2016-02-10 Basile Starynkevitch <basile@starynkevitch.net>

{{merging with even more of GCC 6, using subversion 1.9 svn merge -r227401:227700 ^/trunk }} git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/melt-branch@233282 138bc75d-0d04-0410-961f-82ee72b054a4
author: bstarynk <bstarynk@138bc75d-0d04-0410-961f-82ee72b054a4> 2016-02-10 17:45:52 +0000
committer: bstarynk <bstarynk@138bc75d-0d04-0410-961f-82ee72b054a4> 2016-02-10 17:45:52 +0000
commit: eb76579392e0d61b9f33c90fdd8b620e563d0a12 (patch)
tree: 4307edacc6d226e528b690b44a329d430fe03a61 /gcc
parent: 2d9d01985a7a7866916fafa19c5c296702e69714 (diff)
download: gcc-eb76579392e0d61b9f33c90fdd8b620e563d0a12.tar.gz
268 files changed, 7947 insertions, 1407 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e686783f232..abb1d6d0e53 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,698 @@
+2015-09-11  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/47679
+	* tree-ssa-dom.c (struct cond_equivalence): Update comment.
+	* tree-ssa-scopedtables.h (class const_and_copies): Prefix data
+	member with m_.  Update inline member functions as necessary.  Add
+	toplevel comment.
+	* tree-ssa-scopedtables.c: Update const_and_copies's member
+	functions to use m_ prefix to access the stack.
+
+2015-09-11  Aditya Kumar  <aditya.k7@samsung.com>
+
+        * graphite-optimize-isl.c (disable_tiling): Remove.
+	(get_schedule_for_band): Do not use disable_tiling.
+	(get_prevector_map): Delete function.
+	(enable_polly_vector): Remove.
+        (get_schedule_for_band_list): Remove dead code.
+
+2015-09-11  Aditya Kumar  <aditya.k7@samsung.com>
+
+        * graphite-optimize-isl.c (get_tile_map): Refactor.
+        (get_schedule_for_band): Same.
+        (getScheduleForBand): Same.
+        (get_prevector_map): Same.
+        (get_schedule_for_band_list): Same.
+        (get_schedule_map): Same.
+        (get_single_map): Same.
+        (apply_schedule_map_to_scop): Same.
+        (optimize_isl): Same.
+
+2015-09-10  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
+
+	PR target/63304
+        * config/aarch64/aarch.md (mov<mode>:GPF_F16): Use GPF_TF_F16.
+        (movtf): Delete.
+        * config/aarch64/iterators.md (GPF_TF_F16): New.
+        (GPF_F16): Delete.
+
+2015-09-10  Nathan Sidwell  <nathan@acm.org>
+
+	* config/nvptx/nvptx.c (nvptx_expand_call): Add spacing.
+	(nvptx_reorg): Adjust comments.
+
+2015-09-15  John David Anglin  <danglin@gcc.gnu.org>
+
+	PR bootstrap/67363
+	* configure.ac: Check if setenv and unsetenv are declared.
+	* configure: Rebuild.
+	* config.in: Rebuild.
+	* system.h: Declare setenv and unsetenv if not declared.
+
+2015-09-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
+
+	* config/rs6000/rs6000.c (swap_web_entry): Update preceding
+	commentary to simplify permute mask adjustment equation.
+	(special_handling_values): Add SH_VPERM.
+	(const_load_sequence_p): New function.
+	(insn_is_swappable_p): Add logic to recognize an UNSPEC_VPERM with
+	the mask loaded from the constant pool.
+	(adjust_vperm): New function.
+	(handle_special_swappables): Call adjust_vperm.
+	(dump_swap_insn_table): Handle SH_VPERM.
+
+2015-09-10  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* shrink-wrap.c (requires_stack_frame_p): Remove static.
+	* shrink-wrap.h (requires_stack_frame_p): Put back.
+
+2015-09-10  Richard Sandiford  <richard.sandiford@arm.com>
+
+	* reload1.c (elimination_costs_in_insn): Locally turn
+	-Wmaybe-uninitialized into a warning.
+
+2015-09-10  Segher Boessenkool  <segher@kernel.crashing.org>
+
+	* shrink-wrap.c (requires_stack_frame_p): Make static.
+	(prepare_shrink_wrap): Likewise.
+	(dup_block_and_redirect): Likewise.
+	* shrink-wrap.h: Remove declarations of those functions.
+
+2015-09-10  Mark Wielaard  <mjw@redhat.com>
+
+	* doc/invoke.texi (Wnonnull): Also warns when comparing against NULL.
+
+2015-09-10  Oleg Endo  <olegendo@gcc.gnu.org>
+
+	PR target/67506
+	* config/sh/sh.c (sh_extending_set_of_reg::use_as_extended_reg): Add
+	missing simplify_gen_subreg.
+
+2015-09-10  Andreas Krebbel  <krebbel@linux.vnet.ibm.com>
+
+	* config/s390/s390.c (s390_contiguous_bitmask_vector_p): Reject if
+	the vector element is bigger than 64 bit.
+
+2015-09-10  Andreas Krebbel  <krebbel@linux.vnet.ibm.com>
+
+	* config/s390/vx-builtins.md ("vec_vmal<mode>", "vec_vmah<mode>")
+	("vec_vmalh<mode>"): Change mode iterator from VI_HW to VI_HW_QHS.
+
+2015-09-10  Andreas Krebbel  <krebbel@linux.vnet.ibm.com>
+
+	* config/s390/s390.c: Add V1TImode to constant pool modes.
+
+2015-09-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
+
+	PR target/67439
+	* config/arm/arm.md (*arm32_movhf): Remove !arm_restrict_it from
+	predicate.  Set predicable_short_it attr to "no".
+
+2015-09-10  Jiong Wang  <jiong.wang@arm.com>
+
+	PR rtl-optimization/67421
+	* expr.c (expand_expr_real_2): Cost instrcution sequences when doing
+	left wide shift tranformation.
+
+2015-09-10  Claudiu Zissulescu  <claziss@synopsys.com>
+
+	* common/config/arc/arc-common.c: Remove references to A5.
+	* config/arc/arc-opts.h: Likewise.
+	* config/arc/arc.c, config/arc/arc.h, config/arc/arc.md: Likewise.
+	* config/arc/arc.opt, config/arc/constraints.md: Likewise.
+	* config/arc/t-arc-newlib: Likewise.
+
+2015-09-10  Claudiu Zissulescu <claziss@synopsys.com>
+
+	* config/arc/arc.md (length): Fix attribute length for conditional
+	executed instructions with long immediate.
+
+2015-09-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
+
+	* config/aarch64/aarch64.md (*and<mode>3nr_compare0): Use logics_imm
+	type for second alternative.
+
+2015-09-10  Markus Trippelsdorf  <markus@trippelsdorf.de>
+
+	* doc/invoke.texi (Downloading GCC): Mention
+	contrib/download_prerequisites script.
+
+2015-09-10  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c++/67523
+	* gimplify.c (gimplify_omp_for): If inner stmt is not found
+	for combined loop, assert seen_error () and return GS_ERROR.
+
+	PR middle-end/67521
+	* gimplify.c (gimplify_omp_for): Don't call omp_add_variable
+	if decl is already in outer->variables.
+
+	PR middle-end/67517
+	* gimplify.c (gimplify_scan_omp_clauses): Instead of
+	asserting that decl is not specified in octx->variables,
+	break out of the loop if it is.
+
+	PR c++/67514
+	* gimplify.c (gimplify_omp_for): For loop SIMD construct, if
+	iterator is not explicitly determined, but is defined inside
+	of the combined workshare region, handle it like if it has
+	DECL_EXPR in OMP_FOR_PRE_BODY.
+
+2015-09-09  Nathan Sidwell  <nathan@acm.org>
+
+	* config/nvptx/nvptx.md (call_operation): Move bound out of loop.
+	(*cmp<mode>): Add assembler spacing.
+	(setcc_int<mode>, set_cc_float<mode>): Likewise.
+	* config/nvptx/nvptx.c (nvptx_option_override): Override debug
+	level.
+	(write_func_decl_from_insn): Refactor argument loops & comma emission.
+	(nvptx_expand_call): Likewise.
+	(nvptx_output_call_insn): Likewise.
+	(nvptx_reorg_subreg): Add spacing.
+
+2015-09-09  Marek Polacek  <polacek@redhat.com>
+
+	PR middle-end/67512
+	* tree-ssa-uninit.c (pred_equal_p): Only call invert_tree_comparison
+	for comparisons.
+
+2015-09-09  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/53184
+	* doc/invoke.texi ([Wsubobject-linkage]): Document.
+
+2015-09-09  Tom de Vries  <tom@codesourcery.com>
+
+	* params-list.h: Add missing copyright notice.
+
+2015-09-09  Nathan Sidwell  <nathan@acm.org>
+
+	* config/nvptx/nvptx.md (atomic_compare_and_swap<mode>): Use
+	sel_truesi, not andsi.
+
+2015-09-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
+
+	* config/arm/arm.md (*subsi3_compare0): Rename to...
+	(subsi3_compare0): ... This.
+	(modsi3): New define_expand.
+	* config/arm/arm.c (arm_new_rtx_costs, MOD case): Handle case
+	when operand is power of 2.
+
+2015-09-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
+
+	* config/aarch64/aarch64.md (mod<mode>3): New define_expand.
+	(*neg<mode>2_compare0): Rename to...
+	(neg<mode>2_compare0): ... This.
+	* config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
+	Move check for speed inside the if-then-elses.  Reflect
+	CSNEG sequence in MOD by power of 2 case.
+
+2015-09-09  Alan Modra  <amodra@gmail.com>
+
+	PR target/67378
+	* config/rs6000/rs6000.c (rs6000_secondary_reload_gpr): Find
+	reload replacement for PRE_MODIFY address reg.
+
+2015-09-09  Sebastian Pop  <s.pop@samsung.com>
+
+	PR tree-optimization/53852
+	* config.in: Regenerate.
+	* configure: Regenerate.
+	* configure.ac (HAVE_ISL_CTX_MAX_OPERATIONS): Detect.
+	* graphite-optimize-isl.c (optimize_isl): Stop computation when
+	PARAM_MAX_ISL_OPERATIONS is reached.
+	* params.def (PARAM_MAX_ISL_OPERATIONS): Add.
+	* graphite-dependences.c (extend_schedule): Remove gcc_asserts on
+	result equal to isl_stat_ok as the status now can be isl_error_quota.
+	(subtract_commutative_associative_deps): Same.
+	(compute_deps): Same.
+
+2015-09-08  Aditya Kumar  <hiraditya@msn.com>
+            Sebastian Pop  <s.pop@samsung.com>
+
+	* graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_ast_expr_id):
+	Return the parameter if it was saved in corresponding
+	parameter_rename_map of the region.
+	(copy_def): Copy def from sese region to the newly created region.
+	(copy_internal_parameters): Copy all the internal parameters defined
+	within a region to the newly created region.
+	(graphite_regenerate_ast_isl): Copy parameters to the new region before
+	translating isl to gimple.
+	* graphite-scop-detection.c (graphite_can_represent_loop): Bail out if
+	the loop-nest does not have any data-references.
+	(build_graphite_scops): Create a scop only when there is at least one
+	loop inside it.
+	(contains_only_close_phi_nodes): Deleted.
+	(print_graphite_scop_statistics): Deleted
+	(print_graphite_statistics): Deleted
+	(limit_scops): Deleted.
+	(build_scops): Removed call to limit_scops.
+	* sese.c (new_sese): Construct.
+	(free_sese): Destruct.
+	(sese_add_exit_phis_edge): update_stmt after exit phi edge has been
+	added.
+	(set_rename): Pass sese region so that parameters inside the region can
+	be added to its parameter_rename_map.
+	(rename_uses): Pass sese region.
+	(graphite_copy_stmts_from_block): Do not copy parameters that have been
+	generated in the header of the scop. For each SSA_NAME in the
+	parameter_rename_map rename its usage.
+	(invariant_in_sese_p_rec): Return false if tree t is defined outside
+	sese region.
+	(scalar_evolution_in_region): If the tree t is invariant just return t.
+	* sese.h: Added a parameter renamne map (parameter_rename_map_t) to
+	struct sese to keep track of all the parameters which need renaming.
+	* tree-data-ref.c (loop_nest_has_data_refs): Check if a loop nest has
+	any data-refs.
+	* tree-data-ref.h: Declaration of loop_nest_has_data_refs.
+
+2015-09-08  Tom de Vries  <tom@codesourcery.com>
+
+	* Makefile.in (generated_files): Add params.list.
+	(params.list, s-params.list): Add rule.
+	* params.h (enum compiler_param): Include params-list.h.  Move define
+	DEFPARAM, include params.def and undef DEFPARAM ...
+	* params-list.h: ... here.  New file.
+
+2015-09-08  David Malcolm  <dmalcolm@redhat.com>
+
+	* pretty-print.h (printer_fn): Fix typo in comment.
+
+2015-09-07  Jeff Law  <law@redhat.com>
+
+	* tree-ssa-scopedtables.h (class const_and_copies): Fix comment typo.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* doc/sourcebuild.texi (arm_neon_fp16): Correct cross-reference.
+	(arm_neon_fp16_ok): Document adding of -mfp16-format=ieee flag.
+	(arm_neon_fp16_hw): New.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* fold-const.c (native_interpret_real): Fix HFmode for bigendian where
+	UNITS_PER_WORD >= 4.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpacks_lo_<mode>,
+	aarch64_simd_vec_unpacks_hi_<mode>): New insn.
+	(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf): Delete insn.
+	(vec_unpacks_lo_<mode>, vec_unpacks_hi_<mode>): New expand.
+	(aarch64_float_extend_lo_v2df): Rename to...
+	(aarch64_float_extend_lo_<Vwide>): this, using VDF and so adding V4SF.
+
+	* config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi): Add v8hf.
+	(float_extend_lo): Add v4sf.
+
+	* config/aarch64/arm_neon.h (vcvt_f32_f16, vcvt_high_f32_f16): New.
+	* config/aarch64/iterators.md (VQ_HSF): New iterator.
+	(VWIDE, Vwtype, Vhalftype): Add V8HF, V4SF.
+	(Vwide): New mode_attr.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>,
+	aarch64_dup_lane<mode>, aarch64_dup_lane_<vswap_width_name><mode>,
+	aarch64_simd_vec_set<mode>, vec_set<mode>, vec_perm_const<mode>,
+	vec_init<mode>, *aarch64_simd_ld1r<mode>, vec_extract<mode>): Add
+	V4HF and V8HF variants to iterator.
+
+	* config/aarch64/aarch64.c (aarch64_evpc_dup): Add V4HF and V8HF cases.
+
+	* config/aarch64/iterators.md (VDQF_F16): New.
+	(VSWAP_WIDTH, vswap_width_name): Add V4HF and V8HF cases.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
+	vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
+	vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
+	vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
+	vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
+	vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
+	vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
+	vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
+	vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
+	vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
+	vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
+	vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
+	vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16,
+	vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
+	vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
+	vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
+	vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16,
+	vld1q_dup_f16): New.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf):
+	Reparameterize to...
+	(aarch64_float_truncate_lo_<mode>): ...this, for both V2SF and V4HF.
+	(aarch64_float_truncate_hi_v4sf): Reparameterize to...
+	(aarch64_float_truncate_hi_<Vdbl>): ...this, for both V4SF and V8HF.
+
+	* config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add
+	v8hf variant.
+	(float_truncate_lo_): Use BUILTIN_VDF iterator.
+
+	* config/aarch64/arm_neon.h (vcvt_f16_f32, vcvt_high_f16_f32): New.
+
+	* config/aarch64/iterators.md (VDF, Vdtype): New.
+	(VWIDE, Vmwtype): Add cases for V4HF and V2SF.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/aarch64/aarch64.c (aarch64_split_simd_combine): Add V4HFmode.
+	* config/aarch64/aarch64-builtins.c (VAR13, VAR14): New.
+	(aarch64_scalar_builtin_types, aarch64_init_simd_builtin_scalar_types):
+	Add __builtin_aarch64_simd_hf.
+	* config/aarch64/arm_neon.h (float16x4x2_t, float16x8x2_t,
+	float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t,
+	vcombine_f16, vst2_lane_f16, vst2q_lane_f16, vst3_lane_f16,
+	vst3q_lane_f16, vst4_lane_f16, vst4q_lane_f16, vld2_f16, vld2q_f16,
+	vld3_f16, vld3q_f16, vld4_f16, vld4q_f16, vld2_dup_f16, vld2q_dup_f16,
+	vld3_dup_f16, vld3q_dup_f16, vld4_dup_f16, vld4q_dup_f16,
+	vld2_lane_f16, vld2q_lane_f16, vld3_lane_f16, vld3q_lane_f16,
+	vld4_lane_f16, vld4q_lane_f16, vst2_f16, vst2q_f16, vst3_f16,
+	vst3q_f16, vst4_f16, vst4q_f16, vcreate_f16): New.
+
+	* config/aarch64/iterators.md (VALLDIF, Vtype, Vetype, Vbtype,
+	V_cmp_result, v_cmp_result): Add cases for V4HF and V8HF.
+	(VDC, Vdbl): Add V4HF.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/aarch64/aarch64.c (aarch64_vector_mode_supported_p): Support
+	V4HFmode and V8HFmode.
+	(aarch64_split_simd_move): Add case for V8HFmode.
+	* config/aarch64/aarch64-builtins.c (v4hf_UP, v8hf_UP): Define.
+	(aarch64_simd_builtin_std_type): Handle HFmode.
+	(aarch64_init_simd_builtin_types): Include Float16x4_t and Float16x8_t.
+
+	* config/aarch64/aarch64-simd.md (mov<mode>, aarch64_get_lane<mode>,
+	aarch64_ld1<VALL:mode>, aarch64_st1<VALL:mode): Use VALL_F16 iterator.
+	(aarch64_be_ld1<mode>, aarch64_be_st1<mode>): Use VALLDI_F16 iterator.
+
+	* config/aarch64/aarch64-simd-builtin-types.def: Add Float16x4_t,
+	Float16x8_t.
+
+	* config/aarch64/aarch64-simd-builtins.def (ld1, st1): Use VALL_F16.
+	* config/aarch64/arm_neon.h (float16x4_t, float16x8_t, float16_t):
+	New typedefs.
+	(vget_lane_f16, vgetq_lane_f16, vset_lane_f16, vsetq_lane_f16,
+	vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16,
+	vst1q_lane_f16): New.
+	* config/aarch64/iterators.md (VD, VQ, VQ_NO2E): Add vectors of HFmode.
+	(VALLDI_F16, VALL_F16): New.
+	(Vmtype, VEL, VCONQ, VHALF, V_TWO_ELEM, V_THREE_ELEM, V_FOUR_ELEM, q):
+	Add cases for V4HF and V8HF.
+	(VDBL, VRL2, VRL3, VRL4): Add V4HF case.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/arm/arm-builtins.c (VAR11, VAR12): New.
+	* config/arm/arm_neon_builtins.def (vcombine, vld2_dup, vld3_dup,
+	vld4_dup): Add v4hf variant.
+	(vget_high, vget_low): Add v8hf variant.
+	(vld1, vst1, vst1_lane, vld2, vld2_lane, vst2, vst2_lane, vld3,
+	vld3_lane, vst3, vst3_lane, vld4, vld4_lane, vst4, vst4_lane): Add
+	v4hf and v8hf variants.
+
+	* config/arm/iterators.md (VD_LANE, VD_RE, VQ2, VQ_HS): New.
+	(VDX): Add V4HF.
+	(V_DOUBLE): Add case for V4HF.
+	(VQX): Add V8HF.
+	(V_HALF): Add case for V8HF.
+	(VDQX): Add V4HF, V8HF.
+	(V_elem, V_two_elem, V_three_elem, V_four_elem, V_cmp_result,
+	V_uf_sclr, V_sz_elem, V_mode_nunits, q): Add cases for V4HF & V8HF.
+
+	* config/arm/neon.md (vec_set<mode>internal, vec_extract<mode>,
+	neon_vget_lane<mode>_sext_internal, neon_vget_lane<mode>_zext_internal,
+	vec_load_lanesoi<mode>, neon_vld2<mode>, vec_store_lanesoi<mode>,
+	neon_vst2<mode>, vec_load_lanesci<mode>, neon_vld3<mode>,
+	neon_vld3qa<mode>, neon_vld3qb<mode>, vec_store_lanesci<mode>,
+	neon_vst3<mode>, neon_vst3qa<mode>, neon_vst3qb<mode>,
+	vec_load_lanesxi<mode>, neon_vld4<mode>, neon_vld4qa<mode>,
+	neon_vld4qb<mode>, vec_store_lanesxi<mode>, neon_vst4<mode>,
+	neon_vst4qa<mode>, neon_vst4qb<mode>): Change VQ iterator to VQ2.
+
+	(neon_vcreate, neon_vreinterpretv8qi<mode>,
+	neon_vreinterpretv4hi<mode>, neon_vreinterpretv2si<mode>,
+	neon_vreinterpretv2sf<mode>, neon_vreinterpretdi<mode>):
+	Change VDX to VD_RE.
+
+	(neon_vld2_lane<mode>, neon_vst2_lane<mode>, neon_vld3_lane<mode>,
+	neon_vst3_lane<mode>, neon_vld4_lane<mode>, neon_vst4_lane<mode>):
+	Change VD iterator to VD_LANE, and VMQ iterator to VQ_HS.
+
+	* config/arm/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t,
+	float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16,
+	vget_high_f16, vget_low_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16,
+	vst1_lane_f16, vst1q_lane_f16, vld2_f16, vld2q_f16, vld2_lane_f16,
+	vld2q_lane_f16, vld2_dup_f16, vst2_f16, vst2q_f16, vst2_lane_f16,
+	vst2q_lane_f16, vld3_f16, vld3q_f16, vld3_lane_f16, vld3q_lane_f16,
+	vld3_dup_f16, vst3_f16, vst3q_f16, vst3_lane_f16, vst3q_lane_f16,
+	vld4_f16, vld4q_f16, vld4_lane_f16, vld4q_lane_f16, vld4_dup_f16,
+	vst4_f16, vst4q_f16, vst4_lane_f16, vst4q_lane_f16): New.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/arm/arm_neon.h (vgetq_lane_f16, vsetq_lane_f16, vld1q_lane_f16,
+	vld1q_dup_f16, vreinterpretq_p8_f16, vreinterpretq_p16_f16,
+	vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpretq_f16_f32,
+	vreinterpretq_f16_p64, vreinterpretq_f16_p128, vreinterpretq_f16_s64,
+	vreinterpretq_f16_u64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
+	vreinterpretq_f16_s32, vreinterpretq_f16_u8, vreinterpretq_f16_u16,
+	vreinterpretq_f16_u32, vreinterpretq_f32_f16, vreinterpretq_p64_f16,
+	vreinterpretq_p128_f16, vreinterpretq_s64_f16, vreinterpretq_u64_f16,
+	vreinterpretq_s8_f16, vreinterpretq_s16_f16, vreinterpretq_s32_f16,
+	vreinterpretq_u8_f16, vreinterpretq_u16_f16, vreinterpretq_u32_f16):
+	New.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/arm/arm.h (VALID_NEON_QREG_MODE): Add V8HFmode.
+
+	* config/arm/arm.c (arm_vector_mode_supported_p): Support V8HFmode.
+
+	* config/arm/arm-builtins.c (v8hf_UP): New.
+	(arm_init_simd_builtin_types): Initialise Float16x8_t.
+
+	* config/arm/arm-simd-builtin-types.def (Float16x8_t): New.
+
+	* config/arm/arm_neon.h (float16x8_t): New typedef.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* config/arm/arm_neon.h (float16_t, vget_lane_f16, vset_lane_f16,
+	vcreate_f16, vld1_lane_f16, vld1_dup_f16, vreinterpret_p8_f16,
+	vreinterpret_p16_f16, vreinterpret_f16_p8, vreinterpret_f16_p16,
+	vreinterpret_f16_f32, vreinterpret_f16_p64, vreinterpret_f16_s64,
+	vreinterpret_f16_u64, vreinterpret_f16_s8, vreinterpret_f16_s16,
+	vreinterpret_f16_s32, vreinterpret_f16_u8, vreinterpret_f16_u16,
+	vreinterpret_f16_u32, vreinterpret_f32_f16, vreinterpret_p64_f16,
+	vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpret_s8_f16,
+	vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
+	vreinterpret_u16_f16, vreinterpret_u32_f16): New.
+
+2015-09-07  Ilya Verbin  <ilya.verbin@intel.com>
+
+	* config/i386/intelmic-mkoffload.c (prepare_target_image): Handle all
+	non-alphanumeric characters in the symbol name.
+
+2015-09-07  Marek Polacek  <polacek@redhat.com>
+
+	PR inline-asm/67448
+	* gimplify.c (gimplify_asm_expr): Don't allow MODIFY_EXPR as
+	a memory input.
+
+2015-09-07  Marek Polacek  <polacek@redhat.com>
+
+	* system.h (INTTYPE_MINIMUM): Rewrite to avoid shift warning.
+
+2015-09-04  Paolo Bonzini  <bonzini@gnu.org>
+
+	* config/i386/cygming.h (SUBTARGET_OVERRIDE_OPTIONS): Do
+	not warn.
+
+2015-09-04  Jakub Jelinek  <jakub@redhat.com>
+
+	PR middle-end/67452
+	* tree-ssa-live.c: Include cfgloop.h.
+	(remove_unused_locals): Clear loop->simduid if simduid is about
+	to be removed from cfun->local_decls.
+
+2015-09-02  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
+
+	PR target/65210
+	* config/avr/avr.c (avr_eval_addr_attrib): Look for io_low
+	attribute as well.
+
+2015-09-04  Tom de Vries  <tom@codesourcery.com>
+
+	* doc/invoke.texi (@item -ftrapv, @item -fwrapv): Document interaction.
+
+2015-09-04  Jeff Law  <law@redhat.com>
+
+	* tree-ssa-scopedtables.c (const_and_copies::const_and_copies): Remove
+	unnecessary constructor.  It's now trivial and implemented inside...
+	* tree-ssa-scopedtables.h (const_and_copies): Implement trivial
+	constructor.  Add comments to various methods.  Remove unused
+	private fields.
+	* tree-ssa-dom.c (pass_dominator::execute): Corresponding changes.
+	* tree-vrp.c (identify_jump_threads): Likewise.
+	* tree-ssa-threadedge.c (thread_through_normal_block): Fix minor
+	indentation issues.
+	(thread_across_edge): Similarly.
+	(record_temporary_equivalences_from_stmts_at_dest): Remove unused
+	arguments in constructor call.
+
+2015-09-04  Jonas Hahnfeld  <Hahnfeld@itc.rwth-aachen.de>
+
+	* config/i386/intelmic-mkoffload.c (prepare_target_image): Fix if the
+	temp path contains a '-'.
+
+2015-09-04  Andrey Turetskiy  <andrey.turetskiy@intel.com>
+	    Petr Murzin  <petr.murzin@intel.com>
+	    Kirill Yukhin <kirill.yukhin@intel.com>
+
+	* config/i386/i386-builtin-types.def
+	(VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
+	(VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
+	(VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
+	(VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
+	* config/i386/i386.c
+	(ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
+	IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
+	IX86_BUILTIN_SCATTERALTDIV16SI.
+	(ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
+	__builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
+	__builtin_ia32_scatteraltdiv8si.
+	(ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
+	IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
+	IX86_BUILTIN_SCATTERALTDIV16SI.
+	(ix86_vectorize_builtin_scatter): New.
+	(TARGET_VECTORIZE_BUILTIN_SCATTER): Define as
+	ix86_vectorize_builtin_scatter.
+
+2015-09-04  Andrey Turetskiy  <andrey.turetskiy@intel.com>
+	    Petr Murzin  <petr.murzin@intel.com>
+	    Kirill Yukhin <kirill.yukhin@intel.com>
+
+	* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New.
+	* doc/tm.texi: Regenerate.
+	* target.def: Add scatter builtin.
+	* tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it
+	for loads/stores in case of gather/scatter accordingly.
+	(STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
+	(vect_check_gather): Rename to ...
+	(vect_check_gather_scatter): this.
+	* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use
+	STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P.
+	(vect_check_gather_scatter): Use it instead of vect_check_gather.
+	(vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable
+	and new checkings for it accordingly.
+	* tree-vect-stmts.c
+	(STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S).
+	(vect_check_gather_scatter): Use it instead of vect_check_gather.
+	(vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P.
+
+2015-09-03  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
+
+	* config/rs6000/altivec.md (altivec_vperm_v8hiv16qi): New
+	define_insn.
+	(mulv16qi3): New define_expand.
+
+2015-09-03  Martin Sebor  <msebor@redhat.com>
+
+	PR c/66516
+	* doc/extend.texi (Other Builtins): Document when the address
+	of a built-in function can be taken.
+
+2015-09-03  Richard Biener  <rguenther@suse.de>
+
+	* dwarf2out.c (flush_limbo_die_list): Split out from ...
+	(dwarf2out_early_finish): ... here.
+	(dwarf2out_finish): Do not call dwarf2out_early_finish but
+	flush_limbo_die_list.  Assert we have no deferred asm names.
+
+2015-09-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
+
+	* optabs.c (expand_binop): Don't create a broadcast vector with a
+	source element wider than the inner mode.
+
+2015-09-03  Richard Biener  <rguenther@suse.de>
+
+	* varasm.c (output_constant): Use fold_convert instead of
+	wide_int_to_tree.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	PR tree-optimization/65637
+	* omp-low.c (expand_omp_for_static_chunk): Handle case that fin_bb has 2
+	predecessors.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	PR tree-optimization/65637
+	* omp-low.c (find_phi_with_arg_on_edge): New function.
+	(expand_omp_for_static_chunk): Fix inner loop phi.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	PR tree-optimization/65637
+	* omp-low.c (expand_omp_for_static_chunk): Fix gcc_assert for the case
+	that head is NULL.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	* omp-low.c (expand_omp_for_static_chunk): Handle simple latch bb.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	* doc/invoke.texi (parloops-chunk-size): Add item.
+	* params.def (PARAM_PARLOOPS_CHUNK_SIZE): Add DEFPARAM.
+	* tree-parloops.c: Include params.h.
+	(create_parallel_loop): Set chunk-size of schedule of omp-for loop, if
+	param parloops-chunk-size is used.
+
+2015-09-03  Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>
+
+	PR middle-end/67351
+	* fold-const.c (fold_binary_loc) : Move 
+	Transform (x >> c) << c into x & (-1<<c) or
+	transform (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
+	types using simplify and match.
+	* match.pd (lshift (rshift @0 INTEGER_CST@1) @1) : New simplifier.
+	(rshift (lshift @0 INTEGER_CST@1) @1) : New Simplifier
+
+2015-09-03  Richard Biener  <rguenther@suse.de>
+
+	PR ipa/66705
+	* tree-ssa-structalias.c (ctor_for_analysis): New function.
+	(create_variable_info_for_1): Use ctor_for_analysis instead
+	of get_constructor.
+	(create_variable_info_for): Likewise.
+
+2015-09-02  Charles Baylis  <charles.baylis@linaro.org>
+
+	PR ipa/67280
+	* cgraphunit.c (cgraph_node::create_wrapper): Set can_throw_external
+	in new callgraph edge.
+
+2015-09-02  Christophe Lyon  <christophe.lyon@linaro.org>
+
+	PR target/59810
+	PR target/63652
+	PR target/63653
+	* config/aarch64/aarch64-simd.md
+	(aarch64_ld<VSTRUCT:nregs><VQ:mode>): Call
+	gen_aarch64_simd_ld<VSTRUCT:nregs><VQ:mode>.
+	(aarch64_st<VSTRUCT:nregs><VQ:mode>): Call
+	gen_aarch64_simd_st<VSTRUCT:nregs><VQ:mode>.
+
 2015-09-02  Alan Modra  <amodra@gmail.com>
 
 	* config/rs6000/sysv4le.h (LINK_TARGET_SPEC): Don't define.
diff --git a/gcc/DATESTAMP b/gcc/DATESTAMP
index 4c97cf77e19..68e1defad14 100644
--- a/gcc/DATESTAMP
+++ b/gcc/DATESTAMP
@@ -1 +1 @@
-20150902
+20150911
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 191cdf177ec..0343d7a4ac0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2532,7 +2532,7 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_H) multilib.h \
        $(ALL_GTFILES_H) gtype-desc.c gtype-desc.h gcov-iov.h \
        options.h target-hooks-def.h insn-opinit.h \
        common/common-target-hooks-def.h pass-instances.def \
-       c-family/c-target-hooks-def.h
+       c-family/c-target-hooks-def.h params.list
 
 #
 # How to compile object files to run on the build machine.
@@ -3366,6 +3366,12 @@ installdirs:
 	$(mkinstalldirs) $(DESTDIR)$(man1dir)
 	$(mkinstalldirs) $(DESTDIR)$(man7dir)
 
+params.list: s-params.list; @true
+s-params.list: $(srcdir)/params-list.h $(srcdir)/params.def
+	$(CPP) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
+	$(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
+	$(STAMP) s-params.list
+
 PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   toplev.h $(DIAGNOSTIC_CORE_H) $(BASIC_BLOCK_H) $(HASH_TABLE_H) \
   tree-ssa-alias.h $(INTERNAL_FN_H) gimple-fold.h tree-eh.h gimple-expr.h \
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index 8b4009e85fd..3acc84f433b 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,26 @@
+2015-09-09  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/53184
+	* c.opt ([Wsubobject-linkage]): Add.
+
+2015-09-03  Martin Sebor  <msebor@redhat.com>
+
+	PR c/66516
+	* c-common.h (c_decl_implicit, reject_gcc_builtin): Declare new
+	functions.
+	* c-common.c (reject_gcc_builtin): Define.
+
+2015-09-02  Balaji V. Iyer  <balaji.v.iyer@intel.com>
+
+	PR middle-end/60586
+	* c-common.h (cilk_gimplify_call_params_in_spawned_fn): New
+	prototype.
+	* c-gimplify.c (c_gimplify_expr): Added a call to the function
+	cilk_gimplify_call_params_in_spawned_fn.
+	* cilk.c (cilk_gimplify_call_params_in_spawned_fn): New function.
+	(gimplify_cilk_spawn): Removed EXPR_STMT and CLEANUP_POINT_EXPR
+	unwrapping.
+
 2015-08-25  Marek Polacek  <polacek@redhat.com>
 
 	PR middle-end/67330
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index adfd1e6c301..02866235119 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -12882,4 +12882,41 @@ pointer_to_zero_sized_aggr_p (tree t)
   return (TYPE_SIZE (t) && integer_zerop (TYPE_SIZE (t)));
 }
 
+/* For an EXPR of a FUNCTION_TYPE that references a GCC built-in function
+   with no library fallback or for an ADDR_EXPR whose operand is such type
+   issues an error pointing to the location LOC.
+   Returns true when the expression has been diagnosed and false
+   otherwise.  */
+bool
+reject_gcc_builtin (const_tree expr, location_t loc /* = UNKNOWN_LOCATION */)
+{
+  if (TREE_CODE (expr) == ADDR_EXPR)
+    expr = TREE_OPERAND (expr, 0);
+
+  if (TREE_TYPE (expr)
+      && TREE_CODE (TREE_TYPE (expr)) == FUNCTION_TYPE
+      && DECL_P (expr)
+      /* The intersection of DECL_BUILT_IN and DECL_IS_BUILTIN avoids
+	 false positives for user-declared built-ins such as abs or
+	 strlen, and for C++ operators new and delete.
+	 The c_decl_implicit() test avoids false positives for implicitly
+	 declared built-ins with library fallbacks (such as abs).  */
+      && DECL_BUILT_IN (expr)
+      && DECL_IS_BUILTIN (expr)
+      && !c_decl_implicit (expr)
+      && !DECL_ASSEMBLER_NAME_SET_P (expr))
+    {
+      if (loc == UNKNOWN_LOCATION)
+	loc = EXPR_LOC_OR_LOC (expr, input_location);
+
+      /* Reject arguments that are built-in functions with
+	 no library fallback.  */
+      error_at (loc, "built-in function %qE must be directly called", expr);
+
+      return true;
+    }
+
+  return false;
+}
+
 #include "gt-c-family-c-common.h"
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d1f6cba20e3..33d228af259 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -573,6 +573,7 @@ extern int field_decl_cmp (const void *, const void *);
 extern void resort_sorted_fields (void *, void *, gt_pointer_operator,
 				  void *);
 extern bool has_c_linkage (const_tree decl);
+extern bool c_decl_implicit (const_tree);
 
 /* Switches common to the C front ends.  */
 
@@ -1425,6 +1426,8 @@ extern vec <tree, va_gc> *fix_sec_implicit_args
 extern tree insert_cilk_frame (tree);
 extern void cilk_init_builtins (void);
 extern int gimplify_cilk_spawn (tree *);
+extern void cilk_gimplify_call_params_in_spawned_fn (tree *, gimple_seq *,
+						     gimple_seq *);
 extern void cilk_install_body_with_frame_cleanup (tree, tree, void *);
 extern bool cilk_detect_spawn_and_unwrap (tree *);
 extern bool cilk_set_spawn_marker (location_t, tree);
@@ -1438,5 +1441,6 @@ extern bool contains_cilk_spawn_stmt (tree);
 extern tree cilk_for_number_of_iterations (tree);
 extern bool check_no_cilk (tree, const char *, const char *,
 		           location_t loc = UNKNOWN_LOCATION);
+extern bool reject_gcc_builtin (const_tree, location_t = UNKNOWN_LOCATION);
 
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-gimplify.c b/gcc/c-family/c-gimplify.c
index 1c81773a91c..92987b588f7 100644
--- a/gcc/c-family/c-gimplify.c
+++ b/gcc/c-family/c-gimplify.c
@@ -288,8 +288,10 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED,
 
       /* If errors are seen, then just process it as a CALL_EXPR.  */
       if (!seen_error ())
-	return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
-
+	{
+	  cilk_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
+	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	}
     case MODIFY_EXPR:
     case INIT_EXPR:
     case CALL_EXPR:
@@ -299,7 +301,10 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED,
 	     original expression (MODIFY/INIT/CALL_EXPR) is processes as
 	     it is supposed to be.  */
 	  && !seen_error ())
-	return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	{
+	  cilk_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
+	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	}
 
     default:;
     }
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 050dcb02734..d519d7a000b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -944,6 +944,11 @@ Wuseless-cast
 C++ ObjC++ Var(warn_useless_cast) Warning
 Warn about useless casts
 
+Wsubobject-linkage
+C++ ObjC++ Var(warn_subobject_linkage) Warning Init(1)
+Warn if a class type has a base or a field whose type uses the anonymous
+namespace or depends on a type with no linkage
+
 ansi
 C ObjC C++ ObjC++
 A synonym for -std=c89 (for C) or -std=c++98 (for C++)
diff --git a/gcc/c-family/cilk.c b/gcc/c-family/cilk.c
index 1012a4fd3aa..1c316a4eb78 100644
--- a/gcc/c-family/cilk.c
+++ b/gcc/c-family/cilk.c
@@ -762,6 +762,34 @@ create_cilk_wrapper (tree exp, tree *args_out)
   return fndecl;
 }
 
+/* Gimplify all the parameters for the Spawned function.  *EXPR_P can be a
+   CALL_EXPR, INIT_EXPR, MODIFY_EXPR or TARGET_EXPR.  *PRE_P and *POST_P are
+   gimple sequences from the caller of gimplify_cilk_spawn.  */
+
+void
+cilk_gimplify_call_params_in_spawned_fn (tree *expr_p, gimple_seq *pre_p,
+					 gimple_seq *post_p)
+{
+  int ii = 0;
+  tree *fix_parm_expr = expr_p;
+
+  /* Remove CLEANUP_POINT_EXPR and EXPR_STMT from *spawn_p.  */
+  while (TREE_CODE (*fix_parm_expr) == CLEANUP_POINT_EXPR
+	 || TREE_CODE (*fix_parm_expr) == EXPR_STMT)
+    *fix_parm_expr = TREE_OPERAND (*fix_parm_expr, 0);
+
+  if ((TREE_CODE (*expr_p) == INIT_EXPR)
+      || (TREE_CODE (*expr_p) == TARGET_EXPR)
+      || (TREE_CODE (*expr_p) == MODIFY_EXPR))
+    fix_parm_expr = &TREE_OPERAND (*expr_p, 1);
+
+  if (TREE_CODE (*fix_parm_expr) == CALL_EXPR)
+    for (ii = 0; ii < call_expr_nargs (*fix_parm_expr); ii++)
+      gimplify_expr (&CALL_EXPR_ARG (*fix_parm_expr, ii), pre_p, post_p,
+		     is_gimple_reg, fb_rvalue);
+}
+
+
 /* Transform *SPAWN_P, a spawned CALL_EXPR, to gimple.  *SPAWN_P can be a
    CALL_EXPR, INIT_EXPR or MODIFY_EXPR.  Returns GS_OK if everything is fine,
    and GS_UNHANDLED, otherwise.  */
@@ -779,12 +807,6 @@ gimplify_cilk_spawn (tree *spawn_p)
 
   cfun->calls_cilk_spawn = 1;
   cfun->is_cilk_function = 1;
-
-  /* Remove CLEANUP_POINT_EXPR and EXPR_STMT from *spawn_p.  */
-  while (TREE_CODE (expr) == CLEANUP_POINT_EXPR
-	 || TREE_CODE (expr) == EXPR_STMT)
-    expr = TREE_OPERAND (expr, 0);
-  
   new_args = NULL;
   function = create_cilk_wrapper (expr, &new_args);
 
diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 275d7879fd6..325686a4b1c 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,57 @@
+2015-09-09  Mark Wielaard  <mjw@redhat.com>
+
+	* c-typeck.c (build_binary_op): Check and warn when nonnull arg
+	parm against NULL.
+
+2015-09-10  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c/67502
+	* c-parser.c (c_parser_omp_for_loop): Emit DECL_EXPR stmts
+	into OMP_FOR_PRE_BODY rather than before the loop.
+
+2015-09-09  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c/67501
+	* c-parser.c (c_parser_oacc_all_clauses,
+	c_parser_omp_all_clauses): Remove invalid clause from
+	list of clauses even if parser->error is set.
+
+	PR c/67500
+	* c-parser.c (c_parser_omp_clause_aligned,
+	c_parser_omp_clause_safelen, c_parser_omp_clause_simdlen): Fix up
+	test for errors.
+	* c-decl.c (temp_pop_parm_decls): Allow b->decl equal to
+	error_mark_node.
+
+	PR c/67495
+	* c-parser.c (c_parser_omp_atomic): Use c_parser_cast_expression
+	instead of c_parser_unary_expression.  If the result is !lvalue_p,
+	wrap the result of c_fully_fold into NON_LVALUE_EXPR.
+
+2015-09-04  Marek Polacek  <polacek@redhat.com>
+
+	PR sanitizer/67279
+	* c-typeck.c (build_binary_op): Don't instrument static initializers.
+
+2015-09-03  Martin Sebor  <msebor@redhat.com>
+
+	PR c/66516
+	* c-typeck.c (convert_arguments, parser_build_unary_op,
+	build_conditional_expr, c_cast_expr, convert_for_assignment,
+	build_binary_op, _objc_common_truthvalue_conversion): Call
+	reject_gcc_builtin.
+	(c_decl_implicit): Define.
+
+2015-09-02  Marek Polacek  <polacek@redhat.com>
+
+	PR c/67432
+	* c-parser.c (c_parser_enum_specifier): Give a better error for
+	an empty enum.
+
+2015-08-18  Trevor Saunders  <tbsaunde@tbsaunde.org>
+
+	* c-aux-info.c, c-parser.c, c-tree.h: Remove useless typedefs.
+
 2015-08-12  Marek Polacek  <polacek@redhat.com>
 
 	* c-decl.c (grokdeclarator): Call error_at instead of error and pass
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b83c584c701..5e5b6d7dfc2 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -8912,7 +8912,8 @@ temp_pop_parm_decls (void)
   current_scope->bindings = NULL;
   for (; b; b = free_binding_and_advance (b))
     {
-      gcc_assert (TREE_CODE (b->decl) == PARM_DECL);
+      gcc_assert (TREE_CODE (b->decl) == PARM_DECL
+		  || b->decl == error_mark_node);
       gcc_assert (I_SYMBOL_BINDING (b->id) == b);
       I_SYMBOL_BINDING (b->id) = b->shadowed;
       if (b->shadowed && b->shadowed->u.type)
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 30b4302b09e..52eec97152b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -2555,7 +2555,16 @@ c_parser_enum_specifier (c_parser *parser)
 	  location_t decl_loc, value_loc;
 	  if (c_parser_next_token_is_not (parser, CPP_NAME))
 	    {
-	      c_parser_error (parser, "expected identifier");
+	      /* Give a nicer error for "enum {}".  */
+	      if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE)
+		  && !parser->error)
+		{
+		  error_at (c_parser_peek_token (parser)->location,
+			    "empty enum is invalid");
+		  parser->error = true;
+		}
+	      else
+		c_parser_error (parser, "expected identifier");
 	      c_parser_skip_until_found (parser, CPP_CLOSE_BRACE, NULL);
 	      values = error_mark_node;
 	      break;
@@ -11222,9 +11231,9 @@ c_parser_omp_clause_aligned (c_parser *parser, tree list)
       tree alignment = c_parser_expr_no_commas (parser, NULL).value;
       mark_exp_read (alignment);
       alignment = c_fully_fold (alignment, false, NULL);
-      if (!INTEGRAL_TYPE_P (TREE_TYPE (alignment))
-	  && TREE_CODE (alignment) != INTEGER_CST
-	  && tree_int_cst_sgn (alignment) != 1)
+      if (TREE_CODE (alignment) != INTEGER_CST
+	  || !INTEGRAL_TYPE_P (TREE_TYPE (alignment))
+	  || tree_int_cst_sgn (alignment) != 1)
 	{
 	  error_at (clause_loc, "%<aligned%> clause alignment expression must "
 				"be positive constant integer expression");
@@ -11301,9 +11310,9 @@ c_parser_omp_clause_safelen (c_parser *parser, tree list)
   t = c_parser_expr_no_commas (parser, NULL).value;
   mark_exp_read (t);
   t = c_fully_fold (t, false, NULL);
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
-      && TREE_CODE (t) != INTEGER_CST
-      && tree_int_cst_sgn (t) != 1)
+  if (TREE_CODE (t) != INTEGER_CST
+      || !INTEGRAL_TYPE_P (TREE_TYPE (t))
+      || tree_int_cst_sgn (t) != 1)
     {
       error_at (clause_loc, "%<safelen%> clause expression must "
 			    "be positive constant integer expression");
@@ -11337,9 +11346,9 @@ c_parser_omp_clause_simdlen (c_parser *parser, tree list)
   t = c_parser_expr_no_commas (parser, NULL).value;
   mark_exp_read (t);
   t = c_fully_fold (t, false, NULL);
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
-      && TREE_CODE (t) != INTEGER_CST
-      && tree_int_cst_sgn (t) != 1)
+  if (TREE_CODE (t) != INTEGER_CST
+      || !INTEGRAL_TYPE_P (TREE_TYPE (t))
+      || tree_int_cst_sgn (t) != 1)
     {
       error_at (clause_loc, "%<simdlen%> clause expression must "
 			    "be positive constant integer expression");
@@ -11743,7 +11752,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 
       first = false;
 
-      if (((mask >> c_kind) & 1) == 0 && !parser->error)
+      if (((mask >> c_kind) & 1) == 0)
 	{
 	  /* Remove the invalid clause(s) from the list to avoid
 	     confusing the rest of the compiler.  */
@@ -11972,7 +11981,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 
       first = false;
 
-      if (((mask >> c_kind) & 1) == 0 && !parser->error)
+      if (((mask >> c_kind) & 1) == 0)
 	{
 	  /* Remove the invalid clause(s) from the list to avoid
 	     confusing the rest of the compiler.  */
@@ -12413,6 +12422,7 @@ c_parser_omp_atomic (location_t loc, c_parser *parser)
   bool structured_block = false;
   bool swapped = false;
   bool seq_cst = false;
+  bool non_lvalue_p;
 
   if (c_parser_next_token_is (parser, CPP_NAME))
     {
@@ -12466,20 +12476,33 @@ c_parser_omp_atomic (location_t loc, c_parser *parser)
     {
     case OMP_ATOMIC_READ:
     case NOP_EXPR: /* atomic write */
-      v = c_parser_unary_expression (parser).value;
+      v = c_parser_cast_expression (parser, NULL).value;
+      non_lvalue_p = !lvalue_p (v);
       v = c_fully_fold (v, false, NULL);
       if (v == error_mark_node)
 	goto saw_error;
+      if (non_lvalue_p)
+	v = non_lvalue (v);
       loc = c_parser_peek_token (parser)->location;
       if (!c_parser_require (parser, CPP_EQ, "expected %<=%>"))
 	goto saw_error;
       if (code == NOP_EXPR)
-	lhs = c_parser_expression (parser).value;
+	{
+	  lhs = c_parser_expression (parser).value;
+	  lhs = c_fully_fold (lhs, false, NULL);
+	  if (lhs == error_mark_node)
+	    goto saw_error;
+	}
       else
-	lhs = c_parser_unary_expression (parser).value;
-      lhs = c_fully_fold (lhs, false, NULL);
-      if (lhs == error_mark_node)
-	goto saw_error;
+	{
+	  lhs = c_parser_cast_expression (parser, NULL).value;
+	  non_lvalue_p = !lvalue_p (lhs);
+	  lhs = c_fully_fold (lhs, false, NULL);
+	  if (lhs == error_mark_node)
+	    goto saw_error;
+	  if (non_lvalue_p)
+	    lhs = non_lvalue (lhs);
+	}
       if (code == NOP_EXPR)
 	{
 	  /* atomic write is represented by OMP_ATOMIC with NOP_EXPR
@@ -12498,10 +12521,13 @@ c_parser_omp_atomic (location_t loc, c_parser *parser)
 	}
       else
 	{
-	  v = c_parser_unary_expression (parser).value;
+	  v = c_parser_cast_expression (parser, NULL).value;
+	  non_lvalue_p = !lvalue_p (v);
 	  v = c_fully_fold (v, false, NULL);
 	  if (v == error_mark_node)
 	    goto saw_error;
+	  if (non_lvalue_p)
+	    v = non_lvalue (v);
 	  if (!c_parser_require (parser, CPP_EQ, "expected %<=%>"))
 	    goto saw_error;
 	}
@@ -12514,7 +12540,7 @@ c_parser_omp_atomic (location_t loc, c_parser *parser)
      old or new x should be captured.  */
 restart:
   eloc = c_parser_peek_token (parser)->location;
-  expr = c_parser_unary_expression (parser);
+  expr = c_parser_cast_expression (parser, NULL);
   lhs = expr.value;
   expr = default_function_array_conversion (eloc, expr);
   unfolded_lhs = expr.value;
@@ -12607,6 +12633,8 @@ restart:
 	}
       /* FALLTHRU */
     default:
+      if (!lvalue_p (unfolded_lhs))
+	lhs = non_lvalue (lhs);
       switch (c_parser_peek_token (parser)->type)
 	{
 	case CPP_MULT_EQ:
@@ -12721,20 +12749,25 @@ stmt_done:
     {
       if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
 	goto saw_error;
-      v = c_parser_unary_expression (parser).value;
+      v = c_parser_cast_expression (parser, NULL).value;
+      non_lvalue_p = !lvalue_p (v);
       v = c_fully_fold (v, false, NULL);
       if (v == error_mark_node)
 	goto saw_error;
+      if (non_lvalue_p)
+	v = non_lvalue (v);
       if (!c_parser_require (parser, CPP_EQ, "expected %<=%>"))
 	goto saw_error;
       eloc = c_parser_peek_token (parser)->location;
-      expr = c_parser_unary_expression (parser);
+      expr = c_parser_cast_expression (parser, NULL);
       lhs1 = expr.value;
       expr = default_function_array_read_conversion (eloc, expr);
       unfolded_lhs1 = expr.value;
       lhs1 = c_fully_fold (lhs1, false, NULL);
       if (lhs1 == error_mark_node)
 	goto saw_error;
+      if (!lvalue_p (unfolded_lhs1))
+	lhs1 = non_lvalue (lhs1);
     }
   if (structured_block)
     {
@@ -12836,7 +12869,8 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		       tree clauses, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
-  tree declv, condv, incrv, initv, ret = NULL;
+  tree declv, condv, incrv, initv, ret = NULL_TREE;
+  tree pre_body = NULL_TREE, this_pre_body;
   bool fail = false, open_brace_parsed = false;
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
@@ -12880,8 +12914,23 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	{
 	  if (i > 0)
 	    vec_safe_push (for_block, c_begin_compound_stmt (true));
+	  this_pre_body = push_stmt_list ();
 	  c_parser_declaration_or_fndef (parser, true, true, true, true, true,
 					 NULL, vNULL);
+	  if (this_pre_body)
+	    {
+	      this_pre_body = pop_stmt_list (this_pre_body);
+	      if (pre_body)
+		{
+		  tree t = pre_body;   
+		  pre_body = push_stmt_list ();
+		  add_stmt (t);
+		  add_stmt (this_pre_body);
+		  pre_body = pop_stmt_list (pre_body);
+		}
+	      else
+		pre_body = this_pre_body;
+	    }
 	  decl = check_for_loop_decls (for_loc, flag_isoc99);
 	  if (decl == NULL)
 	    goto error_init;
@@ -13076,7 +13125,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   if (!fail)
     {
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, pre_body);
       if (stmt)
 	{
 	  if (cclauses != NULL
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index e8c818989ee..4108f27ab7c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -3339,6 +3339,10 @@ convert_arguments (location_t loc, vec<location_t> arg_loc, tree typelist,
 	  error (invalid_func_diag);
 	  return -1;
 	}
+      else if (TREE_CODE (val) == ADDR_EXPR && reject_gcc_builtin (val))
+	{
+	  return -1;
+	}
       else
 	/* Convert `short' and `char' to full-size `int'.  */
 	parmval = default_conversion (val);
@@ -3376,12 +3380,20 @@ parser_build_unary_op (location_t loc, enum tree_code code, struct c_expr arg)
 {
   struct c_expr result;
 
-  result.value = build_unary_op (loc, code, arg.value, 0);
   result.original_code = code;
   result.original_type = NULL;
 
+  if (reject_gcc_builtin (arg.value))
+    {
+      result.value = error_mark_node;
+    }
+  else
+    {
+      result.value = build_unary_op (loc, code, arg.value, 0);
+
   if (TREE_OVERFLOW_P (result.value) && !TREE_OVERFLOW_P (arg.value))
     overflow_warning (loc, result.value);
+    }
 
   return result;
 }
@@ -4484,6 +4496,12 @@ build_conditional_expr (location_t colon_loc, tree ifexp, bool ifexp_bcp,
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (code1 == POINTER_TYPE && reject_gcc_builtin (op1))
+    return error_mark_node;
+
+  if (code2 == POINTER_TYPE && reject_gcc_builtin (op2))
+    return error_mark_node;
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -5222,6 +5240,10 @@ c_cast_expr (location_t loc, struct c_type_name *type_name, tree expr)
   type = groktypename (type_name, &type_expr, &type_expr_const);
   warn_strict_prototypes = saved_wsp;
 
+  if (TREE_CODE (expr) == ADDR_EXPR && !VOID_TYPE_P (type)
+      && reject_gcc_builtin (expr))
+    return error_mark_node;
+
   ret = build_c_cast (loc, type, expr);
   if (type_expr)
     {
@@ -5861,6 +5883,10 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
   rhs = require_complete_type (rhs);
   if (rhs == error_mark_node)
     return error_mark_node;
+
+  if (coder == POINTER_TYPE && reject_gcc_builtin (rhs))
+    return error_mark_node;
+
   /* A non-reference type can convert to a reference.  This handles
      va_start, va_copy and possibly port built-ins.  */
   if (codel == REFERENCE_TYPE && coder != REFERENCE_TYPE)
@@ -10350,6 +10376,14 @@ build_binary_op (location_t location, enum tree_code code,
   if (code0 == ERROR_MARK || code1 == ERROR_MARK)
     return error_mark_node;
 
+  if (code0 == POINTER_TYPE
+      && reject_gcc_builtin (op0, EXPR_LOCATION (orig_op0)))
+    return error_mark_node;
+
+  if (code1 == POINTER_TYPE
+      && reject_gcc_builtin (op1, EXPR_LOCATION (orig_op1)))
+    return error_mark_node;
+
   if ((invalid_op_diag
        = targetm.invalid_binary_op (code, type0, type1)))
     {
@@ -10769,6 +10803,11 @@ build_binary_op (location_t location, enum tree_code code,
 	short_compare = 1;
       else if (code0 == POINTER_TYPE && null_pointer_constant_p (orig_op1))
 	{
+	  if (warn_nonnull
+	      && TREE_CODE (op0) == PARM_DECL && nonnull_arg_p (op0))
+	    warning_at (location, OPT_Wnonnull,
+			"nonnull argument %qD compared to NULL", op0);
+
 	  if (TREE_CODE (op0) == ADDR_EXPR
 	      && decl_with_nonnull_addr_p (TREE_OPERAND (op0, 0)))
 	    {
@@ -10789,6 +10828,11 @@ build_binary_op (location_t location, enum tree_code code,
 	}
       else if (code1 == POINTER_TYPE && null_pointer_constant_p (orig_op0))
 	{
+	  if (warn_nonnull
+	      && TREE_CODE (op1) == PARM_DECL && nonnull_arg_p (op1))
+	    warning_at (location, OPT_Wnonnull,
+			"nonnull argument %qD compared to NULL", op1);
+
 	  if (TREE_CODE (op1) == ADDR_EXPR
 	      && decl_with_nonnull_addr_p (TREE_OPERAND (op1, 0)))
 	    {
@@ -11258,7 +11302,8 @@ build_binary_op (location_t location, enum tree_code code,
   if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE
 			| SANITIZE_FLOAT_DIVIDE))
       && do_ubsan_in_current_function ()
-      && (doing_div_or_mod || doing_shift))
+      && (doing_div_or_mod || doing_shift)
+      && !require_constant_value)
     {
       /* OP0 and/or OP1 might have side-effects.  */
       op0 = c_save_expr (op0);
@@ -11330,6 +11375,11 @@ c_objc_common_truthvalue_conversion (location_t location, tree expr)
       error_at (location, "void value not ignored as it ought to be");
       return error_mark_node;
 
+    case POINTER_TYPE:
+      if (reject_gcc_builtin (expr))
+	return error_mark_node;
+      break;
+
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
@@ -12882,3 +12932,13 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree body, void *w)
   append_to_statement_list (build_stmt (EXPR_LOCATION (body), TRY_FINALLY_EXPR,
 				       	body_list, dtor), &list);
 }
+
+/* Returns true when the function declaration FNDECL is implicit,
+   introduced as a result of a call to an otherwise undeclared
+   function, and false otherwise.  */
+
+bool
+c_decl_implicit (const_tree fndecl)
+{
+  return C_DECL_IMPLICIT (fndecl);
+}
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index be16f5d0d04..278515da096 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2543,6 +2543,7 @@ cgraph_node::create_wrapper (cgraph_node *target)
   memset (&thunk, 0, sizeof (cgraph_thunk_info));
   thunk.thunk_p = true;
   create_edge (target, NULL, count, CGRAPH_FREQ_BASE);
+  callees->can_throw_external = !TREE_NOTHROW (target->decl);
 
   tree arguments = DECL_ARGUMENTS (decl);
 
diff --git a/gcc/common/config/arc/arc-common.c b/gcc/common/config/arc/arc-common.c
index 95993df355e..489bdb22533 100644
--- a/gcc/common/config/arc/arc-common.c
+++ b/gcc/common/config/arc/arc-common.c
@@ -33,7 +33,7 @@ arc_option_init_struct (struct gcc_options *opts)
 {
   opts->x_flag_no_common = 255; /* Mark as not user-initialized.  */
 
-  /* Which cpu we're compiling for (A5, ARC600, ARC601, ARC700).  */
+  /* Which cpu we're compiling for (ARC600, ARC601, ARC700).  */
   arc_cpu = PROCESSOR_NONE;
 }
 
@@ -82,7 +82,6 @@ arc_handle_option (struct gcc_options *opts, struct gcc_options *opts_set,
 
       switch (value)
 	{
-	case PROCESSOR_A5:
 	case PROCESSOR_ARC600:
 	case PROCESSOR_ARC700:
 	  if (! (opts_set->x_target_flags & MASK_BARREL_SHIFTER) )
diff --git a/gcc/config.in b/gcc/config.in
index 22a4e6b7cb2..431d26218d1 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -934,6 +934,13 @@
 #endif
 
 
+/* Define to 1 if we found a declaration for 'setenv', otherwise define to 0.
+   */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_SETENV
+#endif
+
+
 /* Define to 1 if we found a declaration for 'setrlimit', otherwise define to
    0. */
 #ifndef USED_FOR_TARGET
@@ -1025,6 +1032,13 @@
 #endif
 
 
+/* Define to 1 if we found a declaration for 'unsetenv', otherwise define to
+   0. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_UNSETENV
+#endif
+
+
 /* Define to 1 if we found a declaration for 'vasprintf', otherwise define to
    0. */
 #ifndef USED_FOR_TARGET
@@ -1332,6 +1346,12 @@
 #endif
 
 
+/* Define if isl_ctx_get_max_operations exists. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_ISL_CTX_MAX_OPERATIONS
+#endif
+
+
 /* Define if isl_options_set_schedule_serialize_sccs exists. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index e3a90b5e4dd..5a0426348ee 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -61,6 +61,7 @@
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
+#define v4hf_UP  V4HFmode
 #define v2si_UP  V2SImode
 #define v2sf_UP  V2SFmode
 #define v1df_UP  V1DFmode
@@ -68,6 +69,7 @@
 #define df_UP    DFmode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -295,6 +297,12 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define VAR12(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, MAP, L)
+#define VAR13(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR12 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR1 (T, N, MAP, M)
+#define VAR14(T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M, N) \
+  VAR13 (T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR1 (T, X, MAP, N)
 
 #include "aarch64-builtin-iterators.h"
 
@@ -372,6 +380,7 @@ const char *aarch64_scalar_builtin_types[] = {
   "__builtin_aarch64_simd_qi",
   "__builtin_aarch64_simd_hi",
   "__builtin_aarch64_simd_si",
+  "__builtin_aarch64_simd_hf",
   "__builtin_aarch64_simd_sf",
   "__builtin_aarch64_simd_di",
   "__builtin_aarch64_simd_df",
@@ -520,6 +529,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode,
       return aarch64_simd_intCI_type_node;
     case XImode:
       return aarch64_simd_intXI_type_node;
+    case HFmode:
+      return aarch64_fp16_type_node;
     case SFmode:
       return float_type_node;
     case DFmode:
@@ -604,6 +615,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Poly64x2_t].eltype = aarch64_simd_types[Poly64_t].itype;
 
   /* Continue with standard types.  */
+  aarch64_simd_types[Float16x4_t].eltype = aarch64_fp16_type_node;
+  aarch64_simd_types[Float16x8_t].eltype = aarch64_fp16_type_node;
   aarch64_simd_types[Float32x2_t].eltype = float_type_node;
   aarch64_simd_types[Float32x4_t].eltype = float_type_node;
   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
@@ -655,6 +668,8 @@ aarch64_init_simd_builtin_scalar_types (void)
 					     "__builtin_aarch64_simd_qi");
   (*lang_hooks.types.register_builtin_type) (intHI_type_node,
 					     "__builtin_aarch64_simd_hi");
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node,
+					     "__builtin_aarch64_simd_hf");
   (*lang_hooks.types.register_builtin_type) (intSI_type_node,
 					     "__builtin_aarch64_simd_si");
   (*lang_hooks.types.register_builtin_type) (float_type_node,
diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config/aarch64/aarch64-simd-builtin-types.def
index bb54e56ce63..ea219b72ff9 100644
--- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
+++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
@@ -44,6 +44,8 @@
   ENTRY (Poly16x8_t, V8HI, poly, 12)
   ENTRY (Poly64x1_t, DI, poly, 12)
   ENTRY (Poly64x2_t, V2DI, poly, 12)
+  ENTRY (Float16x4_t, V4HF, none, 13)
+  ENTRY (Float16x8_t, V8HF, none, 13)
   ENTRY (Float32x2_t, V2SF, none, 13)
   ENTRY (Float32x4_t, V4SF, none, 13)
   ENTRY (Float64x1_t, V1DF, none, 13)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index d0f298a1f07..2c13cfb0823 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -361,17 +361,19 @@
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
   BUILTIN_VDQF (UNOP, abs, 2)
 
-  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
+  BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
+  VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
 
   VAR1 (UNOP, float_extend_lo_, 0, v2df)
-  VAR1 (UNOP, float_truncate_lo_, 0, v2sf)
+  VAR1 (UNOP, float_extend_lo_,  0, v4sf)
+  BUILTIN_VDF (UNOP, float_truncate_lo_, 0)
 
-  /* Implemented by aarch64_ld1<VALL:mode>.  */
-  BUILTIN_VALL (LOAD1, ld1, 0)
+  /* Implemented by aarch64_ld1<VALL_F16:mode>.  */
+  BUILTIN_VALL_F16 (LOAD1, ld1, 0)
 
-  /* Implemented by aarch64_st1<VALL:mode>.  */
-  BUILTIN_VALL (STORE1, st1, 0)
+  /* Implemented by aarch64_st1<VALL_F16:mode>.  */
+  BUILTIN_VALL_F16 (STORE1, st1, 0)
 
   /* Implemented by fma<mode>4.  */
   BUILTIN_VDQF (TERNOP, fma, 4)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 97774181fab..a4eaecae2a0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,8 +19,8 @@
 ;; <http://www.gnu.org/licenses/>.
 
 (define_expand "mov<mode>"
-  [(set (match_operand:VALL 0 "nonimmediate_operand" "")
-	(match_operand:VALL 1 "general_operand" ""))]
+  [(set (match_operand:VALL_F16 0 "nonimmediate_operand" "")
+	(match_operand:VALL_F16 1 "general_operand" ""))]
   "TARGET_SIMD"
   "
     if (GET_CODE (operands[0]) == MEM)
@@ -53,18 +53,19 @@
 )
 
 (define_insn "aarch64_simd_dup<mode>"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-        (vec_duplicate:VDQF (match_operand:<VEL> 1 "register_operand" "w")))]
+  [(set (match_operand:VDQF_F16 0 "register_operand" "=w")
+	(vec_duplicate:VDQF_F16
+	  (match_operand:<VEL> 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "dup\\t%0.<Vtype>, %1.<Vetype>[0]"
   [(set_attr "type" "neon_dup<q>")]
 )
 
 (define_insn "aarch64_dup_lane<mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(vec_duplicate:VALL
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(vec_duplicate:VALL_F16
 	  (vec_select:<VEL>
-	    (match_operand:VALL 1 "register_operand" "w")
+	    (match_operand:VALL_F16 1 "register_operand" "w")
 	    (parallel [(match_operand:SI 2 "immediate_operand" "i")])
           )))]
   "TARGET_SIMD"
@@ -76,8 +77,8 @@
 )
 
 (define_insn "aarch64_dup_lane_<vswap_width_name><mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(vec_duplicate:VALL
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(vec_duplicate:VALL_F16
 	  (vec_select:<VEL>
 	    (match_operand:<VSWAP_WIDTH> 1 "register_operand" "w")
 	    (parallel [(match_operand:SI 2 "immediate_operand" "i")])
@@ -834,11 +835,11 @@
 )
 
 (define_insn "aarch64_simd_vec_set<mode>"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-        (vec_merge:VDQF
-	    (vec_duplicate:VDQF
+  [(set (match_operand:VDQF_F16 0 "register_operand" "=w")
+	(vec_merge:VDQF_F16
+	    (vec_duplicate:VDQF_F16
 		(match_operand:<VEL> 1 "register_operand" "w"))
-	    (match_operand:VDQF 3 "register_operand" "0")
+	    (match_operand:VDQF_F16 3 "register_operand" "0")
 	    (match_operand:SI 2 "immediate_operand" "i")))]
   "TARGET_SIMD"
   {
@@ -851,7 +852,7 @@
 )
 
 (define_expand "vec_set<mode>"
-  [(match_operand:VDQF 0 "register_operand" "+w")
+  [(match_operand:VDQF_F16 0 "register_operand" "+w")
    (match_operand:<VEL> 1 "register_operand" "w")
    (match_operand:SI 2 "immediate_operand" "")]
   "TARGET_SIMD"
@@ -1691,58 +1692,79 @@
 
 ;; Float widening operations.
 
-(define_insn "vec_unpacks_lo_v4sf"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-	(float_extend:V2DF
-	  (vec_select:V2SF
-	    (match_operand:V4SF 1 "register_operand" "w")
-	    (parallel [(const_int 0) (const_int 1)])
-	  )))]
+(define_insn "aarch64_simd_vec_unpacks_lo_<mode>"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (float_extend:<VWIDE> (vec_select:<VHALF>
+			       (match_operand:VQ_HSF 1 "register_operand" "w")
+			       (match_operand:VQ_HSF 2 "vect_par_cnst_lo_half" "")
+			    )))]
   "TARGET_SIMD"
-  "fcvtl\\t%0.2d, %1.2s"
+  "fcvtl\\t%0.<Vwtype>, %1.<Vhalftype>"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "aarch64_float_extend_lo_v2df"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-	(float_extend:V2DF
-	  (match_operand:V2SF 1 "register_operand" "w")))]
+(define_expand "vec_unpacks_lo_<mode>"
+  [(match_operand:<VWIDE> 0 "register_operand" "")
+   (match_operand:VQ_HSF 1 "register_operand" "")]
   "TARGET_SIMD"
-  "fcvtl\\t%0.2d, %1.2s"
+  {
+    rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+    emit_insn (gen_aarch64_simd_vec_unpacks_lo_<mode> (operands[0],
+						       operands[1], p));
+    DONE;
+  }
+)
+
+(define_insn "aarch64_simd_vec_unpacks_hi_<mode>"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (float_extend:<VWIDE> (vec_select:<VHALF>
+			       (match_operand:VQ_HSF 1 "register_operand" "w")
+			       (match_operand:VQ_HSF 2 "vect_par_cnst_hi_half" "")
+			    )))]
+  "TARGET_SIMD"
+  "fcvtl2\\t%0.<Vwtype>, %1.<Vtype>"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "vec_unpacks_hi_v4sf"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-	(float_extend:V2DF
-	  (vec_select:V2SF
-	    (match_operand:V4SF 1 "register_operand" "w")
-	    (parallel [(const_int 2) (const_int 3)])
-	  )))]
+(define_expand "vec_unpacks_hi_<mode>"
+  [(match_operand:<VWIDE> 0 "register_operand" "")
+   (match_operand:VQ_HSF 1 "register_operand" "")]
   "TARGET_SIMD"
-  "fcvtl2\\t%0.2d, %1.4s"
+  {
+    rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+    emit_insn (gen_aarch64_simd_vec_unpacks_lo_<mode> (operands[0],
+						       operands[1], p));
+    DONE;
+  }
+)
+(define_insn "aarch64_float_extend_lo_<Vwide>"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+	(float_extend:<VWIDE>
+	  (match_operand:VDF 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "fcvtl\\t%0<Vmwtype>, %1<Vmtype>"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
 ;; Float narrowing operations.
 
-(define_insn "aarch64_float_truncate_lo_v2sf"
-  [(set (match_operand:V2SF 0 "register_operand" "=w")
-      (float_truncate:V2SF
-	(match_operand:V2DF 1 "register_operand" "w")))]
+(define_insn "aarch64_float_truncate_lo_<mode>"
+  [(set (match_operand:VDF 0 "register_operand" "=w")
+      (float_truncate:VDF
+	(match_operand:<VWIDE> 1 "register_operand" "w")))]
   "TARGET_SIMD"
-  "fcvtn\\t%0.2s, %1.2d"
+  "fcvtn\\t%0.<Vtype>, %1<Vmwtype>"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
-(define_insn "aarch64_float_truncate_hi_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=w")
-    (vec_concat:V4SF
-      (match_operand:V2SF 1 "register_operand" "0")
-      (float_truncate:V2SF
-	(match_operand:V2DF 2 "register_operand" "w"))))]
+(define_insn "aarch64_float_truncate_hi_<Vdbl>"
+  [(set (match_operand:<VDBL> 0 "register_operand" "=w")
+    (vec_concat:<VDBL>
+      (match_operand:VDF 1 "register_operand" "0")
+      (float_truncate:VDF
+	(match_operand:<VWIDE> 2 "register_operand" "w"))))]
   "TARGET_SIMD"
-  "fcvtn2\\t%0.4s, %2.2d"
+  "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
@@ -2450,7 +2472,7 @@
 (define_insn "aarch64_get_lane<mode>"
   [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
 	(vec_select:<VEL>
-	  (match_operand:VALL 1 "register_operand" "w, w, w")
+	  (match_operand:VALL_F16 1 "register_operand" "w, w, w")
 	  (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
   "TARGET_SIMD"
   {
@@ -4243,8 +4265,9 @@
 )
 
 (define_insn "aarch64_be_ld1<mode>"
-  [(set (match_operand:VALLDI 0	"register_operand" "=w")
-	(unspec:VALLDI [(match_operand:VALLDI 1 "aarch64_simd_struct_operand" "Utv")]
+  [(set (match_operand:VALLDI_F16 0	"register_operand" "=w")
+	(unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1
+			     "aarch64_simd_struct_operand" "Utv")]
 	UNSPEC_LD1))]
   "TARGET_SIMD"
   "ld1\\t{%0<Vmtype>}, %1"
@@ -4252,8 +4275,8 @@
 )
 
 (define_insn "aarch64_be_st1<mode>"
-  [(set (match_operand:VALLDI 0 "aarch64_simd_struct_operand" "=Utv")
-	(unspec:VALLDI [(match_operand:VALLDI 1 "register_operand" "w")]
+  [(set (match_operand:VALLDI_F16 0 "aarch64_simd_struct_operand" "=Utv")
+	(unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 "register_operand" "w")]
 	UNSPEC_ST1))]
   "TARGET_SIMD"
   "st1\\t{%1<Vmtype>}, %0"
@@ -4542,16 +4565,16 @@
   DONE;
 })
 
-(define_expand "aarch64_ld1<VALL:mode>"
- [(match_operand:VALL 0 "register_operand")
+(define_expand "aarch64_ld1<VALL_F16:mode>"
+ [(match_operand:VALL_F16 0 "register_operand")
   (match_operand:DI 1 "register_operand")]
   "TARGET_SIMD"
 {
-  machine_mode mode = <VALL:MODE>mode;
+  machine_mode mode = <VALL_F16:MODE>mode;
   rtx mem = gen_rtx_MEM (mode, operands[1]);
 
   if (BYTES_BIG_ENDIAN)
-    emit_insn (gen_aarch64_be_ld1<VALL:mode> (operands[0], mem));
+    emit_insn (gen_aarch64_be_ld1<VALL_F16:mode> (operands[0], mem));
   else
     emit_move_insn (operands[0], mem);
   DONE;
@@ -4566,7 +4589,7 @@
   machine_mode mode = <VSTRUCT:MODE>mode;
   rtx mem = gen_rtx_MEM (mode, operands[1]);
 
-  emit_insn (gen_vec_load_lanes<VSTRUCT:mode><VQ:mode> (operands[0], mem));
+  emit_insn (gen_aarch64_simd_ld<VSTRUCT:nregs><VQ:mode> (operands[0], mem));
   DONE;
 })
 
@@ -4669,9 +4692,9 @@
 ;; vec_perm support
 
 (define_expand "vec_perm_const<mode>"
-  [(match_operand:VALL 0 "register_operand")
-   (match_operand:VALL 1 "register_operand")
-   (match_operand:VALL 2 "register_operand")
+  [(match_operand:VALL_F16 0 "register_operand")
+   (match_operand:VALL_F16 1 "register_operand")
+   (match_operand:VALL_F16 2 "register_operand")
    (match_operand:<V_cmp_result> 3)]
   "TARGET_SIMD"
 {
@@ -4849,7 +4872,7 @@
   machine_mode mode = <VSTRUCT:MODE>mode;
   rtx mem = gen_rtx_MEM (mode, operands[0]);
 
-  emit_insn (gen_vec_store_lanes<VSTRUCT:mode><VQ:mode> (mem, operands[1]));
+  emit_insn (gen_aarch64_simd_st<VSTRUCT:nregs><VQ:mode> (mem, operands[1]));
   DONE;
 })
 
@@ -4895,16 +4918,16 @@
   DONE;
 })
 
-(define_expand "aarch64_st1<VALL:mode>"
+(define_expand "aarch64_st1<VALL_F16:mode>"
  [(match_operand:DI 0 "register_operand")
-  (match_operand:VALL 1 "register_operand")]
+  (match_operand:VALL_F16 1 "register_operand")]
   "TARGET_SIMD"
 {
-  machine_mode mode = <VALL:MODE>mode;
+  machine_mode mode = <VALL_F16:MODE>mode;
   rtx mem = gen_rtx_MEM (mode, operands[0]);
 
   if (BYTES_BIG_ENDIAN)
-    emit_insn (gen_aarch64_be_st1<VALL:mode> (mem, operands[1]));
+    emit_insn (gen_aarch64_be_st1<VALL_F16:mode> (mem, operands[1]));
   else
     emit_move_insn (mem, operands[1]);
   DONE;
@@ -4935,7 +4958,7 @@
 ;; Standard pattern name vec_init<mode>.
 
 (define_expand "vec_init<mode>"
-  [(match_operand:VALL 0 "register_operand" "")
+  [(match_operand:VALL_F16 0 "register_operand" "")
    (match_operand 1 "" "")]
   "TARGET_SIMD"
 {
@@ -4944,8 +4967,8 @@
 })
 
 (define_insn "*aarch64_simd_ld1r<mode>"
-  [(set (match_operand:VALL 0 "register_operand" "=w")
-	(vec_duplicate:VALL
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+	(vec_duplicate:VALL_F16
 	  (match_operand:<VEL> 1 "aarch64_simd_struct_operand" "Utv")))]
   "TARGET_SIMD"
   "ld1r\\t{%0.<Vtype>}, %1"
@@ -4992,7 +5015,7 @@
 
 (define_expand "vec_extract<mode>"
   [(match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "")
-   (match_operand:VALL 1 "register_operand" "")
+   (match_operand:VALL_F16 1 "register_operand" "")
    (match_operand:SI 2 "immediate_operand" "")]
   "TARGET_SIMD"
 {
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bc612e47d4f..b2a481b4c29 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1335,6 +1335,9 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
 	case V2SImode:
 	  gen = gen_aarch64_simd_combinev2si;
 	  break;
+	case V4HFmode:
+	  gen = gen_aarch64_simd_combinev4hf;
+	  break;
 	case V2SFmode:
 	  gen = gen_aarch64_simd_combinev2sf;
 	  break;
@@ -1383,6 +1386,9 @@ aarch64_split_simd_move (rtx dst, rtx src)
 	case V2DImode:
 	  gen = gen_aarch64_split_simd_movv2di;
 	  break;
+	case V8HFmode:
+	  gen = gen_aarch64_split_simd_movv8hf;
+	  break;
 	case V4SFmode:
 	  gen = gen_aarch64_split_simd_movv4sf;
 	  break;
@@ -6763,6 +6769,25 @@ cost_plus:
       return true;
 
     case MOD:
+    /* We can expand signed mod by power of 2 using a NEGS, two parallel
+       ANDs and a CSNEG.  Assume here that CSNEG is the same as the cost of
+       an unconditional negate.  This case should only ever be reached through
+       the set_smod_pow2_cheap check in expmed.c.  */
+      if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && (mode == SImode || mode == DImode))
+	{
+	  /* We expand to 4 instructions.  Reset the baseline.  */
+	  *cost = COSTS_N_INSNS (4);
+
+	  if (speed)
+	    *cost += 2 * extra_cost->alu.logical
+		     + 2 * extra_cost->alu.arith;
+
+	  return true;
+	}
+
+    /* Fall-through.  */
     case UMOD:
       if (speed)
 	{
@@ -9780,6 +9805,7 @@ aarch64_vector_mode_supported_p (machine_mode mode)
 	  || mode == V2SImode  || mode == V4HImode
 	  || mode == V8QImode || mode == V2SFmode
 	  || mode == V4SFmode || mode == V2DFmode
+	  || mode == V4HFmode || mode == V8HFmode
 	  || mode == V1DFmode))
     return true;
 
@@ -11899,6 +11925,8 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
     case V4SImode: gen = gen_aarch64_dup_lanev4si; break;
     case V2SImode: gen = gen_aarch64_dup_lanev2si; break;
     case V2DImode: gen = gen_aarch64_dup_lanev2di; break;
+    case V8HFmode: gen = gen_aarch64_dup_lanev8hf; break;
+    case V4HFmode: gen = gen_aarch64_dup_lanev4hf; break;
     case V4SFmode: gen = gen_aarch64_dup_lanev4sf; break;
     case V2SFmode: gen = gen_aarch64_dup_lanev2sf; break;
     case V2DFmode: gen = gen_aarch64_dup_lanev2df; break;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1e5f5dbd4fa..9669e014882 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -875,7 +875,8 @@ extern enum aarch64_code_model aarch64_cmodel;
 /* Modes valid for AdvSIMD Q registers.  */
 #define AARCH64_VALID_SIMD_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode || mode == V2DFmode)
+   || (MODE) == V4SFmode || (MODE) == V8HFmode || (MODE) == V2DImode \
+   || (MODE) == V2DFmode)
 
 #define ENDIAN_LANE_N(mode, n)  \
   (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 25229824fb5..5a005b572c8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -313,6 +313,62 @@
   }
 )
 
+;; Expansion of signed mod by a power of 2 using CSNEG.
+;; For x0 % n where n is a power of 2 produce:
+;; negs   x1, x0
+;; and    x0, x0, #(n - 1)
+;; and    x1, x1, #(n - 1)
+;; csneg  x0, x0, x1, mi
+
+(define_expand "mod<mode>3"
+  [(match_operand:GPI 0 "register_operand" "")
+   (match_operand:GPI 1 "register_operand" "")
+   (match_operand:GPI 2 "const_int_operand" "")]
+  ""
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (val) <= 0
+       || !aarch64_bitmask_imm (val - 1, <MODE>mode))
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp    x0, xzr
+	and    x0, x0, 1
+	cneg   x0, x0, lt.  */
+    if (val == 2)
+      {
+	rtx masked = gen_reg_rtx (<MODE>mode);
+	rtx ccreg = aarch64_gen_compare_reg (LT, operands[1], const0_rtx);
+	emit_insn (gen_and<mode>3 (masked, operands[1], mask));
+	rtx x = gen_rtx_LT (VOIDmode, ccreg, const0_rtx);
+	emit_insn (gen_csneg3<mode>_insn (operands[0], x, masked, masked));
+	DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (<MODE>mode);
+    rtx_insn *insn = emit_insn (gen_neg<mode>2_compare0 (neg_op, operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (VOIDmode, cc_reg, const0_rtx);
+
+    rtx masked_pos = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_pos, operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_and<mode>3 (masked_neg, neg_op, mask));
+
+    emit_insn (gen_csneg3<mode>_insn (operands[0], cond,
+				       masked_neg, masked_pos));
+    DONE;
+  }
+)
+
 (define_insn "*condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
@@ -1043,8 +1099,8 @@
 })
 
 (define_expand "mov<mode>"
-  [(set (match_operand:GPF_F16 0 "nonimmediate_operand" "")
-	(match_operand:GPF_F16 1 "general_operand" ""))]
+  [(set (match_operand:GPF_TF_F16 0 "nonimmediate_operand" "")
+	(match_operand:GPF_TF_F16 1 "general_operand" ""))]
   ""
   {
     if (!TARGET_FLOAT)
@@ -1118,24 +1174,6 @@
                      f_loadd,f_stored,load1,store1,mov_reg")]
 )
 
-(define_expand "movtf"
-  [(set (match_operand:TF 0 "nonimmediate_operand" "")
-	(match_operand:TF 1 "general_operand" ""))]
-  ""
-  {
-    if (!TARGET_FLOAT)
-      {
-	aarch64_err_no_fpadvsimd (TFmode, "code");
-	FAIL;
-      }
-
-    if (GET_CODE (operands[0]) == MEM
-        && ! (GET_CODE (operands[1]) == CONST_DOUBLE
-	      && aarch64_float_const_zero_rtx_p (operands[1])))
-      operands[1] = force_reg (TFmode, operands[1]);
-  }
-)
-
 (define_insn "*movtf_aarch64"
   [(set (match_operand:TF 0
 	 "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump,Ump")
@@ -2457,7 +2495,7 @@
   [(set_attr "type" "adc_reg")]
 )
 
-(define_insn "*neg<mode>2_compare0"
+(define_insn "neg<mode>2_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ (neg:GPI (match_operand:GPI 1 "register_operand" "r"))
 		       (const_int 0)))
@@ -3501,7 +3539,7 @@
 	 (const_int 0)))]
   ""
   "tst\\t%<w>0, %<w>1"
-  [(set_attr "type" "logics_reg")]
+  [(set_attr "type" "logics_reg,logics_imm")]
 )
 
 (define_insn "*and_<SHIFT:optab><mode>3nr_compare0"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6dfebe7eea6..91ada618b79 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -39,6 +39,7 @@ typedef __Int8x8_t int8x8_t;
 typedef __Int16x4_t int16x4_t;
 typedef __Int32x2_t int32x2_t;
 typedef __Int64x1_t int64x1_t;
+typedef __Float16x4_t float16x4_t;
 typedef __Float32x2_t float32x2_t;
 typedef __Poly8x8_t poly8x8_t;
 typedef __Poly16x4_t poly16x4_t;
@@ -51,6 +52,7 @@ typedef __Int8x16_t int8x16_t;
 typedef __Int16x8_t int16x8_t;
 typedef __Int32x4_t int32x4_t;
 typedef __Int64x2_t int64x2_t;
+typedef __Float16x8_t float16x8_t;
 typedef __Float32x4_t float32x4_t;
 typedef __Float64x2_t float64x2_t;
 typedef __Poly8x16_t poly8x16_t;
@@ -66,6 +68,7 @@ typedef __Poly16_t poly16_t;
 typedef __Poly64_t poly64_t;
 typedef __Poly128_t poly128_t;
 
+typedef __fp16 float16_t;
 typedef float float32_t;
 typedef double float64_t;
 
@@ -149,6 +152,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -269,6 +282,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -389,6 +412,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -2640,6 +2673,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t) {__a};
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -2690,6 +2729,12 @@ vcreate_p16 (uint64_t __a)
 
 /* vget_lane  */
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vget_lane_f16 (float16x4_t __a, const int __b)
+{
+  return __aarch64_vget_lane_any (__a, __b);
+}
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -2764,6 +2809,12 @@ vget_lane_u64 (uint64x1_t __a, const int __b)
 
 /* vgetq_lane  */
 
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vgetq_lane_f16 (float16x8_t __a, const int __b)
+{
+  return __aarch64_vget_lane_any (__a, __b);
+}
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
@@ -2839,6 +2890,12 @@ vgetq_lane_u64 (uint64x2_t __a, const int __b)
 /* vreinterpret  */
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f16 (float16x4_t __a)
+{
+  return (poly8x8_t) __a;
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_f64 (float64x1_t __a)
 {
   return (poly8x8_t) __a;
@@ -2935,6 +2992,12 @@ vreinterpretq_p8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t) __a;
@@ -2971,6 +3034,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f16 (float16x4_t __a)
+{
+  return (poly16x4_t) __a;
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
 vreinterpret_p16_f64 (float64x1_t __a)
 {
   return (poly16x4_t) __a;
@@ -3067,6 +3136,12 @@ vreinterpretq_p16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
   return (poly16x8_t) __a;
@@ -3102,6 +3177,156 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
   return (poly16x8_t) __a;
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f64 (float64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s8 (int8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s16 (int16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s32 (int32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s64 (int64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f32 (float32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u8 (uint8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u16 (uint16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u32 (uint32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u64 (uint64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p8 (poly8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p16 (poly16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_f64 (float64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s8 (int8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s16 (int16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s32 (int32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s64 (int64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_f32 (float32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u8 (uint8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u16 (uint16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u32 (uint32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u64 (uint64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p8 (poly8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p16 (poly16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_f16 (float16x4_t __a)
+{
+  return (float32x2_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vreinterpret_f32_f64 (float64x1_t __a)
 {
@@ -3169,6 +3394,12 @@ vreinterpret_f32_p16 (poly16x4_t __a)
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_f16 (float16x8_t __a)
+{
+  return (float32x4_t) __a;
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_f32_f64 (float64x2_t __a)
 {
   return (float32x4_t) __a;
@@ -3235,6 +3466,12 @@ vreinterpretq_f32_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline float64x1_t __attribute__((__always_inline__))
+vreinterpret_f64_f16 (float16x4_t __a)
+{
+  return (float64x1_t) __a;
+}
+
+__extension__ static __inline float64x1_t __attribute__((__always_inline__))
 vreinterpret_f64_f32 (float32x2_t __a)
 {
   return (float64x1_t) __a;
@@ -3301,6 +3538,12 @@ vreinterpret_f64_u64 (uint64x1_t __a)
 }
 
 __extension__ static __inline float64x2_t __attribute__((__always_inline__))
+vreinterpretq_f64_f16 (float16x8_t __a)
+{
+  return (float64x2_t) __a;
+}
+
+__extension__ static __inline float64x2_t __attribute__((__always_inline__))
 vreinterpretq_f64_f32 (float32x4_t __a)
 {
   return (float64x2_t) __a;
@@ -3367,6 +3610,12 @@ vreinterpretq_f64_u64 (uint64x2_t __a)
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_f16 (float16x4_t __a)
+{
+  return (int64x1_t) __a;
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vreinterpret_s64_f64 (float64x1_t __a)
 {
   return (int64x1_t) __a;
@@ -3457,6 +3706,12 @@ vreinterpretq_s64_s32 (int32x4_t __a)
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_f16 (float16x8_t __a)
+{
+  return (int64x2_t) __a;
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vreinterpretq_s64_f32 (float32x4_t __a)
 {
   return (int64x2_t) __a;
@@ -3499,6 +3754,12 @@ vreinterpretq_s64_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_f16 (float16x4_t __a)
+{
+  return (uint64x1_t) __a;
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vreinterpret_u64_f64 (float64x1_t __a)
 {
   return (uint64x1_t) __a;
@@ -3595,6 +3856,12 @@ vreinterpretq_u64_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_f16 (float16x8_t __a)
+{
+  return (uint64x2_t) __a;
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vreinterpretq_u64_f32 (float32x4_t __a)
 {
   return (uint64x2_t) __a;
@@ -3631,6 +3898,12 @@ vreinterpretq_u64_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_f16 (float16x4_t __a)
+{
+  return (int8x8_t) __a;
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vreinterpret_s8_f64 (float64x1_t __a)
 {
   return (int8x8_t) __a;
@@ -3721,6 +3994,12 @@ vreinterpretq_s8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_f16 (float16x8_t __a)
+{
+  return (int8x16_t) __a;
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_s8_f32 (float32x4_t __a)
 {
   return (int8x16_t) __a;
@@ -3763,6 +4042,12 @@ vreinterpretq_s8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_f16 (float16x4_t __a)
+{
+  return (int16x4_t) __a;
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vreinterpret_s16_f64 (float64x1_t __a)
 {
   return (int16x4_t) __a;
@@ -3853,6 +4138,12 @@ vreinterpretq_s16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_f16 (float16x8_t __a)
+{
+  return (int16x8_t) __a;
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_s16_f32 (float32x4_t __a)
 {
   return (int16x8_t) __a;
@@ -3895,6 +4186,12 @@ vreinterpretq_s16_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_f16 (float16x4_t __a)
+{
+  return (int32x2_t) __a;
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vreinterpret_s32_f64 (float64x1_t __a)
 {
   return (int32x2_t) __a;
@@ -3985,6 +4282,12 @@ vreinterpretq_s32_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_f16 (float16x8_t __a)
+{
+  return (int32x4_t) __a;
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_s32_f32 (float32x4_t __a)
 {
   return (int32x4_t) __a;
@@ -4027,6 +4330,12 @@ vreinterpretq_s32_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_f16 (float16x4_t __a)
+{
+  return (uint8x8_t) __a;
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vreinterpret_u8_f64 (float64x1_t __a)
 {
   return (uint8x8_t) __a;
@@ -4123,6 +4432,12 @@ vreinterpretq_u8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_f16 (float16x8_t __a)
+{
+  return (uint8x16_t) __a;
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_u8_f32 (float32x4_t __a)
 {
   return (uint8x16_t) __a;
@@ -4159,6 +4474,12 @@ vreinterpretq_u8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t) __a;
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vreinterpret_u16_f64 (float64x1_t __a)
 {
   return (uint16x4_t) __a;
@@ -4255,6 +4576,12 @@ vreinterpretq_u16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t) __a;
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_u16_f32 (float32x4_t __a)
 {
   return (uint16x8_t) __a;
@@ -4291,6 +4618,12 @@ vreinterpretq_u16_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_f16 (float16x4_t __a)
+{
+  return (uint32x2_t) __a;
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vreinterpret_u32_f64 (float64x1_t __a)
 {
   return (uint32x2_t) __a;
@@ -4387,6 +4720,12 @@ vreinterpretq_u32_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_f16 (float16x8_t __a)
+{
+  return (uint32x4_t) __a;
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_u32_f32 (float32x4_t __a)
 {
   return (uint32x4_t) __a;
@@ -4424,6 +4763,12 @@ vreinterpretq_u32_p16 (poly16x8_t __a)
 
 /* vset_lane  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vset_lane_f16 (float16_t __elem, float16x4_t __vec, const int __index)
+{
+  return __aarch64_vset_lane_any (__elem, __vec, __index);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vset_lane_f32 (float32_t __elem, float32x2_t __vec, const int __index)
 {
@@ -4498,6 +4843,12 @@ vset_lane_u64 (uint64_t __elem, uint64x1_t __vec, const int __index)
 
 /* vsetq_lane  */
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vsetq_lane_f16 (float16_t __elem, float16x8_t __vec, const int __index)
+{
+  return __aarch64_vset_lane_any (__elem, __vec, __index);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsetq_lane_f32 (float32_t __elem, float32x4_t __vec, const int __index)
 {
@@ -4575,6 +4926,12 @@ vsetq_lane_u64 (uint64_t __elem, uint64x2_t __vec, const int __index)
   uint64x1_t lo = vcreate_u64 (vgetq_lane_u64 (tmp, 0));  \
   return vreinterpret_##__TYPE##_u64 (lo);
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_low_f16 (float16x8_t __a)
+{
+  __GET_LOW (f16);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_low_f32 (float32x4_t __a)
 {
@@ -4654,6 +5011,12 @@ vget_low_u64 (uint64x2_t __a)
   uint64x1_t hi = vcreate_u64 (vgetq_lane_u64 (tmp, 1));	\
   return vreinterpret_##__TYPE##_u64 (hi);
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_high_f16 (float16x8_t __a)
+{
+  __GET_HIGH (f16);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_high_f32 (float32x4_t __a)
 {
@@ -4752,6 +5115,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return __builtin_aarch64_combinedi (__a[0], __b[0]);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcombine_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_combinev4hf (__a, __b);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcombine_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -5656,14 +6025,6 @@ vaddlvq_u32 (uint32x4_t a)
        result;                                                          \
      })
 
-/* vcvt_f16_f32 not supported */
-
-/* vcvt_f32_f16 not supported */
-
-/* vcvt_high_f16_f32 not supported */
-
-/* vcvt_high_f32_f16 not supported */
-
 #define vcvt_n_f32_s32(a, b)                                            \
   __extension__                                                         \
     ({                                                                  \
@@ -9880,7 +10241,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
    +------+----+----+----+----+
    |uint  | Y  | Y  | N  | N  |
    +------+----+----+----+----+
-   |float | -  | -  | N  | N  |
+   |float | -  | Y  | N  | N  |
    +------+----+----+----+----+
    |poly  | Y  | Y  | -  | -  |
    +------+----+----+----+----+
@@ -9894,7 +10255,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
    +------+----+----+----+----+
    |uint  | Y  | Y  | Y  | Y  |
    +------+----+----+----+----+
-   |float | -  | -  | Y  | Y  |
+   |float | -  | Y  | Y  | Y  |
    +------+----+----+----+----+
    |poly  | Y  | Y  | -  | -  |
    +------+----+----+----+----+
@@ -9908,7 +10269,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
    +------+----+----+----+----+
    |uint  | Y  | N  | N  | Y  |
    +------+----+----+----+----+
-   |float | -  | -  | N  | Y  |
+   |float | -  | N  | N  | Y  |
    +------+----+----+----+----+
    |poly  | Y  | N  | -  | -  |
    +------+----+----+----+----+
@@ -9924,6 +10285,7 @@ __STRUCTN (int, 8, 2)
 __STRUCTN (int, 16, 2)
 __STRUCTN (uint, 8, 2)
 __STRUCTN (uint, 16, 2)
+__STRUCTN (float, 16, 2)
 __STRUCTN (poly, 8, 2)
 __STRUCTN (poly, 16, 2)
 /* 3-element structs.  */
@@ -9935,6 +10297,7 @@ __STRUCTN (uint, 8, 3)
 __STRUCTN (uint, 16, 3)
 __STRUCTN (uint, 32, 3)
 __STRUCTN (uint, 64, 3)
+__STRUCTN (float, 16, 3)
 __STRUCTN (float, 32, 3)
 __STRUCTN (float, 64, 3)
 __STRUCTN (poly, 8, 3)
@@ -9972,6 +10335,8 @@ vst2_lane_ ## funcsuffix (ptrtype *__ptr,				     \
 				     __ptr, __o, __c);			     \
 }
 
+__ST2_LANE_FUNC (float16x4x2_t, float16x8x2_t, float16_t, v4hf, v8hf, hf, f16,
+		 float16x8_t)
 __ST2_LANE_FUNC (float32x2x2_t, float32x4x2_t, float32_t, v2sf, v4sf, sf, f32,
 		 float32x4_t)
 __ST2_LANE_FUNC (float64x1x2_t, float64x2x2_t, float64_t, df, v2df, df, f64,
@@ -10010,6 +10375,7 @@ vst2q_lane_ ## funcsuffix (ptrtype *__ptr,				    \
 				    __ptr, __temp.__o, __c);		    \
 }
 
+__ST2_LANE_FUNC (float16x8x2_t, float16_t, v8hf, hf, f16)
 __ST2_LANE_FUNC (float32x4x2_t, float32_t, v4sf, sf, f32)
 __ST2_LANE_FUNC (float64x2x2_t, float64_t, v2df, df, f64)
 __ST2_LANE_FUNC (poly8x16x2_t, poly8_t, v16qi, qi, p8)
@@ -10051,6 +10417,8 @@ vst3_lane_ ## funcsuffix (ptrtype *__ptr,				     \
 				     __ptr, __o, __c);			     \
 }
 
+__ST3_LANE_FUNC (float16x4x3_t, float16x8x3_t, float16_t, v4hf, v8hf, hf, f16,
+		 float16x8_t)
 __ST3_LANE_FUNC (float32x2x3_t, float32x4x3_t, float32_t, v2sf, v4sf, sf, f32,
 		 float32x4_t)
 __ST3_LANE_FUNC (float64x1x3_t, float64x2x3_t, float64_t, df, v2df, df, f64,
@@ -10089,6 +10457,7 @@ vst3q_lane_ ## funcsuffix (ptrtype *__ptr,				    \
 				    __ptr, __temp.__o, __c);		    \
 }
 
+__ST3_LANE_FUNC (float16x8x3_t, float16_t, v8hf, hf, f16)
 __ST3_LANE_FUNC (float32x4x3_t, float32_t, v4sf, sf, f32)
 __ST3_LANE_FUNC (float64x2x3_t, float64_t, v2df, df, f64)
 __ST3_LANE_FUNC (poly8x16x3_t, poly8_t, v16qi, qi, p8)
@@ -10135,6 +10504,8 @@ vst4_lane_ ## funcsuffix (ptrtype *__ptr,				     \
 				     __ptr, __o, __c);			     \
 }
 
+__ST4_LANE_FUNC (float16x4x4_t, float16x8x4_t, float16_t, v4hf, v8hf, hf, f16,
+		 float16x8_t)
 __ST4_LANE_FUNC (float32x2x4_t, float32x4x4_t, float32_t, v2sf, v4sf, sf, f32,
 		 float32x4_t)
 __ST4_LANE_FUNC (float64x1x4_t, float64x2x4_t, float64_t, df, v2df, df, f64,
@@ -10173,6 +10544,7 @@ vst4q_lane_ ## funcsuffix (ptrtype *__ptr,				    \
 				    __ptr, __temp.__o, __c);		    \
 }
 
+__ST4_LANE_FUNC (float16x8x4_t, float16_t, v8hf, hf, f16)
 __ST4_LANE_FUNC (float32x4x4_t, float32_t, v4sf, sf, f32)
 __ST4_LANE_FUNC (float64x2x4_t, float64_t, v2df, df, f64)
 __ST4_LANE_FUNC (poly8x16x4_t, poly8_t, v16qi, qi, p8)
@@ -13034,6 +13406,18 @@ vcntq_u8 (uint8x16_t __a)
 
 /* vcvt (double -> float).  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_f32 (float32x4_t __a)
+{
+  return __builtin_aarch64_float_truncate_lo_v4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b)
+{
+  return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcvt_f32_f64 (float64x2_t __a)
 {
@@ -13048,6 +13432,12 @@ vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b)
 
 /* vcvt (float -> double).  */
 
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvt_f32_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_float_extend_lo_v4sf (__a);
+}
+
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vcvt_f64_f32 (float32x2_t __a)
 {
@@ -13055,6 +13445,12 @@ vcvt_f64_f32 (float32x2_t __a)
   return __builtin_aarch64_float_extend_lo_v2df (__a);
 }
 
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvt_high_f32_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_vec_unpacks_hi_v8hf (__a);
+}
+
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vcvt_high_f64_f32 (float32x4_t __a)
 {
@@ -14628,6 +15024,12 @@ vfmsq_laneq_f64 (float64x2_t __a, float64x2_t __b,
 
 /* vld1 */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_f16 (const float16_t *__a)
+{
+  return __builtin_aarch64_ld1v4hf (__a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_f32 (const float32_t *a)
 {
@@ -14707,6 +15109,12 @@ vld1_u64 (const uint64_t *a)
 
 /* vld1q */
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_f16 (const float16_t *__a)
+{
+  return __builtin_aarch64_ld1v8hf (__a);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_f32 (const float32_t *a)
 {
@@ -14787,6 +15195,13 @@ vld1q_u64 (const uint64_t *a)
 
 /* vld1_dup  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t* __a)
+{
+  float16_t __f = *__a;
+  return (float16x4_t) { __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_dup_f32 (const float32_t* __a)
 {
@@ -14861,6 +15276,13 @@ vld1_dup_u64 (const uint64_t* __a)
 
 /* vld1q_dup  */
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_f16 (const float16_t* __a)
+{
+  float16_t __f = *__a;
+  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_dup_f32 (const float32_t* __a)
 {
@@ -14935,6 +15357,12 @@ vld1q_dup_u64 (const uint64_t* __a)
 
 /* vld1_lane  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_lane_f16 (const float16_t *__src, float16x4_t __vec, const int __lane)
+{
+  return __aarch64_vset_lane_any (*__src, __vec, __lane);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_lane_f32 (const float32_t *__src, float32x2_t __vec, const int __lane)
 {
@@ -15009,6 +15437,12 @@ vld1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
 
 /* vld1q_lane  */
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_f16 (const float16_t *__src, float16x8_t __vec, const int __lane)
+{
+  return __aarch64_vset_lane_any (*__src, __vec, __lane);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_lane_f32 (const float32_t *__src, float32x4_t __vec, const int __lane)
 {
@@ -15204,6 +15638,17 @@ vld2_u32 (const uint32_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vld2_f16 (const float16_t * __a)
+{
+  float16x4x2_t ret;
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_ld2v4hf (__a);
+  ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_dregoiv4hf (__o, 1);
+  return ret;
+}
+
 __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
 vld2_f32 (const float32_t * __a)
 {
@@ -15325,6 +15770,17 @@ vld2q_u64 (const uint64_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vld2q_f16 (const float16_t * __a)
+{
+  float16x8x2_t ret;
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_ld2v8hf (__a);
+  ret.val[0] = __builtin_aarch64_get_qregoiv8hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1);
+  return ret;
+}
+
 __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
 vld2q_f32 (const float32_t * __a)
 {
@@ -15479,6 +15935,18 @@ vld3_u32 (const uint32_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+vld3_f16 (const float16_t * __a)
+{
+  float16x4x3_t ret;
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_ld3v4hf (__a);
+  ret.val[0] = __builtin_aarch64_get_dregciv4hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_dregciv4hf (__o, 1);
+  ret.val[2] = __builtin_aarch64_get_dregciv4hf (__o, 2);
+  return ret;
+}
+
 __extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
 vld3_f32 (const float32_t * __a)
 {
@@ -15611,6 +16079,18 @@ vld3q_u64 (const uint64_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x8x3_t __attribute__ ((__always_inline__))
+vld3q_f16 (const float16_t * __a)
+{
+  float16x8x3_t ret;
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_ld3v8hf (__a);
+  ret.val[0] = __builtin_aarch64_get_qregciv8hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_qregciv8hf (__o, 1);
+  ret.val[2] = __builtin_aarch64_get_qregciv8hf (__o, 2);
+  return ret;
+}
+
 __extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
 vld3q_f32 (const float32_t * __a)
 {
@@ -15778,6 +16258,19 @@ vld4_u32 (const uint32_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+vld4_f16 (const float16_t * __a)
+{
+  float16x4x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld4v4hf (__a);
+  ret.val[0] = __builtin_aarch64_get_dregxiv4hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_dregxiv4hf (__o, 1);
+  ret.val[2] = __builtin_aarch64_get_dregxiv4hf (__o, 2);
+  ret.val[3] = __builtin_aarch64_get_dregxiv4hf (__o, 3);
+  return ret;
+}
+
 __extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
 vld4_f32 (const float32_t * __a)
 {
@@ -15921,6 +16414,19 @@ vld4q_u64 (const uint64_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x8x4_t __attribute__ ((__always_inline__))
+vld4q_f16 (const float16_t * __a)
+{
+  float16x8x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld4v8hf (__a);
+  ret.val[0] = __builtin_aarch64_get_qregxiv8hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_qregxiv8hf (__o, 1);
+  ret.val[2] = __builtin_aarch64_get_qregxiv8hf (__o, 2);
+  ret.val[3] = __builtin_aarch64_get_qregxiv8hf (__o, 3);
+  return ret;
+}
+
 __extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
 vld4q_f32 (const float32_t * __a)
 {
@@ -15982,6 +16488,17 @@ vld2_dup_s32 (const int32_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vld2_dup_f16 (const float16_t * __a)
+{
+  float16x4x2_t ret;
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_ld2rv4hf ((const __builtin_aarch64_simd_hf *) __a);
+  ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0);
+  ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregoiv4hf (__o, 1);
+  return ret;
+}
+
 __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
 vld2_dup_f32 (const float32_t * __a)
 {
@@ -16191,6 +16708,17 @@ vld2q_dup_u64 (const uint64_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vld2q_dup_f16 (const float16_t * __a)
+{
+  float16x8x2_t ret;
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_ld2rv8hf ((const __builtin_aarch64_simd_hf *) __a);
+  ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregoiv8hf (__o, 0);
+  ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1);
+  return ret;
+}
+
 __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
 vld2q_dup_f32 (const float32_t * __a)
 {
@@ -16345,6 +16873,18 @@ vld3_dup_u32 (const uint32_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+vld3_dup_f16 (const float16_t * __a)
+{
+  float16x4x3_t ret;
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_ld3rv4hf ((const __builtin_aarch64_simd_hf *) __a);
+  ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 0);
+  ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 1);
+  ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 2);
+  return ret;
+}
+
 __extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
 vld3_dup_f32 (const float32_t * __a)
 {
@@ -16477,6 +17017,18 @@ vld3q_dup_u64 (const uint64_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x8x3_t __attribute__ ((__always_inline__))
+vld3q_dup_f16 (const float16_t * __a)
+{
+  float16x8x3_t ret;
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_ld3rv8hf ((const __builtin_aarch64_simd_hf *) __a);
+  ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 0);
+  ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 1);
+  ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 2);
+  return ret;
+}
+
 __extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
 vld3q_dup_f32 (const float32_t * __a)
 {
@@ -16644,6 +17196,19 @@ vld4_dup_u32 (const uint32_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+vld4_dup_f16 (const float16_t * __a)
+{
+  float16x4x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld4rv4hf ((const __builtin_aarch64_simd_hf *) __a);
+  ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 0);
+  ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 1);
+  ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 2);
+  ret.val[3] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 3);
+  return ret;
+}
+
 __extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
 vld4_dup_f32 (const float32_t * __a)
 {
@@ -16787,6 +17352,19 @@ vld4q_dup_u64 (const uint64_t * __a)
   return ret;
 }
 
+__extension__ static __inline float16x8x4_t __attribute__ ((__always_inline__))
+vld4q_dup_f16 (const float16_t * __a)
+{
+  float16x8x4_t ret;
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_ld4rv8hf ((const __builtin_aarch64_simd_hf *) __a);
+  ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 0);
+  ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 1);
+  ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 2);
+  ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 3);
+  return ret;
+}
+
 __extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
 vld4q_dup_f32 (const float32_t * __a)
 {
@@ -16839,6 +17417,8 @@ vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c)  \
   return __b;								   \
 }
 
+__LD2_LANE_FUNC (float16x4x2_t, float16x4_t, float16x8x2_t, float16_t, v4hf,
+		 v8hf, hf, f16, float16x8_t)
 __LD2_LANE_FUNC (float32x2x2_t, float32x2_t, float32x4x2_t, float32_t, v2sf, v4sf,
 		 sf, f32, float32x4_t)
 __LD2_LANE_FUNC (float64x1x2_t, float64x1_t, float64x2x2_t, float64_t, df, v2df,
@@ -16883,6 +17463,7 @@ vld2q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
   return ret;								   \
 }
 
+__LD2_LANE_FUNC (float16x8x2_t, float16x8_t, float16_t, v8hf, hf, f16)
 __LD2_LANE_FUNC (float32x4x2_t, float32x4_t, float32_t, v4sf, sf, f32)
 __LD2_LANE_FUNC (float64x2x2_t, float64x2_t, float64_t, v2df, df, f64)
 __LD2_LANE_FUNC (poly8x16x2_t, poly8x16_t, poly8_t, v16qi, qi, p8)
@@ -16930,6 +17511,8 @@ vld3_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c)  \
   return __b;								   \
 }
 
+__LD3_LANE_FUNC (float16x4x3_t, float16x4_t, float16x8x3_t, float16_t, v4hf,
+		 v8hf, hf, f16, float16x8_t)
 __LD3_LANE_FUNC (float32x2x3_t, float32x2_t, float32x4x3_t, float32_t, v2sf, v4sf,
 		 sf, f32, float32x4_t)
 __LD3_LANE_FUNC (float64x1x3_t, float64x1_t, float64x2x3_t, float64_t, df, v2df,
@@ -16976,6 +17559,7 @@ vld3q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
   return ret;								   \
 }
 
+__LD3_LANE_FUNC (float16x8x3_t, float16x8_t, float16_t, v8hf, hf, f16)
 __LD3_LANE_FUNC (float32x4x3_t, float32x4_t, float32_t, v4sf, sf, f32)
 __LD3_LANE_FUNC (float64x2x3_t, float64x2_t, float64_t, v2df, df, f64)
 __LD3_LANE_FUNC (poly8x16x3_t, poly8x16_t, poly8_t, v16qi, qi, p8)
@@ -17031,6 +17615,8 @@ vld4_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c)  \
 
 /* vld4q_lane */
 
+__LD4_LANE_FUNC (float16x4x4_t, float16x4_t, float16x8x4_t, float16_t, v4hf,
+		 v8hf, hf, f16, float16x8_t)
 __LD4_LANE_FUNC (float32x2x4_t, float32x2_t, float32x4x4_t, float32_t, v2sf, v4sf,
 		 sf, f32, float32x4_t)
 __LD4_LANE_FUNC (float64x1x4_t, float64x1_t, float64x2x4_t, float64_t, df, v2df,
@@ -17079,6 +17665,7 @@ vld4q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
   return ret;								   \
 }
 
+__LD4_LANE_FUNC (float16x8x4_t, float16x8_t, float16_t, v8hf, hf, f16)
 __LD4_LANE_FUNC (float32x4x4_t, float32x4_t, float32_t, v4sf, sf, f32)
 __LD4_LANE_FUNC (float64x2x4_t, float64x2_t, float64_t, v2df, df, f64)
 __LD4_LANE_FUNC (poly8x16x4_t, poly8x16_t, poly8_t, v16qi, qi, p8)
@@ -21977,6 +22564,12 @@ vsrid_n_u64 (uint64_t __a, uint64_t __b, const int __c)
 /* vst1 */
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_f16 (float16_t *__a, float16x4_t __b)
+{
+  __builtin_aarch64_st1v4hf (__a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_f32 (float32_t *a, float32x2_t b)
 {
   __builtin_aarch64_st1v2sf ((__builtin_aarch64_simd_sf *) a, b);
@@ -22056,6 +22649,12 @@ vst1_u64 (uint64_t *a, uint64x1_t b)
 /* vst1q */
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_f16 (float16_t *__a, float16x8_t __b)
+{
+  __builtin_aarch64_st1v8hf (__a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst1q_f32 (float32_t *a, float32x4_t b)
 {
   __builtin_aarch64_st1v4sf ((__builtin_aarch64_simd_sf *) a, b);
@@ -22136,6 +22735,12 @@ vst1q_u64 (uint64_t *a, uint64x2_t b)
 /* vst1_lane */
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_f16 (float16_t *__a, float16x4_t __b, const int __lane)
+{
+  *__a = __aarch64_vget_lane_any (__b, __lane);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_lane_f32 (float32_t *__a, float32x2_t __b, const int __lane)
 {
   *__a = __aarch64_vget_lane_any (__b, __lane);
@@ -22210,6 +22815,12 @@ vst1_lane_u64 (uint64_t *__a, uint64x1_t __b, const int __lane)
 /* vst1q_lane */
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_f16 (float16_t *__a, float16x8_t __b, const int __lane)
+{
+  *__a = __aarch64_vget_lane_any (__b, __lane);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst1q_lane_f32 (float32_t *__a, float32x4_t __b, const int __lane)
 {
   *__a = __aarch64_vget_lane_any (__b, __lane);
@@ -22416,6 +23027,18 @@ vst2_u32 (uint32_t * __a, uint32x2x2_t val)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_f16 (float16_t * __a, float16x4x2_t val)
+{
+  __builtin_aarch64_simd_oi __o;
+  float16x8x2_t temp;
+  temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[1], 1);
+  __builtin_aarch64_st2v4hf (__a, __o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst2_f32 (float32_t * __a, float32x2x2_t val)
 {
   __builtin_aarch64_simd_oi __o;
@@ -22518,6 +23141,15 @@ vst2q_u64 (uint64_t * __a, uint64x2x2_t val)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_f16 (float16_t * __a, float16x8x2_t val)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[1], 1);
+  __builtin_aarch64_st2v8hf (__a, __o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst2q_f32 (float32_t * __a, float32x4x2_t val)
 {
   __builtin_aarch64_simd_oi __o;
@@ -22690,6 +23322,20 @@ vst3_u32 (uint32_t * __a, uint32x2x3_t val)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_f16 (float16_t * __a, float16x4x3_t val)
+{
+  __builtin_aarch64_simd_ci __o;
+  float16x8x3_t temp;
+  temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[2], 2);
+  __builtin_aarch64_st3v4hf ((__builtin_aarch64_simd_hf *) __a, __o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst3_f32 (float32_t * __a, float32x2x3_t val)
 {
   __builtin_aarch64_simd_ci __o;
@@ -22804,6 +23450,16 @@ vst3q_u64 (uint64_t * __a, uint64x2x3_t val)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_f16 (float16_t * __a, float16x8x3_t val)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[2], 2);
+  __builtin_aarch64_st3v8hf ((__builtin_aarch64_simd_hf *) __a, __o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst3q_f32 (float32_t * __a, float32x4x3_t val)
 {
   __builtin_aarch64_simd_ci __o;
@@ -23000,6 +23656,22 @@ vst4_u32 (uint32_t * __a, uint32x2x4_t val)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_f16 (float16_t * __a, float16x4x4_t val)
+{
+  __builtin_aarch64_simd_xi __o;
+  float16x8x4_t temp;
+  temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  temp.val[3] = vcombine_f16 (val.val[3], vcreate_f16 (__AARCH64_UINT64_C (0)));
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[3], 3);
+  __builtin_aarch64_st4v4hf ((__builtin_aarch64_simd_hf *) __a, __o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst4_f32 (float32_t * __a, float32x2x4_t val)
 {
   __builtin_aarch64_simd_xi __o;
@@ -23126,6 +23798,17 @@ vst4q_u64 (uint64_t * __a, uint64x2x4_t val)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_f16 (float16_t * __a, float16x8x4_t val)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[3], 3);
+  __builtin_aarch64_st4v8hf ((__builtin_aarch64_simd_hf *) __a, __o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst4q_f32 (float32_t * __a, float32x4x4_t val)
 {
   __builtin_aarch64_simd_xi __o;
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 475aa6e6d37..ff698001d68 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -38,8 +38,11 @@
 ;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes)
 (define_mode_iterator GPF [SF DF])
 
-;; Iterator for General Purpose Float registers, inc __fp16.
-(define_mode_iterator GPF_F16 [HF SF DF])
+;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
+(define_mode_iterator GPF_TF_F16 [HF SF DF TF])
+
+;; Double vector modes.
+(define_mode_iterator VDF [V2SF V4HF])
 
 ;; Integer vector modes.
 (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
@@ -52,7 +55,7 @@
 (define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI])
 
 ;; Double vector modes.
-(define_mode_iterator VD [V8QI V4HI V2SI V2SF])
+(define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF])
 
 ;; vector, 64-bit container, all integer modes
 (define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
@@ -61,10 +64,10 @@
 (define_mode_iterator VDQ_BHSI [V8QI V16QI V4HI V8HI V2SI V4SI])
 
 ;; Quad vector modes.
-(define_mode_iterator VQ [V16QI V8HI V4SI V2DI V4SF V2DF])
+(define_mode_iterator VQ [V16QI V8HI V4SI V2DI V8HF V4SF V2DF])
 
 ;; VQ without 2 element modes.
-(define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V4SF])
+(define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF])
 
 ;; Quad vector with only 2 element modes.
 (define_mode_iterator VQ_2E [V2DI V2DF])
@@ -79,7 +82,10 @@
 ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
 (define_mode_iterator PTR [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")])
 
-;; Vector Float modes.
+;; Vector Float modes suitable for moving, loading and storing.
+(define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
+
+;; Vector Float modes, barring HF modes.
 (define_mode_iterator VDQF [V2SF V4SF V2DF])
 
 ;; Vector Float modes, and DF.
@@ -88,6 +94,9 @@
 ;; Vector single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
 
+;; Quad vector Float modes with half/single elements.
+(define_mode_iterator VQ_HSF [V8HF V4SF])
+
 ;; Modes suitable to use as the return type of a vcond expression.
 (define_mode_iterator VDQF_COND [V2SF V2SI V4SF V4SI V2DF V2DI])
 
@@ -97,15 +106,23 @@
 ;; Vector Float modes with 2 elements.
 (define_mode_iterator V2F [V2SF V2DF])
 
-;; All modes.
+;; All vector modes on which we support any arithmetic operations.
 (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
 
-;; All vector modes and DI.
+;; All vector modes suitable for moving, loading, and storing.
+(define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
+				V4HF V8HF V2SF V4SF V2DF])
+
+;; All vector modes barring HF modes, plus DI.
 (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI])
 
-;; All vector modes and DI and DF.
+;; All vector modes and DI.
+(define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
+				  V4HF V8HF V2SF V4SF V2DF DI])
+
+;; All vector modes, plus DI and DF.
 (define_mode_iterator VALLDIF [V8QI V16QI V4HI V8HI V2SI V4SI
-			       V2DI V2SF V4SF V2DF DI DF])
+			       V2DI V4HF V8HF V2SF V4SF V2DF DI DF])
 
 ;; Vector modes for Integer reduction across lanes.
 (define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI V2DI])
@@ -126,7 +143,7 @@
 (define_mode_iterator VQW [V16QI V8HI V4SI])
 
 ;; Double vector modes for combines.
-(define_mode_iterator VDC [V8QI V4HI V2SI V2SF DI DF])
+(define_mode_iterator VDC [V8QI V4HI V4HF V2SI V2SF DI DF])
 
 ;; Vector modes except double int.
 (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
@@ -353,7 +370,8 @@
                          (V2SI "2s") (V4SI  "4s")
                          (DI   "1d") (DF    "1d")
                          (V2DI "2d") (V2SF "2s")
-			 (V4SF "4s") (V2DF "2d")])
+			 (V4SF "4s") (V2DF "2d")
+			 (V4HF "4h") (V8HF "8h")])
 
 (define_mode_attr Vrevsuff [(V4HI "16") (V8HI "16") (V2SI "32")
                             (V4SI "32") (V2DI "64")])
@@ -361,7 +379,8 @@
 (define_mode_attr Vmtype [(V8QI ".8b") (V16QI ".16b")
 			 (V4HI ".4h") (V8HI  ".8h")
 			 (V2SI ".2s") (V4SI  ".4s")
-			 (V2DI ".2d") (V2SF ".2s")
+			 (V2DI ".2d") (V4HF ".4h")
+			 (V8HF ".8h") (V2SF ".2s")
 			 (V4SF ".4s") (V2DF ".2d")
 			 (DI   "")    (SI   "")
 			 (HI   "")    (QI   "")
@@ -378,7 +397,8 @@
 (define_mode_attr Vetype [(V8QI "b") (V16QI "b")
 			  (V4HI "h") (V8HI  "h")
                           (V2SI "s") (V4SI  "s")
-			  (V2DI "d") (V2SF  "s")
+			  (V2DI "d") (V4HF "h")
+			  (V8HF "h") (V2SF  "s")
 			  (V4SF "s") (V2DF  "d")
 			  (SF   "s") (DF  "d")
 			  (QI "b")   (HI "h")
@@ -388,7 +408,8 @@
 (define_mode_attr Vbtype [(V8QI "8b")  (V16QI "16b")
 			  (V4HI "8b") (V8HI  "16b")
 			  (V2SI "8b") (V4SI  "16b")
-			  (V2DI "16b") (V2SF  "8b")
+			  (V2DI "16b") (V4HF "8b")
+			  (V8HF "16b") (V2SF  "8b")
 			  (V4SF "16b") (V2DF  "16b")
 			  (DI   "8b")  (DF    "8b")
 			  (SI   "8b")])
@@ -398,6 +419,7 @@
 			(V4HI "HI") (V8HI "HI")
                         (V2SI "SI") (V4SI "SI")
                         (DI "DI")   (V2DI "DI")
+                        (V4HF "HF") (V8HF "HF")
                         (V2SF "SF") (V4SF "SF")
                         (V2DF "DF") (DF "DF")
 			(SI   "SI") (HI   "HI")
@@ -416,6 +438,7 @@
 			 (V4HI "V8HI") (V8HI "V8HI")
 			 (V2SI "V4SI") (V4SI "V4SI")
 			 (DI   "V2DI") (V2DI "V2DI")
+			 (V4HF "V8HF") (V8HF "V8HF")
 			 (V2SF "V2SF") (V4SF "V4SF")
 			 (V2DF "V2DF") (SI   "V4SI")
 			 (HI   "V8HI") (QI   "V16QI")])
@@ -425,16 +448,22 @@
 			 (V4HI "V2HI")  (V8HI  "V4HI")
 			 (V2SI "SI")    (V4SI  "V2SI")
 			 (V2DI "DI")    (V2SF  "SF")
-			 (V4SF "V2SF")  (V2DF  "DF")])
+			 (V4SF "V2SF")  (V4HF "V2HF")
+			 (V8HF "V4HF")  (V2DF  "DF")])
 
 ;; Double modes of vector modes.
 (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI")
+			(V4HF "V8HF")
 			(V2SI "V4SI")  (V2SF "V4SF")
 			(SI   "V2SI")  (DI   "V2DI")
 			(DF   "V2DF")])
 
+;; Register suffix for double-length mode.
+(define_mode_attr Vdtype [(V4HF "8h") (V2SF "4s")])
+
 ;; Double modes of vector modes (lower case).
 (define_mode_attr Vdbl [(V8QI "v16qi") (V4HI "v8hi")
+			(V4HF "v8hf")
 			(V2SI "v4si")  (V2SF "v4sf")
 			(SI   "v2si")  (DI   "v2di")
 			(DF   "v2df")])
@@ -465,24 +494,31 @@
 (define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI")
 			 (V2SI "V2DI") (V16QI "V8HI") 
 			 (V8HI "V4SI") (V4SI "V2DI")
-			 (HI "SI")     (SI "DI")]
-
+			 (HI "SI")     (SI "DI")
+			 (V8HF "V4SF") (V4SF "V2DF")
+			 (V4HF "V4SF") (V2SF "V2DF")]
 )
 
-;; Widened mode register suffixes for VD_BHSI/VQW.
+;; Widened modes of vector modes, lowercase
+(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")])
+
+;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF.
 (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s")
 			  (V2SI "2d") (V16QI "8h") 
-			  (V8HI "4s") (V4SI "2d")])
+			  (V8HI "4s") (V4SI "2d")
+			  (V8HF "4s") (V4SF "2d")])
 
 ;; Widened mode register suffixes for VDW/VQW.
 (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
 			   (V2SI ".2d") (V16QI ".8h") 
 			   (V8HI ".4s") (V4SI ".2d")
+			   (V4HF ".4s") (V2SF ".2d")
 			   (SI   "")    (HI   "")])
 
-;; Lower part register suffixes for VQW.
+;; Lower part register suffixes for VQW/VQ_HSF.
 (define_mode_attr Vhalftype [(V16QI "8b") (V8HI "4h")
-			     (V4SI "2s")])
+			     (V4SI "2s") (V8HF "4h")
+			     (V4SF "2s")])
 
 ;; Define corresponding core/FP element mode for each vector mode.
 (define_mode_attr vw   [(V8QI "w") (V16QI "w")
@@ -509,6 +545,7 @@
 				(V4HI "V4HI") (V8HI  "V8HI")
 				(V2SI "V2SI") (V4SI  "V4SI")
 				(DI   "DI")   (V2DI  "V2DI")
+				(V4HF "V4HI") (V8HF  "V8HI")
 				(V2SF "V2SI") (V4SF  "V4SI")
 				(V2DF "V2DI") (DF    "DI")
 				(SF   "SI")])
@@ -518,6 +555,7 @@
 				(V4HI "v4hi") (V8HI  "v8hi")
 				(V2SI "v2si") (V4SI  "v4si")
 				(DI   "di")   (V2DI  "v2di")
+				(V4HF "v4hi") (V8HF  "v8hi")
 				(V2SF "v2si") (V4SF  "v4si")
 				(V2DF "v2di") (DF    "di")
 				(SF   "si")])
@@ -539,14 +577,17 @@
 (define_mode_attr nregs [(OI "2") (CI "3") (XI "4")])
 
 (define_mode_attr VRL2 [(V8QI "V32QI") (V4HI "V16HI")
+			(V4HF "V16HF")
 			(V2SI "V8SI")  (V2SF "V8SF")
 			(DI   "V4DI")  (DF   "V4DF")])
 
 (define_mode_attr VRL3 [(V8QI "V48QI") (V4HI "V24HI")
+			(V4HF "V24HF")
 			(V2SI "V12SI")  (V2SF "V12SF")
 			(DI   "V6DI")  (DF   "V6DF")])
 
 (define_mode_attr VRL4 [(V8QI "V64QI") (V4HI "V32HI")
+			(V4HF "V32HF")
 			(V2SI "V16SI")  (V2SF "V16SF")
 			(DI   "V8DI")  (DF   "V8DF")])
 
@@ -559,6 +600,7 @@
                               (V2SI "V2SI") (V4SI "V2SI")
                               (DI "V2DI")   (V2DI "V2DI")
                               (V2SF "V2SF") (V4SF "V2SF")
+                              (V4HF "SF") (V8HF "SF")
                               (DF "V2DI")   (V2DF "V2DI")])
 
 ;; Similar, for three elements.
@@ -567,6 +609,7 @@
                                 (V2SI "BLK") (V4SI "BLK")
                                 (DI "EI")    (V2DI "EI")
                                 (V2SF "BLK") (V4SF "BLK")
+                                (V4HF "BLK") (V8HF "BLK")
                                 (DF "EI")    (V2DF "EI")])
 
 ;; Similar, for four elements.
@@ -575,6 +618,7 @@
                                (V2SI "V4SI") (V4SI "V4SI")
                                (DI "OI")     (V2DI "OI")
                                (V2SF "V4SF") (V4SF "V4SF")
+                               (V4HF "V4HF") (V8HF "V4HF")
                                (DF "OI")     (V2DF "OI")])
 
 
@@ -594,12 +638,14 @@
 				(V2SI "V4SI") (V4SI  "V2SI")
 				(DI   "V2DI") (V2DI  "DI")
 				(V2SF "V4SF") (V4SF  "V2SF")
+				(V4HF "V8HF") (V8HF  "V4HF")
 				(DF   "V2DF") (V2DF  "DF")])
 
 (define_mode_attr vswap_width_name [(V8QI "to_128") (V16QI "to_64")
 				    (V4HI "to_128") (V8HI  "to_64")
 				    (V2SI "to_128") (V4SI  "to_64")
 				    (DI   "to_128") (V2DI  "to_64")
+				    (V4HF "to_128") (V8HF  "to_64")
 				    (V2SF "to_128") (V4SF  "to_64")
 				    (DF   "to_128") (V2DF  "to_64")])
 
@@ -633,6 +679,7 @@
 		     (V4HI "") (V8HI  "_q")
 		     (V2SI "") (V4SI  "_q")
 		     (DI   "") (V2DI  "_q")
+		     (V4HF "") (V8HF "_q")
 		     (V2SF "") (V4SF  "_q")
 			       (V2DF  "_q")
 		     (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")])
diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
index f259a470e9d..cca1f035636 100644
--- a/gcc/config/arc/arc-opts.h
+++ b/gcc/config/arc/arc-opts.h
@@ -21,7 +21,6 @@
 enum processor_type
 {
   PROCESSOR_NONE,
-  PROCESSOR_A5,
   PROCESSOR_ARC600,
   PROCESSOR_ARC601,
   PROCESSOR_ARC700
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index b5b644c5eb1..e9ecc908cb8 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -77,7 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "rtl-iter.h"
 
-/* Which cpu we're compiling for (A5, ARC600, ARC601, ARC700).  */
+/* Which cpu we're compiling for (ARC600, ARC601, ARC700).  */
 static const char *arc_cpu_string = "";
 
 /* ??? Loads can handle any constant, stores can only handle small ones.  */
@@ -702,11 +702,7 @@ arc_init (void)
 {
   enum attr_tune tune_dflt = TUNE_NONE;
 
-  if (TARGET_A5)
-    {
-      arc_cpu_string = "A5";
-    }
-  else if (TARGET_ARC600)
+  if (TARGET_ARC600)
     {
       arc_cpu_string = "ARC600";
       tune_dflt = TUNE_ARC600;
@@ -755,7 +751,7 @@ arc_init (void)
 	break;
       }
 
-  /* Support mul64 generation only for A5 and ARC600.  */
+  /* Support mul64 generation only for ARC600.  */
   if (TARGET_MUL64_SET && TARGET_ARC700)
       error ("-mmul64 not supported for ARC700");
 
@@ -1280,7 +1276,7 @@ arc_conditional_register_usage (void)
 	   i <= ARC_LAST_SIMD_DMA_CONFIG_REG; i++)
 	reg_alloc_order [i] = i;
     }
-  /* For Arctangent-A5 / ARC600, lp_count may not be read in an instruction
+  /* For ARC600, lp_count may not be read in an instruction
      following immediately after another one setting it to a new value.
      There was some discussion on how to enforce scheduling constraints for
      processors with missing interlocks on the gcc mailing list:
@@ -2093,7 +2089,7 @@ arc_compute_frame_size (int size)	/* size = # of var. bytes allocated.  */
   total_size = ARC_STACK_ALIGN (total_size);
 
   /* Compute offset of register save area from stack pointer:
-     A5 Frame: pretend_size <blink> reg_size <fp> var_size args_size <--sp
+     Frame: pretend_size <blink> reg_size <fp> var_size args_size <--sp
   */
   reg_offset = (total_size - (pretend_size + reg_size + extra_size)
 		+ (frame_pointer_needed ? 4 : 0));
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index d98cce11257..874b118421d 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -66,9 +66,7 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_CPU_CPP_BUILTINS()	\
  do {					\
     builtin_define ("__arc__");		\
-    if (TARGET_A5)			\
-      builtin_define ("__A5__");	\
-    else if (TARGET_ARC600)			\
+    if (TARGET_ARC600)			\
       {					\
 	builtin_define ("__A6__");	\
 	builtin_define ("__ARC600__");	\
@@ -133,7 +131,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define ASM_SPEC  "\
 %{mbig-endian|EB:-EB} %{EL} \
-%{mcpu=A5|mcpu=a5|mA5:-mA5} \
 %{mcpu=ARC600:-mARC600} \
 %{mcpu=ARC601:-mARC601} \
 %{mcpu=ARC700:-mARC700} \
@@ -224,7 +221,6 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 
 #define DRIVER_SELF_SPECS DRIVER_ENDIAN_SELF_SPECS \
-  "%{mARC5|mA5: -mcpu=A5 %<mARC5 %<mA5}" \
   "%{mARC600|mA6: -mcpu=ARC600 %<mARC600 %<mA6}" \
   "%{mARC601: -mcpu=ARC601 %<mARC601}" \
   "%{mARC700|mA7: -mcpu=ARC700 %<mARC700 %<mA7}" \
@@ -277,7 +273,6 @@ along with GCC; see the file COPYING3.  If not see
    use conditional execution?  */
 #define TARGET_AT_DBR_CONDEXEC  (!TARGET_ARC700)
 
-#define TARGET_A5 (arc_cpu == PROCESSOR_A5)
 #define TARGET_ARC600 (arc_cpu == PROCESSOR_ARC600)
 #define TARGET_ARC601 (arc_cpu == PROCESSOR_ARC601)
 #define TARGET_ARC700 (arc_cpu == PROCESSOR_ARC700)
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 931f9a18703..e1da4d70085 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -188,7 +188,7 @@
 
 
 ;; Attribute describing the processor
-(define_attr "cpu" "none,A5,ARC600,ARC700"
+(define_attr "cpu" "none,ARC600,ARC700"
   (const (symbol_ref "arc_cpu_attr")))
 
 ;; true for compact instructions (those with _s suffix)
@@ -337,9 +337,13 @@
 	(match_test "GET_CODE (PATTERN (insn)) == COND_EXEC") (const_int 4)]
       (const_int 2))
 
-    (eq_attr "iscompact" "true_limm,maybe_limm")
+    (eq_attr "iscompact" "true_limm")
     (const_int 6)
 
+    (eq_attr "iscompact" "maybe_limm")
+    (cond [(match_test "GET_CODE (PATTERN (insn)) == COND_EXEC") (const_int 8)]
+	  (const_int 6))
+
     (eq_attr "type" "load")
     (if_then_else
        (match_operand 1 "long_immediate_loadstore_operand" "")
@@ -4899,9 +4903,7 @@
 
 ; operand 0 is the loop count pseudo register
 ; operand 1 is the label to jump to at the top of the loop
-; Use this for the ARC600 and ARC700.  For ARCtangent-A5, this is unsafe
-; without further checking for nearby branches etc., and without proper
-; annotation of shift patterns that clobber lp_count
+; Use this for the ARC600 and ARC700.
 ; ??? ARC600 might want to check if the loop has few iteration and only a
 ; single insn - loop setup is expensive then.
 (define_expand "doloop_end"
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 6352a2e82de..7c859e4bfcd 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -33,10 +33,6 @@ mno-cond-exec
 Target Report RejectNegative Mask(NO_COND_EXEC)
 Disable ARCompact specific pass to generate conditional execution instructions
 
-mA5
-Target Report
-Generate ARCompact 32-bit code for ARCtangent-A5 processor
-
 mA6
 Target Report
 Generate ARCompact 32-bit code for ARC600 processor
@@ -61,7 +57,7 @@ mmixed-code
 Target Report Mask(MIXED_CODE_SET)
 Tweak register allocation to help 16-bit instruction generation
 ; originally this was:
-;Generate ARCompact 16-bit instructions intermixed with 32-bit instructions for ARCtangent-A5 and higher processors
+;Generate ARCompact 16-bit instructions intermixed with 32-bit instructions
 ; but we do that without -mmixed-code, too, it's just a different instruction
 ; count / size tradeoff.
 
@@ -163,9 +159,6 @@ Enum
 Name(processor_type) Type(enum processor_type)
 
 EnumValue
-Enum(processor_type) String(A5) Value(PROCESSOR_A5)
-
-EnumValue
 Enum(processor_type) String(ARC600) Value(PROCESSOR_ARC600)
 
 EnumValue
diff --git a/gcc/config/arc/constraints.md b/gcc/config/arc/constraints.md
index 9540075f369..8902246ff21 100644
--- a/gcc/config/arc/constraints.md
+++ b/gcc/config/arc/constraints.md
@@ -21,7 +21,7 @@
 
 ; Most instructions accept arbitrary core registers for their inputs, even
 ; if the core register in question cannot be written to, like the multiply
-; result registers of the ARCtangent-A5 and ARC600 .
+; result registers of ARC600.
 ; First, define a class for core registers that can be read cheaply.  This
 ; is most or all core registers for ARC600, but only r0-r31 for ARC700
 (define_register_constraint "c" "CHEAP_CORE_REGS"
diff --git a/gcc/config/arc/t-arc-newlib b/gcc/config/arc/t-arc-newlib
index 135bef665c9..8823805b8aa 100644
--- a/gcc/config/arc/t-arc-newlib
+++ b/gcc/config/arc/t-arc-newlib
@@ -17,8 +17,6 @@
 # with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Selecting -mA5 uses the same functional multilib files/libraries
-# as get used for -mARC600 aka -mA6.
 MULTILIB_OPTIONS=mcpu=ARC600/mcpu=ARC601 mmul64/mmul32x16 mnorm
 MULTILIB_DIRNAMES=arc600 arc601 mul64 mul32x16 norm
 #
@@ -26,7 +24,6 @@ MULTILIB_DIRNAMES=arc600 arc601 mul64 mul32x16 norm
 MULTILIB_MATCHES  = mcpu?ARC600=mcpu?arc600
 MULTILIB_MATCHES += mcpu?ARC600=mARC600
 MULTILIB_MATCHES += mcpu?ARC600=mA6
-MULTILIB_MATCHES += mcpu?ARC600=mA5
 MULTILIB_MATCHES += mcpu?ARC600=mno-mpy
 MULTILIB_MATCHES += mcpu?ARC601=mcpu?arc601
 MULTILIB_MATCHES += EL=mlittle-endian
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 4391f17c655..0f5a1f1aaf8 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -190,6 +190,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define di_UP    DImode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -238,6 +239,12 @@ typedef struct {
 #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
   VAR9 (T, N, A, B, C, D, E, F, G, H, I) \
   VAR1 (T, N, J)
+#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR1 (T, N, K)
+#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR1 (T, N, L)
 
 /* The NEON builtin data can be found in arm_neon_builtins.def.
    The mode entries in the following table correspond to the "key" type of the
@@ -819,6 +826,7 @@ arm_init_simd_builtin_types (void)
   /* The __builtin_simd{64,128}_float16 types are kept private unless
      we have a scalar __fp16 type.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index bcbd20be057..b178ae6c05f 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -44,5 +44,7 @@
 
   ENTRY (Float16x4_t, V4HF, none, 64, float16, 18)
   ENTRY (Float32x2_t, V2SF, none, 64, float32, 18)
+
+  ENTRY (Float16x8_t, V8HF, none, 128, float16, 19)
   ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fa4e083adfe..5f3180d38ce 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9580,6 +9580,24 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
       return false;	/* All arguments must be in registers.  */
 
     case MOD:
+      /* MOD by a power of 2 can be expanded as:
+	 rsbs    r1, r0, #0
+	 and     r0, r0, #(n - 1)
+	 and     r1, r1, #(n - 1)
+	 rsbpl   r0, r1, #0.  */
+      if (CONST_INT_P (XEXP (x, 1))
+	  && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+	  && mode == SImode)
+	{
+	  *cost += COSTS_N_INSNS (3);
+
+	  if (speed_p)
+	    *cost += 2 * extra_cost->alu.logical
+		     + extra_cost->alu.arith;
+	  return true;
+	}
+
+    /* Fall-through.  */
     case UMOD:
       *cost = LIBCALL_COST (2);
       return false;	/* All arguments must be in registers.  */
@@ -26278,7 +26296,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 {
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
-      || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode))
+      || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
+      || mode == V2DImode || mode == V8HFmode))
     return true;
 
   if ((TARGET_NEON || TARGET_IWMMXT)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index eee9e8b551f..f7a9d638673 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1016,7 +1016,7 @@ extern int arm_arch_crc;
 /* Modes valid for Neon Q registers.  */
 #define VALID_NEON_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode)
+   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b6c20478f9c..878576a3cd5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1229,7 +1229,7 @@
   ""
 )
 
-(define_insn "*subsi3_compare0"
+(define_insn "subsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
 	(compare:CC_NOOV
 	 (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I")
@@ -6620,7 +6620,7 @@
 (define_insn "*arm32_movhf"
   [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,r,r")
 	(match_operand:HF 1 "general_operand"	   " m,r,r,F"))]
-  "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_FP16) && !arm_restrict_it
+  "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_FP16)
    && (	  s_register_operand (operands[0], HFmode)
        || s_register_operand (operands[1], HFmode))"
   "*
@@ -6658,7 +6658,8 @@
   [(set_attr "conds" "unconditional")
    (set_attr "type" "load1,store1,mov_reg,multiple")
    (set_attr "length" "4,4,4,8")
-   (set_attr "predicable" "yes")]
+   (set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")]
 )
 
 (define_expand "movsf"
@@ -11142,6 +11143,75 @@
   ""
 )
 
+;; ARM-specific expansion of signed mod by power of 2
+;; using conditional negate.
+;; For r0 % n where n is a power of 2 produce:
+;; rsbs    r1, r0, #0
+;; and     r0, r0, #(n - 1)
+;; and     r1, r1, #(n - 1)
+;; rsbpl   r0, r1, #0
+
+(define_expand "modsi3"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "const_int_operand" "")]
+  "TARGET_32BIT"
+  {
+    HOST_WIDE_INT val = INTVAL (operands[2]);
+
+    if (val <= 0
+       || exact_log2 (val) <= 0)
+      FAIL;
+
+    rtx mask = GEN_INT (val - 1);
+
+    /* In the special case of x0 % 2 we can do the even shorter:
+	cmp     r0, #0
+	and     r0, r0, #1
+	rsblt   r0, r0, #0.  */
+
+    if (val == 2)
+      {
+	rtx cc_reg = arm_gen_compare_reg (LT,
+					  operands[1], const0_rtx, NULL_RTX);
+	rtx cond = gen_rtx_LT (SImode, cc_reg, const0_rtx);
+	rtx masked = gen_reg_rtx (SImode);
+
+	emit_insn (gen_andsi3 (masked, operands[1], mask));
+	emit_move_insn (operands[0],
+			gen_rtx_IF_THEN_ELSE (SImode, cond,
+					      gen_rtx_NEG (SImode,
+							   masked),
+					      masked));
+	DONE;
+      }
+
+    rtx neg_op = gen_reg_rtx (SImode);
+    rtx_insn *insn = emit_insn (gen_subsi3_compare0 (neg_op, const0_rtx,
+						      operands[1]));
+
+    /* Extract the condition register and mode.  */
+    rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+    rtx cc_reg = SET_DEST (cmp);
+    rtx cond = gen_rtx_GE (SImode, cc_reg, const0_rtx);
+
+    emit_insn (gen_andsi3 (operands[0], operands[1], mask));
+
+    rtx masked_neg = gen_reg_rtx (SImode);
+    emit_insn (gen_andsi3 (masked_neg, neg_op, mask));
+
+    /* We want a conditional negate here, but emitting COND_EXEC rtxes
+       during expand does not always work.  Do an IF_THEN_ELSE instead.  */
+    emit_move_insn (operands[0],
+		    gen_rtx_IF_THEN_ELSE (SImode, cond,
+					  gen_rtx_NEG (SImode, masked_neg),
+					  operands[0]));
+
+
+    DONE;
+  }
+)
+
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
   	(bswap:SI (match_operand:SI 1 "s_register_operand" "r")))]
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 2b30be61a46..66622dfcfe2 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,7 @@ typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef __fp16 float16_t;
 typedef __simd64_float16_t float16x4_t;
 #endif
 typedef __simd64_float32_t float32x2_t;
@@ -59,6 +60,9 @@ typedef __simd128_int8_t int8x16_t;
 typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef __simd128_float16_t float16x8_t;
+#endif
 typedef __simd128_float32_t float32x4_t;
 typedef __simd128_poly8_t poly8x16_t;
 typedef __simd128_poly16_t poly16x8_t;
@@ -162,6 +166,20 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+#endif
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -288,6 +306,20 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+#endif
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -414,6 +446,20 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+#endif
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -5203,6 +5249,21 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev2si (__a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+/* Functions cannot accept or return __FP16 types.  Even if the function
+   were marked always-inline so there were no call sites, the declaration
+   would nonetheless raise an error.  Hence, we must use a macro instead.  */
+
+#define vget_lane_f16(__v, __idx)		\
+  __extension__					\
+    ({						\
+      float16x4_t __vec = (__v);		\
+      __builtin_arm_lane_check (4, __idx);	\
+      float16_t __res = __vec[__idx];		\
+      __res;					\
+    })
+#endif
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -5269,6 +5330,17 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev4si (__a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define vgetq_lane_f16(__v, __idx)		\
+  __extension__					\
+    ({						\
+      float16x8_t __vec = (__v);		\
+      __builtin_arm_lane_check (8, __idx);	\
+      float16_t __res = __vec[__idx];		\
+      __res;					\
+    })
+#endif
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
@@ -5335,6 +5407,18 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define vset_lane_f16(__e, __v, __idx)		\
+  __extension__					\
+    ({						\
+      float16_t __elem = (__e);			\
+      float16x4_t __vec = (__v);		\
+      __builtin_arm_lane_check (4, __idx);	\
+      __vec[__idx] = __elem;			\
+      __vec;					\
+    })
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c)
 {
@@ -5401,6 +5485,18 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define vsetq_lane_f16(__e, __v, __idx)		\
+  __extension__					\
+    ({						\
+      float16_t __elem = (__e);			\
+      float16x8_t __vec = (__v);		\
+      __builtin_arm_lane_check (8, __idx);	\
+      __vec[__idx] = __elem;			\
+      __vec;					\
+    })
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c)
 {
@@ -5481,6 +5577,14 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -5983,6 +6087,14 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return (int64x2_t)__builtin_neon_vcombinedi (__a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcombine_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcombinev4hf (__a, __b);
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcombine_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -6057,6 +6169,14 @@ vget_high_s64 (int64x2_t __a)
   return (int64x1_t)__builtin_neon_vget_highv2di (__a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_high_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vget_highv8hf (__a);
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_high_f32 (float32x4_t __a)
 {
@@ -6117,6 +6237,14 @@ vget_low_s32 (int32x4_t __a)
   return (int32x2_t)__builtin_neon_vget_lowv4si (__a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_low_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vget_lowv8hf (__a);
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_low_f32 (float32x4_t __a)
 {
@@ -8668,6 +8796,14 @@ vld1_s64 (const int64_t * __a)
   return (int64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_f16 (const float16_t * __a)
+{
+  return __builtin_neon_vld1v4hf (__a);
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_f32 (const float32_t * __a)
 {
@@ -8742,6 +8878,14 @@ vld1q_s64 (const int64_t * __a)
   return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_f16 (const float16_t * __a)
+{
+  return __builtin_neon_vld1v8hf (__a);
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_f32 (const float32_t * __a)
 {
@@ -8802,6 +8946,14 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c)
+{
+  return vset_lane_f16 (*__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c)
 {
@@ -8876,6 +9028,14 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c)
+{
+  return vsetq_lane_f16 (*__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c)
 {
@@ -8950,6 +9110,15 @@ vld1_dup_s32 (const int32_t * __a)
   return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x4_t) { __f, __f, __f, __f };
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_dup_f32 (const float32_t * __a)
 {
@@ -9024,6 +9193,15 @@ vld1q_dup_s32 (const int32_t * __a)
   return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_dup_f32 (const float32_t * __a)
 {
@@ -9112,6 +9290,14 @@ vst1_s64 (int64_t * __a, int64x1_t __b)
   __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_f16 (float16_t * __a, float16x4_t __b)
+{
+  __builtin_neon_vst1v4hf (__a, __b);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_f32 (float32_t * __a, float32x2_t __b)
 {
@@ -9186,6 +9372,14 @@ vst1q_s64 (int64_t * __a, int64x2_t __b)
   __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_f16 (float16_t * __a, float16x8_t __b)
+{
+  __builtin_neon_vst1v8hf (__a, __b);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1q_f32 (float32_t * __a, float32x4_t __b)
 {
@@ -9246,6 +9440,14 @@ vst1_lane_s32 (int32_t * __a, int32x2_t __b, const int __c)
   __builtin_neon_vst1_lanev2si ((__builtin_neon_si *) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_f16 (float16_t * __a, float16x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4hf (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_lane_f32 (float32_t * __a, float32x2_t __b, const int __c)
 {
@@ -9320,6 +9522,14 @@ vst1q_lane_s32 (int32_t * __a, int32x4_t __b, const int __c)
   __builtin_neon_vst1_lanev4si ((__builtin_neon_si *) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_f16 (float16_t * __a, float16x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8hf (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1q_lane_f32 (float32_t * __a, float32x4_t __b, const int __c)
 {
@@ -9400,6 +9610,16 @@ vld2_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vld2_f16 (const float16_t * __a)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
 vld2_f32 (const float32_t * __a)
 {
@@ -9498,6 +9718,16 @@ vld2q_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vld2q_f16 (const float16_t * __a)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
 vld2q_f32 (const float32_t * __a)
 {
@@ -9573,6 +9803,17 @@ vld2_lane_s32 (const int32_t * __a, int32x2x2_t __b, const int __c)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vld2_lane_f16 (const float16_t * __a, float16x4x2_t __b, const int __c)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4hf ( __a, __bu.__o, __c);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
 vld2_lane_f32 (const float32_t * __a, float32x2x2_t __b, const int __c)
 {
@@ -9645,6 +9886,17 @@ vld2q_lane_s32 (const int32_t * __a, int32x4x2_t __b, const int __c)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+vld2q_lane_f16 (const float16_t * __a, float16x8x2_t __b, const int __c)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8hf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
 vld2q_lane_f32 (const float32_t * __a, float32x4x2_t __b, const int __c)
 {
@@ -9705,6 +9957,16 @@ vld2_dup_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+vld2_dup_f16 (const float16_t * __a)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv4hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
 vld2_dup_f32 (const float32_t * __a)
 {
@@ -9800,6 +10062,15 @@ vst2_s32 (int32_t * __a, int32x2x2_t __b)
   __builtin_neon_vst2v2si ((__builtin_neon_si *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_f16 (float16_t * __a, float16x4x2_t __b)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v4hf (__a, __bu.__o);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst2_f32 (float32_t * __a, float32x2x2_t __b)
 {
@@ -9886,6 +10157,15 @@ vst2q_s32 (int32_t * __a, int32x4x2_t __b)
   __builtin_neon_vst2v4si ((__builtin_neon_si *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_f16 (float16_t * __a, float16x8x2_t __b)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v8hf (__a, __bu.__o);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst2q_f32 (float32_t * __a, float32x4x2_t __b)
 {
@@ -9949,6 +10229,15 @@ vst2_lane_s32 (int32_t * __a, int32x2x2_t __b, const int __c)
   __builtin_neon_vst2_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_f16 (float16_t * __a, float16x4x2_t __b, const int __c)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4hf (__a, __bu.__o, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst2_lane_f32 (float32_t * __a, float32x2x2_t __b, const int __c)
 {
@@ -10005,6 +10294,15 @@ vst2q_lane_s32 (int32_t * __a, int32x4x2_t __b, const int __c)
   __builtin_neon_vst2_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_f16 (float16_t * __a, float16x8x2_t __b, const int __c)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8hf (__a, __bu.__o, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst2q_lane_f32 (float32_t * __a, float32x4x2_t __b, const int __c)
 {
@@ -10057,6 +10355,16 @@ vld3_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+vld3_f16 (const float16_t * __a)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
 vld3_f32 (const float32_t * __a)
 {
@@ -10155,6 +10463,16 @@ vld3q_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8x3_t __attribute__ ((__always_inline__))
+vld3q_f16 (const float16_t * __a)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
 vld3q_f32 (const float32_t * __a)
 {
@@ -10230,6 +10548,17 @@ vld3_lane_s32 (const int32_t * __a, int32x2x3_t __b, const int __c)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+vld3_lane_f16 (const float16_t * __a, float16x4x3_t __b, const int __c)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4hf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
 vld3_lane_f32 (const float32_t * __a, float32x2x3_t __b, const int __c)
 {
@@ -10302,6 +10631,17 @@ vld3q_lane_s32 (const int32_t * __a, int32x4x3_t __b, const int __c)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8x3_t __attribute__ ((__always_inline__))
+vld3q_lane_f16 (const float16_t * __a, float16x8x3_t __b, const int __c)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8hf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
 vld3q_lane_f32 (const float32_t * __a, float32x4x3_t __b, const int __c)
 {
@@ -10362,6 +10702,16 @@ vld3_dup_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+vld3_dup_f16 (const float16_t * __a)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv4hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
 vld3_dup_f32 (const float32_t * __a)
 {
@@ -10457,6 +10807,15 @@ vst3_s32 (int32_t * __a, int32x2x3_t __b)
   __builtin_neon_vst3v2si ((__builtin_neon_si *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_f16 (float16_t * __a, float16x4x3_t __b)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v4hf (__a, __bu.__o);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst3_f32 (float32_t * __a, float32x2x3_t __b)
 {
@@ -10543,6 +10902,15 @@ vst3q_s32 (int32_t * __a, int32x4x3_t __b)
   __builtin_neon_vst3v4si ((__builtin_neon_si *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_f16 (float16_t * __a, float16x8x3_t __b)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v8hf (__a, __bu.__o);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst3q_f32 (float32_t * __a, float32x4x3_t __b)
 {
@@ -10606,6 +10974,15 @@ vst3_lane_s32 (int32_t * __a, int32x2x3_t __b, const int __c)
   __builtin_neon_vst3_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_f16 (float16_t * __a, float16x4x3_t __b, const int __c)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4hf (__a, __bu.__o, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst3_lane_f32 (float32_t * __a, float32x2x3_t __b, const int __c)
 {
@@ -10662,6 +11039,15 @@ vst3q_lane_s32 (int32_t * __a, int32x4x3_t __b, const int __c)
   __builtin_neon_vst3_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_f16 (float16_t * __a, float16x8x3_t __b, const int __c)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8hf (__a, __bu.__o, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst3q_lane_f32 (float32_t * __a, float32x4x3_t __b, const int __c)
 {
@@ -10714,6 +11100,16 @@ vld4_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+vld4_f16 (const float16_t * __a)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
 vld4_f32 (const float32_t * __a)
 {
@@ -10812,6 +11208,16 @@ vld4q_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8x4_t __attribute__ ((__always_inline__))
+vld4q_f16 (const float16_t * __a)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
 vld4q_f32 (const float32_t * __a)
 {
@@ -10887,6 +11293,18 @@ vld4_lane_s32 (const int32_t * __a, int32x2x4_t __b, const int __c)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+vld4_lane_f16 (const float16_t * __a, float16x4x4_t __b, const int __c)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4hf (__a,
+					   __bu.__o, __c);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
 vld4_lane_f32 (const float32_t * __a, float32x2x4_t __b, const int __c)
 {
@@ -10959,6 +11377,18 @@ vld4q_lane_s32 (const int32_t * __a, int32x4x4_t __b, const int __c)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8x4_t __attribute__ ((__always_inline__))
+vld4q_lane_f16 (const float16_t * __a, float16x8x4_t __b, const int __c)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8hf (__a,
+					   __bu.__o, __c);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
 vld4q_lane_f32 (const float32_t * __a, float32x4x4_t __b, const int __c)
 {
@@ -11019,6 +11449,16 @@ vld4_dup_s32 (const int32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+vld4_dup_f16 (const float16_t * __a)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv4hf (__a);
+  return __rv.__i;
+}
+#endif
+
 __extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
 vld4_dup_f32 (const float32_t * __a)
 {
@@ -11114,6 +11554,15 @@ vst4_s32 (int32_t * __a, int32x2x4_t __b)
   __builtin_neon_vst4v2si ((__builtin_neon_si *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_f16 (float16_t * __a, float16x4x4_t __b)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v4hf (__a, __bu.__o);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst4_f32 (float32_t * __a, float32x2x4_t __b)
 {
@@ -11200,6 +11649,15 @@ vst4q_s32 (int32_t * __a, int32x4x4_t __b)
   __builtin_neon_vst4v4si ((__builtin_neon_si *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_f16 (float16_t * __a, float16x8x4_t __b)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v8hf (__a, __bu.__o);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst4q_f32 (float32_t * __a, float32x4x4_t __b)
 {
@@ -11263,6 +11721,15 @@ vst4_lane_s32 (int32_t * __a, int32x2x4_t __b, const int __c)
   __builtin_neon_vst4_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_f16 (float16_t * __a, float16x4x4_t __b, const int __c)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4hf (__a, __bu.__o, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst4_lane_f32 (float32_t * __a, float32x2x4_t __b, const int __c)
 {
@@ -11319,6 +11786,15 @@ vst4q_lane_s32 (int32_t * __a, int32x4x4_t __b, const int __c)
   __builtin_neon_vst4_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_f16 (float16_t * __a, float16x8x4_t __b, const int __c)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8hf (__a, __bu.__o, __c);
+}
+#endif
+
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst4q_lane_f32 (float32_t * __a, float32x4x4_t __b, const int __c)
 {
@@ -11833,6 +12309,14 @@ vreinterpret_p8_p16 (poly16x4_t __a)
   return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f16 (float16x4_t __a)
+{
+  return (poly8x8_t) __a;
+}
+#endif
+
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_f32 (float32x2_t __a)
 {
@@ -11901,6 +12385,14 @@ vreinterpret_p16_p8 (poly8x8_t __a)
   return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f16 (float16x4_t __a)
+{
+  return (poly16x4_t) __a;
+}
+#endif
+
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
 vreinterpret_p16_f32 (float32x2_t __a)
 {
@@ -11963,6 +12455,104 @@ vreinterpret_p16_u32 (uint32x2_t __a)
   return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p8 (poly8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p16 (poly16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f32 (float32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#ifdef __ARM_FEATURE_CRYPTO
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p64 (poly64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s64 (int64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u64 (uint64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s8 (int8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s16 (int16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s32 (int32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u8 (uint8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u16 (uint16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u32 (uint32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vreinterpret_f32_p8 (poly8x8_t __a)
 {
@@ -11975,6 +12565,14 @@ vreinterpret_f32_p16 (poly16x4_t __a)
   return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_f16 (float16x4_t __a)
+{
+  return (float32x2_t) __a;
+}
+#endif
+
 #ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vreinterpret_f32_p64 (poly64x1_t __a)
@@ -12047,6 +12645,17 @@ vreinterpret_p64_p16 (poly16x4_t __a)
 }
 
 #endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_f16 (float16x4_t __a)
+{
+  return (poly64x1_t) __a;
+}
+#endif
+#endif
+
 #ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
 vreinterpret_p64_f32 (float32x2_t __a)
@@ -12131,6 +12740,14 @@ vreinterpret_s64_p16 (poly16x4_t __a)
   return (int64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_f16 (float16x4_t __a)
+{
+  return (int64x1_t) __a;
+}
+#endif
+
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vreinterpret_s64_f32 (float32x2_t __a)
 {
@@ -12199,6 +12816,14 @@ vreinterpret_u64_p16 (poly16x4_t __a)
   return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_f16 (float16x4_t __a)
+{
+  return (uint64x1_t) __a;
+}
+#endif
+
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vreinterpret_u64_f32 (float32x2_t __a)
 {
@@ -12267,6 +12892,14 @@ vreinterpret_s8_p16 (poly16x4_t __a)
   return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_f16 (float16x4_t __a)
+{
+  return (int8x8_t) __a;
+}
+#endif
+
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vreinterpret_s8_f32 (float32x2_t __a)
 {
@@ -12335,6 +12968,14 @@ vreinterpret_s16_p16 (poly16x4_t __a)
   return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_f16 (float16x4_t __a)
+{
+  return (int16x4_t) __a;
+}
+#endif
+
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vreinterpret_s16_f32 (float32x2_t __a)
 {
@@ -12403,6 +13044,14 @@ vreinterpret_s32_p16 (poly16x4_t __a)
   return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_f16 (float16x4_t __a)
+{
+  return (int32x2_t) __a;
+}
+#endif
+
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vreinterpret_s32_f32 (float32x2_t __a)
 {
@@ -12471,6 +13120,14 @@ vreinterpret_u8_p16 (poly16x4_t __a)
   return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_f16 (float16x4_t __a)
+{
+  return (uint8x8_t) __a;
+}
+#endif
+
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vreinterpret_u8_f32 (float32x2_t __a)
 {
@@ -12539,6 +13196,14 @@ vreinterpret_u16_p16 (poly16x4_t __a)
   return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_f16 (float16x4_t __a)
+{
+  return (uint16x4_t) __a;
+}
+#endif
+
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vreinterpret_u16_f32 (float32x2_t __a)
 {
@@ -12607,6 +13272,14 @@ vreinterpret_u32_p16 (poly16x4_t __a)
   return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_f16 (float16x4_t __a)
+{
+  return (uint32x2_t) __a;
+}
+#endif
+
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vreinterpret_u32_f32 (float32x2_t __a)
 {
@@ -12669,6 +13342,14 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
   return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+#endif
+
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
@@ -12745,6 +13426,14 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+#endif
+
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
@@ -12815,6 +13504,114 @@ vreinterpretq_p16_u32 (uint32x4_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p8 (poly8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p16 (poly16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_f32 (float32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p64 (poly64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p128 (poly128_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s64 (int64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u64 (uint64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s8 (int8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s16 (int16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s32 (int32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u8 (uint8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u16 (uint16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u32 (uint32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_f32_p8 (poly8x16_t __a)
 {
@@ -12827,6 +13624,14 @@ vreinterpretq_f32_p16 (poly16x8_t __a)
   return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_f16 (float16x8_t __a)
+{
+  return (float32x4_t) __a;
+}
+#endif
+
 #ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_f32_p64 (poly64x2_t __a)
@@ -12907,6 +13712,17 @@ vreinterpretq_p64_p16 (poly16x8_t __a)
 }
 
 #endif
+
+#ifdef __ARM_FEATURE_CRYPTO
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_f16 (float16x8_t __a)
+{
+  return (poly64x2_t) __a;
+}
+#endif
+#endif
+
 #ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
 vreinterpretq_p64_f32 (float32x4_t __a)
@@ -13001,8 +13817,18 @@ vreinterpretq_p128_p16 (poly16x8_t __a)
 {
   return (poly128_t)__builtin_neon_vreinterprettiv8hi ((int16x8_t) __a);
 }
+#endif
 
+#ifdef __ARM_FEATURE_CRYPTO
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_f16 (float16x8_t __a)
+{
+  return (poly128_t) __a;
+}
 #endif
+#endif
+
 #ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline poly128_t __attribute__ ((__always_inline__))
 vreinterpretq_p128_f32 (float32x4_t __a)
@@ -13095,6 +13921,14 @@ vreinterpretq_s64_p16 (poly16x8_t __a)
   return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_f16 (float16x8_t __a)
+{
+  return (int64x2_t) __a;
+}
+#endif
+
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vreinterpretq_s64_f32 (float32x4_t __a)
 {
@@ -13171,6 +14005,14 @@ vreinterpretq_u64_p16 (poly16x8_t __a)
   return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_f16 (float16x8_t __a)
+{
+  return (uint64x2_t) __a;
+}
+#endif
+
 __extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
 vreinterpretq_u64_f32 (float32x4_t __a)
 {
@@ -13247,6 +14089,14 @@ vreinterpretq_s8_p16 (poly16x8_t __a)
   return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_f16 (float16x8_t __a)
+{
+  return (int8x16_t) __a;
+}
+#endif
+
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_s8_f32 (float32x4_t __a)
 {
@@ -13323,6 +14173,14 @@ vreinterpretq_s16_p16 (poly16x8_t __a)
   return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_f16 (float16x8_t __a)
+{
+  return (int16x8_t) __a;
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_s16_f32 (float32x4_t __a)
 {
@@ -13399,6 +14257,14 @@ vreinterpretq_s32_p16 (poly16x8_t __a)
   return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_f16 (float16x8_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+}
+#endif
+
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_s32_f32 (float32x4_t __a)
 {
@@ -13475,6 +14341,14 @@ vreinterpretq_u8_p16 (poly16x8_t __a)
   return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_f16 (float16x8_t __a)
+{
+  return (uint8x16_t) __a;
+}
+#endif
+
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_u8_f32 (float32x4_t __a)
 {
@@ -13551,6 +14425,14 @@ vreinterpretq_u16_p16 (poly16x8_t __a)
   return (uint16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_f16 (float16x8_t __a)
+{
+  return (uint16x8_t) __a;
+}
+#endif
+
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_u16_f32 (float32x4_t __a)
 {
@@ -13627,6 +14509,14 @@ vreinterpretq_u32_p16 (poly16x8_t __a)
   return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_f16 (float16x8_t __a)
+{
+  return (uint32x4_t) __a;
+}
+#endif
+
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vreinterpretq_u32_f32 (float32x4_t __a)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index f150b98b809..0b719df7607 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -164,9 +164,9 @@ VAR10 (UNOP, vdup_n,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (GETLANE, vdup_lane,
 	 v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR5 (COMBINE, vcombine, v8qi, v4hi, v2si, v2sf, di)
-VAR5 (UNOP, vget_high, v16qi, v8hi, v4si, v4sf, v2di)
-VAR5 (UNOP, vget_low, v16qi, v8hi, v4si, v4sf, v2di)
+VAR6 (COMBINE, vcombine, v8qi, v4hi, v4hf, v2si, v2sf, di)
+VAR6 (UNOP, vget_high, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
+VAR6 (UNOP, vget_low, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
 VAR3 (UNOP, vmovn, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovns, v8hi, v4si, v2di)
 VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di)
@@ -242,40 +242,40 @@ VAR6 (UNOP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di, ti)
 VAR6 (UNOP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di, ti)
 VAR6 (UNOP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di, ti)
 VAR6 (UNOP, vreinterpretti, v16qi, v8hi, v4si, v4sf, v2di, ti)
-VAR10 (LOAD1, vld1,
-        v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+VAR12 (LOAD1, vld1,
+        v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
 VAR10 (LOAD1LANE, vld1_lane,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
 VAR10 (LOAD1, vld1_dup,
 	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR10 (STORE1, vst1,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR10 (STORE1LANE, vst1_lane,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
-VAR9 (LOAD1, vld2,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (LOAD1LANE, vld2_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR5 (LOAD1, vld2_dup, v8qi, v4hi, v2si, v2sf, di)
-VAR9 (STORE1, vst2,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (STORE1LANE, vst2_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR9 (LOAD1, vld3,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (LOAD1LANE, vld3_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR5 (LOAD1, vld3_dup, v8qi, v4hi, v2si, v2sf, di)
-VAR9 (STORE1, vst3,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (STORE1LANE, vst3_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR9 (LOAD1, vld4,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (LOAD1LANE, vld4_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
-VAR5 (LOAD1, vld4_dup, v8qi, v4hi, v2si, v2sf, di)
-VAR9 (STORE1, vst4,
-	v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf)
-VAR7 (STORE1LANE, vst4_lane,
-	v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+VAR12 (STORE1, vst1,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
+VAR12 (STORE1LANE, vst1_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
+VAR11 (LOAD1, vld2,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf)
+VAR9 (LOAD1LANE, vld2_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf)
+VAR6 (LOAD1, vld2_dup, v8qi, v4hi, v4hf, v2si, v2sf, di)
+VAR11 (STORE1, vst2,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf)
+VAR9 (STORE1LANE, vst2_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf)
+VAR11 (LOAD1, vld3,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf)
+VAR9 (LOAD1LANE, vld3_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf)
+VAR6 (LOAD1, vld3_dup, v8qi, v4hi, v4hf, v2si, v2sf, di)
+VAR11 (STORE1, vst3,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf)
+VAR9 (STORE1LANE, vst3_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf)
+VAR11 (LOAD1, vld4,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf)
+VAR9 (LOAD1LANE, vld4_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf)
+VAR6 (LOAD1, vld4_dup, v8qi, v4hi, v4hf, v2si, v2sf, di)
+VAR11 (STORE1, vst4,
+	v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf)
+VAR9 (STORE1LANE, vst4_lane,
+	v8qi, v4hi, v4hf, v2si, v2sf, v8hi, v8hf, v4si, v4sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 1e7f3f17a8a..47cc1eebecd 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -65,20 +65,32 @@
 ;; Integer modes supported by Neon and IWMMXT, except V2DI
 (define_mode_iterator VINTW [V2SI V4HI V8QI V4SI V8HI V16QI])
 
-;; Double-width vector modes.
+;; Double-width vector modes, on which we support arithmetic (no HF!)
 (define_mode_iterator VD [V8QI V4HI V2SI V2SF])
 
+;; Double-width vector modes plus 64-bit elements for vreinterpret + vcreate.
+(define_mode_iterator VD_RE [V8QI V4HI V2SI V2SF DI])
+
 ;; Double-width vector modes plus 64-bit elements.
-(define_mode_iterator VDX [V8QI V4HI V2SI V2SF DI])
+(define_mode_iterator VDX [V8QI V4HI V4HF V2SI V2SF DI])
+
+;; Double-width vector modes, with V4HF - for vldN_lane and vstN_lane.
+(define_mode_iterator VD_LANE [V8QI V4HI V4HF V2SI V2SF])
 
 ;; Double-width vector modes without floating-point elements.
 (define_mode_iterator VDI [V8QI V4HI V2SI])
 
-;; Quad-width vector modes.
+;; Quad-width vector modes supporting arithmetic (no HF!).
 (define_mode_iterator VQ [V16QI V8HI V4SI V4SF])
 
+;; Quad-width vector modes, including V8HF.
+(define_mode_iterator VQ2 [V16QI V8HI V8HF V4SI V4SF])
+
+;; Quad-width vector modes with 16- or 32-bit elements
+(define_mode_iterator VQ_HS [V8HI V8HF V4SI V4SF])
+
 ;; Quad-width vector modes plus 64-bit elements.
-(define_mode_iterator VQX [V16QI V8HI V4SI V4SF V2DI])
+(define_mode_iterator VQX [V16QI V8HI V8HF V4SI V4SF V2DI])
 
 ;; Quad-width vector modes without floating-point elements.
 (define_mode_iterator VQI [V16QI V8HI V4SI])
@@ -111,7 +123,8 @@
 (define_mode_iterator VDQI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
 
 ;; Vector modes, including 64-bit integer elements.
-(define_mode_iterator VDQX [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF DI V2DI])
+(define_mode_iterator VDQX [V8QI V16QI V4HI V8HI V2SI V4SI
+			    V4HF V8HF V2SF V4SF DI V2DI])
 
 ;; Vector modes including 64-bit integer elements, but no floats.
 (define_mode_iterator VDQIX [V8QI V16QI V4HI V8HI V2SI V4SI DI V2DI])
@@ -366,7 +379,8 @@
 
 ;; Define element mode for each vector mode.
 (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
-              (V4HI "HI") (V8HI "HI")
+			  (V4HI "HI") (V8HI "HI")
+			  (V4HF "HF") (V8HF "HF")
                           (V2SI "SI") (V4SI "SI")
                           (V2SF "SF") (V4SF "SF")
                           (DI "DI")   (V2DI "DI")])
@@ -383,6 +397,7 @@
 ;; size for structure lane/dup loads and stores.
 (define_mode_attr V_two_elem [(V8QI "HI")   (V16QI "HI")
                               (V4HI "SI")   (V8HI "SI")
+                              (V4HF "SF")   (V8HF "SF")
                               (V2SI "V2SI") (V4SI "V2SI")
                               (V2SF "V2SF") (V4SF "V2SF")
                               (DI "V2DI")   (V2DI "V2DI")])
@@ -390,6 +405,7 @@
 ;; Similar, for three elements.
 (define_mode_attr V_three_elem [(V8QI "BLK") (V16QI "BLK")
                                 (V4HI "BLK") (V8HI "BLK")
+                                (V4HF "BLK") (V8HF "BLK")
                                 (V2SI "BLK") (V4SI "BLK")
                                 (V2SF "BLK") (V4SF "BLK")
                                 (DI "EI")    (V2DI "EI")])
@@ -397,6 +413,7 @@
 ;; Similar, for four elements.
 (define_mode_attr V_four_elem [(V8QI "SI")   (V16QI "SI")
                                (V4HI "V4HI") (V8HI "V4HI")
+                               (V4HF "V4HF") (V8HF "V4HF")
                                (V2SI "V4SI") (V4SI "V4SI")
                                (V2SF "V4SF") (V4SF "V4SF")
                                (DI "OI")     (V2DI "OI")])
@@ -421,7 +438,8 @@
 
 ;; Modes with half the number of equal-sized elements.
 (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
-              (V4SI  "V2SI") (V4SF "V2SF") (V2DF "DF")
+			  (V8HF "V4HF") (V4SI  "V2SI")
+			  (V4SF "V2SF") (V2DF "DF")
                           (V2DI "DI")])
 
 ;; Same, but lower-case.
@@ -431,8 +449,9 @@
 
 ;; Modes with twice the number of equal-sized elements.
 (define_mode_attr V_DOUBLE [(V8QI "V16QI") (V4HI "V8HI")
-                (V2SI "V4SI") (V2SF "V4SF") (DF "V2DF")
-                            (DI "V2DI")])
+			    (V2SI "V4SI") (V4HF "V8HF")
+			    (V2SF "V4SF") (DF "V2DF")
+			    (DI "V2DI")])
 
 ;; Same, but lower-case.
 (define_mode_attr V_double [(V8QI "v16qi") (V4HI "v8hi")
@@ -454,8 +473,9 @@
 
 ;; Mode of result of comparison operations (and bit-select operand 1).
 (define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
-                    (V4HI "V4HI") (V8HI  "V8HI")
+				(V4HI "V4HI") (V8HI  "V8HI")
                                 (V2SI "V2SI") (V4SI  "V4SI")
+				(V4HF "V4HI") (V8HF  "V8HI")
                                 (V2SF "V2SI") (V4SF  "V4SI")
                                 (DI   "DI")   (V2DI  "V2DI")])
 
@@ -492,12 +512,14 @@
 (define_mode_attr V_uf_sclr [(V8QI "u8")  (V16QI "u8")
                  (V4HI "u16") (V8HI "u16")
                              (V2SI "32") (V4SI "32")
+                             (V4HF "u16") (V8HF "u16")
                              (V2SF "32") (V4SF "32")])
 
 (define_mode_attr V_sz_elem [(V8QI "8")  (V16QI "8")
                  (V4HI "16") (V8HI  "16")
                              (V2SI "32") (V4SI  "32")
                              (DI   "64") (V2DI  "64")
+			     (V4HF "16") (V8HF "16")
                  (V2SF "32") (V4SF  "32")])
 
 (define_mode_attr V_elem_ch [(V8QI "b")  (V16QI "b")
@@ -564,6 +586,7 @@
                             (DI   "true") (V2DI  "false")])
 
 (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16")
+				 (V4HF "4") (V8HF "8")
                                  (V4HI "4") (V8HI "8")
                                  (V2SI "2") (V4SI "4")
                                  (V2SF "2") (V4SF "4")
@@ -607,6 +630,7 @@
 (define_mode_attr q [(V8QI "") (V16QI "_q")
                      (V4HI "") (V8HI "_q")
                      (V2SI "") (V4SI "_q")
+		     (V4HF "") (V8HF "_q")
                      (V2SF "") (V4SF "_q")
                      (DI "")   (V2DI "_q")
                      (DF "")   (V2DF "_q")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 873330fc7de..26678663a64 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -320,11 +320,11 @@
   [(set_attr "type" "neon_load1_all_lanes<q>,neon_from_gp<q>")])
 
 (define_insn "vec_set<mode>_internal"
-  [(set (match_operand:VQ 0 "s_register_operand" "=w,w")
-        (vec_merge:VQ
-          (vec_duplicate:VQ
+  [(set (match_operand:VQ2 0 "s_register_operand" "=w,w")
+        (vec_merge:VQ2
+          (vec_duplicate:VQ2
             (match_operand:<V_elem> 1 "nonimmediate_operand" "Um,r"))
-          (match_operand:VQ 3 "s_register_operand" "0,0")
+          (match_operand:VQ2 3 "s_register_operand" "0,0")
           (match_operand:SI 2 "immediate_operand" "i,i")))]
   "TARGET_NEON"
 {
@@ -407,7 +407,7 @@
 (define_insn "vec_extract<mode>"
   [(set (match_operand:<V_elem> 0 "nonimmediate_operand" "=Um,r")
 	(vec_select:<V_elem>
-          (match_operand:VQ 1 "s_register_operand" "w,w")
+          (match_operand:VQ2 1 "s_register_operand" "w,w")
           (parallel [(match_operand:SI 2 "immediate_operand" "i,i")])))]
   "TARGET_NEON"
 {
@@ -2607,7 +2607,7 @@
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extend:SI
 	  (vec_select:<V_elem>
-	    (match_operand:VQ 1 "s_register_operand" "w")
+	    (match_operand:VQ2 1 "s_register_operand" "w")
 	    (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
   "TARGET_NEON"
 {
@@ -2634,7 +2634,7 @@
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(zero_extend:SI
 	  (vec_select:<V_elem>
-	    (match_operand:VQ 1 "s_register_operand" "w")
+	    (match_operand:VQ2 1 "s_register_operand" "w")
 	    (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
   "TARGET_NEON"
 {
@@ -2789,7 +2789,7 @@ if (BYTES_BIG_ENDIAN)
 })
 
 (define_expand "neon_vcreate<mode>"
-  [(match_operand:VDX 0 "s_register_operand" "")
+  [(match_operand:VD_RE 0 "s_register_operand" "")
    (match_operand:DI 1 "general_operand" "")]
   "TARGET_NEON"
 {
@@ -4140,7 +4140,7 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vreinterpretv8qi<mode>"
   [(match_operand:V8QI 0 "s_register_operand" "")
-   (match_operand:VDX 1 "s_register_operand" "")]
+   (match_operand:VD_RE 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4149,7 +4149,7 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vreinterpretv4hi<mode>"
   [(match_operand:V4HI 0 "s_register_operand" "")
-   (match_operand:VDX 1 "s_register_operand" "")]
+   (match_operand:VD_RE 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4158,7 +4158,7 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vreinterpretv2si<mode>"
   [(match_operand:V2SI 0 "s_register_operand" "")
-   (match_operand:VDX 1 "s_register_operand" "")]
+   (match_operand:VD_RE 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4167,7 +4167,7 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vreinterpretv2sf<mode>"
   [(match_operand:V2SF 0 "s_register_operand" "")
-   (match_operand:VDX 1 "s_register_operand" "")]
+   (match_operand:VD_RE 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4176,7 +4176,7 @@ if (BYTES_BIG_ENDIAN)
 
 (define_expand "neon_vreinterpretdi<mode>"
   [(match_operand:DI 0 "s_register_operand" "")
-   (match_operand:VDX 1 "s_register_operand" "")]
+   (match_operand:VD_RE 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4435,14 +4435,14 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "vec_load_lanesoi<mode>"
   [(set (match_operand:OI 0 "s_register_operand")
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 		   UNSPEC_VLD2))]
   "TARGET_NEON")
 
 (define_insn "neon_vld2<mode>"
   [(set (match_operand:OI 0 "s_register_operand" "=w")
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD2))]
   "TARGET_NEON"
   "vld2.<V_sz_elem>\t%h0, %A1"
@@ -4453,7 +4453,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:TI [(match_operand:<V_two_elem> 1 "neon_struct_operand" "Um")
                     (match_operand:TI 2 "s_register_operand" "0")
                     (match_operand:SI 3 "immediate_operand" "i")
-                    (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VD_LANE [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD2_LANE))]
   "TARGET_NEON"
 {
@@ -4478,7 +4478,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:OI [(match_operand:<V_two_elem> 1 "neon_struct_operand" "Um")
                     (match_operand:OI 2 "s_register_operand" "0")
                     (match_operand:SI 3 "immediate_operand" "i")
-                    (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ_HS [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD2_LANE))]
   "TARGET_NEON"
 {
@@ -4549,14 +4549,14 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "vec_store_lanesoi<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand")
 	(unspec:OI [(match_operand:OI 1 "s_register_operand")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST2))]
   "TARGET_NEON")
 
 (define_insn "neon_vst2<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
 	(unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
-		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 		   UNSPEC_VST2))]
   "TARGET_NEON"
   "vst2.<V_sz_elem>\t%h1, %A0"
@@ -4568,7 +4568,7 @@ if (BYTES_BIG_ENDIAN)
 	(unspec:<V_two_elem>
 	  [(match_operand:TI 1 "s_register_operand" "w")
 	   (match_operand:SI 2 "immediate_operand" "i")
-	   (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	   (unspec:VD_LANE [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 	  UNSPEC_VST2_LANE))]
   "TARGET_NEON"
 {
@@ -4593,7 +4593,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:<V_two_elem>
            [(match_operand:OI 1 "s_register_operand" "w")
             (match_operand:SI 2 "immediate_operand" "i")
-            (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+            (unspec:VQ_HS [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
            UNSPEC_VST2_LANE))]
   "TARGET_NEON"
 {
@@ -4646,7 +4646,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "vec_load_lanesci<mode>"
   [(match_operand:CI 0 "s_register_operand")
    (match_operand:CI 1 "neon_struct_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   emit_insn (gen_neon_vld3<mode> (operands[0], operands[1]));
@@ -4656,7 +4656,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "neon_vld3<mode>"
   [(match_operand:CI 0 "s_register_operand")
    (match_operand:CI 1 "neon_struct_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   rtx mem;
@@ -4671,7 +4671,7 @@ if (BYTES_BIG_ENDIAN)
 (define_insn "neon_vld3qa<mode>"
   [(set (match_operand:CI 0 "s_register_operand" "=w")
         (unspec:CI [(match_operand:EI 1 "neon_struct_operand" "Um")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD3A))]
   "TARGET_NEON"
 {
@@ -4691,7 +4691,7 @@ if (BYTES_BIG_ENDIAN)
   [(set (match_operand:CI 0 "s_register_operand" "=w")
         (unspec:CI [(match_operand:EI 1 "neon_struct_operand" "Um")
                     (match_operand:CI 2 "s_register_operand" "0")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD3B))]
   "TARGET_NEON"
 {
@@ -4712,7 +4712,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:EI [(match_operand:<V_three_elem> 1 "neon_struct_operand" "Um")
                     (match_operand:EI 2 "s_register_operand" "0")
                     (match_operand:SI 3 "immediate_operand" "i")
-                    (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VD_LANE [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD3_LANE))]
   "TARGET_NEON"
 {
@@ -4739,7 +4739,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:CI [(match_operand:<V_three_elem> 1 "neon_struct_operand" "Um")
                     (match_operand:CI 2 "s_register_operand" "0")
                     (match_operand:SI 3 "immediate_operand" "i")
-                    (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ_HS [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD3_LANE))]
   "TARGET_NEON"
 {
@@ -4819,7 +4819,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "vec_store_lanesci<mode>"
   [(match_operand:CI 0 "neon_struct_operand")
    (match_operand:CI 1 "s_register_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   emit_insn (gen_neon_vst3<mode> (operands[0], operands[1]));
@@ -4829,7 +4829,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "neon_vst3<mode>"
   [(match_operand:CI 0 "neon_struct_operand")
    (match_operand:CI 1 "s_register_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   rtx mem;
@@ -4844,7 +4844,7 @@ if (BYTES_BIG_ENDIAN)
 (define_insn "neon_vst3qa<mode>"
   [(set (match_operand:EI 0 "neon_struct_operand" "=Um")
         (unspec:EI [(match_operand:CI 1 "s_register_operand" "w")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST3A))]
   "TARGET_NEON"
 {
@@ -4863,7 +4863,7 @@ if (BYTES_BIG_ENDIAN)
 (define_insn "neon_vst3qb<mode>"
   [(set (match_operand:EI 0 "neon_struct_operand" "=Um")
         (unspec:EI [(match_operand:CI 1 "s_register_operand" "w")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST3B))]
   "TARGET_NEON"
 {
@@ -4884,7 +4884,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:<V_three_elem>
            [(match_operand:EI 1 "s_register_operand" "w")
             (match_operand:SI 2 "immediate_operand" "i")
-            (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+            (unspec:VD_LANE [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
            UNSPEC_VST3_LANE))]
   "TARGET_NEON"
 {
@@ -4911,7 +4911,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:<V_three_elem>
            [(match_operand:CI 1 "s_register_operand" "w")
             (match_operand:SI 2 "immediate_operand" "i")
-            (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+            (unspec:VQ_HS [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
            UNSPEC_VST3_LANE))]
   "TARGET_NEON"
 {
@@ -4966,7 +4966,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "vec_load_lanesxi<mode>"
   [(match_operand:XI 0 "s_register_operand")
    (match_operand:XI 1 "neon_struct_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   emit_insn (gen_neon_vld4<mode> (operands[0], operands[1]));
@@ -4976,7 +4976,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "neon_vld4<mode>"
   [(match_operand:XI 0 "s_register_operand")
    (match_operand:XI 1 "neon_struct_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   rtx mem;
@@ -4991,7 +4991,7 @@ if (BYTES_BIG_ENDIAN)
 (define_insn "neon_vld4qa<mode>"
   [(set (match_operand:XI 0 "s_register_operand" "=w")
         (unspec:XI [(match_operand:OI 1 "neon_struct_operand" "Um")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD4A))]
   "TARGET_NEON"
 {
@@ -5012,7 +5012,7 @@ if (BYTES_BIG_ENDIAN)
   [(set (match_operand:XI 0 "s_register_operand" "=w")
         (unspec:XI [(match_operand:OI 1 "neon_struct_operand" "Um")
                     (match_operand:XI 2 "s_register_operand" "0")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD4B))]
   "TARGET_NEON"
 {
@@ -5034,7 +5034,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:OI [(match_operand:<V_four_elem> 1 "neon_struct_operand" "Um")
                     (match_operand:OI 2 "s_register_operand" "0")
                     (match_operand:SI 3 "immediate_operand" "i")
-                    (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VD_LANE [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD4_LANE))]
   "TARGET_NEON"
 {
@@ -5062,7 +5062,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:XI [(match_operand:<V_four_elem> 1 "neon_struct_operand" "Um")
                     (match_operand:XI 2 "s_register_operand" "0")
                     (match_operand:SI 3 "immediate_operand" "i")
-                    (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ_HS [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VLD4_LANE))]
   "TARGET_NEON"
 {
@@ -5147,7 +5147,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "vec_store_lanesxi<mode>"
   [(match_operand:XI 0 "neon_struct_operand")
    (match_operand:XI 1 "s_register_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   emit_insn (gen_neon_vst4<mode> (operands[0], operands[1]));
@@ -5157,7 +5157,7 @@ if (BYTES_BIG_ENDIAN)
 (define_expand "neon_vst4<mode>"
   [(match_operand:XI 0 "neon_struct_operand")
    (match_operand:XI 1 "s_register_operand")
-   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+   (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
   "TARGET_NEON"
 {
   rtx mem;
@@ -5172,7 +5172,7 @@ if (BYTES_BIG_ENDIAN)
 (define_insn "neon_vst4qa<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
         (unspec:OI [(match_operand:XI 1 "s_register_operand" "w")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST4A))]
   "TARGET_NEON"
 {
@@ -5192,7 +5192,7 @@ if (BYTES_BIG_ENDIAN)
 (define_insn "neon_vst4qb<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
         (unspec:OI [(match_operand:XI 1 "s_register_operand" "w")
-                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                    (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
                    UNSPEC_VST4B))]
   "TARGET_NEON"
 {
@@ -5214,7 +5214,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:<V_four_elem>
            [(match_operand:OI 1 "s_register_operand" "w")
             (match_operand:SI 2 "immediate_operand" "i")
-            (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+            (unspec:VD_LANE [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
            UNSPEC_VST4_LANE))]
   "TARGET_NEON"
 {
@@ -5242,7 +5242,7 @@ if (BYTES_BIG_ENDIAN)
         (unspec:<V_four_elem>
            [(match_operand:XI 1 "s_register_operand" "w")
             (match_operand:SI 2 "immediate_operand" "i")
-            (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+            (unspec:VQ_HS [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
            UNSPEC_VST4_LANE))]
   "TARGET_NEON"
 {
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index bec9a8bb788..9f5bc88ce30 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -9069,6 +9069,8 @@ avr_eval_addr_attrib (rtx x)
       if (SYMBOL_REF_FLAGS (x) & SYMBOL_FLAG_IO)
 	{
 	  attr = lookup_attribute ("io", DECL_ATTRIBUTES (decl));
+         if (!attr || !TREE_VALUE (attr))
+           attr = lookup_attribute ("io_low", DECL_ATTRIBUTES (decl));
 	  gcc_assert (attr);
 	}
       if (!attr || !TREE_VALUE (attr))
diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index fda59fb3c06..71506ecbcd0 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -198,20 +198,7 @@ along with GCC; see the file COPYING3.  If not see
 #undef  SUBTARGET_OVERRIDE_OPTIONS
 #define SUBTARGET_OVERRIDE_OPTIONS					\
 do {									\
-  if (TARGET_64BIT && flag_pic != 1)					\
-    {									\
-      if (flag_pic > 1)							\
-        warning (0,							\
-	         "-fPIC ignored for target (all code is position independent)"\
-                 );                         				\
-      flag_pic = 1;							\
-    }									\
-  else if (!TARGET_64BIT && flag_pic)					\
-    {									\
-      warning (0, "-f%s ignored for target (all code is position independent)",\
-	       (flag_pic > 1) ? "PIC" : "pic");				\
-      flag_pic = 0;							\
-    }									\
+  flag_pic = TARGET_64BIT ? 1 : 0;                                      \
 } while (0)
 
 /* Define this macro if references to a symbol must be treated
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index ee31ee34c48..b892f086798 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1021,6 +1021,10 @@ DEF_FUNCTION_TYPE (VOID, PINT, QI, V8DI, V8SI, INT)
 DEF_FUNCTION_TYPE (VOID, PINT, QI, V4DI, V4SI, INT)
 DEF_FUNCTION_TYPE (VOID, PINT, QI, V2DI, V4SI, INT)
 DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8DI, V8DI, INT)
+DEF_FUNCTION_TYPE (VOID, PFLOAT, HI, V8DI, V16SF, INT)
+DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V16SI, V8DF, INT)
+DEF_FUNCTION_TYPE (VOID, PINT, HI, V8DI, V16SI, INT)
+DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V16SI, V8DI, INT)
 
 DEF_FUNCTION_TYPE (VOID, QI, V8SI, PCINT64, INT, INT)
 DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V4DI, V4DI, INT)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c69c738caa0..d78f4e7f175 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -30388,6 +30388,10 @@ enum ix86_builtins
   IX86_BUILTIN_GATHER3SIV16SI,
   IX86_BUILTIN_GATHER3SIV8DF,
   IX86_BUILTIN_GATHER3SIV8DI,
+  IX86_BUILTIN_SCATTERALTSIV8DF,
+  IX86_BUILTIN_SCATTERALTDIV16SF,
+  IX86_BUILTIN_SCATTERALTSIV8DI,
+  IX86_BUILTIN_SCATTERALTDIV16SI,
   IX86_BUILTIN_SCATTERDIV16SF,
   IX86_BUILTIN_SCATTERDIV16SI,
   IX86_BUILTIN_SCATTERDIV8DF,
@@ -34204,6 +34208,21 @@ ix86_init_mmx_sse_builtins (void)
   def_builtin (OPTION_MASK_ISA_AVX512VL, "__builtin_ia32_scatterdiv2di",
 	       VOID_FTYPE_PLONGLONG_QI_V2DI_V2DI_INT,
 	       IX86_BUILTIN_SCATTERDIV2DI);
+  def_builtin (OPTION_MASK_ISA_AVX512F, "__builtin_ia32_scatteraltsiv8df ",
+	       VOID_FTYPE_PDOUBLE_QI_V16SI_V8DF_INT,
+	       IX86_BUILTIN_SCATTERALTSIV8DF);
+
+  def_builtin (OPTION_MASK_ISA_AVX512F, "__builtin_ia32_scatteraltdiv8sf ",
+	       VOID_FTYPE_PFLOAT_HI_V8DI_V16SF_INT,
+	       IX86_BUILTIN_SCATTERALTDIV16SF);
+
+  def_builtin (OPTION_MASK_ISA_AVX512F, "__builtin_ia32_scatteraltsiv8di ",
+	       VOID_FTYPE_PLONGLONG_QI_V16SI_V8DI_INT,
+	       IX86_BUILTIN_SCATTERALTSIV8DI);
+
+  def_builtin (OPTION_MASK_ISA_AVX512F, "__builtin_ia32_scatteraltdiv8si ",
+	       VOID_FTYPE_PINT_HI_V8DI_V16SI_INT,
+	       IX86_BUILTIN_SCATTERALTDIV16SI);
 
   /* AVX512PF */
   def_builtin (OPTION_MASK_ISA_AVX512PF, "__builtin_ia32_gatherpfdpd",
@@ -39860,6 +39879,18 @@ rdseed_step:
     case IX86_BUILTIN_GATHERPFDPD:
       icode = CODE_FOR_avx512pf_gatherpfv8sidf;
       goto vec_prefetch_gen;
+    case IX86_BUILTIN_SCATTERALTSIV8DF:
+      icode = CODE_FOR_avx512f_scattersiv8df;
+      goto scatter_gen;
+    case IX86_BUILTIN_SCATTERALTDIV16SF:
+      icode = CODE_FOR_avx512f_scatterdiv16sf;
+      goto scatter_gen;
+    case IX86_BUILTIN_SCATTERALTSIV8DI:
+      icode = CODE_FOR_avx512f_scattersiv8di;
+      goto scatter_gen;
+    case IX86_BUILTIN_SCATTERALTDIV16SI:
+      icode = CODE_FOR_avx512f_scatterdiv16si;
+      goto scatter_gen;
     case IX86_BUILTIN_GATHERPFDPS:
       icode = CODE_FOR_avx512pf_gatherpfv16sisf;
       goto vec_prefetch_gen;
@@ -40123,6 +40154,36 @@ rdseed_step:
       mode3 = insn_data[icode].operand[3].mode;
       mode4 = insn_data[icode].operand[4].mode;
 
+      /* Scatter instruction stores operand op3 to memory with
+	 indices from op2 and scale from op4 under writemask op1.
+	 If index operand op2 has more elements then source operand
+	 op3 one need to use only its low half. And vice versa.  */
+      switch (fcode)
+	{
+	case IX86_BUILTIN_SCATTERALTSIV8DF:
+	case IX86_BUILTIN_SCATTERALTSIV8DI:
+	  half = gen_reg_rtx (V8SImode);
+	  if (!nonimmediate_operand (op2, V16SImode))
+	    op2 = copy_to_mode_reg (V16SImode, op2);
+	  emit_insn (gen_vec_extract_lo_v16si (half, op2));
+	  op2 = half;
+	  break;
+	case IX86_BUILTIN_SCATTERALTDIV16SF:
+	case IX86_BUILTIN_SCATTERALTDIV16SI:
+	  half = gen_reg_rtx (mode3);
+	  if (mode3 == V8SFmode)
+	    gen = gen_vec_extract_lo_v16sf;
+	  else
+	    gen = gen_vec_extract_lo_v16si;
+	  if (!nonimmediate_operand (op3, GET_MODE (op3)))
+	    op3 = copy_to_mode_reg (GET_MODE (op3), op3);
+	  emit_insn (gen (half, op3));
+	  op3 = half;
+	  break;
+	default:
+	  break;
+	}
+
       /* Force memory operand only with base register here.  But we
 	 don't want to do it on memory operand for other builtin
 	 functions.  */
@@ -41202,6 +41263,62 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
   return ix86_get_builtin (code);
 }
 
+/* Returns a decl of a function that implements scatter store with
+   register type VECTYPE and index type INDEX_TYPE and SCALE.
+   Return NULL_TREE if it is not available.  */
+
+static tree
+ix86_vectorize_builtin_scatter (const_tree vectype,
+				const_tree index_type, int scale)
+{
+  bool si;
+  enum ix86_builtins code;
+
+  if (!TARGET_AVX512F)
+    return NULL_TREE;
+
+  if ((TREE_CODE (index_type) != INTEGER_TYPE
+       && !POINTER_TYPE_P (index_type))
+      || (TYPE_MODE (index_type) != SImode
+	  && TYPE_MODE (index_type) != DImode))
+    return NULL_TREE;
+
+  if (TYPE_PRECISION (index_type) > POINTER_SIZE)
+    return NULL_TREE;
+
+  /* v*scatter* insn sign extends index to pointer mode.  */
+  if (TYPE_PRECISION (index_type) < POINTER_SIZE
+      && TYPE_UNSIGNED (index_type))
+    return NULL_TREE;
+
+  /* Scale can be 1, 2, 4 or 8.  */
+  if (scale <= 0
+      || scale > 8
+      || (scale & (scale - 1)) != 0)
+    return NULL_TREE;
+
+  si = TYPE_MODE (index_type) == SImode;
+  switch (TYPE_MODE (vectype))
+    {
+    case V8DFmode:
+      code = si ? IX86_BUILTIN_SCATTERALTSIV8DF : IX86_BUILTIN_SCATTERDIV8DF;
+      break;
+    case V8DImode:
+      code = si ? IX86_BUILTIN_SCATTERALTSIV8DI : IX86_BUILTIN_SCATTERDIV8DI;
+      break;
+    case V16SFmode:
+      code = si ? IX86_BUILTIN_SCATTERSIV16SF : IX86_BUILTIN_SCATTERALTDIV16SF;
+      break;
+    case V16SImode:
+      code = si ? IX86_BUILTIN_SCATTERSIV16SI : IX86_BUILTIN_SCATTERALTDIV16SI;
+      break;
+    default:
+      return NULL_TREE;
+    }
+
+  return ix86_builtins[code];
+}
+
 /* Returns a code for a target-specific builtin that implements
    reciprocal of the function, or NULL_TREE if not available.  */
 
@@ -52332,6 +52449,9 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 #undef TARGET_VECTORIZE_BUILTIN_GATHER
 #define TARGET_VECTORIZE_BUILTIN_GATHER ix86_vectorize_builtin_gather
 
+#undef TARGET_VECTORIZE_BUILTIN_SCATTER
+#define TARGET_VECTORIZE_BUILTIN_SCATTER ix86_vectorize_builtin_scatter
+
 #undef TARGET_BUILTIN_RECIPROCAL
 #define TARGET_BUILTIN_RECIPROCAL ix86_builtin_reciprocal
 
diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index ca15868dde1..4a7812c2499 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -453,17 +453,18 @@ prepare_target_image (const char *target_compiler, int argc, char **argv)
   fork_execute (objcopy_argv[0], CONST_CAST (char **, objcopy_argv), false);
 
   /* Objcopy has created symbols, containing the input file name with
-     special characters replaced with '_'.  We are going to rename these
-     new symbols.  */
+     non-alphanumeric characters replaced by underscores.
+     We are going to rename these new symbols.  */
   size_t symbol_name_len = strlen (target_so_filename);
   char *symbol_name = XALLOCAVEC (char, symbol_name_len + 1);
-  for (size_t i = 0; i <= symbol_name_len; i++)
+  for (size_t i = 0; i < symbol_name_len; i++)
     {
       char c = target_so_filename[i];
-      if ((c == '/') || (c == '.'))
+      if (!ISALNUM (c))
 	c = '_';
       symbol_name[i] = c;
     }
+  symbol_name[symbol_name_len] = '\0';
 
   char *opt_for_objcopy[3];
   opt_for_objcopy[0] = XALLOCAVEC (char, sizeof ("_binary__start=")
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index e6853680078..1069d0e85c1 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -102,6 +102,8 @@ nvptx_option_override (void)
   flag_toplevel_reorder = 1;
   /* Assumes that it will see only hard registers.  */
   flag_var_tracking = 0;
+  write_symbols = NO_DEBUG;
+  debug_info_level = DINFO_LEVEL_NONE;
 
   declared_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   needed_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
@@ -681,29 +683,30 @@ write_func_decl_from_insn (std::stringstream &s, rtx result, rtx pat,
 
   s << name;
 
-  int nargs = XVECLEN (pat, 0) - 1;
-  if (nargs > 0)
+  int arg_end = XVECLEN (pat, 0);
+      
+  if (1 < arg_end)
     {
+      const char *comma = "";
       s << " (";
-      for (int i = 0; i < nargs; i++)
+      for (int i = 1; i < arg_end; i++)
 	{
-	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  rtx t = XEXP (XVECEXP (pat, 0, i), 0);
 	  machine_mode mode = GET_MODE (t);
 	  int count = maybe_split_mode (&mode);
 
-	  while (count-- > 0)
+	  while (count--)
 	    {
-	      s << ".param";
+	      s << comma << ".param";
 	      s << nvptx_ptx_type_from_mode (mode, false);
 	      s << " ";
 	      if (callprototype)
 		s << "_";
 	      else
-		s << "%arg" << i;
+		s << "%arg" << i - 1;
 	      if (mode == QImode || mode == HImode)
 		s << "[1]";
-	      if (i + 1 < nargs || count > 0)
-		s << ", ";
+	      comma = ", ";
 	    }
 	}
       s << ")";
@@ -775,19 +778,17 @@ nvptx_end_call_args (void)
 void
 nvptx_expand_call (rtx retval, rtx address)
 {
-  int nargs;
+  int nargs = 0;
   rtx callee = XEXP (address, 0);
   rtx pat, t;
   rtvec vec;
   bool external_decl = false;
+  rtx varargs = NULL_RTX;
+  tree decl_type = NULL_TREE;
 
-  nargs = 0;
   for (t = cfun->machine->call_args; t; t = XEXP (t, 1))
     nargs++;
 
-  bool has_varargs = false;
-  tree decl_type = NULL_TREE;
-
   if (!call_insn_operand (callee, Pmode))
     {
       callee = force_reg (Pmode, callee);
@@ -806,6 +807,7 @@ nvptx_expand_call (rtx retval, rtx address)
 	    external_decl = true;
 	}
     }
+
   if (cfun->machine->funtype
       /* It's possible to construct testcases where we call a variable.
 	 See compile/20020129-1.c.  stdarg_p will crash so avoid calling it
@@ -814,31 +816,18 @@ nvptx_expand_call (rtx retval, rtx address)
 	  || TREE_CODE (cfun->machine->funtype) == METHOD_TYPE)
       && stdarg_p (cfun->machine->funtype))
     {
-      has_varargs = true;
-      cfun->machine->has_call_with_varargs = true;
-    }
-  vec = rtvec_alloc (nargs + 1 + (has_varargs ? 1 : 0));
-  pat = gen_rtx_PARALLEL (VOIDmode, vec);
-  if (has_varargs)
-    {
-      rtx this_arg = gen_reg_rtx (Pmode);
+      varargs = gen_reg_rtx (Pmode);
       if (Pmode == DImode)
-	emit_move_insn (this_arg, stack_pointer_rtx);
+	emit_move_insn (varargs, stack_pointer_rtx);
       else
-	emit_move_insn (this_arg, stack_pointer_rtx);
-      XVECEXP (pat, 0, nargs + 1) = gen_rtx_USE (VOIDmode, this_arg);
-    }
-
-  /* Construct the call insn, including a USE for each argument pseudo
-     register.  These will be used when printing the insn.  */
-  int i;
-  rtx arg;
-  for (i = 1, arg = cfun->machine->call_args; arg; arg = XEXP (arg, 1), i++)
-    {
-      rtx this_arg = XEXP (arg, 0);
-      XVECEXP (pat, 0, i) = gen_rtx_USE (VOIDmode, this_arg);
+	emit_move_insn (varargs, stack_pointer_rtx);
+      cfun->machine->has_call_with_varargs = true;
     }
+  vec = rtvec_alloc (nargs + 1 + (varargs ? 1 : 0));
+  pat = gen_rtx_PARALLEL (VOIDmode, vec);
 
+  int vec_pos = 0;
+  
   rtx tmp_retval = retval;
   t = gen_rtx_CALL (VOIDmode, address, const0_rtx);
   if (retval != NULL_RTX)
@@ -847,7 +836,20 @@ nvptx_expand_call (rtx retval, rtx address)
 	tmp_retval = gen_reg_rtx (GET_MODE (retval));
       t = gen_rtx_SET (tmp_retval, t);
     }
-  XVECEXP (pat, 0, 0) = t;
+  XVECEXP (pat, 0, vec_pos++) = t;
+
+  /* Construct the call insn, including a USE for each argument pseudo
+     register.  These will be used when printing the insn.  */
+  for (rtx arg = cfun->machine->call_args; arg; arg = XEXP (arg, 1))
+    {
+      rtx this_arg = XEXP (arg, 0);
+      XVECEXP (pat, 0, vec_pos++) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  if (varargs)
+      XVECEXP (pat, 0, vec_pos++) = gen_rtx_USE (VOIDmode, varargs);
+
+  gcc_assert (vec_pos = XVECLEN (pat, 0));
 
   /* If this is a libcall, decl_type is NULL. For a call to a non-libcall
      undeclared function, we'll have an external decl without arg types.
@@ -1498,16 +1500,14 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
   static int labelno;
   bool needs_tgt = register_operand (callee, Pmode);
   rtx pat = PATTERN (insn);
-  int nargs = XVECLEN (pat, 0) - 1;
+  int arg_end = XVECLEN (pat, 0);
   tree decl = NULL_TREE;
 
   fprintf (asm_out_file, "\t{\n");
   if (result != NULL)
-    {
-      fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n",
-	       nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
-					 false));
-    }
+    fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n",
+	     nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+				       false));
 
   /* Ensure we have a ptx declaration in the output if necessary.  */
   if (GET_CODE (callee) == SYMBOL_REF)
@@ -1527,20 +1527,20 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
       fputs (s.str().c_str(), asm_out_file);
     }
 
-  for (int i = 0, argno = 0; i < nargs; i++)
+  for (int i = 1, argno = 0; i < arg_end; i++)
     {
-      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      rtx t = XEXP (XVECEXP (pat, 0, i), 0);
       machine_mode mode = GET_MODE (t);
       int count = maybe_split_mode (&mode);
 
-      while (count-- > 0)
+      while (count--)
 	fprintf (asm_out_file, "\t\t.param%s %%out_arg%d%s;\n",
 		 nvptx_ptx_type_from_mode (mode, false), argno++,
 		 mode == QImode || mode == HImode ? "[1]" : "");
     }
-  for (int i = 0, argno = 0; i < nargs; i++)
+  for (int i = 1, argno = 0; i < arg_end; i++)
     {
-      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      rtx t = XEXP (XVECEXP (pat, 0, i), 0);
       gcc_assert (REG_P (t));
       machine_mode mode = GET_MODE (t);
       int count = maybe_split_mode (&mode);
@@ -1552,7 +1552,7 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
       else
 	{
 	  int n = 0;
-	  while (count-- > 0)
+	  while (count--)
 	    fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d$%d;\n",
 		     nvptx_ptx_type_from_mode (mode, false), argno++,
 		     REGNO (t), n++);
@@ -1572,33 +1572,30 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
   else
     output_address (callee);
 
-  if (nargs > 0 || (decl && DECL_STATIC_CHAIN (decl)))
+  if (arg_end > 1 || (decl && DECL_STATIC_CHAIN (decl)))
     {
+      const char *comma = "";
+      
       fprintf (asm_out_file, ", (");
-      int i, argno;
-      for (i = 0, argno = 0; i < nargs; i++)
+      for (int i = 1, argno = 0; i < arg_end; i++)
 	{
-	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  rtx t = XEXP (XVECEXP (pat, 0, i), 0);
 	  machine_mode mode = GET_MODE (t);
 	  int count = maybe_split_mode (&mode);
 
-	  while (count-- > 0)
+	  while (count--)
 	    {
-	      fprintf (asm_out_file, "%%out_arg%d", argno++);
-	      if (i + 1 < nargs || count > 0)
-		fprintf (asm_out_file, ", ");
+	      fprintf (asm_out_file, "%s%%out_arg%d", comma, argno++);
+	      comma = ", ";
 	    }
 	}
       if (decl && DECL_STATIC_CHAIN (decl))
-	{
-	  if (i > 0)
-	    fprintf (asm_out_file, ", ");
-	  fprintf (asm_out_file, "%s",
-		   reg_names [OUTGOING_STATIC_CHAIN_REGNUM]);
-	}
+	fprintf (asm_out_file, "%s%s", comma,
+		 reg_names [OUTGOING_STATIC_CHAIN_REGNUM]);
 
       fprintf (asm_out_file, ")");
     }
+
   if (needs_tgt)
     {
       fprintf (asm_out_file, ", ");
@@ -1922,12 +1919,14 @@ nvptx_reorg_subreg (void)
 	  || GET_CODE (PATTERN (insn)) == USE
 	  || GET_CODE (PATTERN (insn)) == CLOBBER)
 	continue;
+
       qiregs.n_in_use = 0;
       hiregs.n_in_use = 0;
       siregs.n_in_use = 0;
       diregs.n_in_use = 0;
       extract_insn (insn);
       enum attr_subregs_ok s_ok = get_attr_subregs_ok (insn);
+
       for (int i = 0; i < recog_data.n_operands; i++)
 	{
 	  rtx op = recog_data.operand[i];
@@ -1983,9 +1982,10 @@ nvptx_reorg_subreg (void)
 }
 
 /* PTX-specific reorganization
-   1) mark now-unused registers, so function begin doesn't declare
+   - Compute live registers
+   - Mark now-unused registers, so function begin doesn't declare
    unused registers.
-   2) replace subregs with suitable sequences.
+   - Replace subregs with suitable sequences.
 */
 
 static void
@@ -1997,6 +1997,7 @@ nvptx_reorg (void)
 
   thread_prologue_and_epilogue_insns ();
 
+  /* Compute live regs */
   df_clear_flags (DF_LR_RUN_DCE);
   df_set_flags (DF_NO_INSN_RESCAN | DF_NO_HARD_REGS);
   df_analyze ();
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index b857e53bb22..049f34c6fbf 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -197,9 +197,9 @@
 (define_predicate "call_operation"
   (match_code "parallel")
 {
-  int i;
+  int arg_end = XVECLEN (op, 0);
 
-  for (i = 1; i < XVECLEN (op, 0); i++)
+  for (int i = 1; i < arg_end; i++)
     {
       rtx elt = XVECEXP (op, 0, i);
 
@@ -783,7 +783,7 @@
 	   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
 	    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
   ""
-  "%.\\tsetp%c1 %0,%2,%3;")
+  "%.\\tsetp%c1\\t%0, %2, %3;")
 
 (define_insn "*cmp<mode>"
   [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
@@ -791,7 +791,7 @@
 	   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
 	    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
   ""
-  "%.\\tsetp%c1 %0,%2,%3;")
+  "%.\\tsetp%c1\\t%0, %2, %3;")
 
 (define_insn "jump"
   [(set (pc)
@@ -908,7 +908,7 @@
 	  [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
 	   (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
   ""
-  "%.\\tset%t0%c1 %0,%2,%3;")
+  "%.\\tset%t0%c1\\t%0, %2, %3;")
 
 (define_insn "setcc_int<mode>"
   [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
@@ -916,7 +916,7 @@
 	   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
 	    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
   ""
-  "%.\\tset%t0%c1 %0,%2,%3;")
+  "%.\\tset%t0%c1\\t%0, %2, %3;")
 
 (define_insn "setcc_float<mode>"
   [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
@@ -924,7 +924,7 @@
 	   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
 	    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
   ""
-  "%.\\tset%t0%c1 %0,%2,%3;")
+  "%.\\tset%t0%c1\\t%0, %2, %3;")
 
 (define_insn "setcc_float<mode>"
   [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
@@ -932,7 +932,7 @@
 	   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
 	    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
   ""
-  "%.\\tset%t0%c1 %0,%2,%3;")
+  "%.\\tset%t0%c1\\t%0, %2, %3;")
 
 (define_expand "cstorebi4"
   [(set (match_operand:SI 0 "nvptx_register_operand")
@@ -1351,14 +1351,12 @@
    (match_operand:SI 7 "const_int_operand")]		;; failure model
   ""
 {
-  emit_insn (gen_atomic_compare_and_swap<mode>_1 (operands[1], operands[2], operands[3],
-					          operands[4], operands[6]));
-
-  rtx tmp = gen_reg_rtx (GET_MODE (operands[0]));
-  emit_insn (gen_cstore<mode>4 (tmp,
-				gen_rtx_EQ (SImode, operands[1], operands[3]),
-				operands[1], operands[3]));
-  emit_insn (gen_andsi3 (operands[0], tmp, GEN_INT (1)));
+  emit_insn (gen_atomic_compare_and_swap<mode>_1
+    (operands[1], operands[2], operands[3], operands[4], operands[6]));
+
+  rtx cond = gen_reg_rtx (BImode);
+  emit_move_insn (cond, gen_rtx_EQ (BImode, operands[1], operands[3]));
+  emit_insn (gen_sel_truesi (operands[0], cond, GEN_INT (1), GEN_INT (0)));
   DONE;
 })
 
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 4170f38b7db..9cacca49050 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1957,6 +1957,16 @@
   "vperm %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])
 
+(define_insn "altivec_vperm_v8hiv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+	(unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v")
+   	               (match_operand:V8HI 2 "register_operand" "v")
+		       (match_operand:V16QI 3 "register_operand" "v")]
+		   UNSPEC_VPERM))]
+  "TARGET_ALTIVEC"
+  "vperm %0,%1,%2,%3"
+  [(set_attr "type" "vecperm")])
+
 (define_expand "altivec_vperm_<mode>_uns"
   [(set (match_operand:VM 0 "register_operand" "=v")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
@@ -3161,6 +3171,33 @@
   "<VI_unit>"
   "")
 
+(define_expand "mulv16qi3"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (mult:V16QI (match_operand:V16QI 1 "register_operand" "v")
+                    (match_operand:V16QI 2 "register_operand" "v")))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx even = gen_reg_rtx (V8HImode);
+  rtx odd = gen_reg_rtx (V8HImode);
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtvec v = rtvec_alloc (16);
+  int i;
+
+  for (i = 0; i < 8; ++i) {
+    RTVEC_ELT (v, 2 * i)
+     = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 1 : 31 - 2 * i);
+    RTVEC_ELT (v, 2 * i + 1)
+     = gen_rtx_CONST_INT (QImode, BYTES_BIG_ENDIAN ? 2 * i + 17 : 15 - 2 * i);
+  }
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vmulesb (even, operands[1], operands[2]));
+  emit_insn (gen_altivec_vmulosb (odd, operands[1], operands[2]));
+  emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], even, odd, mask));
+  DONE;
+}")
+
 (define_expand "altivec_negv4sf2"
   [(use (match_operand:V4SF 0 "register_operand" ""))
    (use (match_operand:V4SF 1 "register_operand" ""))]
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8107bec8e6e..7278792d0dc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18198,8 +18198,21 @@ rs6000_secondary_reload_gpr (rtx reg, rtx mem, rtx scratch, bool store_p)
 
   if (GET_CODE (addr) == PRE_MODIFY)
     {
+      gcc_assert (REG_P (XEXP (addr, 0))
+		  && GET_CODE (XEXP (addr, 1)) == PLUS
+		  && XEXP (XEXP (addr, 1), 0) == XEXP (addr, 0));
       scratch_or_premodify = XEXP (addr, 0);
-      gcc_assert (REG_P (scratch_or_premodify));
+      if (!HARD_REGISTER_P (scratch_or_premodify))
+	/* If we have a pseudo here then reload will have arranged
+	   to have it replaced, but only in the original insn.
+	   Use the replacement here too.  */
+	scratch_or_premodify = find_replacement (&XEXP (addr, 0));
+
+      /* RTL emitted by rs6000_secondary_reload_gpr uses RTL
+	 expressions from the original insn, without unsharing them.
+	 Any RTL that points into the original insn will of course
+	 have register replacements applied.  That is why we don't
+	 need to look for replacements under the PLUS.  */
       addr = XEXP (addr, 1);
     }
   gcc_assert (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM);
@@ -34930,10 +34943,8 @@ emit_fusion_gpr_load (rtx target, rtx mem)
    throughout the computation, we can get correct behavior by replacing
    M with M' as follows:
 
-            { M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23]
-    M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31]
-            { M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23]
-            { M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31]
+    M'[i] = { (M[i]+8)%16      : M[i] in [0,15]
+            { ((M[i]+8)%16)+16 : M[i] in [16,31]
 
    This seems promising at first, since we are just replacing one mask
    with another.  But certain masks are preferable to others.  If M
@@ -34951,7 +34962,11 @@ emit_fusion_gpr_load (rtx target, rtx mem)
    mask to be produced by an UNSPEC_LVSL, in which case the mask 
    cannot be known at compile time.  In such a case we would have to
    generate several instructions to compute M' as above at run time,
-   and a cost model is needed again.  */
+   and a cost model is needed again.
+
+   However, when the mask M for an UNSPEC_VPERM is loaded from the
+   constant pool, we can replace M with M' as above at no cost
+   beyond adding a constant pool entry.  */
 
 /* This is based on the union-find logic in web.c.  web_entry_base is
    defined in df.h.  */
@@ -35003,7 +35018,8 @@ enum special_handling_values {
   SH_EXTRACT,
   SH_SPLAT,
   SH_XXPERMDI,
-  SH_CONCAT
+  SH_CONCAT,
+  SH_VPERM
 };
 
 /* Union INSN with all insns containing definitions that reach USE.
@@ -35138,6 +35154,64 @@ insn_is_swap_p (rtx insn)
   return 1;
 }
 
+/* Return TRUE if insn is a swap fed by a load from the constant pool.  */
+static bool
+const_load_sequence_p (swap_web_entry *insn_entry, rtx insn)
+{
+  unsigned uid = INSN_UID (insn);
+  if (!insn_entry[uid].is_swap || insn_entry[uid].is_load)
+    return false;
+
+  /* Find the unique use in the swap and locate its def.  If the def
+     isn't unique, punt.  */
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+  df_ref use;
+  FOR_EACH_INSN_INFO_USE (use, insn_info)
+    {
+      struct df_link *def_link = DF_REF_CHAIN (use);
+      if (!def_link || def_link->next)
+	return false;
+
+      rtx def_insn = DF_REF_INSN (def_link->ref);
+      unsigned uid2 = INSN_UID (def_insn);
+      if (!insn_entry[uid2].is_load || !insn_entry[uid2].is_swap)
+	return false;
+
+      rtx body = PATTERN (def_insn);
+      if (GET_CODE (body) != SET
+	  || GET_CODE (SET_SRC (body)) != VEC_SELECT
+	  || GET_CODE (XEXP (SET_SRC (body), 0)) != MEM)
+	return false;
+
+      rtx mem = XEXP (SET_SRC (body), 0);
+      rtx base_reg = XEXP (mem, 0);
+
+      df_ref base_use;
+      insn_info = DF_INSN_INFO_GET (def_insn);
+      FOR_EACH_INSN_INFO_USE (base_use, insn_info)
+	{
+	  if (!rtx_equal_p (DF_REF_REG (base_use), base_reg))
+	    continue;
+
+	  struct df_link *base_def_link = DF_REF_CHAIN (base_use);
+	  if (!base_def_link || base_def_link->next)
+	    return false;
+
+	  rtx tocrel_insn = DF_REF_INSN (base_def_link->ref);
+	  rtx tocrel_body = PATTERN (tocrel_insn);
+	  rtx base, offset;
+	  if (GET_CODE (tocrel_body) != SET)
+	    return false;
+	  if (!toc_relative_expr_p (SET_SRC (tocrel_body), false))
+	    return false;
+	  split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
+	  if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
+	    return false;
+	}
+    }
+  return true;
+}
+
 /* Return 1 iff OP is an operand that will not be affected by having
    vector doublewords swapped in memory.  */
 static unsigned int
@@ -35397,6 +35471,32 @@ insn_is_swappable_p (swap_web_entry *insn_entry, rtx insn,
       return 1;
     }
 
+  /* An UNSPEC_VPERM is ok if the mask operand is loaded from the
+     constant pool.  */
+  if (GET_CODE (body) == SET
+      && GET_CODE (SET_SRC (body)) == UNSPEC
+      && XINT (SET_SRC (body), 1) == UNSPEC_VPERM
+      && XVECLEN (SET_SRC (body), 0) == 3
+      && GET_CODE (XVECEXP (SET_SRC (body), 0, 2)) == REG)
+    {
+      rtx mask_reg = XVECEXP (SET_SRC (body), 0, 2);
+      struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+      df_ref use;
+      FOR_EACH_INSN_INFO_USE (use, insn_info)
+	if (rtx_equal_p (DF_REF_REG (use), mask_reg))
+	  {
+	    struct df_link *def_link = DF_REF_CHAIN (use);
+	    /* Punt if multiple definitions for this reg.  */
+	    if (def_link && !def_link->next &&
+		const_load_sequence_p (insn_entry,
+				       DF_REF_INSN (def_link->ref)))
+	      {
+		*special = SH_VPERM;
+		return 1;
+	      }
+	  }
+    }
+
   /* Otherwise check the operands for vector lane violations.  */
   return rtx_is_swappable_p (body, special);
 }
@@ -35729,6 +35829,105 @@ adjust_concat (rtx_insn *insn)
     fprintf (dump_file, "Reversing inputs for concat %d\n", INSN_UID (insn));
 }
 
+/* Given an UNSPEC_VPERM insn, modify the mask loaded from the
+   constant pool to reflect swapped doublewords.  */
+static void
+adjust_vperm (rtx_insn *insn)
+{
+  /* We previously determined that the UNSPEC_VPERM was fed by a
+     swap of a swapping load of a TOC-relative constant pool symbol.
+     Find the MEM in the swapping load and replace it with a MEM for
+     the adjusted mask constant.  */
+  rtx set = PATTERN (insn);
+  rtx mask_reg = XVECEXP (SET_SRC (set), 0, 2);
+
+  /* Find the swap.  */
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+  df_ref use;
+  rtx_insn *swap_insn = 0;
+  FOR_EACH_INSN_INFO_USE (use, insn_info)
+    if (rtx_equal_p (DF_REF_REG (use), mask_reg))
+      {
+	struct df_link *def_link = DF_REF_CHAIN (use);
+	gcc_assert (def_link && !def_link->next);
+	swap_insn = DF_REF_INSN (def_link->ref);
+	break;
+      }
+  gcc_assert (swap_insn);
+  
+  /* Find the load.  */
+  insn_info = DF_INSN_INFO_GET (swap_insn);
+  rtx_insn *load_insn = 0;
+  FOR_EACH_INSN_INFO_USE (use, insn_info)
+    {
+      struct df_link *def_link = DF_REF_CHAIN (use);
+      gcc_assert (def_link && !def_link->next);
+      load_insn = DF_REF_INSN (def_link->ref);
+      break;
+    }
+  gcc_assert (load_insn);
+
+  /* Find the TOC-relative symbol access.  */
+  insn_info = DF_INSN_INFO_GET (load_insn);
+  rtx_insn *tocrel_insn = 0;
+  FOR_EACH_INSN_INFO_USE (use, insn_info)
+    {
+      struct df_link *def_link = DF_REF_CHAIN (use);
+      gcc_assert (def_link && !def_link->next);
+      tocrel_insn = DF_REF_INSN (def_link->ref);
+      break;
+    }
+  gcc_assert (tocrel_insn);
+
+  /* Find the embedded CONST_VECTOR.  We have to call toc_relative_expr_p
+     to set tocrel_base; otherwise it would be unnecessary as we've
+     already established it will return true.  */
+  rtx base, offset;
+  if (!toc_relative_expr_p (SET_SRC (PATTERN (tocrel_insn)), false))
+    gcc_unreachable ();
+  split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
+  rtx const_vector = get_pool_constant (base);
+  gcc_assert (GET_CODE (const_vector) == CONST_VECTOR);
+
+  /* Create an adjusted mask from the initial mask.  */
+  unsigned int new_mask[16], i, val;
+  for (i = 0; i < 16; ++i) {
+    val = INTVAL (XVECEXP (const_vector, 0, i));
+    if (val < 16)
+      new_mask[i] = (val + 8) % 16;
+    else
+      new_mask[i] = ((val + 8) % 16) + 16;
+  }
+
+  /* Create a new CONST_VECTOR and a MEM that references it.  */
+  rtx vals = gen_rtx_PARALLEL (V16QImode, rtvec_alloc (16));
+  for (i = 0; i < 16; ++i)
+    XVECEXP (vals, 0, i) = GEN_INT (new_mask[i]);
+  rtx new_const_vector = gen_rtx_CONST_VECTOR (V16QImode, XVEC (vals, 0));
+  rtx new_mem = force_const_mem (V16QImode, new_const_vector);
+  /* This gives us a MEM whose base operand is a SYMBOL_REF, which we
+     can't recognize.  Force the SYMBOL_REF into a register.  */
+  if (!REG_P (XEXP (new_mem, 0))) {
+    rtx base_reg = force_reg (Pmode, XEXP (new_mem, 0));
+    XEXP (new_mem, 0) = base_reg;
+    /* Move the newly created insn ahead of the load insn.  */
+    rtx_insn *force_insn = get_last_insn ();
+    remove_insn (force_insn);
+    rtx_insn *before_load_insn = PREV_INSN (load_insn);
+    add_insn_after (force_insn, before_load_insn, BLOCK_FOR_INSN (load_insn));
+    df_insn_rescan (before_load_insn);
+    df_insn_rescan (force_insn);
+  }
+
+  /* Replace the MEM in the load instruction and rescan it.  */
+  XEXP (SET_SRC (PATTERN (load_insn)), 0) = new_mem;
+  INSN_CODE (load_insn) = -1; /* Force re-recognition.  */
+  df_insn_rescan (load_insn);
+
+  if (dump_file)
+    fprintf (dump_file, "Adjusting mask for vperm %d\n", INSN_UID (insn));
+}
+
 /* The insn described by INSN_ENTRY[I] can be swapped, but only
    with special handling.  Take care of that here.  */
 static void
@@ -35783,6 +35982,10 @@ handle_special_swappables (swap_web_entry *insn_entry, unsigned i)
       /* Reverse the order of a concatenation operation.  */
       adjust_concat (insn);
       break;
+    case SH_VPERM:
+      /* Change the mask loaded from the constant pool for a VPERM.  */
+      adjust_vperm (insn);
+      break;
     }
 }
 
@@ -35859,6 +36062,8 @@ dump_swap_insn_table (swap_web_entry *insn_entry)
 	      fputs ("special:xxpermdi ", dump_file);
 	    else if (insn_entry[i].special_handling == SH_CONCAT)
 	      fputs ("special:concat ", dump_file);
+	    else if (insn_entry[i].special_handling == SH_VPERM)
+	      fputs ("special:vperm ", dump_file);
 	  }
 	if (insn_entry[i].web_not_optimizable)
 	  fputs ("unoptimizable ", dump_file);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index cbfc80073c9..d276ab21ab3 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2265,6 +2265,11 @@ s390_contiguous_bitmask_vector_p (rtx op, int *start, int *end)
     return false;
 
   size = GET_MODE_UNIT_BITSIZE (GET_MODE (op));
+
+  /* We cannot deal with V1TI/V1TF. This would require a vgmq.  */
+  if (size > 64)
+    return false;
+
   mask = UINTVAL (elt);
   if (s390_contiguous_bitmask_p (mask, size, start,
 				 end != NULL ? &length : NULL))
@@ -7703,11 +7708,12 @@ replace_ltrel_base (rtx *x)
 /* We keep a list of constants which we have to add to internal
    constant tables in the middle of large functions.  */
 
-#define NR_C_MODES 31
+#define NR_C_MODES 32
 machine_mode constant_modes[NR_C_MODES] =
 {
   TFmode, TImode, TDmode,
-  V16QImode, V8HImode, V4SImode, V2DImode, V4SFmode, V2DFmode, V1TFmode,
+  V16QImode, V8HImode, V4SImode, V2DImode, V1TImode,
+  V4SFmode, V2DFmode, V1TFmode,
   DFmode, DImode, DDmode,
   V8QImode, V4HImode, V2SImode, V1DImode, V2SFmode, V1DFmode,
   SFmode, SImode, SDmode,
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 35ada1371ff..7e20d2b6986 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -870,11 +870,11 @@
 ; vec_mladd -> vec_vmal
 ; vmalb, vmalh, vmalf, vmalg
 (define_insn "vec_vmal<mode>"
-  [(set (match_operand:VI_HW 0 "register_operand" "=v")
-	(unspec:VI_HW [(match_operand:VI_HW 1 "register_operand" "v")
-		       (match_operand:VI_HW 2 "register_operand" "v")
-		       (match_operand:VI_HW 3 "register_operand" "v")]
-		      UNSPEC_VEC_VMAL))]
+  [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v")
+	(unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v")
+			   (match_operand:VI_HW_QHS 2 "register_operand" "v")
+			   (match_operand:VI_HW_QHS 3 "register_operand" "v")]
+			  UNSPEC_VEC_VMAL))]
   "TARGET_VX"
   "vmal<bhfgq><w>\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
@@ -883,22 +883,22 @@
 
 ; vmahb; vmahh, vmahf, vmahg
 (define_insn "vec_vmah<mode>"
-  [(set (match_operand:VI_HW 0 "register_operand" "=v")
-	(unspec:VI_HW [(match_operand:VI_HW 1 "register_operand" "v")
-		       (match_operand:VI_HW 2 "register_operand" "v")
-		       (match_operand:VI_HW 3 "register_operand" "v")]
-		      UNSPEC_VEC_VMAH))]
+  [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v")
+	(unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v")
+			   (match_operand:VI_HW_QHS 2 "register_operand" "v")
+			   (match_operand:VI_HW_QHS 3 "register_operand" "v")]
+			  UNSPEC_VEC_VMAH))]
   "TARGET_VX"
   "vmah<bhfgq>\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
 ; vmalhb; vmalhh, vmalhf, vmalhg
 (define_insn "vec_vmalh<mode>"
-  [(set (match_operand:VI_HW 0 "register_operand" "=v")
-	(unspec:VI_HW [(match_operand:VI_HW 1 "register_operand" "v")
-		       (match_operand:VI_HW 2 "register_operand" "v")
-		       (match_operand:VI_HW 3 "register_operand" "v")]
-		      UNSPEC_VEC_VMALH))]
+  [(set (match_operand:VI_HW_QHS 0 "register_operand" "=v")
+	(unspec:VI_HW_QHS [(match_operand:VI_HW_QHS 1 "register_operand" "v")
+			   (match_operand:VI_HW_QHS 2 "register_operand" "v")
+			   (match_operand:VI_HW_QHS 3 "register_operand" "v")]
+			  UNSPEC_VEC_VMALH))]
   "TARGET_VX"
   "vmalh<bhfgq>\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 1442b7fc790..25149a60ff7 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -14016,6 +14016,9 @@ sh_extending_set_of_reg::use_as_extended_reg (rtx_insn* use_at_insn) const
   else
     {
       rtx extension_dst = XEXP (set_rtx, 0);
+      if (GET_MODE (extension_dst) != SImode)
+	extension_dst = simplify_gen_subreg (SImode, extension_dst,
+					     GET_MODE (extension_dst), 0);
       if (modified_between_p (extension_dst, insn, use_at_insn))
 	{
 	  if (dump_file)
diff --git a/gcc/configure b/gcc/configure
index 0d313831aac..846c996342f 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -10926,7 +10926,7 @@ _ACEOF
 
 for ac_func in getenv atol atoll asprintf sbrk abort atof getcwd getwd \
 	madvise stpcpy strnlen strsignal strverscmp \
-	strtol strtoul strtoll strtoull \
+	strtol strtoul strtoll strtoull setenv unsetenv \
 	errno snprintf vsnprintf vasprintf malloc realloc calloc \
 	free getopt clock getpagesize ffs clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
 do
@@ -28625,6 +28625,29 @@ rm -f core conftest.err conftest.$ac_objext \
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_has_isl_options_set_schedule_serialize_sccs" >&5
 $as_echo "$ac_has_isl_options_set_schedule_serialize_sccs" >&6; }
 
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking Checking for isl_ctx_get_max_operations" >&5
+$as_echo_n "checking Checking for isl_ctx_get_max_operations... " >&6; }
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <isl/ctx.h>
+int
+main ()
+{
+isl_ctx_get_max_operations (isl_ctx_alloc ());
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_link "$LINENO"; then :
+  ac_has_isl_ctx_get_max_operations=yes
+else
+  ac_has_isl_ctx_get_max_operations=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_has_isl_ctx_get_max_operations" >&5
+$as_echo "$ac_has_isl_ctx_get_max_operations" >&6; }
+
   LIBS="$saved_LIBS"
   CXXFLAGS="$saved_CXXFLAGS"
 
@@ -28639,6 +28662,11 @@ $as_echo "#define HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE 1" >>confdefs.h
 $as_echo "#define HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS 1" >>confdefs.h
 
   fi
+  if test x"$ac_has_isl_ctx_get_max_operations" = x"yes"; then
+
+$as_echo "#define HAVE_ISL_CTX_MAX_OPERATIONS 1" >>confdefs.h
+
+  fi
 fi
 
 # Check for plugin support
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 846651d01f9..34c43d54228 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1247,7 +1247,7 @@ AC_CHECK_DECLS([basename(const char*), strstr(const char*,const char*)], , ,[
 
 gcc_AC_CHECK_DECLS(getenv atol atoll asprintf sbrk abort atof getcwd getwd \
 	madvise stpcpy strnlen strsignal strverscmp \
-	strtol strtoul strtoll strtoull \
+	strtol strtoul strtoll strtoull setenv unsetenv \
 	errno snprintf vsnprintf vasprintf malloc realloc calloc \
 	free getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[
 #include "ansidecl.h"
@@ -5790,6 +5790,13 @@ if test "x${ISLLIBS}" != "x" ; then
               [ac_has_isl_options_set_schedule_serialize_sccs=no])
   AC_MSG_RESULT($ac_has_isl_options_set_schedule_serialize_sccs)
 
+  AC_MSG_CHECKING([Checking for isl_ctx_get_max_operations])
+  AC_TRY_LINK([#include <isl/ctx.h>],
+              [isl_ctx_get_max_operations (isl_ctx_alloc ());],
+              [ac_has_isl_ctx_get_max_operations=yes],
+              [ac_has_isl_ctx_get_max_operations=no])
+  AC_MSG_RESULT($ac_has_isl_ctx_get_max_operations)
+
   LIBS="$saved_LIBS"
   CXXFLAGS="$saved_CXXFLAGS"
 
@@ -5802,6 +5809,10 @@ if test "x${ISLLIBS}" != "x" ; then
      AC_DEFINE(HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS, 1,
                [Define if isl_options_set_schedule_serialize_sccs exists.])
   fi
+  if test x"$ac_has_isl_ctx_get_max_operations" = x"yes"; then
+     AC_DEFINE(HAVE_ISL_CTX_MAX_OPERATIONS, 1,
+               [Define if isl_ctx_get_max_operations exists.])
+  fi
 fi
 
 GCC_ENABLE_PLUGINS
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 477bb209388..a9952fcae75 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,69 @@
+2015-09-10  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/67318
+	* parser.c (cp_parser_parameter_declaration): Consume the ellipsis
+	and set template_parameter_pack_p also when the type is null.
+
+2015-09-09  Mark Wielaard  <mjw@redhat.com>
+
+	* typeck.c (cp_build_binary_op): Check and warn when nonnull arg
+	parm against NULL.
+
+2015-09-10  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c++/67522
+	* semantics.c (handle_omp_array_sections_1): Only run
+	type_dependent_expression_p on VAR_DECL/PARM_DECLs.
+	(finish_omp_clauses) <case OMP_CLAUSE_LINEAR>: Likewise.
+	Don't adjust OMP_CLAUSE_LINEAR_STEP if OMP_CLAUSE_DECL
+	is not a VAR_DECL/PARM_DECL.
+
+	PR c++/67511
+	* semantics.c (handle_omp_for_class_iterator): Don't wrap
+	error_mark_node into a NOP_EXPR to void_type_node.
+
+2015-09-09  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/53184
+	* decl2.c (constrain_class_visibility): Use Wsubobject-linkage.
+
+2015-09-09  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c++/67504
+	* parser.c (cp_parser_omp_clause_collapse): Test tree_fits_shwi_p
+	before INTEGRAL_TYPE_P test.
+
+2015-09-08  Jason Merrill  <jason@redhat.com>
+
+	PR c++/67041
+	* pt.c (tsubst_copy_and_build): Handle variables like functions.
+
+2015-09-08  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/67369
+	* pt.c (tsubst_copy, [case FUNCTION_DECL]): Do not call tsubst
+	if the first argument isn't a template.
+
+2015-09-03  Martin Sebor  <msebor@redhat.com>
+
+	PR c/66516
+	* cp-tree.h (mark_rvalue_use, decay_conversion): Add new
+	argument(s).
+	* expr.c (mark_rvalue_use): Use new argument.
+	* call.c (build_addr_func): Call decay_conversion with new
+	argument.
+	* pt.c (convert_template_argument): Call reject_gcc_builtin.
+	* typeck.c (decay_conversion): Use new argument.
+	(c_decl_implicit): Define.
+
+2015-09-02  Balaji V. Iyer  <balaji.v.iyer@intel.com>
+
+	PR middle-end/60586
+	* cp-gimplify.c (cilk_cp_gimplify_call_params_in_spawned_fn): New
+	function.
+	(cp_gimplify_expr): Added a call to the function
+	cilk_cp_gimplify_call_params_in_spawned_fn.
+
 2015-09-01  Paolo Carlini  <paolo.carlini@oracle.com>
 
 	PR c++/61753
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 909ac990189..bf9f68fb638 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -289,7 +289,7 @@ build_addr_func (tree function, tsubst_flags_t complain)
       function = build_address (function);
     }
   else
-    function = decay_conversion (function, complain);
+    function = decay_conversion (function, complain, /*reject_builtin=*/false);
 
   return function;
 }
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index c36d3399133..5ab060431a3 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -95,6 +95,25 @@ finish_bc_block (tree *block, enum bc_t bc, tree label)
   DECL_CHAIN (label) = NULL_TREE;
 }
 
+/* This function is a wrapper for cilk_gimplify_call_params_in_spawned_fn.
+   *EXPR_P can be a CALL_EXPR, INIT_EXPR, MODIFY_EXPR, AGGR_INIT_EXPR or
+   TARGET_EXPR.  *PRE_P and *POST_P are gimple sequences from the caller
+   of gimplify_cilk_spawn.  */
+
+static void
+cilk_cp_gimplify_call_params_in_spawned_fn (tree *expr_p, gimple_seq *pre_p,
+					    gimple_seq *post_p)
+{
+  int ii = 0;
+
+  cilk_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);  
+  if (TREE_CODE (*expr_p) == AGGR_INIT_EXPR)
+    for (ii = 0; ii < aggr_init_expr_nargs (*expr_p); ii++)
+      gimplify_expr (&AGGR_INIT_EXPR_ARG (*expr_p, ii), pre_p, post_p,
+		     is_gimple_reg, fb_rvalue);
+}
+
+
 /* Get the LABEL_EXPR to represent a break or continue statement
    in the current block scope.  BC indicates which.  */
 
@@ -603,7 +622,10 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
       if (fn_contains_cilk_spawn_p (cfun)
 	  && cilk_detect_spawn_and_unwrap (expr_p)
 	  && !seen_error ())
-	return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	{
+	  cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
+	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	}
       cp_gimplify_init_expr (expr_p);
       if (TREE_CODE (*expr_p) != INIT_EXPR)
 	return GS_OK;
@@ -614,8 +636,10 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 	if (fn_contains_cilk_spawn_p (cfun)
 	    && cilk_detect_spawn_and_unwrap (expr_p)
 	    && !seen_error ())
-	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
-
+	  {
+	    cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
+	    return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	  }
 	/* If the back end isn't clever enough to know that the lhs and rhs
 	   types are the same, add an explicit conversion.  */
 	tree op0 = TREE_OPERAND (*expr_p, 0);
@@ -715,14 +739,18 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 
       /* If errors are seen, then just process it as a CALL_EXPR.  */
       if (!seen_error ())
-	return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
-      
+	{
+	  cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
+	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	}
     case CALL_EXPR:
       if (fn_contains_cilk_spawn_p (cfun)
 	  && cilk_detect_spawn_and_unwrap (expr_p)
 	  && !seen_error ())
-	return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
-
+	{
+	  cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
+	  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
+	}
       /* DR 1030 says that we need to evaluate the elements of an
 	 initializer-list in forward order even when it's used as arguments to
 	 a constructor.  So if the target wants to evaluate them in reverse
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 529eeeca9c0..6bd0b1efcff 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5788,7 +5788,9 @@ extern tree create_try_catch_expr               (tree, tree);
 
 /* in expr.c */
 extern tree cplus_expand_constant		(tree);
-extern tree mark_rvalue_use			(tree);
+extern tree mark_rvalue_use			(tree,
+                                                 location_t = UNKNOWN_LOCATION,
+                                                 bool = true);
 extern tree mark_lvalue_use			(tree);
 extern tree mark_type_use			(tree);
 extern void mark_exp_read			(tree);
@@ -6462,7 +6464,9 @@ extern tree cxx_alignas_expr                    (tree);
 extern tree cxx_sizeof_nowarn                   (tree);
 extern tree is_bitfield_expr_with_lowered_type  (const_tree);
 extern tree unlowered_expr_type                 (const_tree);
-extern tree decay_conversion			(tree, tsubst_flags_t);
+extern tree decay_conversion			(tree,
+                                                 tsubst_flags_t,
+                                                 bool = true);
 extern tree build_class_member_access_expr      (tree, tree, tree, bool,
 						 tsubst_flags_t);
 extern tree finish_class_member_access_expr     (tree, tree, bool, 
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 74ba380c44d..6c1f0842331 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2564,10 +2564,25 @@ constrain_class_visibility (tree type)
 
 	if (subvis == VISIBILITY_ANON)
 	  {
-	    if (!in_main_input_context ())
-	      warning (0, "\
+	    if (!in_main_input_context())
+	      {
+		tree nlt = no_linkage_check (ftype, /*relaxed_p=*/false);
+		if (nlt)
+		  {
+		    if (same_type_p (TREE_TYPE (t), nlt))
+		      warning (OPT_Wsubobject_linkage, "\
+%qT has a field %qD whose type has no linkage",
+			       type, t);
+		    else
+		      warning (OPT_Wsubobject_linkage, "\
+%qT has a field %qD whose type depends on the type %qT which has no linkage",
+			       type, t, nlt);
+		  }
+		else
+		  warning (OPT_Wsubobject_linkage, "\
 %qT has a field %qD whose type uses the anonymous namespace",
-		       type, t);
+			   type, t);
+	      }
 	  }
 	else if (MAYBE_CLASS_TYPE_P (ftype)
 		 && vis < VISIBILITY_HIDDEN
@@ -2585,9 +2600,24 @@ constrain_class_visibility (tree type)
       if (subvis == VISIBILITY_ANON)
         {
 	  if (!in_main_input_context())
-	    warning (0, "\
+	    {
+	      tree nlt = no_linkage_check (TREE_TYPE (t), /*relaxed_p=*/false);
+	      if (nlt)
+		{
+		  if (same_type_p (TREE_TYPE (t), nlt))
+		    warning (OPT_Wsubobject_linkage, "\
+%qT has a base %qT whose type has no linkage",
+			     type, TREE_TYPE (t));
+		  else
+		    warning (OPT_Wsubobject_linkage, "\
+%qT has a base %qT whose type depends on the type %qT which has no linkage",
+			     type, TREE_TYPE (t), nlt);
+		}
+	      else
+		warning (OPT_Wsubobject_linkage, "\
 %qT has a base %qT whose type uses the anonymous namespace",
-		     type, TREE_TYPE (t));
+			 type, TREE_TYPE (t));
+	    }
 	}
       else if (vis < VISIBILITY_HIDDEN
 	       && subvis >= VISIBILITY_HIDDEN)
diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c
index 6d2d658c7fa..71dec3da033 100644
--- a/gcc/cp/expr.c
+++ b/gcc/cp/expr.c
@@ -91,18 +91,24 @@ cplus_expand_constant (tree cst)
   return cst;
 }
 
-/* Called whenever an expression is used
-   in a rvalue context.  */
-
+/* Called whenever the expression EXPR is used in an rvalue context.
+   When REJECT_BUILTIN is true the expression is checked to make sure
+   it doesn't make it possible to obtain the address of a GCC built-in
+   function with no library fallback (or any of its bits, such as in
+   a conversion to bool).  */
 tree
-mark_rvalue_use (tree expr)
+mark_rvalue_use (tree expr,
+		 location_t loc /* = UNKNOWN_LOCATION */,
+		 bool reject_builtin /* = true */)
 {
+  if (reject_builtin && reject_gcc_builtin (expr, loc))
+    return error_mark_node;
+
   mark_exp_read (expr);
   return expr;
 }
 
-/* Called whenever an expression is used
-   in a lvalue context.  */
+/* Called whenever an expression is used in an lvalue context.  */
 
 tree
 mark_lvalue_use (tree expr)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a0b249bbf5b..2e8e34eaf5e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19613,11 +19613,12 @@ cp_parser_parameter_declaration (cp_parser *parser,
 	}
     }
 
-  /* If the next token is an ellipsis, and we have not seen a
-     declarator name, and the type of the declarator contains parameter
-     packs but it is not a TYPE_PACK_EXPANSION, then we actually have
-     a parameter pack expansion expression. Otherwise, leave the
-     ellipsis for a C-style variadic function. */
+  /* If the next token is an ellipsis, and we have not seen a declarator
+     name, and if either the type of the declarator contains parameter
+     packs but it is not a TYPE_PACK_EXPANSION or is null (this happens
+     for, eg, abbreviated integral type names), then we actually have a
+     parameter pack expansion expression. Otherwise, leave the ellipsis
+     for a C-style variadic function. */
   token = cp_lexer_peek_token (parser->lexer);
   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
     {
@@ -19626,11 +19627,12 @@ cp_parser_parameter_declaration (cp_parser *parser,
       if (type && DECL_P (type))
         type = TREE_TYPE (type);
 
-      if (type
-	  && TREE_CODE (type) != TYPE_PACK_EXPANSION
-	  && declarator_can_be_parameter_pack (declarator)
-          && (template_parm_p || uses_parameter_packs (type)))
-        {
+      if (((type
+	    && TREE_CODE (type) != TYPE_PACK_EXPANSION
+	    && (template_parm_p || uses_parameter_packs (type)))
+	   || (!type && template_parm_p))
+	  && declarator_can_be_parameter_pack (declarator))
+	{
 	  /* Consume the `...'. */
 	  cp_lexer_consume_token (parser->lexer);
 	  maybe_warn_variadic_templates ();
@@ -29228,8 +29230,8 @@ cp_parser_omp_clause_collapse (cp_parser *parser, tree list, location_t location
   if (num == error_mark_node)
     return list;
   num = fold_non_dependent_expr (num);
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (num))
-      || !tree_fits_shwi_p (num)
+  if (!tree_fits_shwi_p (num)
+      || !INTEGRAL_TYPE_P (TREE_TYPE (num))
       || (n = tree_to_shwi (num)) <= 0
       || (int) n != n)
     {
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fb7b9d21056..05e6d83a396 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7199,6 +7199,18 @@ convert_template_argument (tree parm,
       else if (val == error_mark_node && (complain & tf_error))
 	error ("could not convert template argument %qE to %qT",  orig_arg, t);
 
+      if (INDIRECT_REF_P (val))
+        {
+          /* Reject template arguments that are references to built-in
+             functions with no library fallbacks.  */
+          const_tree inner = TREE_OPERAND (val, 0);
+          if (TREE_CODE (TREE_TYPE (inner)) == REFERENCE_TYPE
+              && TREE_CODE (TREE_TYPE (TREE_TYPE (inner))) == FUNCTION_TYPE
+              && TREE_CODE (TREE_TYPE (inner)) == REFERENCE_TYPE
+              && reject_gcc_builtin (TREE_OPERAND (inner, 0)))
+              return error_mark_node;
+        }
+
       if (TREE_CODE (val) == SCOPE_REF)
 	{
 	  /* Strip typedefs from the SCOPE_REF.  */
@@ -13587,8 +13599,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 	      if (r)
 		{
 		  /* Make sure that the one we found is the one we want.  */
-		  tree ctx = tsubst (DECL_CONTEXT (t), args,
-				     complain, in_decl);
+		  tree ctx = DECL_CONTEXT (t);
+		  if (DECL_LANG_SPECIFIC (ctx) && DECL_TEMPLATE_INFO (ctx))
+		    ctx = tsubst (ctx, args, complain, in_decl);
 		  if (ctx != DECL_CONTEXT (r))
 		    r = NULL_TREE;
 		}
@@ -16308,15 +16321,14 @@ tsubst_copy_and_build (tree t,
 	LAMBDA_EXPR_MUTABLE_P (r) = LAMBDA_EXPR_MUTABLE_P (t);
 	LAMBDA_EXPR_DISCRIMINATOR (r)
 	  = (LAMBDA_EXPR_DISCRIMINATOR (t));
-	/* For a function scope, we want to use tsubst so that we don't
-	   complain about referring to an auto function before its return
-	   type has been deduced.  Otherwise, we want to use tsubst_copy so
-	   that we look up the existing field/parameter/variable rather
-	   than build a new one.  */
 	tree scope = LAMBDA_EXPR_EXTRA_SCOPE (t);
-	if (scope && TREE_CODE (scope) == FUNCTION_DECL)
+	if (!scope)
+	  /* No substitution needed.  */;
+	else if (VAR_OR_FUNCTION_DECL_P (scope))
+	  /* For a function or variable scope, we want to use tsubst so that we
+	     don't complain about referring to an auto before deduction.  */
 	  scope = tsubst (scope, args, complain, in_decl);
-	else if (scope && TREE_CODE (scope) == PARM_DECL)
+	else if (TREE_CODE (scope) == PARM_DECL)
 	  {
 	    /* Look up the parameter we want directly, as tsubst_copy
 	       doesn't do what we need.  */
@@ -16329,8 +16341,12 @@ tsubst_copy_and_build (tree t,
 	    if (DECL_CONTEXT (scope) == NULL_TREE)
 	      DECL_CONTEXT (scope) = fn;
 	  }
-	else
+	else if (TREE_CODE (scope) == FIELD_DECL)
+	  /* For a field, use tsubst_copy so that we look up the existing field
+	     rather than build a new one.  */
 	  scope = RECUR (scope);
+	else
+	  gcc_unreachable ();
 	LAMBDA_EXPR_EXTRA_SCOPE (r) = scope;
 	LAMBDA_EXPR_RETURN_TYPE (r)
 	  = tsubst (LAMBDA_EXPR_RETURN_TYPE (t), args, complain, in_decl);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 7bbae06f717..618b54ecba2 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -4294,8 +4294,6 @@ handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
     {
       if (error_operand_p (t))
 	return error_mark_node;
-      if (type_dependent_expression_p (t))
-	return NULL_TREE;
       if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
 	{
 	  if (processing_template_decl)
@@ -4318,6 +4316,8 @@ handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
 		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	  return error_mark_node;
 	}
+      if (type_dependent_expression_p (t))
+	return NULL_TREE;
       t = convert_from_reference (t);
       return t;
     }
@@ -5332,7 +5332,8 @@ finish_omp_clauses (tree clauses)
 	  goto check_dup_generic;
 	case OMP_CLAUSE_LINEAR:
 	  t = OMP_CLAUSE_DECL (c);
-	  if (!type_dependent_expression_p (t)
+	  if ((VAR_P (t) || TREE_CODE (t) == PARM_DECL)
+	      && !type_dependent_expression_p (t)
 	      && !INTEGRAL_TYPE_P (TREE_TYPE (t))
 	      && TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE)
 	    {
@@ -5359,7 +5360,9 @@ finish_omp_clauses (tree clauses)
 	  else
 	    {
 	      t = mark_rvalue_use (t);
-	      if (!processing_template_decl)
+	      if (!processing_template_decl
+		  && (VAR_P (OMP_CLAUSE_DECL (c))
+		      || TREE_CODE (OMP_CLAUSE_DECL (c)) == PARM_DECL))
 		{
 		  if (TREE_CODE (OMP_CLAUSE_DECL (c)) == PARM_DECL)
 		    t = maybe_constant_value (t);
@@ -6453,7 +6456,8 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
   iter_init = build_x_modify_expr (elocus,
 				   iter, PLUS_EXPR, iter_init,
 				   tf_warning_or_error);
-  iter_init = build1 (NOP_EXPR, void_type_node, iter_init);
+  if (iter_init != error_mark_node)
+    iter_init = build1 (NOP_EXPR, void_type_node, iter_init);
   finish_expr_stmt (iter_init);
   finish_expr_stmt (build_x_modify_expr (elocus,
 					 last, NOP_EXPR, decl,
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 83fd34ca80a..482e42c819b 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -1911,7 +1911,9 @@ unlowered_expr_type (const_tree exp)
    that the return value is no longer an lvalue.  */
 
 tree
-decay_conversion (tree exp, tsubst_flags_t complain)
+decay_conversion (tree exp,
+		  tsubst_flags_t complain,
+		  bool reject_builtin /* = true */)
 {
   tree type;
   enum tree_code code;
@@ -1921,7 +1923,7 @@ decay_conversion (tree exp, tsubst_flags_t complain)
   if (type == error_mark_node)
     return error_mark_node;
 
-  exp = mark_rvalue_use (exp);
+  exp = mark_rvalue_use (exp, loc, reject_builtin);
 
   exp = resolve_nondeduced_context (exp);
   if (type_unknown_p (exp))
@@ -4436,6 +4438,11 @@ cp_build_binary_op (location_t location,
 	       || (code0 == POINTER_TYPE
 		   && TYPE_PTR_P (type1) && integer_zerop (op1)))
 	{
+	  if (warn_nonnull
+	      && TREE_CODE (op0) == PARM_DECL && nonnull_arg_p (op0))
+	    warning_at (location, OPT_Wnonnull,
+			"nonnull argument %qD compared to NULL", op0);
+
 	  if (TYPE_PTR_P (type1))
 	    result_type = composite_pointer_type (type0, type1, op0, op1,
 						  CPO_COMPARISON, complain);
@@ -4475,6 +4482,11 @@ cp_build_binary_op (location_t location,
 	       || (code1 == POINTER_TYPE
 		   && TYPE_PTR_P (type0) && integer_zerop (op0)))
 	{
+	  if (warn_nonnull
+	      && TREE_CODE (op1) == PARM_DECL && nonnull_arg_p (op1))
+	    warning_at (location, OPT_Wnonnull,
+			"nonnull argument %qD compared to NULL", op1);
+
 	  if (TYPE_PTR_P (type0))
 	    result_type = composite_pointer_type (type0, type1, op0, op1,
 						  CPO_COMPARISON, complain);
@@ -9397,3 +9409,12 @@ check_literal_operator_args (const_tree decl,
       return true;
     }
 }
+
+/* Always returns false since unlike C90, C++ has no concept of implicit
+   function declarations.  */
+
+bool
+c_decl_implicit (const_tree)
+{
+  return false;
+}
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index dba8b4382b2..23e6a76b8a8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10316,14 +10316,22 @@ recommend general use of these functions.
 
 The remaining functions are provided for optimization purposes.
 
+With the exception of built-ins that have library equivalents such as
+the standard C library functions discussed below, or that expand to
+library calls, GCC built-in functions are always expanded inline and
+thus do not have corresponding entry points and their address cannot
+be obtained.  Attempting to use them in an expression other than
+a function call results in a compile-time error.
+
 @opindex fno-builtin
 GCC includes built-in versions of many of the functions in the standard
-C library.  The versions prefixed with @code{__builtin_} are always
-treated as having the same meaning as the C library function even if you
-specify the @option{-fno-builtin} option.  (@pxref{C Dialect Options})
-Many of these functions are only optimized in certain cases; if they are
-not optimized in a particular case, a call to the library function is
-emitted.
+C library.  These functions come in two forms: one whose names start with
+the @code{__builtin_} prefix, and the other without.  Both forms have the
+same type (including prototype), the same address (when their address is
+taken), and the same meaning as the C library functions even if you specify
+the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
+functions are only optimized in certain cases; if they are not optimized in
+a particular case, a call to the library function is emitted.
 
 @opindex ansi
 @opindex std
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 24e7eed1380..c18dbdd9a74 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -565,7 +565,10 @@ components of the binutils you intend to build alongside the compiler
 @file{opcodes}, @dots{}) to the directory containing the GCC sources.
 
 Likewise the GMP, MPFR and MPC libraries can be automatically built
-together with GCC.  Unpack the GMP, MPFR and/or MPC source
+together with GCC.  You may simply run the
+./contrib/download_prerequisites script in the GCC source directory
+to set up everything.
+Otherwise unpack the GMP, MPFR and/or MPC source
 distributions in the directory containing the GCC sources and rename
 their directories to @file{gmp}, @file{mpfr} and @file{mpc},
 respectively (or use symbolic links with the same name).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7726368da60..25775cf9305 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -282,7 +282,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wstrict-aliasing=n @gol -Wstrict-overflow -Wstrict-overflow=@var{n} @gol
 -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
 -Wsuggest-final-types @gol -Wsuggest-final-methods -Wsuggest-override @gol
--Wmissing-format-attribute @gol
+-Wmissing-format-attribute -Wsubobject-linkage @gol
 -Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
 -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs @gol
 -Wtype-limits  -Wundef @gol
@@ -3724,6 +3724,9 @@ formats that may yield only a two-digit year.
 Warn about passing a null pointer for arguments marked as
 requiring a non-null value by the @code{nonnull} function attribute.
 
+Also warns when comparing an argument marked with the @code{nonnull}
+function attribute against null inside the function.
+
 @option{-Wnonnull} is included in @option{-Wall} and @option{-Wformat}.  It
 can be disabled with the @option{-Wno-nonnull} option.
 
@@ -4927,6 +4930,13 @@ types. @option{-Wconversion-null} is enabled by default.
 Warn when a literal '0' is used as null pointer constant.  This can
 be useful to facilitate the conversion to @code{nullptr} in C++11.
 
+@item -Wsubobject-linkage @r{(C++ and Objective-C++ only)}
+@opindex Wsubobject-linkage
+@opindex Wno-subobject-linkage
+Warn if a class type has a base or a field whose type uses the anonymous
+namespace or depends on a type with no linkage.  This warning is
+enabled by default.
+
 @item -Wdate-time
 @opindex Wdate-time
 @opindex Wno-date-time
@@ -11006,6 +11016,10 @@ path.  The default is 10.
 Maximum number of new jump thread paths to create for a finite state
 automaton.  The default is 50.
 
+@item parloops-chunk-size
+Chunk size of omp schedule for loops parallelized by parloops.  The default
+is 0.
+
 @end table
 @end table
 
@@ -23697,6 +23711,11 @@ option is used to control the temporary stack reuse optimization.
 @opindex ftrapv
 This option generates traps for signed overflow on addition, subtraction,
 multiplication operations.
+The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
+@option{-ftrapv} @option{-fwrapv} on the command-line results in
+@option{-fwrapv} being effective.  Note that only active options override, so
+using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
+results in @option{-ftrapv} being effective.
 
 @item -fwrapv
 @opindex fwrapv
@@ -23705,6 +23724,11 @@ overflow of addition, subtraction and multiplication wraps around
 using twos-complement representation.  This flag enables some optimizations
 and disables others.  This option is enabled by default for the Java
 front end, as required by the Java language specification.
+The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
+@option{-ftrapv} @option{-fwrapv} on the command-line results in
+@option{-fwrapv} being effective.  Note that only active options override, so
+using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
+results in @option{-ftrapv} being effective.
 
 @item -fexceptions
 @opindex fexceptions
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 7aa9c9dfa03..5dc7c81bdda 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1549,7 +1549,12 @@ options.  Some multilibs may be incompatible with these options.
 @item arm_neon_fp16_ok
 @anchor{arm_neon_fp16_ok}
 ARM Target supports @code{-mfpu=neon-fp16 -mfloat-abi=softfp} or compatible
-options.  Some multilibs may be incompatible with these options.
+options, including @code{-mfp16-format=ieee} if necessary to obtain the
+@code{__fp16} type.  Some multilibs may be incompatible with these options.
+
+@item arm_neon_fp16_hw
+Test system supports executing Neon half-precision float instructions.
+(Implies previous.)
 
 @item arm_thumb1_ok
 ARM target generates Thumb-1 code for @code{-mthumb}.
@@ -2035,7 +2040,7 @@ keyword}.
 @item arm_neon_fp16
 NEON and half-precision floating point support.  Only ARM targets
 support this feature, and only then in certain modes; see
-the @ref{arm_neon_ok,,arm_neon_fp16_ok effective target keyword}.
+the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 
 @item arm_vfp3
 arm vfp3 floating point support; see
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f5a1f84793e..d548d96b234 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5720,6 +5720,14 @@ The default is @code{NULL_TREE} which means to not vectorize gather
 loads.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_SCATTER (const_tree @var{vectype}, const_tree @var{index_type}, int @var{scale})
+Target builtin that implements vector scatter operation.  @var{vectype}
+is the vector type of the store and @var{index_type} is scalar type of
+the index, scaled by @var{scale}.
+The default is @code{NULL_TREE} which means to not vectorize scatter
+stores.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9d5ac0a10e4..9bef4a59bed 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4239,6 +4239,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_GATHER
 
+@hook TARGET_VECTORIZE_BUILTIN_SCATTER
+
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index d9d3063be1d..b6ab869e0d1 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -25127,6 +25127,62 @@ optimize_location_lists (dw_die_ref die)
   optimize_location_lists_1 (die, &htab);
 }
 
+/* Traverse the limbo die list, and add parent/child links.  The only
+   dies without parents that should be here are concrete instances of
+   inline functions, and the comp_unit_die.  We can ignore the comp_unit_die.
+   For concrete instances, we can get the parent die from the abstract
+   instance.  */
+
+static void
+flush_limbo_die_list (void)
+{
+  limbo_die_node *node, *next_node;
+
+  for (node = limbo_die_list; node; node = next_node)
+    {
+      dw_die_ref die = node->die;
+      next_node = node->next;
+
+      if (die->die_parent == NULL)
+	{
+	  dw_die_ref origin = get_AT_ref (die, DW_AT_abstract_origin);
+
+	  if (origin && origin->die_parent)
+	    add_child_die (origin->die_parent, die);
+	  else if (is_cu_die (die))
+	    ;
+	  else if (seen_error ())
+	    /* It's OK to be confused by errors in the input.  */
+	    add_child_die (comp_unit_die (), die);
+	  else
+	    {
+	      /* In certain situations, the lexical block containing a
+		 nested function can be optimized away, which results
+		 in the nested function die being orphaned.  Likewise
+		 with the return type of that nested function.  Force
+		 this to be a child of the containing function.
+
+		 It may happen that even the containing function got fully
+		 inlined and optimized out.  In that case we are lost and
+		 assign the empty child.  This should not be big issue as
+		 the function is likely unreachable too.  */
+	      gcc_assert (node->created_for);
+
+	      if (DECL_P (node->created_for))
+		origin = get_context_die (DECL_CONTEXT (node->created_for));
+	      else if (TYPE_P (node->created_for))
+		origin = scope_die_for (node->created_for, comp_unit_die ());
+	      else
+		origin = comp_unit_die ();
+
+	      add_child_die (origin, die);
+	    }
+	}
+    }
+
+  limbo_die_list = NULL;
+}
+
 /* Output stuff that dwarf requires at the end of every file,
    and generate the DWARF-2 debugging info.  */
 
@@ -25137,7 +25193,11 @@ dwarf2out_finish (const char *filename)
   dw_die_ref main_comp_unit_die;
 
   /* Flush out any latecomers to the limbo party.  */
-  dwarf2out_early_finish ();
+  flush_limbo_die_list ();
+
+  /* We shouldn't have any symbols with delayed asm names for
+     DIEs generated after early finish.  */
+  gcc_assert (deferred_asm_name == NULL);
 
   /* PCH might result in DW_AT_producer string being restored from the
      header compilation, so always fill it with empty string initially
@@ -25483,7 +25543,7 @@ dwarf2out_finish (const char *filename)
 static void
 dwarf2out_early_finish (void)
 {
-  limbo_die_node *node, *next_node;
+  limbo_die_node *node;
 
   /* Add DW_AT_linkage_name for all deferred DIEs.  */
   for (node = deferred_asm_name; node; node = node->next)
@@ -25501,57 +25561,9 @@ dwarf2out_early_finish (void)
     }
   deferred_asm_name = NULL;
 
-  /* Traverse the limbo die list, and add parent/child links.  The only
-     dies without parents that should be here are concrete instances of
-     inline functions, and the comp_unit_die.  We can ignore the comp_unit_die.
-     For concrete instances, we can get the parent die from the abstract
-     instance.
-
-     The point here is to flush out the limbo list so that it is empty
+  /* The point here is to flush out the limbo list so that it is empty
      and we don't need to stream it for LTO.  */
-  for (node = limbo_die_list; node; node = next_node)
-    {
-      dw_die_ref die = node->die;
-      next_node = node->next;
-
-      if (die->die_parent == NULL)
-	{
-	  dw_die_ref origin = get_AT_ref (die, DW_AT_abstract_origin);
-
-	  if (origin && origin->die_parent)
-	    add_child_die (origin->die_parent, die);
-	  else if (is_cu_die (die))
-	    ;
-	  else if (seen_error ())
-	    /* It's OK to be confused by errors in the input.  */
-	    add_child_die (comp_unit_die (), die);
-	  else
-	    {
-	      /* In certain situations, the lexical block containing a
-		 nested function can be optimized away, which results
-		 in the nested function die being orphaned.  Likewise
-		 with the return type of that nested function.  Force
-		 this to be a child of the containing function.
-
-		 It may happen that even the containing function got fully
-		 inlined and optimized out.  In that case we are lost and
-		 assign the empty child.  This should not be big issue as
-		 the function is likely unreachable too.  */
-	      gcc_assert (node->created_for);
-
-	      if (DECL_P (node->created_for))
-		origin = get_context_die (DECL_CONTEXT (node->created_for));
-	      else if (TYPE_P (node->created_for))
-		origin = scope_die_for (node->created_for, comp_unit_die ());
-	      else
-		origin = comp_unit_die ();
-
-	      add_child_die (origin, die);
-	    }
-	}
-    }
-
-  limbo_die_list = NULL;
+  flush_limbo_die_list ();
 }
 
 /* Reset all state within dwarf2out.c so that we can rerun the compiler
diff --git a/gcc/expr.c b/gcc/expr.c
index ee0c1f93249..cf28f449309 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8892,7 +8892,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	    && ! unsignedp
 	    && mode == GET_MODE_WIDER_MODE (word_mode)
 	    && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)
-	    && ! have_insn_for (ASHIFT, mode)
 	    && TREE_CONSTANT (treeop1)
 	    && TREE_CODE (treeop0) == SSA_NAME)
 	  {
@@ -8908,6 +8907,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
 			>= GET_MODE_BITSIZE (word_mode)))
 		  {
+		    rtx_insn *seq, *seq_old;
 		    unsigned int high_off = subreg_highpart_offset (word_mode,
 								    mode);
 		    rtx low = lowpart_subreg (word_mode, op0, mode);
@@ -8918,6 +8918,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 					     - TREE_INT_CST_LOW (treeop1));
 		    tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
 
+		    start_sequence ();
 		    /* dest_high = src_low >> (word_size - C).  */
 		    temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
 						  rshift, dest_high, unsignedp);
@@ -8930,7 +8931,28 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 		    if (temp != dest_low)
 		      emit_move_insn (dest_low, temp);
 
+		    seq = get_insns ();
+		    end_sequence ();
 		    temp = target ;
+
+		    if (have_insn_for (ASHIFT, mode))
+		      {
+			bool speed_p = optimize_insn_for_speed_p ();
+			start_sequence ();
+			rtx ret_old = expand_variable_shift (code, mode, op0,
+							     treeop1, target,
+							     unsignedp);
+
+			seq_old = get_insns ();
+			end_sequence ();
+			if (seq_cost (seq, speed_p)
+			    >= seq_cost (seq_old, speed_p))
+			  {
+			    seq = seq_old;
+			    temp = ret_old;
+			  }
+		      }
+		      emit_insn (seq);
 		  }
 	      }
 	  }
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index d478c4dc1c2..e9366e2de6e 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7182,7 +7182,6 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
 {
   machine_mode mode = TYPE_MODE (type);
   int total_bytes = GET_MODE_SIZE (mode);
-  int byte, offset, word, words, bitpos;
   unsigned char value;
   /* There are always 32 bits in each long, no matter the size of
      the hosts long.  We handle floating point representations with
@@ -7193,16 +7192,18 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
   total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
   if (total_bytes > len || total_bytes > 24)
     return NULL_TREE;
-  words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
+  int words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
 
   memset (tmp, 0, sizeof (tmp));
-  for (bitpos = 0; bitpos < total_bytes * BITS_PER_UNIT;
+  for (int bitpos = 0; bitpos < total_bytes * BITS_PER_UNIT;
        bitpos += BITS_PER_UNIT)
     {
-      byte = (bitpos / BITS_PER_UNIT) & 3;
+      /* Both OFFSET and BYTE index within a long;
+	 bitpos indexes the whole float.  */
+      int offset, byte = (bitpos / BITS_PER_UNIT) & 3;
       if (UNITS_PER_WORD < 4)
 	{
-	  word = byte / UNITS_PER_WORD;
+	  int word = byte / UNITS_PER_WORD;
 	  if (WORDS_BIG_ENDIAN)
 	    word = (words - 1) - word;
 	  offset = word * UNITS_PER_WORD;
@@ -7212,7 +7213,16 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
 	    offset += byte % UNITS_PER_WORD;
 	}
       else
-	offset = BYTES_BIG_ENDIAN ? 3 - byte : byte;
+	{
+	  offset = byte;
+	  if (BYTES_BIG_ENDIAN)
+	    {
+	      /* Reverse bytes within each long, or within the entire float
+		 if it's smaller than a long (for HFmode).  */
+	      offset = MIN (3, total_bytes - 1) - offset;
+	      gcc_assert (offset >= 0);
+	    }
+	}
       value = ptr[offset + ((bitpos / BITS_PER_UNIT) & ~3)];
 
       tmp[bitpos / 32] |= (unsigned long)value << (bitpos & 31);
@@ -10412,32 +10422,6 @@ fold_binary_loc (location_t loc,
 
       prec = element_precision (type);
 
-      /* Transform (x >> c) << c into x & (-1<<c), or transform (x << c) >> c
-         into x & ((unsigned)-1 >> c) for unsigned types.  */
-      if (((code == LSHIFT_EXPR && TREE_CODE (arg0) == RSHIFT_EXPR)
-           || (TYPE_UNSIGNED (type)
-	       && code == RSHIFT_EXPR && TREE_CODE (arg0) == LSHIFT_EXPR))
-	  && tree_fits_uhwi_p (arg1)
-	  && tree_to_uhwi (arg1) < prec
-	  && tree_fits_uhwi_p (TREE_OPERAND (arg0, 1))
-	  && tree_to_uhwi (TREE_OPERAND (arg0, 1)) < prec)
-	{
-	  HOST_WIDE_INT low0 = tree_to_uhwi (TREE_OPERAND (arg0, 1));
-	  HOST_WIDE_INT low1 = tree_to_uhwi (arg1);
-	  tree lshift;
-	  tree arg00;
-
-	  if (low0 == low1)
-	    {
-	      arg00 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
-
-	      lshift = build_minus_one_cst (type);
-	      lshift = const_binop (code, lshift, arg1);
-
-	      return fold_build2_loc (loc, BIT_AND_EXPR, type, arg00, lshift);
-	    }
-	}
-
       /* If we have a rotate of a bit operation with the rotate count and
 	 the second operand of the bit operation both constant,
 	 permute the two operations.  */
diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index 88c1a117caa..13bb7b3fd63 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,29 @@
+2015-09-10  Steven G. Kargl  <kargl@gcc.gnu.org>
+
+	PR fortran/67526
+	* expr.c (gfc_check_init_expr): Do not dereference a NULL pointer.
+
+2015-09-10  Paul Thomas  <pault@gcc.gnu.org>
+
+	PR fortran/66993
+	* module.c (read_module): If a symtree exists and the symbol has
+	been associated in a submodule from a parent (sub)module, attach
+	the symbol to a 'unique symtree' and the new symbol to the
+	existing symtree.
+
+2015-09-04  Francois-Xavier Coudert  <fxcoudert@gcc.gnu.org>
+
+	* intrinsic.h (gfc_simplify_mvbits): Remove.
+	* simplify.c (gfc_simplify_mvbits): Remove.
+	* intrinsic.c (add_subroutines): Remove reference to
+	gfc_simplify_mvbits.
+
+2015-09-04  Manuel López-Ibáñez  <manu@gcc.gnu.org>
+
+	PR fortran/67429
+	* error.c (gfc_clear_pp_buffer): Reset last_location, otherwise
+	caret lines might be skipped when actually giving a diagnostic.
+
 2015-08-31  Francois-Xavier Coudert  <fxcoudert@gcc.gnu.org>
 
 	PR fortran/54833
diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index 7689bbd8941..3825751ddd0 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -757,6 +757,9 @@ gfc_clear_pp_buffer (output_buffer *this_buffer)
   pp->buffer = this_buffer;
   pp_clear_output_area (pp);
   pp->buffer = tmp_buffer;
+  /* We need to reset last_location, otherwise we may skip caret lines
+     when we actually give a diagnostic.  */
+  global_dc->last_location = UNKNOWN_LOCATION;
 }
 
 
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 1d6f310f28c..3a0ef4d8f55 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -2600,14 +2600,18 @@ gfc_check_init_expr (gfc_expr *e)
       break;
 
     case EXPR_SUBSTRING:
-      t = gfc_check_init_expr (e->ref->u.ss.start);
-      if (!t)
-	break;
-
-      t = gfc_check_init_expr (e->ref->u.ss.end);
-      if (t)
-	t = gfc_simplify_expr (e, 0);
+      if (e->ref)
+	{
+	  t = gfc_check_init_expr (e->ref->u.ss.start);
+	  if (!t)
+	    break;
 
+	  t = gfc_check_init_expr (e->ref->u.ss.end);
+	  if (t)
+	    t = gfc_simplify_expr (e, 0);
+	}
+      else
+	t = false;
       break;
 
     case EXPR_STRUCTURE:
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index b46a5b21b79..17410925245 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -3290,8 +3290,7 @@ add_subroutines (void)
 	      t, BT_UNKNOWN, 0, REQUIRED, INTENT_OUT);
 
   add_sym_5s ("mvbits", GFC_ISYM_MVBITS, CLASS_ELEMENTAL, BT_UNKNOWN, 0,
-	      GFC_STD_F95, gfc_check_mvbits, gfc_simplify_mvbits,
-	      gfc_resolve_mvbits,
+	      GFC_STD_F95, gfc_check_mvbits, NULL, gfc_resolve_mvbits,
 	      f, BT_INTEGER, di, REQUIRED, INTENT_IN,
 	      fp, BT_INTEGER, di, REQUIRED, INTENT_IN,
 	      ln, BT_INTEGER, di, REQUIRED, INTENT_IN,
diff --git a/gcc/fortran/intrinsic.h b/gcc/fortran/intrinsic.h
index 4e91b822b22..971cf7c9feb 100644
--- a/gcc/fortran/intrinsic.h
+++ b/gcc/fortran/intrinsic.h
@@ -345,8 +345,6 @@ gfc_expr *gfc_simplify_maxexponent (gfc_expr *);
 gfc_expr *gfc_simplify_minexponent (gfc_expr *);
 gfc_expr *gfc_simplify_mod (gfc_expr *, gfc_expr *);
 gfc_expr *gfc_simplify_modulo (gfc_expr *, gfc_expr *);
-gfc_expr *gfc_simplify_mvbits (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *,
-			       gfc_expr *);
 gfc_expr *gfc_simplify_nearest (gfc_expr *, gfc_expr *);
 gfc_expr *gfc_simplify_new_line (gfc_expr *);
 gfc_expr *gfc_simplify_nint (gfc_expr *, gfc_expr *);
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 86dca1c5382..572bd501370 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -5134,7 +5134,8 @@ read_module (void)
 
 	  st = gfc_find_symtree (gfc_current_ns->sym_root, p);
 
-	  if (st != NULL)
+	  if (st != NULL
+	      && !(st->n.sym && st->n.sym->attr.used_in_submodule))
 	    {
 	      /* Check for ambiguous symbols.  */
 	      if (check_for_ambiguous (st, info))
@@ -5144,14 +5145,23 @@ read_module (void)
 	    }
 	  else
 	    {
-	      st = gfc_find_symtree (gfc_current_ns->sym_root, name);
-
-	      /* Create a symtree node in the current namespace for this
-		 symbol.  */
-	      st = check_unique_name (p)
-		   ? gfc_get_unique_symtree (gfc_current_ns)
-		   : gfc_new_symtree (&gfc_current_ns->sym_root, p);
-	      st->ambiguous = ambiguous;
+	      if (st)
+		{
+		  /* This symbol is host associated from a module in a
+		     submodule.  Hide it with a unique symtree.  */
+		  gfc_symtree *s = gfc_get_unique_symtree (gfc_current_ns);
+		  s->n.sym = st->n.sym;
+		  st->n.sym = NULL;
+		}
+	      else
+		{
+		  /* Create a symtree node in the current namespace for this
+		     symbol.  */
+		  st = check_unique_name (p)
+		       ? gfc_get_unique_symtree (gfc_current_ns)
+		       : gfc_new_symtree (&gfc_current_ns->sym_root, p);
+		  st->ambiguous = ambiguous;
+		}
 
 	      sym = info->u.rsym.sym;
 
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 124558efa5d..6a81dc32eca 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4443,18 +4443,6 @@ gfc_simplify_modulo (gfc_expr *a, gfc_expr *p)
 }
 
 
-/* Exists for the sole purpose of consistency with other intrinsics.  */
-gfc_expr *
-gfc_simplify_mvbits (gfc_expr *f  ATTRIBUTE_UNUSED,
-		     gfc_expr *fp ATTRIBUTE_UNUSED,
-		     gfc_expr *l  ATTRIBUTE_UNUSED,
-		     gfc_expr *to ATTRIBUTE_UNUSED,
-		     gfc_expr *tp ATTRIBUTE_UNUSED)
-{
-  return NULL;
-}
-
-
 gfc_expr *
 gfc_simplify_nearest (gfc_expr *x, gfc_expr *s)
 {
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c8f27188418..10f84d47ec2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5210,7 +5210,8 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 	  if (TREE_CODE (inputv) == PREDECREMENT_EXPR
 	      || TREE_CODE (inputv) == PREINCREMENT_EXPR
 	      || TREE_CODE (inputv) == POSTDECREMENT_EXPR
-	      || TREE_CODE (inputv) == POSTINCREMENT_EXPR)
+	      || TREE_CODE (inputv) == POSTINCREMENT_EXPR
+	      || TREE_CODE (inputv) == MODIFY_EXPR)
 	    TREE_VALUE (link) = error_mark_node;
 	  tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p,
 				is_gimple_lvalue, fb_lvalue | fb_mayfail);
@@ -6213,9 +6214,12 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 		    }
 		  else
 		    break;
-		  gcc_checking_assert (splay_tree_lookup (octx->variables,
-							  (splay_tree_key)
-							  decl) == NULL);
+		  if (splay_tree_lookup (octx->variables,
+					 (splay_tree_key) decl) != NULL)
+		    {
+		      octx = NULL;
+		      break;
+		    }
 		  flags = GOVD_SEEN;
 		  if (!OMP_CLAUSE_LINEAR_NO_COPYIN (c))
 		    flags |= GOVD_FIRSTPRIVATE;
@@ -6997,7 +7001,7 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, void *)
 static enum gimplify_status
 gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 {
-  tree for_stmt, orig_for_stmt, decl, var, t;
+  tree for_stmt, orig_for_stmt, inner_for_stmt = NULL_TREE, decl, var, t;
   enum gimplify_status ret = GS_ALL_DONE;
   enum gimplify_status tret;
   gomp_for *gfor;
@@ -7040,6 +7044,19 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	  }
     }
 
+  if (OMP_FOR_INIT (for_stmt) == NULL_TREE)
+    {
+      gcc_assert (TREE_CODE (for_stmt) != OACC_LOOP);
+      inner_for_stmt = walk_tree (&OMP_FOR_BODY (for_stmt),
+				  find_combined_omp_for, NULL, NULL);
+      if (inner_for_stmt == NULL_TREE)
+	{
+	  gcc_assert (seen_error ());
+	  *expr_p = NULL_TREE;
+	  return GS_ERROR;
+	}
+    }
+
   gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
 			     simd ? ORT_SIMD : ORT_WORKSHARE);
   if (TREE_CODE (for_stmt) == OMP_DISTRIBUTE)
@@ -7075,10 +7092,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 
   if (OMP_FOR_INIT (for_stmt) == NULL_TREE)
     {
-      gcc_assert (TREE_CODE (for_stmt) != OACC_LOOP);
-      for_stmt = walk_tree (&OMP_FOR_BODY (for_stmt), find_combined_omp_for,
-			    NULL, NULL);
-      gcc_assert (for_stmt != NULL_TREE);
+      for_stmt = inner_for_stmt;
       gimplify_omp_ctxp->combined_loop = true;
     }
 
@@ -7122,13 +7136,27 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 		  OMP_CLAUSE_LINEAR_NO_COPYOUT (c) = 1;
 		  flags |= GOVD_LINEAR_LASTPRIVATE_NO_OUTER;
 		}
+	      struct gimplify_omp_ctx *outer
+		= gimplify_omp_ctxp->outer_context;
+	      if (outer && !OMP_CLAUSE_LINEAR_NO_COPYOUT (c))
+		{
+		  if (outer->region_type == ORT_WORKSHARE
+		      && outer->combined_loop)
+		    {
+		      n = splay_tree_lookup (outer->variables,
+					     (splay_tree_key)decl);
+		      if (n != NULL && (n->value & GOVD_LOCAL) != 0)
+			{
+			  OMP_CLAUSE_LINEAR_NO_COPYOUT (c) = 1;
+			  flags |= GOVD_LINEAR_LASTPRIVATE_NO_OUTER;
+			}
+		    }
+		}
+
 	      OMP_CLAUSE_DECL (c) = decl;
 	      OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (for_stmt);
 	      OMP_FOR_CLAUSES (for_stmt) = c;
-	      
 	      omp_add_variable (gimplify_omp_ctxp, decl, flags);
-	      struct gimplify_omp_ctx *outer
-		= gimplify_omp_ctxp->outer_context;
 	      if (outer && !OMP_CLAUSE_LINEAR_NO_COPYOUT (c))
 		{
 		  if (outer->region_type == ORT_WORKSHARE
@@ -7145,10 +7173,16 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 		    outer = NULL;
 		  if (outer)
 		    {
-		      omp_add_variable (outer, decl,
-					GOVD_LASTPRIVATE | GOVD_SEEN);
-		      if (outer->outer_context)
-			omp_notice_variable (outer->outer_context, decl, true);
+		      n = splay_tree_lookup (outer->variables,
+					     (splay_tree_key)decl);
+		      if (n == NULL || (n->value & GOVD_DATA_SHARE_CLASS) == 0)
+			{
+			  omp_add_variable (outer, decl,
+					    GOVD_LASTPRIVATE | GOVD_SEEN);
+			  if (outer->outer_context)
+			    omp_notice_variable (outer->outer_context, decl,
+						 true);
+			}
 		    }
 		}
 	    }
@@ -7165,9 +7199,16 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 		  if (outer->region_type == ORT_WORKSHARE
 		      && outer->combined_loop)
 		    {
-		      if (outer->outer_context
-			  && (outer->outer_context->region_type
-			      == ORT_COMBINED_PARALLEL))
+		      n = splay_tree_lookup (outer->variables,
+					     (splay_tree_key)decl);
+		      if (n != NULL && (n->value & GOVD_LOCAL) != 0)
+			{
+			  lastprivate = false;
+			  outer = NULL;
+			}
+		      else if (outer->outer_context
+			       && (outer->outer_context->region_type
+				   == ORT_COMBINED_PARALLEL))
 			outer = outer->outer_context;
 		      else if (omp_check_private (outer, decl, false))
 			outer = NULL;
@@ -7176,10 +7217,16 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 		    outer = NULL;
 		  if (outer)
 		    {
-		      omp_add_variable (outer, decl,
-					GOVD_LASTPRIVATE | GOVD_SEEN);
-		      if (outer->outer_context)
-			omp_notice_variable (outer->outer_context, decl, true);
+		      n = splay_tree_lookup (outer->variables,
+					     (splay_tree_key)decl);
+		      if (n == NULL || (n->value & GOVD_DATA_SHARE_CLASS) == 0)
+			{
+			  omp_add_variable (outer, decl,
+					    GOVD_LASTPRIVATE | GOVD_SEEN);
+			  if (outer->outer_context)
+			    omp_notice_variable (outer->outer_context, decl,
+						 true);
+			}
 		    }
 		}
 
diff --git a/gcc/go/ChangeLog b/gcc/go/ChangeLog
index 90520ded455..2fa56ab0ac0 100644
--- a/gcc/go/ChangeLog
+++ b/gcc/go/ChangeLog
@@ -1,3 +1,8 @@
+2015-09-10  Chris Manghane  <cmang@google.com>
+
+	* go-gcc.cc (Gcc_backend::type_size): Return -1 for
+	unrepresentable size.
+
 2015-08-24  Marek Polacek  <polacek@redhat.com>
 
 	PR tree-optimization/67284
diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
index cb4c2e5c73a..13143440761 100644
--- a/gcc/go/go-gcc.cc
+++ b/gcc/go/go-gcc.cc
@@ -1099,7 +1099,8 @@ Gcc_backend::type_size(Btype* btype)
   gcc_assert(tree_fits_uhwi_p (t));
   unsigned HOST_WIDE_INT val_wide = TREE_INT_CST_LOW(t);
   int64_t ret = static_cast<int64_t>(val_wide);
-  gcc_assert(ret >= 0 && static_cast<unsigned HOST_WIDE_INT>(ret) == val_wide);
+  if (ret < 0 || static_cast<unsigned HOST_WIDE_INT>(ret) != val_wide)
+    return -1;
   return ret;
 }
 
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 5fcf1bd7961..ef21b544a5f 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-a63e173b20baa1a48470dd31a1fb1f2704b37011
+aea4360ca9c37f8e929f177ae7e42593ee62aa79
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 1df9f326561..dc37cf0b01e 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -3626,8 +3626,13 @@ Unary_expression::do_flatten(Gogo* gogo, Named_object*,
       Type* ptype = this->expr_->type()->points_to();
       if (!ptype->is_void_type())
         {
-          Btype* pbtype = ptype->get_backend(gogo);
-          int64_t s = gogo->backend()->type_size(pbtype);
+          int64_t s;
+          bool ok = ptype->backend_type_size(gogo, &s);
+          if (!ok)
+            {
+              go_assert(saw_errors());
+              return Expression::make_error(this->location());
+            }
           if (s >= 4096 || this->issue_nil_check_)
             {
               Temporary_statement* temp =
@@ -4131,7 +4136,13 @@ Unary_expression::do_get_backend(Translate_context* context)
         Btype* pbtype = ptype->get_backend(gogo);
         if (!ptype->is_void_type())
 	  {
-            int64_t s = gogo->backend()->type_size(pbtype);
+            int64_t s;
+            bool ok = ptype->backend_type_size(gogo, &s);
+            if (!ok)
+              {
+                go_assert(saw_errors());
+                return gogo->backend()->error_expression();
+              }
 	    if (s >= 4096 || this->issue_nil_check_)
 	      {
                 go_assert(this->expr_->is_variable());
@@ -4523,6 +4534,12 @@ Binary_expression::eval_constant(Operator op, Numeric_constant* left_nc,
     return false;
   if (!is_shift && !right_nc->set_type(type, true, location))
     return false;
+  if (is_shift
+      && ((left_type->integer_type() == NULL
+           && !left_type->is_abstract())
+          || (right_type->integer_type() == NULL
+              && !right_type->is_abstract())))
+    return false;
 
   bool r;
   if (type->complex_type() != NULL)
@@ -4567,6 +4584,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
       if (mpz_sizeinbase(val, 2) > 0x100000)
 	{
 	  error_at(location, "constant addition overflow");
+          nc->set_invalid();
 	  mpz_set_ui(val, 1);
 	}
       break;
@@ -4575,6 +4593,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
       if (mpz_sizeinbase(val, 2) > 0x100000)
 	{
 	  error_at(location, "constant subtraction overflow");
+          nc->set_invalid();
 	  mpz_set_ui(val, 1);
 	}
       break;
@@ -4589,6 +4608,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
       if (mpz_sizeinbase(val, 2) > 0x100000)
 	{
 	  error_at(location, "constant multiplication overflow");
+          nc->set_invalid();
 	  mpz_set_ui(val, 1);
 	}
       break;
@@ -4598,6 +4618,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
       else
 	{
 	  error_at(location, "division by zero");
+          nc->set_invalid();
 	  mpz_set_ui(val, 0);
 	}
       break;
@@ -4607,6 +4628,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
       else
 	{
 	  error_at(location, "division by zero");
+          nc->set_invalid();
 	  mpz_set_ui(val, 0);
 	}
       break;
@@ -4618,6 +4640,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
 	else
 	  {
 	    error_at(location, "shift count overflow");
+            nc->set_invalid();
 	    mpz_set_ui(val, 1);
 	  }
 	break;
@@ -4629,6 +4652,7 @@ Binary_expression::eval_integer(Operator op, const Numeric_constant* left_nc,
 	if (mpz_cmp_ui(right_val, shift) != 0)
 	  {
 	    error_at(location, "shift count overflow");
+            nc->set_invalid();
 	    mpz_set_ui(val, 1);
 	  }
 	else
@@ -4723,6 +4747,7 @@ Binary_expression::eval_float(Operator op, const Numeric_constant* left_nc,
       else
 	{
 	  error_at(location, "division by zero");
+          nc->set_invalid();
 	  mpfr_set_ui(val, 0, GMP_RNDN);
 	}
       break;
@@ -4787,6 +4812,7 @@ Binary_expression::eval_complex(Operator op, const Numeric_constant* left_nc,
       if (mpc_cmp_si(right_val, 0) == 0)
 	{
 	  error_at(location, "division by zero");
+          nc->set_invalid();
 	  mpc_set_ui(val, 0, MPC_RNDNN);
 	  break;
 	}
@@ -4849,7 +4875,14 @@ Binary_expression::do_lower(Gogo* gogo, Named_object*,
 	    Numeric_constant nc;
 	    if (!Binary_expression::eval_constant(op, &left_nc, &right_nc,
 						  location, &nc))
-	      return this;
+              {
+                if (nc.is_invalid())
+                  {
+                    go_assert(saw_errors());
+                    return Expression::make_error(location);
+                  }
+                return this;
+              }
 	    return nc.expression(location);
 	  }
       }
@@ -8317,8 +8350,14 @@ Builtin_call_expression::do_get_backend(Translate_context* context)
             Expression::make_conditional(cond, arg1_len, arg2_len, location);
 
 	Type* element_type = at->element_type();
-	Btype* element_btype = element_type->get_backend(gogo);
-	int64_t element_size = gogo->backend()->type_size(element_btype);
+	int64_t element_size;
+        bool ok = element_type->backend_type_size(gogo, &element_size);
+        if (!ok)
+          {
+            go_assert(saw_errors());
+            return gogo->backend()->error_expression();
+          }
+
 	Expression* size_expr = Expression::make_integer_int64(element_size,
 							       length->type(),
 							       location);
@@ -8359,8 +8398,12 @@ Builtin_call_expression::do_get_backend(Translate_context* context)
 	  {
 	    arg2_val = at->get_value_pointer(gogo, arg2);
 	    arg2_len = at->get_length(gogo, arg2);
-	    Btype* element_btype = element_type->get_backend(gogo);
-	    size = gogo->backend()->type_size(element_btype);
+            bool ok = element_type->backend_type_size(gogo, &size);
+            if (!ok)
+              {
+                go_assert(saw_errors());
+                return gogo->backend()->error_expression();
+              }
 	  }
         Expression* element_size =
 	  Expression::make_integer_int64(size, NULL, location);
@@ -11517,14 +11560,20 @@ Allocation_expression::do_get_backend(Translate_context* context)
   Gogo* gogo = context->gogo();
   Location loc = this->location();
 
-  Btype* btype = this->type_->get_backend(gogo);
   if (this->allocate_on_stack_)
     {
-      int64_t size = gogo->backend()->type_size(btype);
+      int64_t size;
+      bool ok = this->type_->backend_type_size(gogo, &size);
+      if (!ok)
+        {
+          go_assert(saw_errors());
+          return gogo->backend()->error_expression();
+        }
       return gogo->backend()->stack_allocation_expression(size, loc);
     }
 
-  Bexpression* space = 
+  Btype* btype = this->type_->get_backend(gogo);
+  Bexpression* space =
     gogo->allocate_memory(this->type_, loc)->get_backend(context);
   Btype* pbtype = gogo->backend()->pointer_type(btype);
   return gogo->backend()->convert_expression(pbtype, space, loc);
@@ -13709,23 +13758,28 @@ Type_info_expression::do_type()
 Bexpression*
 Type_info_expression::do_get_backend(Translate_context* context)
 {
-  Btype* btype = this->type_->get_backend(context->gogo());
   Gogo* gogo = context->gogo();
+  bool ok = true;
   int64_t val;
   switch (this->type_info_)
     {
     case TYPE_INFO_SIZE:
-      val = gogo->backend()->type_size(btype);
+      ok = this->type_->backend_type_size(gogo, &val);
       break;
     case TYPE_INFO_ALIGNMENT:
-      val = gogo->backend()->type_alignment(btype);
+      ok = this->type_->backend_type_align(gogo, &val);
       break;
     case TYPE_INFO_FIELD_ALIGNMENT:
-      val = gogo->backend()->type_field_alignment(btype);
+      ok = this->type_->backend_type_field_align(gogo, &val);
       break;
     default:
       go_unreachable();
     }
+  if (!ok)
+    {
+      go_assert(saw_errors());
+      return gogo->backend()->error_expression();
+    }
   Expression* e = Expression::make_integer_int64(val, this->type(),
 						 this->location());
   return e->get_backend(context);
@@ -15189,7 +15243,7 @@ Numeric_constant::set_type(Type* type, bool issue_error, Location loc)
 
 bool
 Numeric_constant::check_int_type(Integer_type* type, bool issue_error,
-				 Location location) const
+				 Location location)
 {
   mpz_t val;
   switch (this->classification_)
@@ -15203,7 +15257,11 @@ Numeric_constant::check_int_type(Integer_type* type, bool issue_error,
       if (!mpfr_integer_p(this->u_.float_val))
 	{
 	  if (issue_error)
-	    error_at(location, "floating point constant truncated to integer");
+            {
+              error_at(location,
+                       "floating point constant truncated to integer");
+              this->set_invalid();
+            }
 	  return false;
 	}
       mpz_init(val);
@@ -15215,7 +15273,10 @@ Numeric_constant::check_int_type(Integer_type* type, bool issue_error,
 	  || !mpfr_zero_p(mpc_imagref(this->u_.complex_val)))
 	{
 	  if (issue_error)
-	    error_at(location, "complex constant truncated to integer");
+            {
+              error_at(location, "complex constant truncated to integer");
+              this->set_invalid();
+            }
 	  return false;
 	}
       mpz_init(val);
@@ -15253,7 +15314,10 @@ Numeric_constant::check_int_type(Integer_type* type, bool issue_error,
     }
 
   if (!ret && issue_error)
-    error_at(location, "integer constant overflow");
+    {
+      error_at(location, "integer constant overflow");
+      this->set_invalid();
+    }
 
   return ret;
 }
@@ -15281,7 +15345,10 @@ Numeric_constant::check_float_type(Float_type* type, bool issue_error,
       if (!mpfr_zero_p(mpc_imagref(this->u_.complex_val)))
 	{
 	  if (issue_error)
-	    error_at(location, "complex constant truncated to float");
+            {
+              this->set_invalid();
+              error_at(location, "complex constant truncated to float");
+            }
 	  return false;
 	}
       mpfr_init_set(val, mpc_realref(this->u_.complex_val), GMP_RNDN);
@@ -15344,7 +15411,10 @@ Numeric_constant::check_float_type(Float_type* type, bool issue_error,
   mpfr_clear(val);
 
   if (!ret && issue_error)
-    error_at(location, "floating point constant overflow");
+    {
+      error_at(location, "floating point constant overflow");
+      this->set_invalid();
+    }
 
   return ret;
 } 
@@ -15399,7 +15469,10 @@ Numeric_constant::check_complex_type(Complex_type* type, bool issue_error,
       && mpfr_get_exp(mpc_realref(val)) > max_exp)
     {
       if (issue_error)
-	error_at(location, "complex real part overflow");
+        {
+          error_at(location, "complex real part overflow");
+          this->set_invalid();
+        }
       ret = false;
     }
 
@@ -15409,7 +15482,10 @@ Numeric_constant::check_complex_type(Complex_type* type, bool issue_error,
       && mpfr_get_exp(mpc_imagref(val)) > max_exp)
     {
       if (issue_error)
-	error_at(location, "complex imaginary part overflow");
+        {
+          error_at(location, "complex imaginary part overflow");
+          this->set_invalid();
+        }
       ret = false;
     }
 
@@ -15455,6 +15531,9 @@ Numeric_constant::expression(Location loc) const
       return Expression::make_float(&this->u_.float_val, this->type_, loc);
     case NC_COMPLEX:
       return Expression::make_complex(&this->u_.complex_val, this->type_, loc);
+    case NC_INVALID:
+      go_assert(saw_errors());
+      return Expression::make_error(loc);
     default:
       go_unreachable();
     }
diff --git a/gcc/go/gofrontend/expressions.h b/gcc/go/gofrontend/expressions.h
index 5358b021339..3e3950985cb 100644
--- a/gcc/go/gofrontend/expressions.h
+++ b/gcc/go/gofrontend/expressions.h
@@ -3460,6 +3460,11 @@ class Numeric_constant
   void
   set_complex(Type*, const mpc_t);
 
+  // Mark numeric constant as invalid.
+  void
+  set_invalid()
+  { this->classification_ = NC_INVALID; }
+
   // Classifiers.
   bool
   is_int() const
@@ -3477,6 +3482,10 @@ class Numeric_constant
   is_complex() const
   { return this->classification_ == Numeric_constant::NC_COMPLEX; }
 
+  bool
+  is_invalid() const
+  { return this->classification_ == Numeric_constant::NC_INVALID; }
+
   // Value retrievers.  These will initialize the values as well as
   // set them.  GET_INT is only valid if IS_INT returns true, and
   // likewise respectively.
@@ -3554,7 +3563,7 @@ class Numeric_constant
   mpfr_to_unsigned_long(const mpfr_t fval, unsigned long *val) const;
 
   bool
-  check_int_type(Integer_type*, bool, Location) const;
+  check_int_type(Integer_type*, bool, Location);
 
   bool
   check_float_type(Float_type*, bool, Location);
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index 233ee274cf0..5ecc55d0dbb 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -1818,7 +1818,11 @@ Gogo::start_function(const std::string& name, Function_type* type,
 								  function);
 	    }
 	  else
-	    go_unreachable();
+            {
+              error_at(type->receiver()->location(),
+                       "invalid receiver type (receiver must be a named type)");
+              ret = Named_object::make_function(name, NULL, function);
+            }
 	}
       this->package_->bindings()->add_method(ret);
     }
diff --git a/gcc/go/gofrontend/lex.cc b/gcc/go/gofrontend/lex.cc
index 67f78034266..98d98da74ec 100644
--- a/gcc/go/gofrontend/lex.cc
+++ b/gcc/go/gofrontend/lex.cc
@@ -1752,7 +1752,9 @@ Lex::skip_cpp_comment()
   // For field tracking analysis: a //go:nointerface comment means
   // that the next interface method should not be stored in the type
   // descriptor.  This permits it to be discarded if it is not needed.
-  if (this->lineoff_ == 2 && memcmp(p, "go:nointerface", 14) == 0)
+  if (this->lineoff_ == 2
+      && pend - p > 14
+      && memcmp(p, "go:nointerface", 14) == 0)
     this->saw_nointerface_ = true;
 
   while (p < pend)
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index 8331678578d..dcc6bc829c6 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -2524,6 +2524,20 @@ Type::backend_type_size(Gogo* gogo, int64_t *psize)
     return false;
   Btype* bt = this->get_backend_placeholder(gogo);
   *psize = gogo->backend()->type_size(bt);
+  if (*psize == -1)
+    {
+      if (this->named_type() != NULL)
+        error_at(this->named_type()->location(),
+                 "type %s larger than address space",
+                 Gogo::message_name(this->named_type()->name()).c_str());
+      else
+        error("type %s larger than address space",
+              this->reflection(gogo).c_str());
+
+      // Make this an error type to avoid knock-on errors.
+      this->classification_ = TYPE_ERROR;
+      return false;
+    }
   return true;
 }
 
@@ -6400,8 +6414,12 @@ Array_type::slice_gc_symbol(Gogo* gogo, Expression_list** vals,
 
   // Differentiate between slices with zero-length and non-zero-length values.
   Type* element_type = this->element_type();
-  Btype* ebtype = element_type->get_backend(gogo);
-  int64_t element_size = gogo->backend()->type_size(ebtype);
+  int64_t element_size;
+  bool ok = element_type->backend_type_size(gogo, &element_size);
+  if (!ok) {
+    go_assert(saw_errors());
+    element_size = 4;
+  }
 
   Type* uintptr_type = Type::lookup_integer_type("uintptr");
   unsigned long opval = element_size == 0 ? GC_APTR : GC_SLICE;
@@ -6432,7 +6450,13 @@ Array_type::array_gc_symbol(Gogo* gogo, Expression_list** vals,
 
   Btype* pbtype = gogo->backend()->pointer_type(gogo->backend()->void_type());
   int64_t pwidth = gogo->backend()->type_size(pbtype);
-  int64_t iwidth = gogo->backend()->type_size(this->get_backend(gogo));
+  int64_t iwidth;
+  bool ok = this->backend_type_size(gogo, &iwidth);
+  if (!ok)
+    {
+      go_assert(saw_errors());
+      iwidth = 4;
+    }
 
   Type* element_type = this->element_type();
   if (bound < 1 || !element_type->has_pointer())
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index c3c20909144..85f16f3933b 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -256,17 +256,12 @@ __isl_give isl_union_map *
 extend_schedule (__isl_take isl_union_map *x)
 {
   int max = 0;
-  isl_stat res;
   struct extend_schedule_str str;
 
-  res = isl_union_map_foreach_map (x, max_number_of_out_dimensions, (void *) &max);
-  gcc_assert (res == isl_stat_ok);
-
+  isl_union_map_foreach_map (x, max_number_of_out_dimensions, (void *) &max);
   str.max = max;
   str.umap = isl_union_map_empty (isl_union_map_get_space (x));
-  res = isl_union_map_foreach_map (x, extend_schedule_1, (void *) &str);
-  gcc_assert (res == isl_stat_ok);
-
+  isl_union_map_foreach_map (x, extend_schedule_1, (void *) &str);
   isl_union_map_free (x);
   return str.umap;
 }
@@ -395,7 +390,6 @@ subtract_commutative_associative_deps (scop_p scop,
   FOR_EACH_VEC_ELT (pbbs, i, pbb)
     if (PBB_IS_REDUCTION (pbb))
       {
-	int res;
 	isl_union_map *r = isl_union_map_empty (isl_space_copy (space));
 	isl_union_map *must_w = isl_union_map_empty (isl_space_copy (space));
 	isl_union_map *may_w = isl_union_map_empty (isl_space_copy (space));
@@ -432,27 +426,24 @@ subtract_commutative_associative_deps (scop_p scop,
 	  (isl_union_map_copy (must_w), isl_union_map_copy (may_w));
 	empty = isl_union_map_empty (isl_union_map_get_space (all_w));
 
-	res = isl_union_map_compute_flow (isl_union_map_copy (r),
-					  isl_union_map_copy (must_w),
-					  isl_union_map_copy (may_w),
-					  isl_union_map_copy (original),
-					  &x_must_raw, &x_may_raw,
-					  &x_must_raw_no_source,
-					  &x_may_raw_no_source);
-	gcc_assert (res == 0);
-	res = isl_union_map_compute_flow (isl_union_map_copy (all_w),
-					  r, empty,
-					  isl_union_map_copy (original),
-					  &x_must_war, &x_may_war,
-					  &x_must_war_no_source,
-					  &x_may_war_no_source);
-	gcc_assert (res == 0);
-	res = isl_union_map_compute_flow (all_w, must_w, may_w,
-					  isl_union_map_copy (original),
-					  &x_must_waw, &x_may_waw,
-					  &x_must_waw_no_source,
-					  &x_may_waw_no_source);
-	gcc_assert (res == 0);
+	isl_union_map_compute_flow (isl_union_map_copy (r),
+				    isl_union_map_copy (must_w),
+				    isl_union_map_copy (may_w),
+				    isl_union_map_copy (original),
+				    &x_must_raw, &x_may_raw,
+				    &x_must_raw_no_source,
+				    &x_may_raw_no_source);
+	isl_union_map_compute_flow (isl_union_map_copy (all_w),
+				    r, empty,
+				    isl_union_map_copy (original),
+				    &x_must_war, &x_may_war,
+				    &x_must_war_no_source,
+				    &x_may_war_no_source);
+	isl_union_map_compute_flow (all_w, must_w, may_w,
+				    isl_union_map_copy (original),
+				    &x_must_waw, &x_may_waw,
+				    &x_must_waw_no_source,
+				    &x_may_waw_no_source);
 
 	if (must_raw)
 	  *must_raw = isl_union_map_subtract (*must_raw, x_must_raw);
@@ -551,26 +542,22 @@ compute_deps (scop_p scop, vec<poly_bb_p> pbbs,
   isl_space *space = isl_union_map_get_space (all_writes);
   isl_union_map *empty = isl_union_map_empty (space);
   isl_union_map *original = scop_get_original_schedule (scop, pbbs);
-  int res;
 
-  res = isl_union_map_compute_flow (isl_union_map_copy (reads),
-				    isl_union_map_copy (must_writes),
-				    isl_union_map_copy (may_writes),
-				    isl_union_map_copy (original),
-				    must_raw, may_raw, must_raw_no_source,
-				    may_raw_no_source);
-  gcc_assert (res == 0);
-  res = isl_union_map_compute_flow (isl_union_map_copy (all_writes),
-				    reads, empty,
-				    isl_union_map_copy (original),
-				    must_war, may_war, must_war_no_source,
-				    may_war_no_source);
-  gcc_assert (res == 0);
-  res = isl_union_map_compute_flow (all_writes, must_writes, may_writes,
-				    isl_union_map_copy (original),
-				    must_waw, may_waw, must_waw_no_source,
-				    may_waw_no_source);
-  gcc_assert (res == 0);
+  isl_union_map_compute_flow (isl_union_map_copy (reads),
+			      isl_union_map_copy (must_writes),
+			      isl_union_map_copy (may_writes),
+			      isl_union_map_copy (original),
+			      must_raw, may_raw, must_raw_no_source,
+			      may_raw_no_source);
+  isl_union_map_compute_flow (isl_union_map_copy (all_writes),
+			      reads, empty,
+			      isl_union_map_copy (original),
+			      must_war, may_war, must_war_no_source,
+			      may_war_no_source);
+  isl_union_map_compute_flow (all_writes, must_writes, may_writes,
+			      isl_union_map_copy (original),
+			      must_waw, may_waw, must_waw_no_source,
+			      may_waw_no_source);
 
   subtract_commutative_associative_deps
     (scop, pbbs, original,
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 5434bfdeb6f..a8c99c3faad 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -57,7 +57,9 @@ extern "C" {
 #include "tree-ssa-loop-manip.h"
 #include "tree-scalar-evolution.h"
 #include "gimple-ssa.h"
+#include "tree-phinodes.h"
 #include "tree-into-ssa.h"
+#include "ssa-iterators.h"
 #include <map>
 #include "graphite-isl-ast-to-gimple.h"
 
@@ -286,7 +288,12 @@ gcc_expression_from_isl_ast_expr_id (tree type,
   gcc_assert (res != ip.end () &&
 	      "Could not map isl_id to tree expression");
   isl_ast_expr_free (expr_id);
-  return fold_convert (type, res->second);
+  tree t = res->second;
+  tree *val = region->parameter_rename_map->get(t);
+
+  if (!val)
+   val = &t;
+  return fold_convert (type, *val);
 }
 
 /* Converts an isl_ast_expr_int expression E to a GCC expression tree of
@@ -1063,6 +1070,69 @@ scop_to_isl_ast (scop_p scop, ivs_params &ip)
   return ast_isl;
 }
 
+/* Copy def from sese REGION to the newly created TO_REGION. TR is defined by
+   DEF_STMT. GSI points to entry basic block of the TO_REGION.  */
+
+static void
+copy_def(tree tr, gimple def_stmt, sese region, sese to_region, gimple_stmt_iterator *gsi)
+{
+  if (!defined_in_sese_p (tr, region))
+    return;
+  ssa_op_iter iter;
+  use_operand_p use_p;
+
+  FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
+    {
+      tree use_tr = USE_FROM_PTR (use_p);
+
+      /* Do not copy parameters that have been generated in the header of the
+	 scop.  */
+      if (region->parameter_rename_map->get(use_tr))
+	continue;
+
+      gimple def_of_use = SSA_NAME_DEF_STMT (use_tr);
+      if (!def_of_use)
+	continue;
+
+      copy_def (use_tr, def_of_use, region, to_region, gsi);
+    }
+
+  gimple copy = gimple_copy (def_stmt);
+  gsi_insert_after (gsi, copy, GSI_NEW_STMT);
+
+  /* Create new names for all the definitions created by COPY and
+     add replacement mappings for each new name.  */
+  def_operand_p def_p;
+  ssa_op_iter op_iter;
+  FOR_EACH_SSA_DEF_OPERAND (def_p, copy, op_iter, SSA_OP_ALL_DEFS)
+    {
+      tree old_name = DEF_FROM_PTR (def_p);
+      tree new_name = create_new_def_for (old_name, copy, def_p);
+      region->parameter_rename_map->put(old_name, new_name);
+    }
+
+  update_stmt (copy);
+}
+
+static void
+copy_internal_parameters(sese region, sese to_region)
+{
+  /* For all the parameters which definitino is in the if_region->false_region,
+     insert code on true_region (if_region->true_region->entry). */
+
+  int i;
+  tree tr;
+  gimple_stmt_iterator gsi = gsi_start_bb(to_region->entry->dest);
+
+  FOR_EACH_VEC_ELT (region->params, i, tr)
+    {
+      // If def is not in region.
+      gimple def_stmt = SSA_NAME_DEF_STMT (tr);
+      if (def_stmt)
+	copy_def (tr, def_stmt, region, to_region, &gsi);
+    }
+}
+
 /* GIMPLE Loop Generator: generates loops from STMT in GIMPLE form for
    the given SCOP.  Return true if code generation succeeded.
 
@@ -1102,10 +1172,13 @@ graphite_regenerate_ast_isl (scop_p scop)
 
   context_loop = SESE_ENTRY (region)->src->loop_father;
 
-  translate_isl_ast_to_gimple t (region);
+  /* Copy all the parameters which are defined in the region.  */
+  copy_internal_parameters(if_region->false_region, if_region->true_region);
 
-  t.translate_isl_ast (context_loop, root_node, if_region->true_region->entry,
-		     ip);
+  translate_isl_ast_to_gimple t(region);
+  edge e = single_succ_edge (if_region->true_region->entry->dest);
+  split_edge (e);
+  t.translate_isl_ast (context_loop, root_node, e, ip);
 
   mark_virtual_operands_for_renaming (cfun);
   update_ssa (TODO_update_ssa);
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index ffa44652f63..bd13978f42d 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include <isl/band.h>
 #include <isl/aff.h>
 #include <isl/options.h>
+#include <isl/ctx.h>
 
 #include "system.h"
 #include "coretypes.h"
@@ -63,335 +64,203 @@ scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
   return res;
 }
 
-/* getTileMap - Create a map that describes a n-dimensonal tiling.
-  
-   getTileMap creates a map from a n-dimensional scattering space into an
+/* get_tile_map - Create a map that describes a n-dimensonal tiling.
+
+   get_tile_map creates a map from a n-dimensional scattering space into an
    2*n-dimensional scattering space. The map describes a rectangular tiling.
-  
+
    Example:
-     scheduleDimensions = 2, parameterDimensions = 1, tileSize = 32
- 
-    tileMap := [p0] -> {[s0, s1] -> [t0, t1, s0, s1]:
-                         t0 % 32 = 0 and t0 <= s0 < t0 + 32 and
-                         t1 % 32 = 0 and t1 <= s1 < t1 + 32}
- 
+     SCHEDULE_DIMENSIONS = 2, PARAMETER_DIMENSIONS = 1, TILE_SIZE = 32
+
+    tile_map := [p0] -> {[s0, s1] -> [t0, t1, s0, s1]:
+			 t0 % 32 = 0 and t0 <= s0 < t0 + 32 and
+			 t1 % 32 = 0 and t1 <= s1 < t1 + 32}
+
    Before tiling:
- 
+
    for (i = 0; i < N; i++)
      for (j = 0; j < M; j++)
- 	S(i,j)
- 
+	S(i,j)
+
    After tiling:
- 
+
    for (t_i = 0; t_i < N; i+=32)
      for (t_j = 0; t_j < M; j+=32)
- 	for (i = t_i; i < min(t_i + 32, N); i++)  | Unknown that N % 32 = 0
- 	  for (j = t_j; j < t_j + 32; j++)        |   Known that M % 32 = 0
- 	    S(i,j)
-   */
- 
+	for (i = t_i; i < min(t_i + 32, N); i++)  | Unknown that N % 32 = 0
+	  for (j = t_j; j < t_j + 32; j++)        |   Known that M % 32 = 0
+	    S(i,j)
+  */
+
 static isl_basic_map *
-getTileMap (isl_ctx *ctx, int scheduleDimensions, int tileSize)
+get_tile_map (isl_ctx *ctx, int schedule_dimensions, int tile_size)
 {
-  int x;
   /* We construct
 
-     tileMap := [p0] -> {[s0, s1] -> [t0, t1, p0, p1, a0, a1]:
-    	                s0 = a0 * 32 and s0 = p0 and t0 <= p0 < t0 + 32 and
-    	                s1 = a1 * 32 and s1 = p1 and t1 <= p1 < t1 + 32}
+     tile_map := [p0] -> {[s0, s1] -> [t0, t1, p0, p1, a0, a1]:
+			s0 = a0 * 32 and s0 = p0 and t0 <= p0 < t0 + 32 and
+			s1 = a1 * 32 and s1 = p1 and t1 <= p1 < t1 + 32}
 
      and project out the auxilary dimensions a0 and a1.  */
-  isl_space *Space = isl_space_alloc (ctx, 0, scheduleDimensions,
-				      scheduleDimensions * 3);
-  isl_basic_map *tileMap = isl_basic_map_universe (isl_space_copy (Space));
+  isl_space *space
+    = isl_space_alloc (ctx, 0, schedule_dimensions, schedule_dimensions * 3);
+  isl_basic_map *tile_map = isl_basic_map_universe (isl_space_copy (space));
 
-  isl_local_space *LocalSpace = isl_local_space_from_space (Space);
+  isl_local_space *local_space = isl_local_space_from_space (space);
 
-  for (x = 0; x < scheduleDimensions; x++)
+  for (int x = 0; x < schedule_dimensions; x++)
     {
       int sX = x;
       int tX = x;
-      int pX = scheduleDimensions + x;
-      int aX = 2 * scheduleDimensions + x;
+      int pX = schedule_dimensions + x;
+      int aX = 2 * schedule_dimensions + x;
 
       isl_constraint *c;
 
-      /* sX = aX * tileSize; */
-      c = isl_equality_alloc (isl_local_space_copy (LocalSpace));
+      /* sX = aX * tile_size; */
+      c = isl_equality_alloc (isl_local_space_copy (local_space));
       isl_constraint_set_coefficient_si (c, isl_dim_out, sX, 1);
-      isl_constraint_set_coefficient_si (c, isl_dim_out, aX, -tileSize);
-      tileMap = isl_basic_map_add_constraint (tileMap, c);
+      isl_constraint_set_coefficient_si (c, isl_dim_out, aX, -tile_size);
+      tile_map = isl_basic_map_add_constraint (tile_map, c);
 
       /* pX = sX; */
-      c = isl_equality_alloc (isl_local_space_copy (LocalSpace));
+      c = isl_equality_alloc (isl_local_space_copy (local_space));
       isl_constraint_set_coefficient_si (c, isl_dim_out, pX, 1);
       isl_constraint_set_coefficient_si (c, isl_dim_in, sX, -1);
-      tileMap = isl_basic_map_add_constraint (tileMap, c);
+      tile_map = isl_basic_map_add_constraint (tile_map, c);
 
       /* tX <= pX */
-      c = isl_inequality_alloc (isl_local_space_copy (LocalSpace));
+      c = isl_inequality_alloc (isl_local_space_copy (local_space));
       isl_constraint_set_coefficient_si (c, isl_dim_out, pX, 1);
       isl_constraint_set_coefficient_si (c, isl_dim_out, tX, -1);
-      tileMap = isl_basic_map_add_constraint (tileMap, c);
+      tile_map = isl_basic_map_add_constraint (tile_map, c);
 
-      /* pX <= tX + (tileSize - 1) */
-      c = isl_inequality_alloc (isl_local_space_copy (LocalSpace));
+      /* pX <= tX + (tile_size - 1) */
+      c = isl_inequality_alloc (isl_local_space_copy (local_space));
       isl_constraint_set_coefficient_si (c, isl_dim_out, tX, 1);
       isl_constraint_set_coefficient_si (c, isl_dim_out, pX, -1);
-      isl_constraint_set_constant_si (c, tileSize - 1);
-      tileMap = isl_basic_map_add_constraint (tileMap, c);
+      isl_constraint_set_constant_si (c, tile_size - 1);
+      tile_map = isl_basic_map_add_constraint (tile_map, c);
     }
 
   /* Project out auxiliary dimensions.
 
-     The auxiliary dimensions are transformed into existentially quantified ones.
-     This reduces the number of visible scattering dimensions and allows Cloog
+     The auxiliary dimensions are transformed into existentially quantified
+     ones.
+     This reduces the number of visible scattering dimensions and allows isl
      to produces better code.  */
-  tileMap = isl_basic_map_project_out (tileMap, isl_dim_out,
-				       2 * scheduleDimensions,
-				       scheduleDimensions);
-  isl_local_space_free (LocalSpace);
-  return tileMap;
+  tile_map =
+      isl_basic_map_project_out (tile_map, isl_dim_out,
+				 2 * schedule_dimensions, schedule_dimensions);
+  isl_local_space_free (local_space);
+  return tile_map;
 }
 
-/* getScheduleForBand - Get the schedule for this band.
-  
-   Polly applies transformations like tiling on top of the isl calculated value.
+/* get_schedule_for_band - Get the schedule for this BAND.
+
+   Polly applies transformations like tiling on top of the isl calculated
+   value.
    This can influence the number of scheduling dimension. The number of
-   schedule dimensions is returned in the parameter 'Dimension'.  */
-static bool DisableTiling = false;
+   schedule dimensions is returned in DIMENSIONS.  */
 
 static isl_union_map *
-getScheduleForBand (isl_band *Band, int *Dimensions)
+get_schedule_for_band (isl_band *band, int *dimensions)
 {
-  isl_union_map *PartialSchedule;
+  isl_union_map *partial_schedule;
   isl_ctx *ctx;
-  isl_space *Space;
-  isl_basic_map *TileMap;
-  isl_union_map *TileUMap;
+  isl_space *space;
+  isl_basic_map *tile_map;
+  isl_union_map *tile_umap;
 
-  PartialSchedule = isl_band_get_partial_schedule (Band);
-  *Dimensions = isl_band_n_member (Band);
-
-  if (DisableTiling)
-    return PartialSchedule;
+  partial_schedule = isl_band_get_partial_schedule (band);
+  *dimensions = isl_band_n_member (band);
 
   /* It does not make any sense to tile a band with just one dimension.  */
-  if (*Dimensions == 1)
+  if (*dimensions == 1)
     {
       if (dump_file && dump_flags)
 	fprintf (dump_file, "not tiled\n");
-      return PartialSchedule;
+      return partial_schedule;
     }
 
   if (dump_file && dump_flags)
     fprintf (dump_file, "tiled by %d\n",
 	     PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE));
 
-  ctx = isl_union_map_get_ctx (PartialSchedule);
-  Space = isl_union_map_get_space (PartialSchedule);
-
-  TileMap = getTileMap (ctx, *Dimensions,
-			PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE));
-  TileUMap = isl_union_map_from_map (isl_map_from_basic_map (TileMap));
-  TileUMap = isl_union_map_align_params (TileUMap, Space);
-  *Dimensions = 2 * *Dimensions;
+  ctx = isl_union_map_get_ctx (partial_schedule);
+  space = isl_union_map_get_space (partial_schedule);
 
-  return isl_union_map_apply_range (PartialSchedule, TileUMap);
-}
-
-/* Create a map that pre-vectorizes one scheduling dimension.
-  
-   getPrevectorMap creates a map that maps each input dimension to the same
-   output dimension, except for the dimension DimToVectorize. DimToVectorize is
-   strip mined by 'VectorWidth' and the newly created point loop of
-   DimToVectorize is moved to the innermost level.
-  
-   Example (DimToVectorize=0, ScheduleDimensions=2, VectorWidth=4):
-  
-   | Before transformation
-   |
-   | A[i,j] -> [i,j]
-   |
-   | for (i = 0; i < 128; i++)
-   |    for (j = 0; j < 128; j++)
-   |      A(i,j);
-  
-     Prevector map:
-     [i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and i = ip
-  
-   | After transformation:
-   |
-   | A[i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and i = ip
-   |
-   | for (it = 0; it < 128; it+=4)
-   |    for (j = 0; j < 128; j++)
-   |      for (ip = max(0,it); ip < min(128, it + 3); ip++)
-   |        A(ip,j);
-  
-   The goal of this transformation is to create a trivially vectorizable loop.
-   This means a parallel loop at the innermost level that has a constant number
-   of iterations corresponding to the target vector width.
-  
-   This transformation creates a loop at the innermost level. The loop has a
-   constant number of iterations, if the number of loop iterations at
-   DimToVectorize can be devided by VectorWidth. The default VectorWidth is
-   currently constant and not yet target specific. This function does not reason
-   about parallelism.  */
-static isl_map *
-getPrevectorMap (isl_ctx *ctx, int DimToVectorize,
-		 int ScheduleDimensions,
-		 int VectorWidth)
-{
-  isl_space *Space;
-  isl_local_space *LocalSpace, *LocalSpaceRange;
-  isl_set *Modulo;
-  isl_map *TilingMap;
-  isl_constraint *c;
-  isl_aff *Aff;
-  int PointDimension; /* ip */
-  int TileDimension;  /* it */
-  isl_val *VectorWidthMP;
-  int i;
-
-  /* assert (0 <= DimToVectorize && DimToVectorize < ScheduleDimensions);*/
-
-  Space = isl_space_alloc (ctx, 0, ScheduleDimensions, ScheduleDimensions + 1);
-  TilingMap = isl_map_universe (isl_space_copy (Space));
-  LocalSpace = isl_local_space_from_space (Space);
-  PointDimension = ScheduleDimensions;
-  TileDimension = DimToVectorize;
-
-  /* Create an identity map for everything except DimToVectorize and map
-     DimToVectorize to the point loop at the innermost dimension.  */
-  for (i = 0; i < ScheduleDimensions; i++)
-    {
-      c = isl_equality_alloc (isl_local_space_copy (LocalSpace));
-      isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1);
-
-      if (i == DimToVectorize)
-	isl_constraint_set_coefficient_si (c, isl_dim_out, PointDimension, 1);
-      else
-	isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1);
-
-      TilingMap = isl_map_add_constraint (TilingMap, c);
-    }
+  tile_map = get_tile_map (ctx, *dimensions,
+			   PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE));
+  tile_umap = isl_union_map_from_map (isl_map_from_basic_map (tile_map));
+  tile_umap = isl_union_map_align_params (tile_umap, space);
+  *dimensions = 2 * *dimensions;
 
-  /* it % 'VectorWidth' = 0  */
-  LocalSpaceRange = isl_local_space_range (isl_local_space_copy (LocalSpace));
-  Aff = isl_aff_zero_on_domain (LocalSpaceRange);
-  Aff = isl_aff_set_constant_si (Aff, VectorWidth);
-  Aff = isl_aff_set_coefficient_si (Aff, isl_dim_in, TileDimension, 1);
-
-  VectorWidthMP = isl_val_int_from_si (ctx, VectorWidth);
-  Aff = isl_aff_mod_val (Aff, VectorWidthMP);
-  Modulo = isl_pw_aff_zero_set (isl_pw_aff_from_aff (Aff));
-  TilingMap = isl_map_intersect_range (TilingMap, Modulo);
-
-  /* it <= ip */
-  c = isl_inequality_alloc (isl_local_space_copy (LocalSpace));
-  isl_constraint_set_coefficient_si (c, isl_dim_out, TileDimension, -1);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, PointDimension, 1);
-  TilingMap = isl_map_add_constraint (TilingMap, c);
-
-  /* ip <= it + ('VectorWidth' - 1) */
-  c = isl_inequality_alloc (LocalSpace);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, TileDimension, 1);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, PointDimension, -1);
-  isl_constraint_set_constant_si (c, VectorWidth - 1);
-  TilingMap = isl_map_add_constraint (TilingMap, c);
-
-  return TilingMap;
+  return isl_union_map_apply_range (partial_schedule, tile_umap);
 }
 
-static bool EnablePollyVector = false;
 
-/* getScheduleForBandList - Get the scheduling map for a list of bands.
+/* get_schedule_for_band_list - Get the scheduling map for a list of bands.
 
    We walk recursively the forest of bands to combine the schedules of the
-   individual bands to the overall schedule. In case tiling is requested,
+   individual bands to the overall schedule.  In case tiling is requested,
    the individual bands are tiled.  */
+
 static isl_union_map *
-getScheduleForBandList (isl_band_list *BandList)
+get_schedule_for_band_list (isl_band_list *band_list)
 {
-  int NumBands, i;
-  isl_union_map *Schedule;
+  int num_bands, i;
+  isl_union_map *schedule;
   isl_ctx *ctx;
 
-  ctx = isl_band_list_get_ctx (BandList);
-  NumBands = isl_band_list_n_band (BandList);
-  Schedule = isl_union_map_empty (isl_space_params_alloc (ctx, 0));
+  ctx = isl_band_list_get_ctx (band_list);
+  num_bands = isl_band_list_n_band (band_list);
+  schedule = isl_union_map_empty (isl_space_params_alloc (ctx, 0));
 
-  for (i = 0; i < NumBands; i++)
+  for (i = 0; i < num_bands; i++)
     {
-      isl_band *Band;
-      isl_union_map *PartialSchedule;
-      int ScheduleDimensions;
-      isl_space *Space;
+      isl_band *band;
+      isl_union_map *partial_schedule;
+      int schedule_dimensions;
+      isl_space *space;
 
-      Band = isl_band_list_get_band (BandList, i);
-      PartialSchedule = getScheduleForBand (Band, &ScheduleDimensions);
-      Space = isl_union_map_get_space (PartialSchedule);
+      band = isl_band_list_get_band (band_list, i);
+      partial_schedule = get_schedule_for_band (band, &schedule_dimensions);
+      space = isl_union_map_get_space (partial_schedule);
 
-      if (isl_band_has_children (Band))
-	{
-	  isl_band_list *Children;
-	  isl_union_map *SuffixSchedule;
-
-	  Children = isl_band_get_children (Band);
-	  SuffixSchedule = getScheduleForBandList (Children);
-	  PartialSchedule = isl_union_map_flat_range_product (PartialSchedule,
-							      SuffixSchedule);
-	  isl_band_list_free (Children);
-	}
-      else if (EnablePollyVector)
+      if (isl_band_has_children (band))
 	{
-	  for (i = ScheduleDimensions - 1 ;  i >= 0 ; i--)
-	    {
-#ifdef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
-	      if (isl_band_member_is_coincident (Band, i))
-#else
-	      if (isl_band_member_is_zero_distance (Band, i))
-#endif
-		{
-		  isl_map *TileMap;
-		  isl_union_map *TileUMap;
-
-		  TileMap = getPrevectorMap (ctx, i, ScheduleDimensions, 4);
-		  TileUMap = isl_union_map_from_map (TileMap);
-		  TileUMap = isl_union_map_align_params
-		    (TileUMap, isl_space_copy (Space));
-		  PartialSchedule = isl_union_map_apply_range
-		    (PartialSchedule, TileUMap);
-		  break;
-		}	
-	    }
+	  isl_band_list *children = isl_band_get_children (band);
+	  isl_union_map *suffixSchedule
+	    = get_schedule_for_band_list (children);
+	  partial_schedule
+	    = isl_union_map_flat_range_product (partial_schedule,
+						suffixSchedule);
+	  isl_band_list_free (children);
 	}
 
-      Schedule = isl_union_map_union (Schedule, PartialSchedule);
+      schedule = isl_union_map_union (schedule, partial_schedule);
 
-      isl_band_free (Band);
-      isl_space_free (Space);
+      isl_band_free (band);
+      isl_space_free (space);
     }
 
-  return Schedule;
+  return schedule;
 }
 
 static isl_union_map *
-getScheduleMap (isl_schedule *Schedule)
+get_schedule_map (isl_schedule *schedule)
 {
-  isl_band_list *BandList = isl_schedule_get_band_forest (Schedule);
-  isl_union_map *ScheduleMap = getScheduleForBandList (BandList);
-  isl_band_list_free (BandList);
-  return ScheduleMap;
+  isl_band_list *bandList = isl_schedule_get_band_forest (schedule);
+  isl_union_map *schedule_map = get_schedule_for_band_list (bandList);
+  isl_band_list_free (bandList);
+  return schedule_map;
 }
 
 static isl_stat
-getSingleMap (__isl_take isl_map *map, void *user)
+get_single_map (__isl_take isl_map *map, void *user)
 {
-  isl_map **singleMap = (isl_map **) user;
-  *singleMap = map;
-
+  isl_map **single_map = (isl_map **)user;
+  *single_map = map;
   return isl_stat_ok;
 }
 
@@ -404,54 +273,55 @@ apply_schedule_map_to_scop (scop_p scop, isl_union_map *schedule_map)
   FOR_EACH_VEC_ELT (scop->bbs, i, pbb)
     {
       isl_set *domain = isl_set_copy (pbb->domain);
-      isl_union_map *stmtBand;
-      isl_map *stmtSchedule;
+      isl_map *stmt_schedule;
 
-      stmtBand = isl_union_map_intersect_domain
-	(isl_union_map_copy (schedule_map),
-	 isl_union_set_from_set (domain));
-      isl_union_map_foreach_map (stmtBand, getSingleMap, &stmtSchedule);
+      isl_union_map *stmt_band
+	= isl_union_map_intersect_domain (isl_union_map_copy (schedule_map),
+					  isl_union_set_from_set (domain));
+      isl_union_map_foreach_map (stmt_band, get_single_map, &stmt_schedule);
       isl_map_free (pbb->transformed);
-      pbb->transformed = stmtSchedule;
-      isl_union_map_free (stmtBand);
+      pbb->transformed = stmt_schedule;
+      isl_union_map_free (stmt_band);
     }
 }
 
 static const int CONSTANT_BOUND = 20;
 
+/* Compute the schedule for SCOP based on its parameters, domain and set of
+   constraints.  Then apply the schedule to SCOP.  */
+
 bool
 optimize_isl (scop_p scop)
 {
-
-  isl_schedule *schedule;
-#ifdef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
-  isl_schedule_constraints *schedule_constraints;
+#ifdef HAVE_ISL_CTX_MAX_OPERATIONS
+  int old_max_operations = isl_ctx_get_max_operations (scop->ctx);
+  int max_operations = PARAM_VALUE (PARAM_MAX_ISL_OPERATIONS);
+  if (max_operations)
+    isl_ctx_set_max_operations (scop->ctx, max_operations);
 #endif
-  isl_union_set *domain;
-  isl_union_map *validity, *proximity, *dependences;
-  isl_union_map *schedule_map;
-
-  domain = scop_get_domains (scop);
-  dependences = scop_get_dependences (scop);
-  dependences = isl_union_map_gist_domain (dependences,
-					   isl_union_set_copy (domain));
-  dependences = isl_union_map_gist_range (dependences,
-					  isl_union_set_copy (domain));
-  validity = dependences;
+  isl_options_set_on_error (scop->ctx, ISL_ON_ERROR_CONTINUE);
 
-  proximity = isl_union_map_copy (validity);
+  isl_union_set *domain = scop_get_domains (scop);
+  isl_union_map *dependences = scop_get_dependences (scop);
+  dependences
+    = isl_union_map_gist_domain (dependences, isl_union_set_copy (domain));
+  dependences
+    = isl_union_map_gist_range (dependences, isl_union_set_copy (domain));
+  isl_union_map *validity = dependences;
+  isl_union_map *proximity = isl_union_map_copy (validity);
 
 #ifdef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
+  isl_schedule_constraints *schedule_constraints;
   schedule_constraints = isl_schedule_constraints_on_domain (domain);
   schedule_constraints
-	= isl_schedule_constraints_set_proximity (schedule_constraints,
-						  proximity);
+    = isl_schedule_constraints_set_proximity (schedule_constraints,
+					      proximity);
   schedule_constraints
-	= isl_schedule_constraints_set_validity (schedule_constraints,
-						 isl_union_map_copy (validity));
+    = isl_schedule_constraints_set_validity (schedule_constraints,
+					     isl_union_map_copy (validity));
   schedule_constraints
-	= isl_schedule_constraints_set_coincidence (schedule_constraints,
-						    validity);
+    = isl_schedule_constraints_set_coincidence (schedule_constraints,
+						validity);
 #endif
 
   isl_options_set_schedule_max_constant_term (scop->ctx, CONSTANT_BOUND);
@@ -461,27 +331,40 @@ optimize_isl (scop_p scop)
 #else
   isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MIN);
 #endif
-  isl_options_set_on_error (scop->ctx, ISL_ON_ERROR_CONTINUE);
 
 #ifdef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
-  schedule = isl_schedule_constraints_compute_schedule(schedule_constraints);
+  isl_schedule *schedule
+    = isl_schedule_constraints_compute_schedule (schedule_constraints);
 #else
-  schedule = isl_union_set_compute_schedule (domain, validity, proximity);
+  isl_schedule *schedule
+    = isl_union_set_compute_schedule (domain, validity, proximity);
 #endif
 
   isl_options_set_on_error (scop->ctx, ISL_ON_ERROR_ABORT);
 
+#ifdef HAVE_ISL_CTX_MAX_OPERATIONS
+  isl_ctx_reset_operations (scop->ctx);
+  isl_ctx_set_max_operations (scop->ctx, old_max_operations);
+  if (!schedule || isl_ctx_last_error (scop->ctx) == isl_error_quota)
+    {
+      if (dump_file && dump_flags)
+	fprintf (dump_file, "ISL timed out at %d operations\n",
+		 max_operations);
+      if (schedule)
+	isl_schedule_free (schedule);
+      return false;
+    }
+#else
   if (!schedule)
     return false;
+#endif
 
-  schedule_map = getScheduleMap (schedule);
-
+  isl_union_map *schedule_map = get_schedule_map (schedule);
   apply_schedule_map_to_scop (scop, schedule_map);
 
   isl_schedule_free (schedule);
   isl_union_map_free (schedule_map);
-
   return true;
 }
 
-#endif  /* HAVE_isl */
+#endif /* HAVE_isl */
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index dbd98428265..3ac56dedd04 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -472,6 +472,17 @@ graphite_can_represent_loop (basic_block scop_entry, loop_p loop)
   tree niter;
   struct tree_niter_desc niter_desc;
 
+  if (!loop_nest_has_data_refs (loop))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "[scop-detection-fail] ");
+	  fprintf (dump_file, "Loop %d does not have any data reference.\n",
+		   loop->num);
+	}
+      return false;
+    }
+
   /* FIXME: For the moment, graphite cannot be used on loops that
      iterate using induction variables that wrap.  */
 
@@ -1155,8 +1166,17 @@ build_graphite_scops (vec<sd_region> regions,
       if (!exit)
 	continue;
 
-      scop = new_scop (new_sese (entry, exit));
-      scops->safe_push (scop);
+      sese sese_reg = new_sese (entry, exit);
+      scop = new_scop (sese_reg);
+
+      build_sese_loop_nests (sese_reg);
+
+      /* Scops with one or no loops are not interesting.  */
+      if (SESE_LOOP_NEST (sese_reg).length () > 1)
+	scops->safe_push (scop);
+      else if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Discarded scop: %d loops\n",
+		 SESE_LOOP_NEST (sese_reg).length ());
 
       /* Are there overlapping SCoPs?  */
 #ifdef ENABLE_CHECKING
@@ -1172,151 +1192,6 @@ build_graphite_scops (vec<sd_region> regions,
     }
 }
 
-/* Returns true when BB contains only close phi nodes.  */
-
-static bool
-contains_only_close_phi_nodes (basic_block bb)
-{
-  gimple_stmt_iterator gsi;
-
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    if (gimple_code (gsi_stmt (gsi)) != GIMPLE_LABEL)
-      return false;
-
-  return true;
-}
-
-/* Print statistics for SCOP to FILE.  */
-
-static void
-print_graphite_scop_statistics (FILE* file, scop_p scop)
-{
-  long n_bbs = 0;
-  long n_loops = 0;
-  long n_stmts = 0;
-  long n_conditions = 0;
-  long n_p_bbs = 0;
-  long n_p_loops = 0;
-  long n_p_stmts = 0;
-  long n_p_conditions = 0;
-
-  basic_block bb;
-
-  FOR_ALL_BB_FN (bb, cfun)
-    {
-      gimple_stmt_iterator psi;
-      loop_p loop = bb->loop_father;
-
-      if (!bb_in_sese_p (bb, SCOP_REGION (scop)))
-	continue;
-
-      n_bbs++;
-      n_p_bbs += bb->count;
-
-      if (EDGE_COUNT (bb->succs) > 1)
-	{
-	  n_conditions++;
-	  n_p_conditions += bb->count;
-	}
-
-      for (psi = gsi_start_bb (bb); !gsi_end_p (psi); gsi_next (&psi))
-	{
-	  n_stmts++;
-	  n_p_stmts += bb->count;
-	}
-
-      if (loop->header == bb && loop_in_sese_p (loop, SCOP_REGION (scop)))
-	{
-	  n_loops++;
-	  n_p_loops += bb->count;
-	}
-
-    }
-
-  fprintf (file, "\nBefore limit_scops SCoP statistics (");
-  fprintf (file, "BBS:%ld, ", n_bbs);
-  fprintf (file, "LOOPS:%ld, ", n_loops);
-  fprintf (file, "CONDITIONS:%ld, ", n_conditions);
-  fprintf (file, "STMTS:%ld)\n", n_stmts);
-  fprintf (file, "\nBefore limit_scops SCoP profiling statistics (");
-  fprintf (file, "BBS:%ld, ", n_p_bbs);
-  fprintf (file, "LOOPS:%ld, ", n_p_loops);
-  fprintf (file, "CONDITIONS:%ld, ", n_p_conditions);
-  fprintf (file, "STMTS:%ld)\n", n_p_stmts);
-}
-
-/* Print statistics for SCOPS to FILE.  */
-
-static void
-print_graphite_statistics (FILE* file, vec<scop_p> scops)
-{
-  int i;
-  scop_p scop;
-
-  FOR_EACH_VEC_ELT (scops, i, scop)
-    print_graphite_scop_statistics (file, scop);
-}
-
-/* We limit all SCoPs to SCoPs, that are completely surrounded by a loop.
-
-   Example:
-
-   for (i      |
-     {         |
-       for (j  |  SCoP 1
-       for (k  |
-     }         |
-
-   * SCoP frontier, as this line is not surrounded by any loop. *
-
-   for (l      |  SCoP 2
-
-   This is necessary as scalar evolution and parameter detection need a
-   outermost loop to initialize parameters correctly.
-
-   TODO: FIX scalar evolution and parameter detection to allow more flexible
-         SCoP frontiers.  */
-
-static void
-limit_scops (vec<scop_p> *scops)
-{
-  auto_vec<sd_region, 3> regions;
-
-  int i;
-  scop_p scop;
-
-  FOR_EACH_VEC_ELT (*scops, i, scop)
-    {
-      int j;
-      loop_p loop;
-      sese region = SCOP_REGION (scop);
-      build_sese_loop_nests (region);
-
-      FOR_EACH_VEC_ELT (SESE_LOOP_NEST (region), j, loop)
-        if (!loop_in_sese_p (loop_outer (loop), region)
-	    && single_exit (loop))
-          {
-	    sd_region open_scop;
-	    open_scop.entry = loop->header;
-	    open_scop.exit = single_exit (loop)->dest;
-
-	    /* This is a hack on top of the limit_scops hack.  The
-	       limit_scops hack should disappear all together.  */
-	    if (single_succ_p (open_scop.exit)
-		&& contains_only_close_phi_nodes (open_scop.exit))
-	      open_scop.exit = single_succ_edge (open_scop.exit)->dest;
-
-	    regions.safe_push (open_scop);
-	  }
-    }
-
-  free_scops (*scops);
-  scops->create (3);
-
-  create_sese_edges (regions);
-  build_graphite_scops (regions, scops);
-}
-
 /* Returns true when P1 and P2 are close phis with the same
    argument.  */
 
@@ -1501,10 +1376,6 @@ build_scops (vec<scop_p> *scops)
   create_sese_edges (regions);
   build_graphite_scops (regions, scops);
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    print_graphite_statistics (dump_file, *scops);
-
-  limit_scops (scops);
   regions.release ();
 
   if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/match.pd b/gcc/match.pd
index fb4b342d31d..bd5c267f1f8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -931,6 +931,20 @@ along with GCC; see the file COPYING3.  If not see
       && tree_expr_nonnegative_p (@1))
   @0))
 
+/* Optimize (x >> c) << c into x & (-1<<c).  */
+(simplify
+ (lshift (rshift @0 INTEGER_CST@1) @1)
+ (if (wi::ltu_p (@1, element_precision (type)))
+  (bit_and @0 (lshift { build_minus_one_cst (type); } @1))))
+
+/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
+   types.  */
+(simplify
+ (rshift (lshift @0 INTEGER_CST@1) @1)
+ (if (TYPE_UNSIGNED (type)
+      && (wi::ltu_p (@1, element_precision (type))))
+  (bit_and @0 (rshift { build_minus_one_cst (type); } @1))))
+
 (for shiftrotate (lrotate rrotate lshift rshift)
  (simplify
   (shiftrotate @0 integer_zerop)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 36d81288c83..60c502a79a4 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6885,6 +6885,24 @@ expand_omp_for_static_nochunk (struct omp_region *region,
     }
 }
 
+/* Return phi in E->DEST with ARG on edge E.  */
+
+static gphi *
+find_phi_with_arg_on_edge (tree arg, edge e)
+{
+  basic_block bb = e->dest;
+
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi);
+       gsi_next (&gpi))
+    {
+      gphi *phi = gpi.phi ();
+      if (PHI_ARG_DEF_FROM_EDGE (phi, e) == arg)
+	return phi;
+    }
+
+  return NULL;
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a parallel
    loop with static schedule and a specified chunk size.  Given
@@ -6960,7 +6978,8 @@ expand_omp_for_static_chunk (struct omp_region *region,
   body_bb = single_succ (seq_start_bb);
   if (!broken_loop)
     {
-      gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb
+		  || single_succ (BRANCH_EDGE (cont_bb)->dest) == body_bb);
       gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
       trip_update_bb = split_edge (FALLTHRU_EDGE (cont_bb));
     }
@@ -7016,7 +7035,7 @@ expand_omp_for_static_chunk (struct omp_region *region,
       se->probability = REG_BR_PROB_BASE / 2000 - 1;
       if (gimple_in_ssa_p (cfun))
 	{
-	  int dest_idx = find_edge (entry_bb, fin_bb)->dest_idx;
+	  int dest_idx = find_edge (iter_part_bb, fin_bb)->dest_idx;
 	  for (gphi_iterator gpi = gsi_start_phis (fin_bb);
 	       !gsi_end_p (gpi); gsi_next (&gpi))
 	    {
@@ -7261,6 +7280,11 @@ expand_omp_for_static_chunk (struct omp_region *region,
   if (!broken_loop)
     {
       se = find_edge (cont_bb, body_bb);
+      if (se == NULL)
+	{
+	  se = BRANCH_EDGE (cont_bb);
+	  gcc_assert (single_succ (se->dest) == body_bb);
+	}
       if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
 	  remove_edge (se);
@@ -7292,7 +7316,7 @@ expand_omp_for_static_chunk (struct omp_region *region,
       /* When we redirect the edge from trip_update_bb to iter_part_bb, we
 	 remove arguments of the phi nodes in fin_bb.  We need to create
 	 appropriate phi nodes in iter_part_bb instead.  */
-      se = single_pred_edge (fin_bb);
+      se = find_edge (iter_part_bb, fin_bb);
       re = single_succ_edge (trip_update_bb);
       vec<edge_var_map> *head = redirect_edge_var_map_vector (re);
       ene = single_succ_edge (entry_bb);
@@ -7307,6 +7331,10 @@ expand_omp_for_static_chunk (struct omp_region *region,
 	  phi = psi.phi ();
 	  t = gimple_phi_result (phi);
 	  gcc_assert (t == redirect_edge_var_map_result (vm));
+
+	  if (!single_pred_p (fin_bb))
+	    t = copy_ssa_name (t, phi);
+
 	  nphi = create_phi_node (t, iter_part_bb);
 
 	  t = PHI_ARG_DEF_FROM_EDGE (phi, se);
@@ -7318,17 +7346,33 @@ expand_omp_for_static_chunk (struct omp_region *region,
 	    t = vextra;
 	  add_phi_arg (nphi, t, ene, locus);
 	  locus = redirect_edge_var_map_location (vm);
-	  add_phi_arg (nphi, redirect_edge_var_map_def (vm), re, locus);
+	  tree back_arg = redirect_edge_var_map_def (vm);
+	  add_phi_arg (nphi, back_arg, re, locus);
+	  edge ce = find_edge (cont_bb, body_bb);
+	  if (ce == NULL)
+	    {
+	      ce = BRANCH_EDGE (cont_bb);
+	      gcc_assert (single_succ (ce->dest) == body_bb);
+	      ce = single_succ_edge (ce->dest);
+	    }
+	  gphi *inner_loop_phi = find_phi_with_arg_on_edge (back_arg, ce);
+	  gcc_assert (inner_loop_phi != NULL);
+	  add_phi_arg (inner_loop_phi, gimple_phi_result (nphi),
+		       find_edge (seq_start_bb, body_bb), locus);
+
+	  if (!single_pred_p (fin_bb))
+	    add_phi_arg (phi, gimple_phi_result (nphi), se, locus);
 	}
-      gcc_assert (gsi_end_p (psi) && i == head->length ());
+      gcc_assert (gsi_end_p (psi) && (head == NULL || i == head->length ()));
       redirect_edge_var_map_clear (re);
-      while (1)
-	{
-	  psi = gsi_start_phis (fin_bb);
-	  if (gsi_end_p (psi))
-	    break;
-	  remove_phi_node (&psi, false);
-	}
+      if (single_pred_p (fin_bb))
+	while (1)
+	  {
+	    psi = gsi_start_phis (fin_bb);
+	    if (gsi_end_p (psi))
+	      break;
+	    remove_phi_node (&psi, false);
+	  }
 
       /* Make phi node for trip.  */
       phi = create_phi_node (trip_main, iter_part_bb);
@@ -7351,14 +7395,24 @@ expand_omp_for_static_chunk (struct omp_region *region,
 
   if (!broken_loop)
     {
+      struct loop *loop = body_bb->loop_father;
       struct loop *trip_loop = alloc_loop ();
       trip_loop->header = iter_part_bb;
       trip_loop->latch = trip_update_bb;
       add_loop (trip_loop, iter_part_bb->loop_father);
 
+      if (loop != entry_bb->loop_father)
+	{
+	  gcc_assert (loop->header == body_bb);
+	  gcc_assert (loop->latch == region->cont
+		      || single_pred (loop->latch) == region->cont);
+	  trip_loop->inner = loop;
+	  return;
+	}
+
       if (!gimple_omp_for_combined_p (fd->for_stmt))
 	{
-	  struct loop *loop = alloc_loop ();
+	  loop = alloc_loop ();
 	  loop->header = body_bb;
 	  if (collapse_bb == NULL)
 	    loop->latch = cont_bb;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6efb36..79c6f06b991 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1,
 
       if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
 	{
+	  /* The scalar may have been extended to be too wide.  Truncate
+	     it back to the proper size to fit in the broadcast vector.  */
+	  machine_mode inner_mode = GET_MODE_INNER (mode);
+	  if (!CONST_INT_P (op1)
+	      && (GET_MODE_BITSIZE (inner_mode)
+		  < GET_MODE_BITSIZE (GET_MODE (op1))))
+	    op1 = force_reg (inner_mode,
+			     simplify_gen_unary (TRUNCATE, inner_mode, op1,
+						 GET_MODE (op1)));
 	  rtx vop1 = expand_vector_broadcast (mode, op1);
 	  if (vop1)
 	    {
diff --git a/gcc/params-list.h b/gcc/params-list.h
new file mode 100644
index 00000000000..ee33ef52b36
--- /dev/null
+++ b/gcc/params-list.h
@@ -0,0 +1,23 @@
+/* File used to generate params.list
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
+  enumerator,
+#include "params.def"
+#undef DEFPARAM
diff --git a/gcc/params.def b/gcc/params.def
index a7291feb2b6..7d240d29b4f 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -868,6 +868,11 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
 	  "maximum number of basic blocks per function to be analyzed by Graphite",
 	  100, 0, 0)
 
+DEFPARAM (PARAM_MAX_ISL_OPERATIONS,
+	  "max-isl-operations",
+	  "maximum number of ISL operations, 0 means unlimited",
+	  350000, 0, 0)
+
 /* Avoid data dependence analysis on very large loops.  */
 DEFPARAM (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS,
 	  "loop-max-datarefs-for-datadeps",
@@ -1159,6 +1164,11 @@ DEFPARAM (PARAM_MAX_FSM_THREAD_PATHS,
 	  "max-fsm-thread-paths",
 	  "Maximum number of new jump thread paths to create for a finite state automaton",
 	  50, 1, 999999)
+
+DEFPARAM (PARAM_PARLOOPS_CHUNK_SIZE,
+	  "parloops-chunk-size",
+	  "Chunk size of omp schedule for loops parallelized by parloops",
+	  0, 0, 0)
 /*
 
 Local variables:
diff --git a/gcc/params.h b/gcc/params.h
index f53426dfea6..9f7618ac1d1 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -81,10 +81,7 @@ extern void set_param_value (const char *name, int value,
 
 enum compiler_param
 {
-#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
-  enumerator,
-#include "params.def"
-#undef DEFPARAM
+#include "params.list"
   LAST_PARAM
 };
 
diff --git a/gcc/pretty-print.h b/gcc/pretty-print.h
index 6e8a300fbeb..36d4e37a0d2 100644
--- a/gcc/pretty-print.h
+++ b/gcc/pretty-print.h
@@ -187,7 +187,7 @@ struct pp_wrapping_mode_t
 /* Get or set the wrapping mode as a single entity.  */
 #define pp_wrapping_mode(PP) (PP)->wrapping
 
-/* The type of a hook that formats client-specific data onto a pretty_pinter.
+/* The type of a hook that formats client-specific data onto a pretty_printer.
    A client-supplied formatter returns true if everything goes well,
    otherwise it returns false.  */
 typedef bool (*printer_fn) (pretty_printer *, text_info *, const char *,
diff --git a/gcc/reload1.c b/gcc/reload1.c
index ad243e321d1..c7cc37bc94f 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -3636,6 +3636,8 @@ eliminate_regs_in_insn (rtx_insn *insn, int replace)
    eliminations in its operands and record cases where eliminating a reg with
    an invariant equivalence would add extra cost.  */
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic warning "-Wmaybe-uninitialized"
 static void
 elimination_costs_in_insn (rtx_insn *insn)
 {
@@ -3785,6 +3787,7 @@ elimination_costs_in_insn (rtx_insn *insn)
 
   return;
 }
+#pragma GCC diagnostic pop
 
 /* Loop through all elimination pairs.
    Recalculate the number not at initial offset.
diff --git a/gcc/sese.c b/gcc/sese.c
index 2dedd9516b6..3b716f54e1a 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -267,6 +267,7 @@ new_sese (edge entry, edge exit)
   SESE_LOOP_NEST (region).create (3);
   SESE_ADD_PARAMS (region) = true;
   SESE_PARAMS (region).create (3);
+  region->parameter_rename_map = new parameter_rename_map_t;
 
   return region;
 }
@@ -281,6 +282,8 @@ free_sese (sese region)
 
   SESE_PARAMS (region).release ();
   SESE_LOOP_NEST (region).release ();
+  delete region->parameter_rename_map;
+  region->parameter_rename_map = NULL;
 
   XDELETE (region);
 }
@@ -294,6 +297,7 @@ sese_add_exit_phis_edge (basic_block exit, tree use, edge false_e, edge true_e)
   create_new_def_for (use, phi, gimple_phi_result_ptr (phi));
   add_phi_arg (phi, use, false_e, UNKNOWN_LOCATION);
   add_phi_arg (phi, use, true_e, UNKNOWN_LOCATION);
+  update_stmt (phi);
 }
 
 /* Insert in the block BB phi nodes for variables defined in REGION
@@ -373,12 +377,19 @@ get_rename (rename_map_type *rename_map, tree old_name)
 /* Register in RENAME_MAP the rename tuple (OLD_NAME, EXPR).  */
 
 static void
-set_rename (rename_map_type *rename_map, tree old_name, tree expr)
+set_rename (rename_map_type *rename_map, tree old_name, tree expr, sese region)
 {
   if (old_name == expr)
     return;
 
   rename_map->put (old_name, expr);
+
+  tree t;
+  int i;
+  /* For a parameter of a scop we dont want to rename it.  */
+  FOR_EACH_VEC_ELT (SESE_PARAMS (region), i, t)
+    if (old_name == t)
+      region->parameter_rename_map->put(old_name, expr);
 }
 
 /* Renames the scalar uses of the statement COPY, using the
@@ -484,7 +495,7 @@ rename_uses (gimple copy, rename_map_type *rename_map,
 	    recompute_tree_invariant_for_addr_expr (rhs);
 	}
 
-      set_rename (rename_map, old_name, new_expr);
+      set_rename (rename_map, old_name, new_expr, region);
     }
 
   return changed;
@@ -525,6 +536,14 @@ graphite_copy_stmts_from_block (basic_block bb, basic_block new_bb,
 	  && scev_analyzable_p (lhs, region))
 	continue;
 
+      /* Do not copy parameters that have been generated in the header of the
+	 scop.  */
+      if (is_gimple_assign (stmt)
+	  && (lhs = gimple_assign_lhs (stmt))
+	  && TREE_CODE (lhs) == SSA_NAME
+	  && region->parameter_rename_map->get(lhs))
+	continue;
+
       /* Create a new copy of STMT and duplicate STMT's virtual
 	 operands.  */
       copy = gimple_copy (stmt);
@@ -539,7 +558,7 @@ graphite_copy_stmts_from_block (basic_block bb, basic_block new_bb,
  	{
  	  tree old_name = DEF_FROM_PTR (def_p);
  	  tree new_name = create_new_def_for (old_name, copy, def_p);
-	  set_rename (rename_map, old_name, new_name);
+	  set_rename (rename_map, old_name, new_name, region);
  	}
 
       if (rename_uses (copy, rename_map, &gsi_tgt, region, loop, iv_map,
@@ -549,6 +568,25 @@ graphite_copy_stmts_from_block (basic_block bb, basic_block new_bb,
 	  fold_stmt_inplace (&gsi_tgt);
 	}
 
+      /* For each SSA_NAME in the parameter_rename_map rename their usage.  */
+      ssa_op_iter iter;
+      use_operand_p use_p;
+      if (!is_gimple_debug (copy))
+	FOR_EACH_SSA_USE_OPERAND (use_p, copy, iter, SSA_OP_USE)
+	  {
+	    tree old_name = USE_FROM_PTR (use_p);
+
+	    if (TREE_CODE (old_name) != SSA_NAME
+		|| SSA_NAME_IS_DEFAULT_DEF (old_name))
+	      continue;
+
+	    tree *new_expr = region->parameter_rename_map->get (old_name);
+	    if (!new_expr)
+	      continue;
+
+	    replace_exp (use_p, *new_expr);
+	  }
+
       update_stmt (copy);
     }
 }
@@ -722,6 +760,35 @@ set_ifsese_condition (ifsese if_region, tree condition)
   gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
 }
 
+/* Return false if T is completely defined outside REGION.  */
+
+static bool
+invariant_in_sese_p_rec (tree t, sese region)
+{
+  ssa_op_iter iter;
+  use_operand_p use_p;
+  if (!defined_in_sese_p (t, region))
+    return true;
+
+  gimple stmt = SSA_NAME_DEF_STMT (t);
+
+  if (gimple_code (stmt) == GIMPLE_PHI
+      || gimple_code (stmt) == GIMPLE_CALL)
+    return false;
+
+  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+    {
+      tree use = USE_FROM_PTR (use_p);
+      if (!defined_in_sese_p (use, region))
+	continue;
+
+      if (!invariant_in_sese_p_rec (use, region))
+	return false;
+    }
+
+  return true;
+}
+
 /* Returns the scalar evolution of T in REGION.  Every variable that
    is not defined in the REGION is considered a parameter.  */
 
@@ -752,6 +819,9 @@ scalar_evolution_in_region (sese region, loop_p loop, tree t)
       t = compute_overall_effect_of_inner_loop (def_loop, t);
       return t;
     }
-  else
-    return instantiate_scev (before, loop, t);
+
+  if (invariant_in_sese_p_rec (t, region))
+    return t;
+
+  return instantiate_scev (before, loop, t);
 }
diff --git a/gcc/sese.h b/gcc/sese.h
index 52aa8688cbf..b025a4dd821 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_SESE_H
 #define GCC_SESE_H
 
+typedef hash_map<tree, tree> parameter_rename_map_t;
+
 /* A Single Entry, Single Exit region is a part of the CFG delimited
    by two edges.  */
 typedef struct sese_s
@@ -32,6 +34,9 @@ typedef struct sese_s
   /* Parameters used within the SCOP.  */
   vec<tree> params;
 
+  /* Parameters to be renamed.  */
+  parameter_rename_map_t *parameter_rename_map;
+
   /* Loops completely contained in the SCOP.  */
   bitmap loops;
   vec<loop_p> loop_nest;
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index c90c40a2429..d10795a0b3a 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -420,7 +420,7 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
    to call-saved registers because their values are live across one or
    more calls during the function.  */
 
-void
+static void
 prepare_shrink_wrap (basic_block entry_block)
 {
   rtx_insn *insn, *curr;
@@ -465,7 +465,7 @@ prepare_shrink_wrap (basic_block entry_block)
 
 /* Create a copy of BB instructions and insert at BEFORE.  Redirect
    preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
-void
+static void
 dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx_insn *before,
 			bitmap_head *need_prologue)
 {
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index dda9b926f5c..681990134f5 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -24,10 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 
 /* In shrink-wrap.c.  */
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
-extern void prepare_shrink_wrap (basic_block entry_block);
-extern void dup_block_and_redirect (basic_block bb, basic_block copy_bb,
-				    rtx_insn *before,
-				    bitmap_head *need_prologue);
 extern void try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
 				 bitmap_head *bb_flags, rtx_insn *prologue_seq);
 extern edge get_unconverted_simple_return (edge, bitmap_head,
diff --git a/gcc/system.h b/gcc/system.h
index 9ca5b5fadd3..1cc5d408df0 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -307,7 +307,7 @@ extern int errno;
 /* The outer cast is needed to work around a bug in Cray C 5.0.3.0.
    It is necessary at least when t == time_t.  */
 #define INTTYPE_MINIMUM(t) ((t) (INTTYPE_SIGNED (t) \
-                             ? ~ (t) 0 << (sizeof (t) * CHAR_BIT - 1) : (t) 0))
+			    ? (t) 1 << (sizeof (t) * CHAR_BIT - 1) : (t) 0))
 #define INTTYPE_MAXIMUM(t) ((t) (~ (t) 0 - INTTYPE_MINIMUM (t)))
 
 /* Use that infrastructure to provide a few constants.  */
@@ -465,6 +465,10 @@ extern char *getwd (char *);
 extern void *sbrk (int);
 #endif
 
+#if defined (HAVE_DECL_SETENV) && !HAVE_DECL_SETENV
+int setenv(const char *, const char *, int);
+#endif
+
 #if defined (HAVE_DECL_STRSTR) && !HAVE_DECL_STRSTR
 extern char *strstr (const char *, const char *);
 #endif
@@ -473,6 +477,10 @@ extern char *strstr (const char *, const char *);
 extern char *stpcpy (char *, const char *);
 #endif
 
+#if defined (HAVE_DECL_UNSETENV) && !HAVE_DECL_UNSETENV
+int unsetenv(const char *);
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/target.def b/gcc/target.def
index 4edc209c464..aa5a1f1b193 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1801,6 +1801,18 @@ loads.",
  (const_tree mem_vectype, const_tree index_type, int scale),
  NULL)
 
+/* Target builtin that implements vector scatter operation.  */
+DEFHOOK
+(builtin_scatter,
+"Target builtin that implements vector scatter operation.  @var{vectype}\n\
+is the vector type of the store and @var{index_type} is scalar type of\n\
+the index, scaled by @var{scale}.\n\
+The default is @code{NULL_TREE} which means to not vectorize scatter\n\
+stores.",
+ tree,
+ (const_tree vectype, const_tree index_type, int scale),
+ NULL)
+
 /* Target function to initialize the cost model for a loop or block.  */
 DEFHOOK
 (init_cost,
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 1e4de51f671..f75a579bc04 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,384 @@
+2015-09-11  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
+
+	* gcc.dg/pie-link.c: Add -pie to dg-options.
+
+2015-09-11  Alex Velenko  <Alex.Velenko@arm.com>
+
+	* gcc.target/arm/pr63210.c (dg-skip-if): Skip armv4t.
+	(dg-additional-options): Add -march=armv5t if arm_arch_v5t_ok.
+
+2015-09-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
+
+	* gcc.target/powerpc/swaps-p8-20.c: New test.
+	* gcc.target/powerpc/swaps-p8-21.c: New test.
+
+2015-09-10  Steven G. Kargl  <kargl@gcc.gnu.org>
+
+	PR fortran/67526
+	* gfortran.dg/pr67526.f90: New test.
+
+2015-09-10  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/67318
+	* g++.dg/cpp0x/variadic166.C: New.
+
+2015-09-09  Mark Wielaard  <mjw@redhat.com>
+
+	* c-c++-common/nonnull-1.c: New test.
+
+2015-09-10  Paul Thomas  <pault@gcc.gnu.org>
+
+	PR fortran/66993
+	* gfortran.dg/submodule_11.f08: New test.
+
+2015-09-10  Oleg Endo  <olegendo@gcc.gnu.org>
+
+	PR target/67506
+	* gcc.c-torture/compile/pr67506.c: New test.
+
+2015-09-10  Andreas Krebbel  <krebbel@linux.vnet.ibm.com>
+
+	* gcc.target/s390/vector/vec-genbytemask-1.c: Add check for V1TI
+	initialization with a byte mask.  No change expected here.
+	* gcc.target/s390/vector/vec-genmask-1.c: Fix whitespace.
+	* gcc.target/s390/vector/vec-genmask-2.c: Add check for V1TI
+	initialization with contigious bitmask.  Literal pool is expectd
+	to be used here.
+
+2015-09-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
+
+	PR target/67439
+	* gcc.target/arm/pr67439_1.c: New test.
+
+2015-09-10  Jiong Wang  <jiong.wang@arm.com>
+
+	* gcc.target/aarch64/pic-small.c (dg-skip-if): Skip tiny and large code
+	model.
+
+2015-09-10  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c++/67523
+	* g++.dg/gomp/pr67523.C: New test.
+
+	PR c++/67522
+	* g++.dg/gomp/pr67522.C: New test.
+
+	PR middle-end/67521
+	* c-c++-common/gomp/pr67521.c: New test.
+
+	PR middle-end/67517
+	* c-c++-common/gomp/pr67517.c: New test.
+
+	PR c++/67514
+	* g++.dg/gomp/pr67514.C: New test.
+
+	PR c++/67511
+	* g++.dg/gomp/pr67511.C: New test.
+
+	PR c/67502
+	* c-c++-common/gomp/pr67502.c: New test.
+
+2015-09-09  Marek Polacek  <polacek@redhat.com>
+
+	PR middle-end/67512
+	* gcc.dg/pr67512.c: New test.
+
+2015-09-09  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/53184
+	* g++.dg/warn/Wsubobject-linkage-1.C: New.
+	* g++.dg/warn/Wsubobject-linkage-2.C: Likewise.
+	* g++.dg/warn/Wsubobject-linkage-3.C: Likewise.
+	* g++.dg/warn/Wsubobject-linkage-4.C: Likewise.
+
+2015-09-09  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
+
+	* gcc.target/aarch64/mod_2.x: New file.
+	* gcc.target/aarch64/mod_256.x: Likewise.
+	* gcc.target/arm/mod_2.c: New test.
+	* gcc.target/arm/mod_256.c: Likewise.
+	* gcc.target/aarch64/mod_2.c: Likewise.
+	* gcc.target/aarch64/mod_256.c: Likewise.
+
+2015-09-09  Jakub Jelinek  <jakub@redhat.com>
+
+	PR c++/67504
+	* g++.dg/gomp/pr67504.C: New test.
+
+	PR c/67501
+	* c-c++-common/gomp/pr67501.c: New test.
+
+	PR c/67500
+	* gcc.dg/gomp/pr67500.c: New test.
+
+	PR c/67495
+	* gcc.dg/gomp/pr67495.c: New test.
+
+2015-09-09  Aditya Kumar  <hiraditya@msn.com>
+            Sebastian Pop  <s.pop@samsung.com>
+
+	PR tree-optimization/53852
+	* gcc.dg/graphite/uns-interchange-12.c: Adjust pattern to pass with
+	both isl-0.12 and isl-0.15.
+	* gcc.dg/graphite/uns-interchange-14.c: Same.
+	* gcc.dg/graphite/uns-interchange-15.c: Same.
+	* gcc.dg/graphite/uns-interchange-mvt.c: Same.
+
+2015-09-08  Aditya Kumar  <hiraditya@msn.com>
+            Sebastian Pop  <s.pop@samsung.com>
+
+	* gcc.dg/graphite/block-0.c: Modifed test case to match current output.
+	* gcc.dg/graphite/block-1.c: Same.
+	* gcc.dg/graphite/block-5.c: Same.
+	* gcc.dg/graphite/block-6.c: Same.
+	* gcc.dg/graphite/interchange-1.c: Same.
+	* gcc.dg/graphite/interchange-10.c: Same.
+	* gcc.dg/graphite/interchange-11.c: Same.
+	* gcc.dg/graphite/interchange-13.c: Same.
+	* gcc.dg/graphite/interchange-14.c: Same.
+	* gcc.dg/graphite/interchange-3.c: Same.
+	* gcc.dg/graphite/interchange-4.c: Same.
+	* gcc.dg/graphite/interchange-7.c: Same.
+	* gcc.dg/graphite/interchange-8.c: Same.
+	* gcc.dg/graphite/interchange-9.c: Same.
+	* gcc.dg/graphite/isl-codegen-loop-dumping.c: Same.
+	* gcc.dg/graphite/pr35356-1.c (foo): Same.
+	* gcc.dg/graphite/pr37485.c: Same.
+	* gcc.dg/graphite/scop-0.c (int toto): Same.
+	* gcc.dg/graphite/scop-1.c: Same.
+	* gcc.dg/graphite/scop-10.c: Same.
+	* gcc.dg/graphite/scop-11.c: Same.
+	* gcc.dg/graphite/scop-12.c: Same.
+	* gcc.dg/graphite/scop-13.c: Same.
+	* gcc.dg/graphite/scop-16.c: Same.
+	* gcc.dg/graphite/scop-17.c: Same.
+	* gcc.dg/graphite/scop-18.c: Same.
+	* gcc.dg/graphite/scop-2.c: Same.
+	* gcc.dg/graphite/scop-21.c (int test): Same.
+	* gcc.dg/graphite/scop-22.c (void foo): Same.
+	* gcc.dg/graphite/scop-4.c: Same.
+	* gcc.dg/graphite/scop-5.c: Same.
+	* gcc.dg/graphite/scop-6.c: Same.
+	* gcc.dg/graphite/scop-7.c: Same.
+	* gcc.dg/graphite/scop-8.c: Same.
+	* gcc.dg/graphite/scop-9.c: Same.
+	* gcc.dg/graphite/scop-mvt.c (void mvt): Introduced dependency so that
+	data-refs remain inside the inner loop.
+	* gcc.dg/graphite/uns-block-1.c: Modifed test case to match o/p.
+	* gcc.dg/graphite/uns-interchange-14.c: Same.
+	* gcc.dg/graphite/uns-interchange-9.c: Same.
+	* gfortran.dg/graphite/interchange-3.f90
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	PR target/63870
+	* gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c: New.
+	* gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c: New.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
+	* lib/target-supports.exp
+	(check_effective_target_arm_neon_fp16_hw): New.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
+	Set additional_flags for neon-fp16 if supported, else fallback to neon.
+
+	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+	(hfloat16_t): New.
+	(result, expected, clean_results, DECL_VARIABLE_64BITS_VARIANTS,
+	DECL_VARIABLE_128BITS_VARIANTS): Add float16x4_t and float16x8_t cases
+	if supported.
+	(CHECK_RESULTS): Redefine using CHECK_RESULTS_NAMED.
+	(CHECK_RESULTS_NAMED): Move body to CHECK_RESULTS_NAMED_NO_FP16;
+	redefine in terms of CHECK_RESULTS_NAMED_NO_FP16 with float16 variants
+	when those are supported.
+	(CHECK_RESULTS_NAMED_NO_FP16, CHECK_RESULTS_NO_FP16): New.
+	(vdup_n_f16): New.
+
+	* gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h (buffer,
+	buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and float16x8_t
+	cases if supported.
+
+	* gcc.target/aarch64/advsimd-intrinsics/vbsl.c (exec_vbsl):
+	Use CHECK_RESULTS_NO_FP16 in place of CHECK_RESULTS.
+	* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c (exec_vdup_vmov):
+	Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c (exec_vdup_lane):
+	Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vext.c (exec_vext): Likewise.
+
+	* gcc.target/aarch64/advsimd-intrinsics/vcombine.c (expected):
+	Add float16x8_t case.
+	(main, exec_vcombine): test float16x4_t -> float16x8_t, if supported.
+	* gcc.target/aarch64/advsimd-intrinsics/vcreate.c (expected,
+	main, exec_vcreate): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vget_high (expected,
+	exec_vget_high): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vget_low.c (expected,
+	exec_vget_low): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vld1.c (expected, exec_vld1):
+	Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c (expected,
+	exec_vld1_dup): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c (expected,
+	exec_vld1_lane): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vldX.c (expected, exec_vldX):
+	Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c (expected,
+	exec_vldX_dup): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c (expected,
+	exec_vldX_lane): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vset_lane.c (expected,
+	exec_vset_lane): Likewise.
+	* gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c (expected,
+	exec_vst1_lane): Likewise.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* gcc.target/aarch64/vget_high_1.c: Add float16x8->float16x4 case.
+	* gcc.target/aarch64/vget_low_1.c: Likewise.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* gcc.target/aarch64/vldN_1.c: Add float16x4_t and float16x8_t cases.
+	* gcc.target/aarch64/vldN_dup_1.c: Likewise.
+	* gcc.target/aarch64/vldN_lane_1.c: Likewise.
+	(main): update orig_data to avoid float16 NaN on bigendian.
+
+2015-09-08  Alan Lawrence  <alan.lawrence@arm.com>
+
+	* g++.dg/abi/mangle-neon-aarch64.C: Add cases for float16x4_t and
+	float16x8_t.
+	* gcc.target/aarch64/vset_lane_1.c: Likewise.
+	* gcc.target/aarch64/vld1-vst1_1.c: Likewise.
+	* gcc.target/aarch64/vld1_lane.c: Likewise.
+
+2015-09-08  Paolo Carlini  <paolo.carlini@oracle.com>
+
+	PR c++/67369
+	* g++.dg/cpp1y/lambda-generic-ice4.C: New.
+
+2015-09-07  Marek Polacek  <polacek@redhat.com>
+
+	PR inline-asm/67448
+	* gcc.dg/asm-10.c: New test.
+
+2015-09-04  Jakub Jelinek  <jakub@redhat.com>
+
+	PR middle-end/67452
+	* gcc.dg/lto/pr67452_0.c: New test.
+
+2015-09-02  Senthil Kumar Selvaraj  <senthil_kumar.selvaraj@atmel.com>
+
+	PR target/65210
+	* gcc.target/avr/pr65210.c: New test.
+
+2015-09-04  H.J. Lu  <hongjiu.lu@intel.com>
+
+	PR testsuite/67450
+	* lib/target-supports.exp (check_cached_effective_target):
+	Apppend $prop to et_prop_list only if needed.
+
+2015-09-04  Marek Polacek  <polacek@redhat.com>
+
+	PR sanitizer/67279
+	* gcc.dg/ubsan/pr67279.c: New test.
+
+2015-09-04  Andrey Turetskiy  <andrey.turetskiy@intel.com>
+	    Petr Murzin  <petr.murzin@intel.com>
+	    Kirill Yukhin <kirill.yukhin@intel.com>
+
+	* gcc.target/i386/avx512f-scatter-1.c: New.
+	* gcc.target/i386/avx512f-scatter-2.c: Ditto.
+	* gcc.target/i386/avx512f-scatter-3.c: Ditto.
+
+2015-09-04  Janne Blomqvist  <jb@gcc.gnu.org>
+
+	* gfortran.dg/read_dir.f90: Delete empty directory when closing
+	rather than calling rmdir, cleanup if open fails.
+
+2015-09-03  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
+
+	* gcc.target/powerpc/vec-mult-char-1.c: New test.
+	* gcc.target/powerpc/vec-mult-char-2.c: New test.
+	* lib/target-supports.exp (check_effective_target_vect_char_mult):
+	Return true for PowerPC targets that implement Altivec.
+
+2015-09-03  Renlin Li  <renlin.li@arm.com>
+
+	* gcc.target/aarch64/arm_align_max_pwr.c: Make it a compile test case,
+	check the assembly.
+	* gcc.target/aarch64/arm_align_max_stack_pwr.c: Likewise.
+
+2015-09-03  Martin Sebor  <msebor@redhat.com>
+
+	PR c/66516
+	* g++.dg/addr_builtin-1.C: New test.
+	* gcc.dg/addr_builtin-1.c: New test.
+
+2015-09-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
+
+	* gcc.target/powerpc/vec-shift.c: New test.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	PR tree-optimization/65637
+	* gcc.dg/autopar/reduc-4.c: New test.
+
+2015-09-03  Tom de Vries  <tom@codesourcery.com>
+
+	PR tree-optimization/65637
+	* gcc.dg/autopar/pr46099-2.c: New test.
+
+2015-09-03  Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>
+
+	PR middle-end/67351
+	* g++.dg/pr67351.C: New test.
+
+2015-09-03  Richard Biener  <rguenther@suse.de>
+
+	PR ipa/66705
+	* g++.dg/lto/pr66705_0.C: New testcase.
+
+2015-09-02  Balaji V. Iyer  <balaji.v.iyer@intel.com>
+
+	PR middle-end/60586
+	* c-c++-common/cilk-plus/CK/pr60586.c: New file.
+	* g++.dg/cilk-plus/CK/pr60586.cc: Likewise.
+
+2015-09-02  Marek Polacek  <polacek@redhat.com>
+
+	PR c/67432
+	* gcc.dg/pr67432.c: New test.
+
+2015-09-02  Christophe Lyon  <christophe.lyon@linaro.org>
+
+	* lib/target-supports.exp (clear_effective_target_cache): New.
+	(check_cached_effective_target): Update et_prop_list.
+	* lib/asan-dg.exp (asan_finish): Call clear_effective_target_cache.
+	* g++.dg/compat/compat.exp: Likewise.
+	* g++.dg/compat/struct-layout-1.exp: Likewise.
+	* lib/asan-dg.exp: Likewise.
+	* lib/atomic-dg.exp: Likewise.
+	* lib/cilk-plus-dg.exp: Likewise.
+	* lib/clearcap.exp: Likewise.
+	* lib/mpx-dg.exp: Likewise.
+	* lib/tsan-dg.exp: Likewise.
+	* lib/ubsan-dg.exp: Likewise.
+
 2015-09-01  Kenneth Zadeck <zadeck@naturalbridge.com>
         * gcc.c-torture/execute/ieee/20000320-1.c Fixed misplaced test case.
 
@@ -128,7 +509,7 @@
 
 2015-08-28  Andrew Bennett  <andrew.bennett@imgtec.com>
 
-	* gcc.target/mips/madd-8.c: Add lo register to clobber list. 
+	* gcc.target/mips/madd-8.c: Add lo register to clobber list.
 	* gcc.target/mips/msub-8.c: Ditto
 
 2015-08-27  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/pr60586.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/pr60586.c
new file mode 100644
index 00000000000..c4012a0a4b1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/pr60586.c
@@ -0,0 +1,28 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus -O2" } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+int noop(int x)
+{
+  return x;
+}
+
+int post_increment(int *x)
+{
+  return (*x)++;
+}
+
+int main(int argc, char *argv[])
+{
+  int m = 5;
+  int n = m;
+  int r = _Cilk_spawn noop(post_increment(&n));
+  int n2 = n;
+  _Cilk_sync;
+
+  if (r != m || n2 != m + 1)
+    return 1;
+  else
+    return 0;
+}
+
diff --git a/gcc/testsuite/c-c++-common/gomp/pr67501.c b/gcc/testsuite/c-c++-common/gomp/pr67501.c
new file mode 100644
index 00000000000..8a7140faf28
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr67501.c
@@ -0,0 +1,12 @@
+/* PR c/67501 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+void
+foo (void)
+{
+  int i, j;
+  #pragma omp for simd copyprivate(j	/* { dg-error "before end of line" } */
+  for (i = 0; i < 16; ++i)		/* { dg-error "is not valid for" "" { target *-*-* } 9 } */
+    ;
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/pr67502.c b/gcc/testsuite/c-c++-common/gomp/pr67502.c
new file mode 100644
index 00000000000..74fef4d9123
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr67502.c
@@ -0,0 +1,16 @@
+/* PR c/67502 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+void bar (int, int);
+
+void
+foo (void)
+{
+#pragma omp parallel
+#pragma omp for simd collapse(2)
+  for (int i = 0; i < 16; ++i)
+    for (int j = 0; j < 16; ++j)
+      bar (i, j);
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/pr67517.c b/gcc/testsuite/c-c++-common/gomp/pr67517.c
new file mode 100644
index 00000000000..3055ffb34eb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr67517.c
@@ -0,0 +1,13 @@
+/* PR middle-end/67517 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+int
+foo (int x, int y, int z)
+{
+  int i;
+  #pragma omp parallel for simd linear (y : x & 15) linear (x : 16) linear (z : x & 15)
+  for (i = 0; i < 256; ++i)
+    x += 16, y += x & 15, z += x & 15;
+  return x + y + z;
+}
diff --git a/gcc/testsuite/c-c++-common/gomp/pr67521.c b/gcc/testsuite/c-c++-common/gomp/pr67521.c
new file mode 100644
index 00000000000..b34c117ae32
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr67521.c
@@ -0,0 +1,20 @@
+/* PR middle-end/67521 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+void
+foo (int x)
+{
+  int i = 0;
+  #pragma omp parallel for simd
+  for (i = (i & x); i < 10; i = i + 2)
+    ;
+  i = 0;
+  #pragma omp parallel for simd
+  for (i = 0; i < (i & x) + 10; i = i + 2)
+    ;
+  i = 0;
+  #pragma omp parallel for simd
+  for (i = 0; i < 10; i = i + ((i & x) + 2))
+    ;
+}
diff --git a/gcc/testsuite/c-c++-common/nonnull-1.c b/gcc/testsuite/c-c++-common/nonnull-1.c
new file mode 100644
index 00000000000..b5c3d7f8866
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/nonnull-1.c
@@ -0,0 +1,28 @@
+/* Test for the bad usage of "nonnull" function attribute parms.  */
+/*  */
+/* { dg-do compile } */
+/* { dg-options "-Wnonnull" } */
+
+#include <stddef.h>
+#include <stdlib.h>
+
+void foo(void *bar) __attribute__((nonnull(1)));
+
+void foo(void *bar) { if (!bar) abort(); } /* { dg-warning "nonnull argument" "bar compared to NULL" } */
+
+extern int func (char *, char *, char *, char *) __attribute__((nonnull));
+
+int
+func (char *cp1, char *cp2, char *cp3, char *cp4)
+{
+  if (cp1) /* { dg-warning "nonnull argument" "cp1 compared to NULL" } */
+    return 1;
+
+  if (cp2 == NULL) /* { dg-warning "nonnull argument" "cp2 compared to NULL" } */
+    return 2;
+
+  if (NULL != cp3) /* { dg-warning "nonnull argument" "cp3 compared to NULL" } */
+    return 3;
+
+  return (cp4 != 0) ? 0 : 1; /* { dg-warning "nonnull argument" "cp4 compared to NULL" } */
+}
diff --git a/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C b/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C
index 09a20dc985e..5740c0281b2 100644
--- a/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C
+++ b/gcc/testsuite/g++.dg/abi/mangle-neon-aarch64.C
@@ -13,6 +13,7 @@ void f3 (uint8x8_t a) {}
 void f4 (uint16x4_t a) {}
 void f5 (uint32x2_t a) {}
 void f23 (uint64x1_t a) {}
+void f61 (float16x4_t a) {}
 void f6 (float32x2_t a) {}
 void f7 (poly8x8_t a) {}
 void f8 (poly16x4_t a) {}
@@ -25,6 +26,7 @@ void f13 (uint8x16_t a) {}
 void f14 (uint16x8_t a) {}
 void f15 (uint32x4_t a) {}
 void f16 (uint64x2_t a) {}
+void f171 (float16x8_t a) {}
 void f17 (float32x4_t a) {}
 void f18 (float64x2_t a) {}
 void f19 (poly8x16_t a) {}
@@ -42,6 +44,7 @@ void g1 (int8x16_t, int8x16_t) {}
 // { dg-final { scan-assembler "_Z2f412__Uint16x4_t:" } }
 // { dg-final { scan-assembler "_Z2f512__Uint32x2_t:" } }
 // { dg-final { scan-assembler "_Z3f2312__Uint64x1_t:" } }
+// { dg-final { scan-assembler "_Z3f6113__Float16x4_t:" } }
 // { dg-final { scan-assembler "_Z2f613__Float32x2_t:" } }
 // { dg-final { scan-assembler "_Z2f711__Poly8x8_t:" } }
 // { dg-final { scan-assembler "_Z2f812__Poly16x4_t:" } }
@@ -53,6 +56,7 @@ void g1 (int8x16_t, int8x16_t) {}
 // { dg-final { scan-assembler "_Z3f1412__Uint16x8_t:" } }
 // { dg-final { scan-assembler "_Z3f1512__Uint32x4_t:" } }
 // { dg-final { scan-assembler "_Z3f1612__Uint64x2_t:" } }
+// { dg-final { scan-assembler "_Z4f17113__Float16x8_t:" } }
 // { dg-final { scan-assembler "_Z3f1713__Float32x4_t:" } }
 // { dg-final { scan-assembler "_Z3f1813__Float64x2_t:" } }
 // { dg-final { scan-assembler "_Z3f1912__Poly8x16_t:" } }
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/pr60586.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/pr60586.cc
new file mode 100644
index 00000000000..6a27cade876
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/pr60586.cc
@@ -0,0 +1,89 @@
+/* { dg-options "-fcilkplus" } */
+/* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */
+/* { dg-options "-fcilkplus -lcilkrts" { target { i?86-*-* x86_64-*-* arm*-*-* } } } */
+
+class Rectangle
+{
+  int area_val, h, w;
+  public:
+    Rectangle (int, int);
+    Rectangle (int, int, int);
+    ~Rectangle ();
+    int area ();
+};
+Rectangle::~Rectangle ()
+{
+  h = 0;
+  w = 0;
+  area_val = 0;
+}
+Rectangle::Rectangle (int height, int width)
+{
+  h = height;
+  w = width;
+  area_val = 0;
+}
+
+int some_func(int &x)
+{
+  x++;
+  return x;
+}
+
+Rectangle::Rectangle (int height, int width, int area_orig)
+{
+  h = height;
+  w = width;
+  area_val = area_orig;
+}
+
+int Rectangle::area()
+{
+  return (area_val += (h*w));
+}
+
+
+int some_func (int &);
+
+/* Spawning constructor.  */
+int main1 (void)
+{
+  int x = 3;
+  Rectangle r = _Cilk_spawn Rectangle (some_func(x), 3);
+  return r.area();
+}
+ 
+/* Spawning constructor 2.  */
+int main2 (void)
+{
+  Rectangle r (_Cilk_spawn Rectangle (4, 3));
+  return r.area();
+}
+
+/* Spawning copy constructor.  */
+int main3 (void)
+{
+  int x = 3;
+  Rectangle r = _Cilk_spawn Rectangle (some_func(x), 3, 2);
+  return r.area ();
+}
+
+/* Spawning copy constructor 2.  */
+int main4 (void)
+{
+  Rectangle r ( _Cilk_spawn Rectangle (4, 3, 2));
+  return r.area();
+}
+
+int main (void)
+{
+  if (main1 () != 12)
+    __builtin_abort ();
+  if (main2 () != 12)
+    __builtin_abort ();
+  if (main3 () != 14)
+    __builtin_abort ();
+  if (main4() != 14)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic166.C b/gcc/testsuite/g++.dg/cpp0x/variadic166.C
new file mode 100644
index 00000000000..91455cbf037
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic166.C
@@ -0,0 +1,14 @@
+// PR c++/67318
+// { dg-do compile { target c++11 } }
+
+template<signed...>
+struct MyStruct1;
+
+template<unsigned...>
+struct MyStruct2;
+
+template<short...>
+struct MyStruct3;
+
+template<long...>
+struct MyStruct4;
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-ice4.C b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-ice4.C
new file mode 100644
index 00000000000..ec4db83b6e4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-ice4.C
@@ -0,0 +1,10 @@
+// PR c++/67369
+// { dg-do compile { target c++14 } }
+
+int main() {
+  unsigned const nsz = 0;
+  auto repeat_conditional = [&](auto) {
+    auto new_sz = nsz;
+  };
+  repeat_conditional(1);
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-var-templ1.C b/gcc/testsuite/g++.dg/cpp1y/lambda-var-templ1.C
new file mode 100644
index 00000000000..4c2a3cb3e60
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-var-templ1.C
@@ -0,0 +1,11 @@
+// PR c++/67041
+// { dg-do compile { target c++14 } }
+
+template<typename T>
+auto test = [](){
+    return T{};
+};
+
+int main() {
+    test<int>();
+}
diff --git a/gcc/testsuite/g++.dg/gomp/pr67504.C b/gcc/testsuite/g++.dg/gomp/pr67504.C
new file mode 100644
index 00000000000..0f1758b6f14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/pr67504.C
@@ -0,0 +1,15 @@
+// PR c++/67504
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+int bar (int);
+double bar (double);
+
+template <typename T>
+void
+foo (T x)
+{
+  #pragma omp for collapse (x + 1) // { dg-error "collapse argument needs positive constant integer expression" }
+  for (int i = 0; i < 10; i++)
+    ;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/pr67511.C b/gcc/testsuite/g++.dg/gomp/pr67511.C
new file mode 100644
index 00000000000..3e0e9a388f6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/pr67511.C
@@ -0,0 +1,20 @@
+// PR c++/67511
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+struct I
+{
+  I ();
+  I (const I &);
+  I &operator++ ();
+  bool operator< (const I &) const;
+};
+__PTRDIFF_TYPE__ operator- (const I &, const I &);
+
+void
+foo (I &x, I &y)
+{
+#pragma omp for
+  for (I i = x; i < y; ++i)	// { dg-error "no match for" }
+    ;
+}
diff --git a/gcc/testsuite/g++.dg/gomp/pr67514.C b/gcc/testsuite/g++.dg/gomp/pr67514.C
new file mode 100644
index 00000000000..a631b8bfedb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/pr67514.C
@@ -0,0 +1,30 @@
+// PR c++/67514
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+template <class T>
+void
+foo (T x, T y)
+{
+  #pragma omp parallel
+  #pragma omp for simd
+  for (T i = x; i < y; ++i)
+    ;
+  #pragma omp parallel
+  #pragma omp for simd collapse (2)
+  for (T i = x; i < y; ++i)
+    for (T j = x; j < y; j++)
+      ;
+}
+
+void
+bar (int *x, int *y)
+{
+  foo (x, y);
+}
+
+void
+baz (int x, int y)
+{
+  foo (x, y);
+}
diff --git a/gcc/testsuite/g++.dg/gomp/pr67522.C b/gcc/testsuite/g++.dg/gomp/pr67522.C
new file mode 100644
index 00000000000..84c854afd92
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/pr67522.C
@@ -0,0 +1,26 @@
+// PR c++/67522
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+struct S;
+
+template <int N>
+void
+foo (void)
+{
+  #pragma omp simd linear (S)			// { dg-error "is not a variable in clause" }
+  for (int i = 0; i < 16; i++)
+    ;
+
+  #pragma omp target map (S[0:10])		// { dg-error "is not a variable in" }
+  ;
+
+  #pragma omp task depend (inout: S[0:10])	// { dg-error "is not a variable in" }
+  ;
+}
+
+void
+bar ()
+{
+  foo <0> ();
+}
diff --git a/gcc/testsuite/g++.dg/gomp/pr67523.C b/gcc/testsuite/g++.dg/gomp/pr67523.C
new file mode 100644
index 00000000000..fb12c8c4695
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/pr67523.C
@@ -0,0 +1,29 @@
+// PR c++/67523
+// { dg-do compile }
+// { dg-options "-fopenmp" }
+
+struct S { int s; };
+
+template <typename T>
+void foo (T &x, T &y)
+{
+#pragma omp for simd
+  for (T i = x; i < y; i++)	// { dg-error "used with class iteration variable" }
+    ;
+#pragma omp parallel for simd
+  for (T i = x; i < y; i++)	// { dg-error "used with class iteration variable" }
+    ;
+#pragma omp target teams distribute parallel for simd
+  for (T i = x; i < y; i++)	// { dg-error "used with class iteration variable" }
+    ;
+#pragma omp target teams distribute simd
+  for (T i = x; i < y; i++)	// { dg-error "used with class iteration variable" }
+    ;
+}
+
+void
+bar ()
+{
+  S x, y;
+  foo <S> (x, y);
+}
diff --git a/gcc/testsuite/g++.dg/lto/pr66705_0.C b/gcc/testsuite/g++.dg/lto/pr66705_0.C
new file mode 100644
index 00000000000..faf3f2d24c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr66705_0.C
@@ -0,0 +1,15 @@
+// { dg-lto-do link }
+// { dg-lto-options { { -O2 -flto -flto-partition=max -fipa-pta } } }
+// { dg-extra-ld-options "-r -nostdlib" }
+
+class A {
+public:
+    A();
+};
+int a = 0;
+void foo() {
+    a = 0;
+    A b;
+    for (; a;)
+      ;
+}
diff --git a/gcc/testsuite/g++.dg/pr67351.C b/gcc/testsuite/g++.dg/pr67351.C
new file mode 100644
index 00000000000..f5bdda6cca7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr67351.C
@@ -0,0 +1,106 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef unsigned char uchar;
+typedef unsigned short ushort;
+typedef unsigned int uint;
+typedef unsigned long long uint64;
+
+class MyRgba
+{
+  uint rgba;
+
+public:
+    explicit MyRgba (uint c):rgba (c)
+  {
+  };
+
+  static MyRgba fromRgba (uchar r, uchar g, uchar b, uchar a)
+  {
+    return MyRgba (uint (r) << 24
+		   | uint (g) << 16 | uint (b) << 8 | uint (a));
+  }
+
+  uchar r ()
+  {
+    return rgba >> 24;
+  }
+  uchar g ()
+  {
+    return rgba >> 16;
+  }
+  uchar b ()
+  {
+    return rgba >> 8;
+  }
+  uchar a ()
+  {
+    return rgba;
+  }
+
+  void setG (uchar _g)
+  {
+    *this = fromRgba (r (), _g, b (), a ());
+  }
+};
+
+extern MyRgba giveMe ();
+
+MyRgba
+test ()
+{
+  MyRgba a = giveMe ();
+  a.setG (0xf0);
+  return a;
+}
+
+class MyRgba64
+{
+  uint64 rgba;
+
+public:
+    explicit MyRgba64 (uint64 c):rgba (c)
+  {
+  };
+
+  static MyRgba64 fromRgba64 (ushort r, ushort g, ushort b, ushort a)
+  {
+    return MyRgba64 (uint64 (r) << 48
+		     | uint64 (g) << 32 | uint64 (b) << 16 | uint64 (a));
+  }
+
+  ushort r ()
+  {
+    return rgba >> 48;
+  }
+  ushort g ()
+  {
+    return rgba >> 32;
+  }
+  ushort b ()
+  {
+    return rgba >> 16;
+  }
+  ushort a ()
+  {
+    return rgba;
+  }
+
+  void setG (ushort _g)
+  {
+    *this = fromRgba64 (r (), _g, b (), a ());
+  }
+};
+
+extern MyRgba64 giveMe64 ();
+
+MyRgba64
+test64 ()
+{
+  MyRgba64 a = giveMe64 ();
+  a.setG (0xf0f0);
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-not "<<" "optimized" } } */
+/* { dg-final { scan-tree-dump-not ">>" "optimized" } } */
diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-10.C b/gcc/testsuite/g++.dg/ubsan/vptr-10.C
new file mode 100644
index 00000000000..e05c33b90ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/vptr-10.C
@@ -0,0 +1,15 @@
+// { dg-do run }
+// { dg-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
+
+struct A
+{
+    virtual ~A() {}
+};
+struct B : virtual A {};
+struct C : virtual A {};
+struct D : B, virtual C {};
+
+int main()
+{
+    D d;
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-1.C b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-1.C
new file mode 100644
index 00000000000..adcaa6dbdaf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-1.C
@@ -0,0 +1,9 @@
+// PR c++/53184
+
+typedef volatile struct { } Foo;
+
+#line 6 "foo.C"
+struct Bar { Foo foo; };   // { dg-warning "no linkage" }
+// { dg-bogus "anonymous namespace" "" { target *-*-* } 6 }
+struct Bar2 : Foo { };     // { dg-warning "no linkage" }
+// { dg-bogus "anonymous namespace" "" { target *-*-* } 8 }
diff --git a/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-2.C b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-2.C
new file mode 100644
index 00000000000..4bb255c79a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-2.C
@@ -0,0 +1,8 @@
+// PR c++/53184
+// { dg-options "-Wno-subobject-linkage" }
+
+typedef volatile struct { } Foo;
+
+#line 7 "foo.C"
+struct Bar { Foo foo; };
+struct Bar2 : Foo { };
diff --git a/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-3.C b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-3.C
new file mode 100644
index 00000000000..e9acb633a1c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-3.C
@@ -0,0 +1,9 @@
+// PR c++/53184
+
+namespace { struct Foo { }; }
+
+#line 6 "foo.C"
+struct Bar { Foo foo; };   // { dg-warning "anonymous namespace" }
+// { dg-bogus "no linkage" "" { target *-*-* } 6 }
+struct Bar2 : Foo { };     // { dg-warning "anonymous namespace" }
+// { dg-bogus "no linkage" "" { target *-*-* } 8 }
diff --git a/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-4.C b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-4.C
new file mode 100644
index 00000000000..033bc473005
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wsubobject-linkage-4.C
@@ -0,0 +1,8 @@
+// PR c++/53184
+// { dg-options "-Wno-subobject-linkage" }
+
+namespace { struct Foo { }; }
+
+#line 7 "foo.C"
+struct Bar { Foo foo; };
+struct Bar2 : Foo { };
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr67506.c b/gcc/testsuite/gcc.c-torture/compile/pr67506.c
new file mode 100644
index 00000000000..2826d0b3aa7
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr67506.c
@@ -0,0 +1,53 @@
+extern struct _IO_FILE *stderr;
+typedef long integer;
+typedef unsigned char byte;
+short nl;
+byte * tfmfilearray;
+integer charbase, ligkernbase;
+unsigned char charsonline;
+short c;
+unsigned short r;
+struct {
+  short cc;
+  integer rr;
+} labeltable[259];
+short sortptr;
+unsigned char activity[(32510) + 1];
+integer ai, acti;
+extern void _IO_putc (char, struct _IO_FILE *);
+
+void
+mainbody (void)
+{
+  register integer for_end;
+  if (c <= for_end)
+    do {
+      if (((tfmfilearray + 1001)[4 * (charbase + c) + 2] % 4) == 1)
+	{
+	  if ( r < nl )
+	    ;
+	  else
+	    {
+	      while (labeltable[sortptr ].rr > r)
+		labeltable[sortptr + 1 ]= labeltable[sortptr];
+	    }
+	}
+    } while (c++ < for_end);
+
+  if (ai <= for_end)
+    do {
+      if (activity[ai]== 2)
+	{
+	  r = (tfmfilearray + 1001)[4 * (ligkernbase + (ai))];
+	  if (r < 128)
+	    {
+	      r = r + ai + 1 ;
+	      if (r >= nl)
+		{
+		  if (charsonline > 0)
+		    _IO_putc ('\n', stderr);
+		}
+	    }
+	}
+    } while (ai++ < for_end);
+}
diff --git a/gcc/testsuite/gcc.dg/asm-10.c b/gcc/testsuite/gcc.dg/asm-10.c
new file mode 100644
index 00000000000..e6c03c62cab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asm-10.c
@@ -0,0 +1,12 @@
+/* PR inline-asm/67448 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void
+f (int i)
+{
+  asm ("" : : "m"(i += 1)); /* { dg-error "not directly addressable" } */
+  asm ("" : : "m"(i++)); /* { dg-error "not directly addressable" } */
+  asm ("" : : "m"(++i)); /* { dg-error "not directly addressable" } */
+  asm ("" : : "m"(i = 0)); /* { dg-error "not directly addressable" } */
+}
diff --git a/gcc/testsuite/gcc.dg/autopar/pr46099-2.c b/gcc/testsuite/gcc.dg/autopar/pr46099-2.c
new file mode 100644
index 00000000000..2883408365e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr46099-2.c
@@ -0,0 +1,5 @@
+/* PR tree-optimization/46099.  */
+/* { dg-do compile } */
+/* { dg-options "-ftree-parallelize-loops=2 -fcompare-debug -O --param parloops-chunk-size=100" } */
+
+#include "pr46099.c"
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-4.c b/gcc/testsuite/gcc.dg/autopar/reduc-4.c
new file mode 100644
index 00000000000..80b15e2852d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-4.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized --param parloops-chunk-size=100" } */
+
+#include "reduc-3.c"
diff --git a/gcc/testsuite/gcc.dg/gomp/pr67495.c b/gcc/testsuite/gcc.dg/gomp/pr67495.c
new file mode 100644
index 00000000000..1011a266972
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr67495.c
@@ -0,0 +1,38 @@
+/* PR c/67495 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+int a, b, c;
+
+void
+foo (void)
+{
+#pragma omp atomic capture
+  a = (float)a + b;	/* { dg-error "invalid operator" } */
+#pragma omp atomic read
+  (float) a = b;	/* { dg-error "lvalue required" } */
+#pragma omp atomic write
+  (float) a = b;	/* { dg-error "lvalue required" } */
+#pragma omp atomic read
+  a = (float) b;	/* { dg-error "lvalue required" } */
+#pragma omp atomic capture
+  (float) a = b += c;	/* { dg-error "lvalue required" } */
+#pragma omp atomic capture
+  { a += b; (float) c = a; }	/* { dg-error "lvalue required" } */
+#pragma omp atomic capture
+  { a += b; c = (float) a; }	/* { dg-error "uses two different expressions for memory" } */
+#pragma omp atomic capture
+  a = (int)a + b;	/* { dg-error "invalid operator" } */
+#pragma omp atomic read
+  (int) a = b;		/* { dg-error "lvalue required" } */
+#pragma omp atomic write
+  (int) a = b;		/* { dg-error "lvalue required" } */
+#pragma omp atomic read
+  a = (int) b;		/* { dg-error "lvalue required" } */
+#pragma omp atomic capture
+  (int) a = b += c;	/* { dg-error "lvalue required" } */
+#pragma omp atomic capture
+  { a += b; (int) c = a; }	/* { dg-error "lvalue required" } */
+#pragma omp atomic capture
+  { a += b; c = (int) a; }	/* { dg-error "lvalue required" } */
+}
diff --git a/gcc/testsuite/gcc.dg/gomp/pr67500.c b/gcc/testsuite/gcc.dg/gomp/pr67500.c
new file mode 100644
index 00000000000..13a6903d72d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr67500.c
@@ -0,0 +1,42 @@
+/* PR c/67500 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+#pragma omp declare simd simdlen(d)	/* { dg-error "clause expression must be positive constant integer expression" } */
+void f1 (int);				/* { dg-error "undeclared here" "" { target *-*-* } 5 } */
+#pragma omp declare simd simdlen(0.5)	/* { dg-error "clause expression must be positive constant integer expression" } */
+void f2 (int);
+#pragma omp declare simd simdlen(-2)	/* { dg-error "clause expression must be positive constant integer expression" } */
+void f3 (int);
+#pragma omp declare simd simdlen(0)	/* { dg-error "clause expression must be positive constant integer expression" } */
+void f4 (int);
+
+void
+foo (int *p)
+{
+  int i;
+  #pragma omp simd safelen(d)		/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)		/* { dg-error "undeclared" "" { target *-*-* } 18 } */
+    ;
+  #pragma omp simd safelen(0.5)		/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+  #pragma omp simd safelen(-2)		/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+  #pragma omp simd safelen(0)		/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+  #pragma omp simd aligned(p:d)		/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+  #pragma omp simd aligned(p:0.5)	/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+  #pragma omp simd aligned(p:-2)	/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+  #pragma omp simd aligned(p:0)		/* { dg-error "must be positive constant integer expression" } */
+  for (i = 0; i < 16; ++i)
+    ;
+}
diff --git a/gcc/testsuite/gcc.dg/graphite/block-0.c b/gcc/testsuite/gcc.dg/graphite/block-0.c
index cb08a5fe56f..24b3bd060a9 100644
--- a/gcc/testsuite/gcc.dg/graphite/block-0.c
+++ b/gcc/testsuite/gcc.dg/graphite/block-0.c
@@ -42,4 +42,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "not tiled" 3 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "not tiled" 2 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/block-1.c b/gcc/testsuite/gcc.dg/graphite/block-1.c
index 19f9f20c3bc..bb81a95d421 100644
--- a/gcc/testsuite/gcc.dg/graphite/block-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/block-1.c
@@ -45,4 +45,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 5 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 6 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/block-5.c b/gcc/testsuite/gcc.dg/graphite/block-5.c
index d30abf80fda..2f4b2f503b0 100644
--- a/gcc/testsuite/gcc.dg/graphite/block-5.c
+++ b/gcc/testsuite/gcc.dg/graphite/block-5.c
@@ -53,4 +53,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/block-6.c b/gcc/testsuite/gcc.dg/graphite/block-6.c
index 9f03448b957..36e9783d151 100644
--- a/gcc/testsuite/gcc.dg/graphite/block-6.c
+++ b/gcc/testsuite/gcc.dg/graphite/block-6.c
@@ -48,4 +48,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-1.c b/gcc/testsuite/gcc.dg/graphite/interchange-1.c
index b9f12c7d20d..2c58ac21e98 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-1.c
@@ -49,4 +49,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-10.c b/gcc/testsuite/gcc.dg/graphite/interchange-10.c
index 29e11c72257..9d486448d08 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-10.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-10.c
@@ -46,4 +46,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 6 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-11.c b/gcc/testsuite/gcc.dg/graphite/interchange-11.c
index afd71230a63..4f6918dc691 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-11.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-11.c
@@ -46,4 +46,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-13.c b/gcc/testsuite/gcc.dg/graphite/interchange-13.c
index 0e722e2632e..c9ea048e482 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-13.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-13.c
@@ -50,4 +50,4 @@ main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-14.c b/gcc/testsuite/gcc.dg/graphite/interchange-14.c
index 55c600247c0..151bfe71f1f 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-14.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-14.c
@@ -54,4 +54,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 7 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 6 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-3.c b/gcc/testsuite/gcc.dg/graphite/interchange-3.c
index cdc02020197..ebdeef7ea8e 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-3.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-3.c
@@ -47,4 +47,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-4.c b/gcc/testsuite/gcc.dg/graphite/interchange-4.c
index 67125658286..9a50e7a0833 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-4.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-4.c
@@ -46,4 +46,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-7.c b/gcc/testsuite/gcc.dg/graphite/interchange-7.c
index d99a16a291a..e53d30e8cfe 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-7.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-7.c
@@ -46,4 +46,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-8.c b/gcc/testsuite/gcc.dg/graphite/interchange-8.c
index 123106bb475..c5e714175a0 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-8.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-8.c
@@ -82,4 +82,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 5 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 6 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-9.c b/gcc/testsuite/gcc.dg/graphite/interchange-9.c
index e4c54ae181d..44a5452213e 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-9.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-9.c
@@ -44,4 +44,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c b/gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c
index cb5d802db8e..70ac24c46d7 100644
--- a/gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c
+++ b/gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c
@@ -12,4 +12,6 @@ main (int n, int *a)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL: \nfor \\(int c1 = 0; c1 < n - 1; c1 \\+= 1\\)\n  for \\(int c3 = 0; c3 < n; c3 \\+= 1\\)\n    S_4\\(c1, c3\\);" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL: \n\\{\n  S_2\\();\n  if \\(P_19 >= 1\\)\n
+    for \\(int c1 = 0; c1 < n - 1; c1 \\+= 1\\) \\{ \n      for \\(int c3 = 0; c3 < n; c3 \\+= 1\\)\n
+        S_4\\(c1, c3\\); \n      S_6\\(c1\\);\n    \\}    \n\\}" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/pr35356-1.c b/gcc/testsuite/gcc.dg/graphite/pr35356-1.c
index 10aa49337b7..7f0e8246e03 100644
--- a/gcc/testsuite/gcc.dg/graphite/pr35356-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/pr35356-1.c
@@ -11,6 +11,10 @@ foo (int bar, int n, int k)
     if (i == k)
       a[i] = bar;
 
+  for (i = 0; i < n; i++)
+    if (i == k)
+      a[i] = bar;
+
   return a[bar];
 }
 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr37485.c b/gcc/testsuite/gcc.dg/graphite/pr37485.c
index 0a6dfbceefc..47138d303af 100644
--- a/gcc/testsuite/gcc.dg/graphite/pr37485.c
+++ b/gcc/testsuite/gcc.dg/graphite/pr37485.c
@@ -31,4 +31,4 @@ void fallbackSort ( UInt32* fmap,
    AssertH ( j < 256, 1005 );
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-0.c b/gcc/testsuite/gcc.dg/graphite/scop-0.c
index 9cfd5dd14dc..abeabce98a8 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-0.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-0.c
@@ -9,7 +9,7 @@ int toto()
   int b[100];
   int N = foo ();
 
-  for (i = 0; i < 2*N+ 100; i++)
+  for (i = 0; i < N+ 100; i++)
     for (j = 0; j < 200; j++)
       a[j][i] = a[j+1][10] + 2;
 
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-1.c b/gcc/testsuite/gcc.dg/graphite/scop-1.c
index 16070d425b8..a569065d095 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-1.c
@@ -27,4 +27,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 3" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-10.c b/gcc/testsuite/gcc.dg/graphite/scop-10.c
index f14aabce4c3..39ed5d7ea7b 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-10.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-10.c
@@ -12,8 +12,6 @@ int toto()
         b[i+j] = b[i+j-1] + 2;
 
       if (i * 2 == i + 8)
-        bar ();
-      else 
         {
 	  for (j = 1; j < 100; j++)
 	    b[i+j] = b[i+j-1] + 2;
@@ -27,4 +25,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 3" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-11.c b/gcc/testsuite/gcc.dg/graphite/scop-11.c
index 4a7286993c5..97fe5393b37 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-11.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-11.c
@@ -10,7 +10,6 @@ int toto()
       for (j = 0; j <= 20; j++)
         a[j] = b + i;
       b = 3;
-      bar();
     }
   else 
     {
@@ -28,4 +27,4 @@ int toto()
   return a[b];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 3" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-12.c b/gcc/testsuite/gcc.dg/graphite/scop-12.c
index 221d987bfba..68e12050488 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-12.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-12.c
@@ -32,4 +32,4 @@ int toto()
   return a[b];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 5" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-13.c b/gcc/testsuite/gcc.dg/graphite/scop-13.c
index 195b7569389..53a17196d3e 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-13.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-13.c
@@ -37,4 +37,4 @@ int toto()
   return a[b];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-16.c b/gcc/testsuite/gcc.dg/graphite/scop-16.c
index cacd564e9e5..676817014b2 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-16.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-16.c
@@ -21,4 +21,4 @@ int test ()
       foo (a[i][j]); 
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-17.c b/gcc/testsuite/gcc.dg/graphite/scop-17.c
index 2252766b796..3c0d8804549 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-17.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-17.c
@@ -20,4 +20,4 @@ int test ()
       foo (a[i][j]); 
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-18.c b/gcc/testsuite/gcc.dg/graphite/scop-18.c
index 6e1080bc701..3416304075d 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-18.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-18.c
@@ -22,4 +22,4 @@ void test (void)
         A[i][j] = B[i][k] * C[k][j];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-2.c b/gcc/testsuite/gcc.dg/graphite/scop-2.c
index a16717c600a..fb1a4e7b692 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-2.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-2.c
@@ -35,4 +35,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 4" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-21.c b/gcc/testsuite/gcc.dg/graphite/scop-21.c
index 48a6d2ff5a8..bd3f811d9d1 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-21.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-21.c
@@ -6,6 +6,9 @@ int test ()
   int i;
 
   for (i = 0; i < N; i++)
+    a[i] += 32;
+
+  for (i = 0; i < N; i++)
     {
       a[i] = i + 12;
 
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-22.c b/gcc/testsuite/gcc.dg/graphite/scop-22.c
index c936428b7bb..6ff5ccd5b56 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-22.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-22.c
@@ -7,6 +7,9 @@ void foo(int N, int *res)
   double sum = 0.0;
 
   for (i = 0; i < N; i++)
+    sum += u[i];
+
+  for (i = 0; i < N; i++)
     {
       a = u[i];
       u[i] = i * i;
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-4.c b/gcc/testsuite/gcc.dg/graphite/scop-4.c
index c6d719e2dd0..4fb0e5ea471 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-4.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-4.c
@@ -25,4 +25,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-5.c b/gcc/testsuite/gcc.dg/graphite/scop-5.c
index fa1c64a0ca8..8309257554c 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-5.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-5.c
@@ -9,6 +9,8 @@ int toto()
     {
       for (j = 0; j <= 20; j++)
         a[j] = b + i;
+      for (j = 2; j <= 23; j++)
+        a[j] = b + i;
       b = 3;
       bar();
     }
@@ -31,4 +33,4 @@ int toto()
   return a[b];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 3" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-6.c b/gcc/testsuite/gcc.dg/graphite/scop-6.c
index 2a45d6ee559..1da486a2ddf 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-6.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-6.c
@@ -17,7 +17,6 @@ int toto()
         {
         for (k = 1; k < 100; k++)
           b[i+k] = b[i+k-1] + 2;
-        bar ();
         }
       
       for (k = 1; k < 100; k++)
@@ -27,4 +26,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 3" 1 "graphite"} } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-7.c b/gcc/testsuite/gcc.dg/graphite/scop-7.c
index 5866ca736f6..3e337d0c603 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-7.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-7.c
@@ -13,7 +13,6 @@ int toto()
 
       if (i * 2 == i + 8)
 	{
-	  bar ();
 	  for (j = 1; j < 100; j++)
 	    b[i+j] = b[i+j-1] + 2;
 	}
@@ -27,4 +26,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 3" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-8.c b/gcc/testsuite/gcc.dg/graphite/scop-8.c
index 9cdc69f670d..71d5c531fb8 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-8.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-8.c
@@ -14,8 +14,7 @@ int toto()
       if (i * 2 == i + 8)
 	{
 	  for (j = 1; j < 100; j++)
-	    if (bar ())
-	      b[i+j] = b[i+j-1] + 2;
+	    b[i+j] = b[i+j-1] + 2;
 	}
       else 
 	a[i][i] = 2;
@@ -27,4 +26,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-9.c b/gcc/testsuite/gcc.dg/graphite/scop-9.c
index d879d9afda3..93888728b0d 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-9.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-9.c
@@ -12,8 +12,6 @@ int toto()
         b[i+j] = b[i+j-1] + 2;
 
       if (i * 2 == i + 8)
-        bar ();
-      else 
 	a[i][i] = 2;
 
       for (k = 1; k < 100; k++)
@@ -23,4 +21,4 @@ int toto()
   return a[3][5] + b[1];
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite"} } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-mvt.c b/gcc/testsuite/gcc.dg/graphite/scop-mvt.c
index 5d3d19f7413..442a3a0bafa 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-mvt.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-mvt.c
@@ -8,16 +8,16 @@ void mvt(long N) {
 
     for (i=0; i<N; i++) {
         for (j=0; j<N; j++) {
-            x1[i] = x1[i] + a[i][j] * y_1[j];
+            x1[j] = x1[j] + a[i][j] * y_1[j];
         }
     }
     
     for (i=0; i<N; i++) {
         for (j=0; j<N; j++) {
-            x2[i] = x2[i] + a[j][i] * y_2[j];
+            x2[j] = x2[j] + a[j][i] * y_2[j];
         }
     }
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } } */
 
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-block-1.c b/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
index 12a62919b5f..64ca761c40c 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
@@ -45,4 +45,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 5 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
index d9c07e2fe21..4e3c705a13a 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
@@ -54,4 +54,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 4 "graphite" } } */
+/* { dg-final { scan-tree-dump "tiled by" "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
index 7ef575b667d..a9d4950a525 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
@@ -55,4 +55,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 6 "graphite" } } */
+/* { dg-final { scan-tree-dump "tiled by" "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
index 0e32fd61456..fe2669f1578 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
@@ -49,4 +49,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump "tiled by" "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
index 31b132253c6..601169ec39e 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
@@ -45,4 +45,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
index eebece38698..211c9ab82bd 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
@@ -59,4 +59,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "tiled by" 3 "graphite" } } */
+/* { dg-final { scan-tree-dump "tiled by" "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/lto/pr67452_0.c b/gcc/testsuite/gcc.dg/lto/pr67452_0.c
new file mode 100644
index 00000000000..a4984ffcc9a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr67452_0.c
@@ -0,0 +1,23 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -O2 -flto -fopenmp-simd } } } */
+
+float b[3][3];
+
+__attribute__((used, noinline)) void
+foo ()
+{
+  int v1, v2;
+#pragma omp simd collapse(2)
+  for (v1 = 0; v1 < 3; v1++)
+    for (v2 = 0; v2 < 3; v2++)
+      b[v1][v2] = 2.5;
+}
+
+int
+main ()
+{
+  asm volatile ("" : : "g" (b) : "memory");
+  foo ();
+  asm volatile ("" : : "g" (b) : "memory");
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pie-link.c b/gcc/testsuite/gcc.dg/pie-link.c
index c16086cc19e..2be07615f96 100644
--- a/gcc/testsuite/gcc.dg/pie-link.c
+++ b/gcc/testsuite/gcc.dg/pie-link.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target pie } } */
-/* { dg-options "-fpie" } */
+/* { dg-options "-fpie -pie" } */
 
 int main(void)
 {
diff --git a/gcc/testsuite/gcc.dg/pr67432.c b/gcc/testsuite/gcc.dg/pr67432.c
new file mode 100644
index 00000000000..74367a97251
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr67432.c
@@ -0,0 +1,6 @@
+/* PR c/67432 */
+/* { dg-do compile } */
+
+enum {}; /* { dg-error "empty enum is invalid" } */
+enum E {}; /* { dg-error "empty enum is invalid" } */
+enum F {} e; /* { dg-error "empty enum is invalid" } */
diff --git a/gcc/testsuite/gcc.dg/pr67512.c b/gcc/testsuite/gcc.dg/pr67512.c
new file mode 100644
index 00000000000..95f836aea00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr67512.c
@@ -0,0 +1,15 @@
+/* PR middle-end/67512 */
+/* { dg-do compile }  */
+/* { dg-options "-O -Wuninitialized" } */
+
+extern int fn2 (void);
+extern int fn3 (int);
+void
+fn1 (void)
+{
+  int z, m;
+  if (1 & m) /* { dg-warning "is used uninitialized" } */
+    z = fn2 ();
+  z = 1 == m ? z : 2 == m;
+  fn3 (z);
+}
diff --git a/gcc/testsuite/gcc.dg/ubsan/pr67279.c b/gcc/testsuite/gcc.dg/ubsan/pr67279.c
new file mode 100644
index 00000000000..5b5db42f96a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/pr67279.c
@@ -0,0 +1,14 @@
+/* PR sanitizer/67279 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -w" } */
+
+#define INT_MIN (-__INT_MAX__ - 1)
+
+void
+foo (void)
+{
+  static int a1 = 1 << 31;
+  static int a2 = 10 << 30;
+  static int a3 = 100 << 28;
+  static int a4 = INT_MIN / -1;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
index ceada839d98..462696315e0 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
@@ -52,8 +52,12 @@ if {[istarget arm*-*-*]} then {
 torture-init
 set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
 
-# Make sure Neon flags are provided, if necessary.
-set additional_flags [add_options_for_arm_neon ""]
+# Make sure Neon flags are provided, if necessary.  Use fp16 if we can.
+if {[check_effective_target_arm_neon_fp16_ok]} then {
+  set additional_flags [add_options_for_arm_neon_fp16 ""]
+} else {
+  set additional_flags [add_options_for_arm_neon ""]
+}
 
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 4e728d5572c..49fbd843e50 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -7,6 +7,7 @@
 #include <inttypes.h>
 
 /* helper type, to help write floating point results in integer form.  */
+typedef uint16_t hfloat16_t;
 typedef uint32_t hfloat32_t;
 typedef uint64_t hfloat64_t;
 
@@ -132,6 +133,9 @@ static ARRAY(result, uint, 32, 2);
 static ARRAY(result, uint, 64, 1);
 static ARRAY(result, poly, 8, 8);
 static ARRAY(result, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+static ARRAY(result, float, 16, 4);
+#endif
 static ARRAY(result, float, 32, 2);
 static ARRAY(result, int, 8, 16);
 static ARRAY(result, int, 16, 8);
@@ -143,6 +147,9 @@ static ARRAY(result, uint, 32, 4);
 static ARRAY(result, uint, 64, 2);
 static ARRAY(result, poly, 8, 16);
 static ARRAY(result, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+static ARRAY(result, float, 16, 8);
+#endif
 static ARRAY(result, float, 32, 4);
 #ifdef __aarch64__
 static ARRAY(result, float, 64, 2);
@@ -160,6 +167,7 @@ extern ARRAY(expected, uint, 32, 2);
 extern ARRAY(expected, uint, 64, 1);
 extern ARRAY(expected, poly, 8, 8);
 extern ARRAY(expected, poly, 16, 4);
+extern ARRAY(expected, hfloat, 16, 4);
 extern ARRAY(expected, hfloat, 32, 2);
 extern ARRAY(expected, int, 8, 16);
 extern ARRAY(expected, int, 16, 8);
@@ -171,38 +179,11 @@ extern ARRAY(expected, uint, 32, 4);
 extern ARRAY(expected, uint, 64, 2);
 extern ARRAY(expected, poly, 8, 16);
 extern ARRAY(expected, poly, 16, 8);
+extern ARRAY(expected, hfloat, 16, 8);
 extern ARRAY(expected, hfloat, 32, 4);
 extern ARRAY(expected, hfloat, 64, 2);
 
-/* Check results. Operates on all possible vector types.  */
-#define CHECK_RESULTS(test_name,comment)				\
-  {									\
-    CHECK(test_name, int, 8, 8, PRIx8, expected, comment);		\
-    CHECK(test_name, int, 16, 4, PRIx16, expected, comment);		\
-    CHECK(test_name, int, 32, 2, PRIx32, expected, comment);		\
-    CHECK(test_name, int, 64, 1, PRIx64, expected, comment);		\
-    CHECK(test_name, uint, 8, 8, PRIx8, expected, comment);		\
-    CHECK(test_name, uint, 16, 4, PRIx16, expected, comment);		\
-    CHECK(test_name, uint, 32, 2, PRIx32, expected, comment);		\
-    CHECK(test_name, uint, 64, 1, PRIx64, expected, comment);		\
-    CHECK(test_name, poly, 8, 8, PRIx8, expected, comment);		\
-    CHECK(test_name, poly, 16, 4, PRIx16, expected, comment);		\
-    CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);	\
-									\
-    CHECK(test_name, int, 8, 16, PRIx8, expected, comment);		\
-    CHECK(test_name, int, 16, 8, PRIx16, expected, comment);		\
-    CHECK(test_name, int, 32, 4, PRIx32, expected, comment);		\
-    CHECK(test_name, int, 64, 2, PRIx64, expected, comment);		\
-    CHECK(test_name, uint, 8, 16, PRIx8, expected, comment);		\
-    CHECK(test_name, uint, 16, 8, PRIx16, expected, comment);		\
-    CHECK(test_name, uint, 32, 4, PRIx32, expected, comment);		\
-    CHECK(test_name, uint, 64, 2, PRIx64, expected, comment);		\
-    CHECK(test_name, poly, 8, 16, PRIx8, expected, comment);		\
-    CHECK(test_name, poly, 16, 8, PRIx16, expected, comment);		\
-    CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment);	\
-  }									\
-
-#define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)			\
+#define CHECK_RESULTS_NAMED_NO_FP16(test_name,EXPECTED,comment)		\
   {									\
     CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);		\
@@ -229,6 +210,24 @@ extern ARRAY(expected, hfloat, 64, 2);
     CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);	\
   }									\
 
+/* Check results against EXPECTED.  Operates on all possible vector types.  */
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)			\
+  {									\
+    CHECK_RESULTS_NAMED_NO_FP16(test_name, EXPECTED, comment)		\
+    CHECK_FP(test_name, float, 16, 4, PRIx16, EXPECTED, comment);	\
+    CHECK_FP(test_name, float, 16, 8, PRIx16, EXPECTED, comment);	\
+  }
+#else
+#define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)		\
+  CHECK_RESULTS_NAMED_NO_FP16(test_name, EXPECTED, comment)
+#endif
+
+#define CHECK_RESULTS_NO_FP16(test_name,comment)			\
+  CHECK_RESULTS_NAMED_NO_FP16(test_name, expected, comment)
+
+#define CHECK_RESULTS(test_name,comment)		\
+  CHECK_RESULTS_NAMED(test_name, expected, comment)
 
 
 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
@@ -380,6 +379,9 @@ static void clean_results (void)
   CLEAN(result, uint, 64, 1);
   CLEAN(result, poly, 8, 8);
   CLEAN(result, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  CLEAN(result, float, 16, 4);
+#endif
   CLEAN(result, float, 32, 2);
 
   CLEAN(result, int, 8, 16);
@@ -392,6 +394,9 @@ static void clean_results (void)
   CLEAN(result, uint, 64, 2);
   CLEAN(result, poly, 8, 16);
   CLEAN(result, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  CLEAN(result, float, 16, 8);
+#endif
   CLEAN(result, float, 32, 4);
 
 #if defined(__aarch64__)
@@ -443,21 +448,40 @@ static void clean_results (void)
   DECL_VARIABLE(VAR, uint, 64, 2)
 
 /* Declare all 64 bits variants.  */
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define DECL_VARIABLE_64BITS_VARIANTS(VAR)	\
+  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE(VAR, poly, 8, 8);		\
+  DECL_VARIABLE(VAR, poly, 16, 4);		\
+  DECL_VARIABLE(VAR, float, 16, 4);		\
+  DECL_VARIABLE(VAR, float, 32, 2)
+#else
 #define DECL_VARIABLE_64BITS_VARIANTS(VAR)	\
   DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE(VAR, poly, 8, 8);		\
   DECL_VARIABLE(VAR, poly, 16, 4);		\
   DECL_VARIABLE(VAR, float, 32, 2)
+#endif
 
 /* Declare all 128 bits variants.  */
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 #define DECL_VARIABLE_128BITS_VARIANTS(VAR)	\
   DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE(VAR, poly, 8, 16);		\
   DECL_VARIABLE(VAR, poly, 16, 8);		\
+  DECL_VARIABLE(VAR, float, 16, 8);		\
   DECL_VARIABLE(VAR, float, 32, 4)
-
+#else
+#define DECL_VARIABLE_128BITS_VARIANTS(VAR)	\
+  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE(VAR, poly, 8, 16);		\
+  DECL_VARIABLE(VAR, poly, 16, 8);		\
+  DECL_VARIABLE(VAR, float, 32, 4)
+#endif
 /* Declare all variants.  */
 #define DECL_VARIABLE_ALL_VARIANTS(VAR)		\
   DECL_VARIABLE_64BITS_VARIANTS(VAR);		\
@@ -476,6 +500,15 @@ static void clean_results (void)
 /* Helpers to initialize vectors.  */
 #define VDUP(VAR, Q, T1, T2, W, N, V)			\
   VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+/* Work around that there is no vdup_n_f16 intrinsic.  */
+#define vdup_n_f16(VAL)		\
+  __extension__			\
+    ({				\
+      float16_t f = VAL;	\
+      vld1_dup_f16(&f);		\
+    })
+#endif
 
 #define VSET_LANE(VAR, Q, T1, T2, W, N, L, V)				\
   VECT_VAR(VAR, T1, W, N) = vset##Q##_lane_##T2##W(V,			\
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
index 26203cc0a69..c8d43367bef 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
@@ -118,6 +118,10 @@ VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
 PAD(buffer_pad, uint, 32, 2);
 VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
 PAD(buffer_pad, uint, 64, 1);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT(buffer, float, 16, 4);
+PAD(buffer_pad, float, 16, 4);
+#endif
 VECT_VAR_DECL_INIT(buffer, float, 32, 2);
 PAD(buffer_pad, float, 32, 2);
 VECT_VAR_DECL_INIT(buffer, int, 8, 16);
@@ -140,6 +144,10 @@ VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
 PAD(buffer_pad, poly, 8, 16);
 VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
 PAD(buffer_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT(buffer, float, 16, 8);
+PAD(buffer_pad, float, 16, 8);
+#endif
 VECT_VAR_DECL_INIT(buffer, float, 32, 4);
 PAD(buffer_pad, float, 32, 4);
 #ifdef __aarch64__
@@ -170,6 +178,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
 VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
 VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
 VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT4(buffer_dup, float, 16, 4);
+VECT_VAR_DECL(buffer_dup_pad, float, 16, 4);
+#endif
 VECT_VAR_DECL_INIT4(buffer_dup, float, 32, 2);
 VECT_VAR_DECL(buffer_dup_pad, float, 32, 2);
 
@@ -193,5 +205,9 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
 VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
 VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
 VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT(buffer_dup, float, 16, 8);
+VECT_VAR_DECL(buffer_dup_pad, float, 16, 8);
+#endif
 VECT_VAR_DECL_INIT(buffer_dup, float, 32, 4);
 VECT_VAR_DECL(buffer_dup_pad, float, 32, 4);
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
index bb17f0a9649..c4fdbb45102 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
@@ -114,7 +114,7 @@ void exec_vbsl (void)
   TEST_VBSL(uint, , float, f, 32, 2);
   TEST_VBSL(uint, q, float, f, 32, 4);
 
-  CHECK_RESULTS (TEST_MSG, "");
+  CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcombine.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcombine.c
index 295768a0348..5100375e5fe 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcombine.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcombine.c
@@ -27,6 +27,8 @@ VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 					 0x66, 0x66, 0x66, 0x66 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 					   0x40533333, 0x40533333 };
+VECT_VAR_DECL(expected,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+					   0x4080, 0x4080, 0x4080, 0x4080 };
 
 #define TEST_MSG "VCOMBINE"
 void exec_vcombine (void)
@@ -44,6 +46,9 @@ void exec_vcombine (void)
 
   /* Initialize input "vector64_a" from "buffer".  */
   TEST_MACRO_64BITS_VARIANTS_2_5(VLOAD, vector64_a, buffer);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VLOAD(vector64_a, buffer, , float, f, 16, 4);
+#endif
   VLOAD(vector64_a, buffer, , float, f, 32, 2);
 
   /* Choose init value arbitrarily.  */
@@ -57,6 +62,9 @@ void exec_vcombine (void)
   VDUP(vector64_b, , uint, u, 64, 1, 0x88);
   VDUP(vector64_b, , poly, p, 8, 8, 0x55);
   VDUP(vector64_b, , poly, p, 16, 4, 0x66);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VDUP(vector64_b, , float, f, 16, 4, 2.25);
+#endif
   VDUP(vector64_b, , float, f, 32, 2, 3.3f);
 
   clean_results ();
@@ -72,6 +80,9 @@ void exec_vcombine (void)
   TEST_VCOMBINE(uint, u, 64, 1, 2);
   TEST_VCOMBINE(poly, p, 8, 8, 16);
   TEST_VCOMBINE(poly, p, 16, 4, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VCOMBINE(float, f, 16, 4, 8);
+#endif
   TEST_VCOMBINE(float, f, 32, 2, 4);
 
   CHECK(TEST_MSG, int, 8, 16, PRIx8, expected, "");
@@ -84,6 +95,9 @@ void exec_vcombine (void)
   CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, "");
   CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected, "");
   CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected, "");
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcreate.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcreate.c
index b2289d3a628..b8b338ef3c0 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcreate.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcreate.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0x123456789abcdef0 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xde, 0xbc, 0x9a,
 					0x78, 0x56, 0x34, 0x12 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xdef0, 0x9abc, 0x5678, 0x1234 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xdef0, 0x9abc, 0x5678, 0x1234 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x9abcdef0, 0x12345678 };
 
 #define INSN_NAME vcreate
@@ -38,6 +39,9 @@ FNNAME (INSN_NAME)
   DECL_VAL(val, int, 16, 4);
   DECL_VAL(val, int, 32, 2);
   DECL_VAL(val, int, 64, 1);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  DECL_VAL(val, float, 16, 4);
+#endif
   DECL_VAL(val, float, 32, 2);
   DECL_VAL(val, uint, 8, 8);
   DECL_VAL(val, uint, 16, 4);
@@ -50,6 +54,9 @@ FNNAME (INSN_NAME)
   DECL_VARIABLE(vector_res, int, 16, 4);
   DECL_VARIABLE(vector_res, int, 32, 2);
   DECL_VARIABLE(vector_res, int, 64, 1);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  DECL_VARIABLE(vector_res, float, 16, 4);
+#endif
   DECL_VARIABLE(vector_res, float, 32, 2);
   DECL_VARIABLE(vector_res, uint, 8, 8);
   DECL_VARIABLE(vector_res, uint, 16, 4);
@@ -65,6 +72,9 @@ FNNAME (INSN_NAME)
   VECT_VAR(val, int, 16, 4) = 0x123456789abcdef0LL;
   VECT_VAR(val, int, 32, 2) = 0x123456789abcdef0LL;
   VECT_VAR(val, int, 64, 1) = 0x123456789abcdef0LL;
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_VAR(val, float, 16, 4) = 0x123456789abcdef0LL;
+#endif
   VECT_VAR(val, float, 32, 2) = 0x123456789abcdef0LL;
   VECT_VAR(val, uint, 8, 8) = 0x123456789abcdef0ULL;
   VECT_VAR(val, uint, 16, 4) = 0x123456789abcdef0ULL;
@@ -76,6 +86,9 @@ FNNAME (INSN_NAME)
   TEST_VCREATE(int, s, 8, 8);
   TEST_VCREATE(int, s, 16, 4);
   TEST_VCREATE(int, s, 32, 2);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VCREATE(float, f, 16, 4);
+#endif
   TEST_VCREATE(float, f, 32, 2);
   TEST_VCREATE(int, s, 64, 1);
   TEST_VCREATE(uint, u, 8, 8);
@@ -95,6 +108,9 @@ FNNAME (INSN_NAME)
   CHECK(TEST_MSG, uint, 64, 1, PRIx64, expected, "");
   CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected, "");
   CHECK(TEST_MSG, poly, 16, 4, PRIx16, expected, "");
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index 00000000000..48e50e18263
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,98 @@
+/* { dg-require-effective-target arm_neon_fp16_hw { target { arm*-*-* } } } */
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include <math.h>
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x41800000, 0x41700000,
+					    0x41600000, 0x41500000 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc1400000, 0xc1300000,
+						 0xc1200000, 0xc1100000 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000,
+						 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+void
+exec_vcvt (void)
+{
+  clean_results ();
+
+#define TEST_MSG vcvt_f32_f16
+  {
+    VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+    DECL_VARIABLE (vector_src, float, 16, 4);
+
+    VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+    DECL_VARIABLE (vector_res, float, 32, 4) =
+	vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+    vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	       VECT_VAR (vector_res, float, 32, 4));
+
+    CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+    VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+    DECL_VARIABLE (vector_src, float, 32, 4);
+
+    VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+    DECL_VARIABLE (vector_res, float, 16, 4) =
+      vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+    vst1_f16 (VECT_VAR (result, float, 16, 4),
+	      VECT_VAR (vector_res, float, 16 ,4));
+
+    CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  }
+#undef TEST_MSG
+
+#if defined (__aarch64__)
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f32_f16"
+  {
+    DECL_VARIABLE (vector_src, float, 16, 8);
+    VLOAD (vector_src, buffer, q, float, f, 16, 8);
+    DECL_VARIABLE (vector_res, float, 32, 4);
+    VECT_VAR (vector_res, float, 32, 4) =
+      vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+    vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	       VECT_VAR (vector_res, float, 32, 4));
+    CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, "");
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f16_f32"
+  {
+    DECL_VARIABLE (vector_low, float, 16, 4);
+    VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+    DECL_VARIABLE (vector_src, float, 32, 4);
+    VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+    DECL_VARIABLE (vector_res, float, 16, 8) =
+      vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
+			 VECT_VAR (vector_src, float, 32, 4));
+    vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	       VECT_VAR (vector_res, float, 16, 8));
+
+    CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, "");
+  }
+#endif
+}
+
+int
+main (void)
+{
+  exec_vcvt ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
index b5132f41ac4..22d45d56c8e 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
@@ -187,13 +187,13 @@ void exec_vdup_vmov (void)
 
     switch (i) {
     case 0:
-      CHECK_RESULTS_NAMED (TEST_MSG, expected0, "");
+      CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected0, "");
       break;
     case 1:
-      CHECK_RESULTS_NAMED (TEST_MSG, expected1, "");
+      CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected1, "");
       break;
     case 2:
-      CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+      CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected2, "");
       break;
     default:
       abort();
@@ -232,13 +232,13 @@ void exec_vdup_vmov (void)
 
     switch (i) {
     case 0:
-      CHECK_RESULTS_NAMED (TEST_MSG, expected0, "");
+      CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected0, "");
       break;
     case 1:
-      CHECK_RESULTS_NAMED (TEST_MSG, expected1, "");
+      CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected1, "");
       break;
     case 2:
-      CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
+      CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected2, "");
       break;
     default:
       abort();
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
index c1ff6dd3007..ef708dcba17 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
@@ -90,7 +90,7 @@ void exec_vdup_lane (void)
   TEST_VDUP_LANE(q, poly, p, 16, 8, 4, 1);
   TEST_VDUP_LANE(q, float, f, 32, 4, 2, 1);
 
-  CHECK_RESULTS (TEST_MSG, "");
+  CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
index 0b014ebda87..98f88a69898 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
@@ -113,7 +113,7 @@ void exec_vext (void)
   TEST_VEXT(q, poly, p, 16, 8, 6);
   TEST_VEXT(q, float, f, 32, 4, 3);
 
-  CHECK_RESULTS (TEST_MSG, "");
+  CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
 }
 
 int main (void)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_high.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_high.c
index d7581125edd..9f0a1687f18 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_high.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_high.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
 					0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 
 #define TEST_MSG "VGET_HIGH"
@@ -31,6 +32,9 @@ void exec_vget_high (void)
   DECL_VARIABLE_128BITS_VARIANTS(vector128);
 
   TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vector128, buffer);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VLOAD(vector128, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector128, buffer, q, float, f, 32, 4);
 
   clean_results ();
@@ -46,6 +50,9 @@ void exec_vget_high (void)
   TEST_VGET_HIGH(uint, u, 64, 1, 2);
   TEST_VGET_HIGH(poly, p, 8, 8, 16);
   TEST_VGET_HIGH(poly, p, 16, 4, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VGET_HIGH(float, f, 16, 4, 8);
+#endif
   TEST_VGET_HIGH(float, f, 32, 2, 4);
 
   CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, "");
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_low.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_low.c
index 12ecfc21ba0..2b875b9b7b8 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_low.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_low.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 
 #define TEST_MSG "VGET_LOW"
@@ -31,6 +32,9 @@ void exec_vget_low (void)
   DECL_VARIABLE_128BITS_VARIANTS(vector128);
 
   TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vector128, buffer);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VLOAD(vector128, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector128, buffer, q, float, f, 32, 4);
 
   clean_results ();
@@ -46,6 +50,9 @@ void exec_vget_low (void)
   TEST_VGET_LOW(uint, u, 64, 1, 2);
   TEST_VGET_LOW(poly, p, 8, 8, 16);
   TEST_VGET_LOW(poly, p, 16, 4, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VGET_LOW(float, f, 16, 4, 8);
+#endif
   TEST_VGET_LOW(float, f, 32, 2, 4);
 
   CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, "");
@@ -58,6 +65,9 @@ void exec_vget_low (void)
   CHECK(TEST_MSG, uint, 64, 1, PRIx64, expected, "");
   CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected, "");
   CHECK(TEST_MSG, poly, 16, 4, PRIx16, expected, "");
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
+#endif
   CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1.c
index ced9d736d6d..4ed0e464f9c 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf4, 0xf5, 0xf6, 0xf7,
@@ -44,6 +45,8 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					 0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 					 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+					   0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 					   0xc1600000, 0xc1500000 };
 
@@ -62,6 +65,10 @@ void exec_vld1 (void)
 
   TEST_MACRO_ALL_VARIANTS_2_5(TEST_VLD1, vector, buffer);
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VLD1(vector, buffer, , float, f, 16, 4);
+  TEST_VLD1(vector, buffer, q, float, f, 16, 8);
+#endif
   TEST_VLD1(vector, buffer, , float, f, 32, 2);
   TEST_VLD1(vector, buffer, q, float, f, 32, 4);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c
index 0e052743926..34be214e912 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c
@@ -17,6 +17,7 @@ VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
 					 0xf0, 0xf0, 0xf0, 0xf0 };
 VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,hfloat,16,4) [] = { 0xcc00, 0xcc00, 0xcc00, 0xcc00 };
 VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1800000 };
 VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
 					 0xf0, 0xf0, 0xf0, 0xf0,
@@ -44,6 +45,8 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
 					  0xf0, 0xf0, 0xf0, 0xf0 };
 VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
 					  0xfff0, 0xfff0, 0xfff0, 0xfff0 };
+VECT_VAR_DECL(expected0,hfloat,16,8) [] = { 0xcc00, 0xcc00, 0xcc00, 0xcc00,
+					    0xcc00, 0xcc00, 0xcc00, 0xcc00 };
 VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1800000,
 					    0xc1800000, 0xc1800000 };
 
@@ -61,6 +64,7 @@ VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
 					 0xf1, 0xf1, 0xf1, 0xf1 };
 VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,hfloat,16,4) [] = { 0xcb80, 0xcb80, 0xcb80, 0xcb80 };
 VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
 VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
 					 0xf1, 0xf1, 0xf1, 0xf1,
@@ -88,6 +92,8 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
 					  0xf1, 0xf1, 0xf1, 0xf1 };
 VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
 					  0xfff1, 0xfff1, 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected1,hfloat,16,8) [] = { 0xcb80, 0xcb80, 0xcb80, 0xcb80,
+					    0xcb80, 0xcb80, 0xcb80, 0xcb80 };
 VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
 					    0xc1700000, 0xc1700000 };
 
@@ -105,6 +111,7 @@ VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff2 };
 VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					 0xf2, 0xf2, 0xf2, 0xf2 };
 VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected2,hfloat,16,4) [] = { 0xcb00, 0xcb00, 0xcb00, 0xcb00 };
 VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1600000, 0xc1600000 };
 VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					 0xf2, 0xf2, 0xf2, 0xf2,
@@ -132,6 +139,8 @@ VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
 					  0xf2, 0xf2, 0xf2, 0xf2 };
 VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2,
 					  0xfff2, 0xfff2, 0xfff2, 0xfff2 };
+VECT_VAR_DECL(expected2,hfloat,16,8) [] = { 0xcb00, 0xcb00, 0xcb00, 0xcb00,
+					    0xcb00, 0xcb00, 0xcb00, 0xcb00 };
 VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1600000, 0xc1600000,
 					    0xc1600000, 0xc1600000 };
 
@@ -154,6 +163,10 @@ void exec_vld1_dup (void)
 
     TEST_MACRO_ALL_VARIANTS_2_5(TEST_VLD1_DUP, vector, buffer_dup);
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+    TEST_VLD1_DUP(vector, buffer_dup, , float, f, 16, 4);
+    TEST_VLD1_DUP(vector, buffer_dup, q, float, f, 16, 8);
+#endif
     TEST_VLD1_DUP(vector, buffer_dup, , float, f, 32, 2);
     TEST_VLD1_DUP(vector, buffer_dup, q, float, f, 32, 4);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c
index d5c5d22a8ce..1f39006498d 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					0xaa, 0xaa, 0xaa, 0xf0 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xfff0 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xcc00, 0xaaaa };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xaaaaaaaa, 0xc1800000 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					0xaa, 0xaa, 0xaa, 0xaa,
@@ -43,6 +44,8 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					 0xf0, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					 0xaaaa, 0xaaaa, 0xfff0, 0xaaaa };
+VECT_VAR_DECL(expected,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+					   0xaaaa, 0xcc00, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 					   0xc1800000, 0xaaaaaaaa };
 
@@ -72,6 +75,9 @@ void exec_vld1_lane (void)
   ARRAY(buffer_src, uint, 64, 1);
   ARRAY(buffer_src, poly, 8, 8);
   ARRAY(buffer_src, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  ARRAY(buffer_src, float, 16, 4);
+#endif
   ARRAY(buffer_src, float, 32, 2);
 
   ARRAY(buffer_src, int, 8, 16);
@@ -84,6 +90,9 @@ void exec_vld1_lane (void)
   ARRAY(buffer_src, uint, 64, 2);
   ARRAY(buffer_src, poly, 8, 16);
   ARRAY(buffer_src, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  ARRAY(buffer_src, float, 16, 8);
+#endif
   ARRAY(buffer_src, float, 32, 4);
 
   clean_results ();
@@ -99,6 +108,9 @@ void exec_vld1_lane (void)
   TEST_VLD1_LANE(, uint, u, 64, 1, 0);
   TEST_VLD1_LANE(, poly, p, 8, 8, 7);
   TEST_VLD1_LANE(, poly, p, 16, 4, 3);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VLD1_LANE(, float, f, 16, 4, 2);
+#endif
   TEST_VLD1_LANE(, float, f, 32, 2, 1);
 
   TEST_VLD1_LANE(q, int, s, 8, 16, 15);
@@ -111,6 +123,9 @@ void exec_vld1_lane (void)
   TEST_VLD1_LANE(q, uint, u, 64, 2, 0);
   TEST_VLD1_LANE(q, poly, p, 8, 16, 12);
   TEST_VLD1_LANE(q, poly, p, 16, 8, 6);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VLD1_LANE(q, float, f, 16, 8, 5);
+#endif
   TEST_VLD1_LANE(q, float, f, 32, 4, 2);
 
   CHECK_RESULTS (TEST_MSG, "");
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c
new file mode 100644
index 00000000000..2174d6eaa8f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c
@@ -0,0 +1,16 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+float16x4x2_t
+f_vld2_lane_f16 (float16_t * p, float16x4x2_t v)
+{
+  float16x4x2_t res;
+  /* { dg-error "lane 4 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  res = vld2_lane_f16 (p, v, 4);
+  /* { dg-error "lane -1 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  res = vld2_lane_f16 (p, v, -1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c
new file mode 100644
index 00000000000..83ae82c8242
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c
@@ -0,0 +1,16 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+float16x8x2_t
+f_vld2q_lane_f16 (float16_t * p, float16x8x2_t v)
+{
+  float16x8x2_t res;
+  /* { dg-error "lane 8 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  res = vld2q_lane_f16 (p, v, 8);
+  /* { dg-error "lane -1 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  res = vld2q_lane_f16 (p, v, -1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c
new file mode 100644
index 00000000000..21b7861ba75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c
@@ -0,0 +1,16 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+float16x4x3_t
+f_vld3_lane_f16 (float16_t * p, float16x4x3_t v)
+{
+  float16x4x3_t res;
+  /* { dg-error "lane 4 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  res = vld3_lane_f16 (p, v, 4);
+  /* { dg-error "lane -1 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  res = vld3_lane_f16 (p, v, -1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c
new file mode 100644
index 00000000000..95ec3913eef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c
@@ -0,0 +1,16 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+float16x8x3_t
+f_vld3q_lane_f16 (float16_t * p, float16x8x3_t v)
+{
+  float16x8x3_t res;
+  /* { dg-error "lane 8 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  res = vld3q_lane_f16 (p, v, 8);
+  /* { dg-error "lane -1 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  res = vld3q_lane_f16 (p, v, -1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c
new file mode 100644
index 00000000000..bd7ecf06690
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c
@@ -0,0 +1,16 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+float16x4x4_t
+f_vld4_lane_f16 (float16_t * p, float16x4x4_t v)
+{
+  float16x4x4_t res;
+  /* { dg-error "lane 4 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  res = vld4_lane_f16 (p, v, 4);
+  /* { dg-error "lane -1 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  res = vld4_lane_f16 (p, v, -1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c
new file mode 100644
index 00000000000..c27559f4ee8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c
@@ -0,0 +1,16 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+float16x8x4_t
+f_vld4q_lane_f16 (float16_t * p, float16x8x4_t v)
+{
+  float16x8x4_t res;
+  /* { dg-error "lane 8 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  res = vld4q_lane_f16 (p, v, 8);
+  /* { dg-error "lane -1 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  res = vld4q_lane_f16 (p, v, -1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c
index f20aa03f51b..1e02dc3fa10 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX.c
@@ -18,6 +18,7 @@ VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld2_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf4, 0xf5, 0xf6, 0xf7,
@@ -41,6 +42,8 @@ VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 						0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 						0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld2_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+						  0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 						  0xc1600000, 0xc1500000 };
 
@@ -58,6 +61,7 @@ VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
 					       0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld2_1,hfloat,16,4) [] = { 0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
 					       0x4, 0x5, 0x6, 0x7,
@@ -81,6 +85,8 @@ VECT_VAR_DECL(expected_vld2_1,poly,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
 						0xc, 0xd, 0xe, 0xf };
 VECT_VAR_DECL(expected_vld2_1,poly,16,8) [] = { 0xfff8, 0xfff9, 0xfffa, 0xfffb,
 						0xfffc, 0xfffd, 0xfffe, 0xffff };
+VECT_VAR_DECL(expected_vld2_1,hfloat,16,8) [] = { 0xc800, 0xc700, 0xc600, 0xc500,
+						  0xc400, 0xc200, 0xc000, 0xbc00 };
 VECT_VAR_DECL(expected_vld2_1,hfloat,32,4) [] = { 0xc1400000, 0xc1300000,
 						  0xc1200000, 0xc1100000 };
 
@@ -98,6 +104,7 @@ VECT_VAR_DECL(expected_vld3_0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected_vld3_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected_vld3_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld3_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld3_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_vld3_0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf4, 0xf5, 0xf6, 0xf7,
@@ -121,6 +128,8 @@ VECT_VAR_DECL(expected_vld3_0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 						0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected_vld3_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 						0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld3_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+						  0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected_vld3_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 						  0xc1600000, 0xc1500000 };
 
@@ -138,6 +147,7 @@ VECT_VAR_DECL(expected_vld3_1,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected_vld3_1,poly,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
 					       0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected_vld3_1,poly,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld3_1,hfloat,16,4) [] = { 0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected_vld3_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 VECT_VAR_DECL(expected_vld3_1,int,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
 					       0x4, 0x5, 0x6, 0x7,
@@ -161,6 +171,8 @@ VECT_VAR_DECL(expected_vld3_1,poly,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
 						0xc, 0xd, 0xe, 0xf };
 VECT_VAR_DECL(expected_vld3_1,poly,16,8) [] = { 0xfff8, 0xfff9, 0xfffa, 0xfffb,
 						0xfffc, 0xfffd, 0xfffe, 0xffff };
+VECT_VAR_DECL(expected_vld3_1,hfloat,16,8) [] = { 0xc800, 0xc700, 0xc600, 0xc500,
+						  0xc400, 0xc200, 0xc000, 0xbc00 };
 VECT_VAR_DECL(expected_vld3_1,hfloat,32,4) [] = { 0xc1400000, 0xc1300000,
 						  0xc1200000, 0xc1100000 };
 
@@ -181,6 +193,7 @@ VECT_VAR_DECL(expected_vld3_2,poly,8,8) [] = { 0x0, 0x1, 0x2, 0x3,
 					       0x4, 0x5, 0x6, 0x7 };
 VECT_VAR_DECL(expected_vld3_2,poly,16,4) [] = { 0xfff8, 0xfff9,
 						0xfffa, 0xfffb };
+VECT_VAR_DECL(expected_vld3_2,hfloat,16,4) [] = { 0xc800, 0xc700, 0xc600, 0xc500 };
 VECT_VAR_DECL(expected_vld3_2,hfloat,32,2) [] = { 0xc1400000, 0xc1300000 };
 VECT_VAR_DECL(expected_vld3_2,int,8,16) [] = { 0x10, 0x11, 0x12, 0x13,
 					       0x14, 0x15, 0x16, 0x17,
@@ -204,6 +217,8 @@ VECT_VAR_DECL(expected_vld3_2,poly,8,16) [] = { 0x10, 0x11, 0x12, 0x13,
 						0x1c, 0x1d, 0x1e, 0x1f };
 VECT_VAR_DECL(expected_vld3_2,poly,16,8) [] = { 0x0, 0x1, 0x2, 0x3,
 						0x4, 0x5, 0x6, 0x7 };
+VECT_VAR_DECL(expected_vld3_2,hfloat,16,8) [] = { 0x0000, 0x3c00, 0x4000, 0x4200,
+						  0x4400, 0x4500, 0x4600, 0x4700 };
 VECT_VAR_DECL(expected_vld3_2,hfloat,32,4) [] = { 0xc1000000, 0xc0e00000,
 						  0xc0c00000, 0xc0a00000 };
 
@@ -223,6 +238,7 @@ VECT_VAR_DECL(expected_vld4_0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected_vld4_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf4, 0xf5, 0xf6, 0xf7 };
 VECT_VAR_DECL(expected_vld4_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld4_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_vld4_0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf4, 0xf5, 0xf6, 0xf7,
@@ -246,6 +262,8 @@ VECT_VAR_DECL(expected_vld4_0,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 						0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected_vld4_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 						0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld4_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+						  0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected_vld4_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 						  0xc1600000, 0xc1500000 };
 
@@ -263,6 +281,7 @@ VECT_VAR_DECL(expected_vld4_1,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected_vld4_1,poly,8,8) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
 					       0xfc, 0xfd, 0xfe, 0xff };
 VECT_VAR_DECL(expected_vld4_1,poly,16,4) [] = { 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected_vld4_1,hfloat,16,4) [] = { 0xca00, 0xc980, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected_vld4_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 VECT_VAR_DECL(expected_vld4_1,int,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
 					       0x4, 0x5, 0x6, 0x7,
@@ -286,6 +305,8 @@ VECT_VAR_DECL(expected_vld4_1,poly,8,16) [] = { 0x0, 0x1, 0x2, 0x3,
 						0xc, 0xd, 0xe, 0xf };
 VECT_VAR_DECL(expected_vld4_1,poly,16,8) [] = { 0xfff8, 0xfff9, 0xfffa, 0xfffb,
 						0xfffc, 0xfffd, 0xfffe, 0xffff };
+VECT_VAR_DECL(expected_vld4_1,hfloat,16,8) [] = { 0xc800, 0xc700, 0xc600, 0xc500,
+						  0xc400, 0xc200, 0xc000, 0xbc00 };
 VECT_VAR_DECL(expected_vld4_1,hfloat,32,4) [] = { 0xc1400000, 0xc1300000,
 						  0xc1200000, 0xc1100000 };
 
@@ -303,6 +324,7 @@ VECT_VAR_DECL(expected_vld4_2,uint,64,1) [] = { 0xfffffffffffffff2 };
 VECT_VAR_DECL(expected_vld4_2,poly,8,8) [] = { 0x0, 0x1, 0x2, 0x3,
 					       0x4, 0x5, 0x6, 0x7 };
 VECT_VAR_DECL(expected_vld4_2,poly,16,4) [] = { 0xfff8, 0xfff9, 0xfffa, 0xfffb };
+VECT_VAR_DECL(expected_vld4_2,hfloat,16,4) [] = { 0xc800, 0xc700, 0xc600, 0xc500 };
 VECT_VAR_DECL(expected_vld4_2,hfloat,32,2) [] = { 0xc1400000, 0xc1300000 };
 VECT_VAR_DECL(expected_vld4_2,int,8,16) [] = { 0x10, 0x11, 0x12, 0x13,
 					       0x14, 0x15, 0x16, 0x17,
@@ -326,6 +348,8 @@ VECT_VAR_DECL(expected_vld4_2,poly,8,16) [] = { 0x10, 0x11, 0x12, 0x13,
 						0x1c, 0x1d, 0x1e, 0x1f };
 VECT_VAR_DECL(expected_vld4_2,poly,16,8) [] = { 0x0, 0x1, 0x2, 0x3,
 						0x4, 0x5, 0x6, 0x7 };
+VECT_VAR_DECL(expected_vld4_2,hfloat,16,8) [] = { 0x0000, 0x3c00, 0x4000, 0x4200,
+						  0x4400, 0x4500, 0x4600, 0x4700 };
 VECT_VAR_DECL(expected_vld4_2,hfloat,32,4) [] = { 0xc1000000, 0xc0e00000,
 						  0xc0c00000, 0xc0a00000 };
 
@@ -343,6 +367,7 @@ VECT_VAR_DECL(expected_vld4_3,uint,64,1) [] = { 0xfffffffffffffff3 };
 VECT_VAR_DECL(expected_vld4_3,poly,8,8) [] = { 0x8, 0x9, 0xa, 0xb,
 					       0xc, 0xd, 0xe, 0xf };
 VECT_VAR_DECL(expected_vld4_3,poly,16,4) [] = { 0xfffc, 0xfffd, 0xfffe, 0xffff };
+VECT_VAR_DECL(expected_vld4_3,hfloat,16,4) [] = { 0xc400, 0xc200, 0xc000, 0xbc00 };
 VECT_VAR_DECL(expected_vld4_3,hfloat,32,2) [] = { 0xc1200000, 0xc1100000 };
 VECT_VAR_DECL(expected_vld4_3,int,8,16) [] = { 0x20, 0x21, 0x22, 0x23,
 					       0x24, 0x25, 0x26, 0x27,
@@ -366,6 +391,8 @@ VECT_VAR_DECL(expected_vld4_3,poly,8,16) [] = { 0x20, 0x21, 0x22, 0x23,
 						0x2c, 0x2d, 0x2e, 0x2f };
 VECT_VAR_DECL(expected_vld4_3,poly,16,8) [] = { 0x8, 0x9, 0xa, 0xb,
 						0xc, 0xd, 0xe, 0xf };
+VECT_VAR_DECL(expected_vld4_3,hfloat,16,8) [] = { 0x4800, 0x4880, 0x4900, 0x4980,
+						  0x4a00, 0x4a80, 0x4b00, 0x4b80 };
 VECT_VAR_DECL(expected_vld4_3,hfloat,32,4) [] = { 0xc0800000, 0xc0400000,
 						  0xc0000000, 0xbf800000 };
 
@@ -398,7 +425,7 @@ void exec_vldX (void)
 	 sizeof(VECT_VAR(result, T1, W, N)));
 
   /* We need all variants in 64 bits, but there is no 64x2 variant.  */
-#define DECL_ALL_VLDX(X)			\
+#define DECL_ALL_VLDX_NO_FP16(X)		\
   DECL_VLDX(int, 8, 8, X);			\
   DECL_VLDX(int, 16, 4, X);			\
   DECL_VLDX(int, 32, 2, X);			\
@@ -420,7 +447,16 @@ void exec_vldX (void)
   DECL_VLDX(poly, 16, 8, X);			\
   DECL_VLDX(float, 32, 4, X)
 
-#define TEST_ALL_VLDX(X)			\
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define DECL_ALL_VLDX(X)	\
+  DECL_ALL_VLDX_NO_FP16(X);	\
+  DECL_VLDX(float, 16, 4, X);	\
+  DECL_VLDX(float, 16, 8, X)
+#else
+#define DECL_ALL_VLDX(X) DECL_ALL_VLDX_NO_FP16(X)
+#endif
+
+#define TEST_ALL_VLDX_NO_FP16(X)		\
   TEST_VLDX(, int, s, 8, 8, X);			\
   TEST_VLDX(, int, s, 16, 4, X);		\
   TEST_VLDX(, int, s, 32, 2, X);		\
@@ -442,7 +478,16 @@ void exec_vldX (void)
   TEST_VLDX(q, poly, p, 16, 8, X);		\
   TEST_VLDX(q, float, f, 32, 4, X)
 
-#define TEST_ALL_EXTRA_CHUNKS(X, Y)		\
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define TEST_ALL_VLDX(X)		\
+  TEST_ALL_VLDX_NO_FP16(X);		\
+  TEST_VLDX(, float, f, 16, 4, X);	\
+  TEST_VLDX(q, float, f, 16, 8, X)
+#else
+#define TEST_ALL_VLDX(X) TEST_ALL_VLDX_NO_FP16(X)
+#endif
+
+#define TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)	\
   TEST_EXTRA_CHUNK(int, 8, 8, X, Y);		\
   TEST_EXTRA_CHUNK(int, 16, 4, X, Y);		\
   TEST_EXTRA_CHUNK(int, 32, 2, X, Y);		\
@@ -464,9 +509,17 @@ void exec_vldX (void)
   TEST_EXTRA_CHUNK(poly, 16, 8, X, Y);		\
   TEST_EXTRA_CHUNK(float, 32, 4, X, Y)
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define TEST_ALL_EXTRA_CHUNKS(X, Y)		\
+  TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)		\
+  TEST_EXTRA_CHUNK(float, 16, 4, X, Y);		\
+  TEST_EXTRA_CHUNK(float, 16, 8, X, Y);
+#else
+#define TEST_ALL_EXTRA_CHUNKS(X, Y) TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)
+#endif
+
   /* vldX supports all vector types except [u]int64x2.  */
-#define CHECK_RESULTS_VLDX(test_name,EXPECTED,comment)			\
-  {									\
+#define CHECK_RESULTS_VLDX_NO_FP16(test_name,EXPECTED,comment)		\
     CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);		\
     CHECK(test_name, int, 32, 2, PRIx32, EXPECTED, comment);		\
@@ -487,8 +540,19 @@ void exec_vldX (void)
     CHECK(test_name, uint, 32, 4, PRIx32, EXPECTED, comment);		\
     CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment);		\
-    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);	\
-  }									\
+    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment)
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define CHECK_RESULTS_VLDX(test_name,EXPECTED,comment)			\
+  {									\
+    CHECK_RESULTS_VLDX_NO_FP16(test_name, EXPECTED, comment);		\
+    CHECK_FP(test_name, float, 16, 4, PRIx16, EXPECTED, comment);	\
+    CHECK_FP(test_name, float, 16, 8, PRIx16, EXPECTED, comment);	\
+  }
+#else
+#define CHECK_RESULTS_VLDX(test_name, EXPECTED, comment)		\
+  { CHECK_RESULTS_VLDX_NO_FP16(test_name, EXPECTED, comment); }
+#endif
 
   DECL_ALL_VLDX(2);
   DECL_ALL_VLDX(3);
@@ -516,6 +580,10 @@ void exec_vldX (void)
   PAD(buffer_vld2_pad, poly, 8, 8);
   VECT_ARRAY_INIT2(buffer_vld2, poly, 16, 4);
   PAD(buffer_vld2_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT2(buffer_vld2, float, 16, 4);
+  PAD(buffer_vld2_pad, float, 16, 4);
+#endif
   VECT_ARRAY_INIT2(buffer_vld2, float, 32, 2);
   PAD(buffer_vld2_pad, float, 32, 2);
 
@@ -539,6 +607,10 @@ void exec_vldX (void)
   PAD(buffer_vld2_pad, poly, 8, 16);
   VECT_ARRAY_INIT2(buffer_vld2, poly, 16, 8);
   PAD(buffer_vld2_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT2(buffer_vld2, float, 16, 8);
+  PAD(buffer_vld2_pad, float, 16, 8);
+#endif
   VECT_ARRAY_INIT2(buffer_vld2, float, 32, 4);
   PAD(buffer_vld2_pad, float, 32, 4);
 
@@ -563,6 +635,10 @@ void exec_vldX (void)
   PAD(buffer_vld3_pad, poly, 8, 8);
   VECT_ARRAY_INIT3(buffer_vld3, poly, 16, 4);
   PAD(buffer_vld3_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT3(buffer_vld3, float, 16, 4);
+  PAD(buffer_vld3_pad, float, 16, 4);
+#endif
   VECT_ARRAY_INIT3(buffer_vld3, float, 32, 2);
   PAD(buffer_vld3_pad, float, 32, 2);
 
@@ -586,6 +662,10 @@ void exec_vldX (void)
   PAD(buffer_vld3_pad, poly, 8, 16);
   VECT_ARRAY_INIT3(buffer_vld3, poly, 16, 8);
   PAD(buffer_vld3_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT3(buffer_vld3, float, 16, 8);
+  PAD(buffer_vld3_pad, float, 16, 8);
+#endif
   VECT_ARRAY_INIT3(buffer_vld3, float, 32, 4);
   PAD(buffer_vld3_pad, float, 32, 4);
 
@@ -610,6 +690,10 @@ void exec_vldX (void)
   PAD(buffer_vld4_pad, poly, 8, 8);
   VECT_ARRAY_INIT4(buffer_vld4, poly, 16, 4);
   PAD(buffer_vld4_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT4(buffer_vld4, float, 16, 4);
+  PAD(buffer_vld4_pad, float, 16, 4);
+#endif
   VECT_ARRAY_INIT4(buffer_vld4, float, 32, 2);
   PAD(buffer_vld4_pad, float, 32, 2);
 
@@ -633,6 +717,10 @@ void exec_vldX (void)
   PAD(buffer_vld4_pad, poly, 8, 16);
   VECT_ARRAY_INIT4(buffer_vld4, poly, 16, 8);
   PAD(buffer_vld4_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT4(buffer_vld4, float, 16, 8);
+  PAD(buffer_vld4_pad, float, 16, 8);
+#endif
   VECT_ARRAY_INIT4(buffer_vld4, float, 32, 4);
   PAD(buffer_vld4_pad, float, 32, 4);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
index c66dade8e45..e4cde46725f 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
@@ -18,6 +18,7 @@ VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
 					0xf0, 0xf1, 0xf0, 0xf1 };
 VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_0,hfloat,16,4) [] = {0xcc00, 0xcb80, 0xcc00, 0xcb80 };
 VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 
 /* vld2_dup/chunk 1.  */
@@ -35,6 +36,7 @@ VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
 					       0xf0, 0xf1, 0xf0, 0xf1 };
 VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xfff0, 0xfff1,
 						0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcc00, 0xcb80 };
 VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 
 /* vld3_dup/chunk 0.  */
@@ -54,6 +56,7 @@ VECT_VAR_DECL(expected_vld3_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf0,
 					       0xf1, 0xf2, 0xf0, 0xf1 };
 VECT_VAR_DECL(expected_vld3_0,poly,16,4) [] = { 0xfff0, 0xfff1,
 						0xfff2, 0xfff0 };
+VECT_VAR_DECL(expected_vld3_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xcc00 };
 VECT_VAR_DECL(expected_vld3_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 
 /* vld3_dup/chunk 1.  */
@@ -73,6 +76,7 @@ VECT_VAR_DECL(expected_vld3_1,poly,8,8) [] = { 0xf2, 0xf0, 0xf1, 0xf2,
 					       0xf0, 0xf1, 0xf2, 0xf0 };
 VECT_VAR_DECL(expected_vld3_1,poly,16,4) [] = { 0xfff1, 0xfff2,
 						0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld3_1,hfloat,16,4) [] = { 0xcb80, 0xcb00, 0xcc00, 0xcb80 };
 VECT_VAR_DECL(expected_vld3_1,hfloat,32,2) [] = { 0xc1600000, 0xc1800000 };
 
 /* vld3_dup/chunk 2.  */
@@ -92,6 +96,7 @@ VECT_VAR_DECL(expected_vld3_2,poly,8,8) [] = { 0xf1, 0xf2, 0xf0, 0xf1,
 					       0xf2, 0xf0, 0xf1, 0xf2 };
 VECT_VAR_DECL(expected_vld3_2,poly,16,4) [] = { 0xfff2, 0xfff0,
 						0xfff1, 0xfff2 };
+VECT_VAR_DECL(expected_vld3_2,hfloat,16,4) [] = { 0xcb00, 0xcc00, 0xcb80, 0xcb00 };
 VECT_VAR_DECL(expected_vld3_2,hfloat,32,2) [] = { 0xc1700000, 0xc1600000 };
 
 /* vld4_dup/chunk 0.  */
@@ -109,6 +114,7 @@ VECT_VAR_DECL(expected_vld4_0,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected_vld4_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf0, 0xf1, 0xf2, 0xf3 };
 VECT_VAR_DECL(expected_vld4_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld4_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 
 /* vld4_dup/chunk 1.  */
@@ -125,6 +131,7 @@ VECT_VAR_DECL(expected_vld4_1,uint,64,1) [] = { 0xfffffffffffffff1 };
 VECT_VAR_DECL(expected_vld4_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf0, 0xf1, 0xf2, 0xf3 };
 VECT_VAR_DECL(expected_vld4_1,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_1,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld4_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 
 /* vld4_dup/chunk 2.  */
@@ -141,6 +148,7 @@ VECT_VAR_DECL(expected_vld4_2,uint,64,1) [] = { 0xfffffffffffffff2 };
 VECT_VAR_DECL(expected_vld4_2,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf0, 0xf1, 0xf2, 0xf3 };
 VECT_VAR_DECL(expected_vld4_2,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_2,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld4_2,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 
 /* vld4_dup/chunk3.  */
@@ -157,6 +165,7 @@ VECT_VAR_DECL(expected_vld4_3,uint,64,1) [] = { 0xfffffffffffffff3 };
 VECT_VAR_DECL(expected_vld4_3,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xf0, 0xf1, 0xf2, 0xf3 };
 VECT_VAR_DECL(expected_vld4_3,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_3,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld4_3,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 
 void exec_vldX_dup (void)
@@ -188,7 +197,7 @@ void exec_vldX_dup (void)
 	 &(VECT_VAR(result_bis_##X, T1, W, N)[Y*N]),	\
 	 sizeof(VECT_VAR(result, T1, W, N)));
 
-#define DECL_ALL_VLDX_DUP(X)			\
+#define DECL_ALL_VLDX_DUP_NO_FP16(X)		\
   DECL_VLDX_DUP(int, 8, 8, X);			\
   DECL_VLDX_DUP(int, 16, 4, X);			\
   DECL_VLDX_DUP(int, 32, 2, X);			\
@@ -201,7 +210,15 @@ void exec_vldX_dup (void)
   DECL_VLDX_DUP(poly, 16, 4, X);		\
   DECL_VLDX_DUP(float, 32, 2, X)
 
-#define TEST_ALL_VLDX_DUP(X)			\
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define DECL_ALL_VLDX_DUP(X)		\
+  DECL_ALL_VLDX_DUP_NO_FP16(X);		\
+  DECL_VLDX_DUP(float, 16, 4, X)
+#else
+#define DECL_ALL_VLDX_DUP(X) DECL_ALL_VLDX_DUP_NO_FP16(X)
+#endif
+
+#define TEST_ALL_VLDX_DUP_NO_FP16(X)		\
   TEST_VLDX_DUP(, int, s, 8, 8, X);		\
   TEST_VLDX_DUP(, int, s, 16, 4, X);		\
   TEST_VLDX_DUP(, int, s, 32, 2, X);		\
@@ -214,7 +231,15 @@ void exec_vldX_dup (void)
   TEST_VLDX_DUP(, poly, p, 16, 4, X);		\
   TEST_VLDX_DUP(, float, f, 32, 2, X)
 
-#define TEST_ALL_EXTRA_CHUNKS(X, Y)		\
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define TEST_ALL_VLDX_DUP(X)		\
+  TEST_ALL_VLDX_DUP_NO_FP16(X);		\
+  TEST_VLDX_DUP(, float, f, 16, 4, X)
+#else
+#define TEST_ALL_VLDX_DUP(X) TEST_ALL_VLDX_DUP_NO_FP16(X)
+#endif
+
+#define TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)	\
   TEST_EXTRA_CHUNK(int, 8, 8, X, Y);		\
   TEST_EXTRA_CHUNK(int, 16, 4, X, Y);		\
   TEST_EXTRA_CHUNK(int, 32, 2, X, Y);		\
@@ -227,9 +252,16 @@ void exec_vldX_dup (void)
   TEST_EXTRA_CHUNK(poly, 16, 4, X, Y);		\
   TEST_EXTRA_CHUNK(float, 32, 2, X, Y)
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define TEST_ALL_EXTRA_CHUNKS(X, Y)	\
+  TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y);	\
+  TEST_EXTRA_CHUNK(float, 16, 4, X, Y)
+#else
+#define TEST_ALL_EXTRA_CHUNKS(X, Y) TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)
+#endif
+
   /* vldX_dup supports only 64-bit inputs.  */
-#define CHECK_RESULTS_VLDX_DUP(test_name,EXPECTED,comment)		\
-  {									\
+#define CHECK_RESULTS_VLDX_DUP_NO_FP16(test_name,EXPECTED,comment)	\
     CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);		\
     CHECK(test_name, int, 32, 2, PRIx32, EXPECTED, comment);		\
@@ -240,8 +272,20 @@ void exec_vldX_dup (void)
     CHECK(test_name, uint, 64, 1, PRIx64, EXPECTED, comment);		\
     CHECK(test_name, poly, 8, 8, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, poly, 16, 4, PRIx16, EXPECTED, comment);		\
-    CHECK_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment);	\
-  }									\
+    CHECK_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment)
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define CHECK_RESULTS_VLDX_DUP(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_RESULTS_VLDX_DUP_NO_FP16(test_name,EXPECTED,comment);		\
+    CHECK_FP(test_name, float, 16, 4, PRIx16, EXPECTED, comment);	\
+  }
+#else
+#define CHECK_RESULTS_VLDX_DUP(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_RESULTS_VLDX_DUP_NO_FP16(test_name,EXPECTED,comment);		\
+  }
+#endif
 
   DECL_ALL_VLDX_DUP(2);
   DECL_ALL_VLDX_DUP(3);
@@ -269,6 +313,10 @@ void exec_vldX_dup (void)
   PAD(buffer_vld2_pad, poly, 8, 8);
   VECT_ARRAY_INIT2(buffer_vld2, poly, 16, 4);
   PAD(buffer_vld2_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT2(buffer_vld2, float, 16, 4);
+  PAD(buffer_vld2_pad, float, 16, 4);
+#endif
   VECT_ARRAY_INIT2(buffer_vld2, float, 32, 2);
   PAD(buffer_vld2_pad, float, 32, 2);
 
@@ -292,6 +340,10 @@ void exec_vldX_dup (void)
   PAD(buffer_vld2_pad, poly, 8, 16);
   VECT_ARRAY_INIT2(buffer_vld2, poly, 16, 8);
   PAD(buffer_vld2_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT2(buffer_vld2, float, 16, 8);
+  PAD(buffer_vld2_pad, float, 16, 8);
+#endif
   VECT_ARRAY_INIT2(buffer_vld2, float, 32, 4);
   PAD(buffer_vld2_pad, float, 32, 4);
 
@@ -316,6 +368,10 @@ void exec_vldX_dup (void)
   PAD(buffer_vld3_pad, poly, 8, 8);
   VECT_ARRAY_INIT3(buffer_vld3, poly, 16, 4);
   PAD(buffer_vld3_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT3(buffer_vld3, float, 16, 4);
+  PAD(buffer_vld3_pad, float, 16, 4);
+#endif
   VECT_ARRAY_INIT3(buffer_vld3, float, 32, 2);
   PAD(buffer_vld3_pad, float, 32, 2);
 
@@ -339,6 +395,10 @@ void exec_vldX_dup (void)
   PAD(buffer_vld3_pad, poly, 8, 16);
   VECT_ARRAY_INIT3(buffer_vld3, poly, 16, 8);
   PAD(buffer_vld3_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT3(buffer_vld3, float, 16, 8);
+  PAD(buffer_vld3_pad, float, 16, 8);
+#endif
   VECT_ARRAY_INIT3(buffer_vld3, float, 32, 4);
   PAD(buffer_vld3_pad, float, 32, 4);
 
@@ -363,6 +423,10 @@ void exec_vldX_dup (void)
   PAD(buffer_vld4_pad, poly, 8, 8);
   VECT_ARRAY_INIT4(buffer_vld4, poly, 16, 4);
   PAD(buffer_vld4_pad, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT4(buffer_vld4, float, 16, 4);
+  PAD(buffer_vld4_pad, float, 16, 4);
+#endif
   VECT_ARRAY_INIT4(buffer_vld4, float, 32, 2);
   PAD(buffer_vld4_pad, float, 32, 2);
 
@@ -386,6 +450,10 @@ void exec_vldX_dup (void)
   PAD(buffer_vld4_pad, poly, 8, 16);
   VECT_ARRAY_INIT4(buffer_vld4, poly, 16, 8);
   PAD(buffer_vld4_pad, poly, 16, 8);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VECT_ARRAY_INIT4(buffer_vld4, float, 16, 8);
+  PAD(buffer_vld4_pad, float, 16, 8);
+#endif
   VECT_ARRAY_INIT4(buffer_vld4, float, 32, 4);
   PAD(buffer_vld4_pad, float, 32, 4);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c
index 2f2e62f0e3e..33b0eafbadb 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c
@@ -18,6 +18,7 @@ VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld2_0,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -29,6 +30,8 @@ VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld2_0,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa } ;
 VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -44,6 +47,7 @@ VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
 VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf0, 0xf1, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected_vld2_1,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld2_1,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xfff0, 0xfff1, 0xaaaa, 0xaaaa };
@@ -55,6 +59,8 @@ VECT_VAR_DECL(expected_vld2_1,uint,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld2_1,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xfff0, 0xfff1,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld2_1,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xcc00, 0xcb80, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld2_1,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -70,6 +76,7 @@ VECT_VAR_DECL(expected_vld3_0,uint,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld3_0,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld3_0,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld3_0,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld3_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_vld3_0,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -81,6 +88,8 @@ VECT_VAR_DECL(expected_vld3_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
 						0xfffffff2, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld3_0,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld3_0,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld3_0,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -96,6 +105,7 @@ VECT_VAR_DECL(expected_vld3_1,uint,32,2) [] = { 0xaaaaaaaa, 0xfffffff0 };
 VECT_VAR_DECL(expected_vld3_1,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xf0, 0xf1, 0xf2, 0xaa };
 VECT_VAR_DECL(expected_vld3_1,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld3_1,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xcc00, 0xcb80 };
 VECT_VAR_DECL(expected_vld3_1,hfloat,32,2) [] = { 0xc1600000, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld3_1,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -107,6 +117,8 @@ VECT_VAR_DECL(expected_vld3_1,uint,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld3_1,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xfff0 };
+VECT_VAR_DECL(expected_vld3_1,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld3_1,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						  0xc1800000, 0xc1700000 };
 
@@ -122,6 +134,7 @@ VECT_VAR_DECL(expected_vld3_2,uint,32,2) [] = { 0xfffffff1, 0xfffffff2 };
 VECT_VAR_DECL(expected_vld3_2,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld3_2,poly,16,4) [] = { 0xaaaa, 0xfff0, 0xfff1, 0xfff2 };
+VECT_VAR_DECL(expected_vld3_2,hfloat,16,4) [] = { 0xcb00, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld3_2,hfloat,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld3_2,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xfff0, 0xfff1,
 					       0xfff2, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -133,6 +146,8 @@ VECT_VAR_DECL(expected_vld3_2,uint,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld3_2,poly,16,8) [] = { 0xfff1, 0xfff2, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld3_2,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xcc00, 0xcb80,
+						  0xcb00, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld3_2,hfloat,32,4) [] = { 0xc1600000, 0xaaaaaaaa,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -148,6 +163,7 @@ VECT_VAR_DECL(expected_vld4_0,uint,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_0,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld4_0,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld4_0,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
 VECT_VAR_DECL(expected_vld4_0,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -159,6 +175,8 @@ VECT_VAR_DECL(expected_vld4_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
 						0xfffffff2, 0xfffffff3 };
 VECT_VAR_DECL(expected_vld4_0,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld4_0,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_0,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -174,6 +192,7 @@ VECT_VAR_DECL(expected_vld4_1,uint,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_1,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld4_1,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld4_1,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
 VECT_VAR_DECL(expected_vld4_1,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -185,6 +204,8 @@ VECT_VAR_DECL(expected_vld4_1,uint,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_1,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld4_1,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_1,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -200,6 +221,7 @@ VECT_VAR_DECL(expected_vld4_2,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
 VECT_VAR_DECL(expected_vld4_2,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld4_2,poly,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld4_2,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
 VECT_VAR_DECL(expected_vld4_2,hfloat,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_2,int,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -211,6 +233,8 @@ VECT_VAR_DECL(expected_vld4_2,uint,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_2,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_2,hfloat,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_2,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 						  0xc1600000, 0xc1500000 };
 
@@ -226,6 +250,7 @@ VECT_VAR_DECL(expected_vld4_3,uint,32,2) [] = { 0xfffffff2, 0xfffffff3 };
 VECT_VAR_DECL(expected_vld4_3,poly,8,8) [] = { 0xaa, 0xaa, 0xaa, 0xaa,
 					       0xaa, 0xaa, 0xaa, 0xaa };
 VECT_VAR_DECL(expected_vld4_3,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected_vld4_3,hfloat,16,4) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_3,hfloat,32,2) [] = { 0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_3,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 					       0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
@@ -237,6 +262,8 @@ VECT_VAR_DECL(expected_vld4_3,uint,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						0xaaaaaaaa, 0xaaaaaaaa };
 VECT_VAR_DECL(expected_vld4_3,poly,16,8) [] = { 0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa,
 						0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
+VECT_VAR_DECL(expected_vld4_3,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+						  0xaaaa, 0xaaaa, 0xaaaa, 0xaaaa };
 VECT_VAR_DECL(expected_vld4_3,hfloat,32,4) [] = { 0xaaaaaaaa, 0xaaaaaaaa,
 						  0xaaaaaaaa, 0xaaaaaaaa };
 
@@ -252,6 +279,9 @@ VECT_VAR_DECL_INIT(buffer_vld2_lane, uint, 32, 2);
 VECT_VAR_DECL_INIT(buffer_vld2_lane, uint, 64, 2);
 VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 8, 2);
 VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 16, 2);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT(buffer_vld2_lane, float, 16, 2);
+#endif
 VECT_VAR_DECL_INIT(buffer_vld2_lane, float, 32, 2);
 
 /* Input buffers for vld3_lane */
@@ -265,6 +295,9 @@ VECT_VAR_DECL_INIT(buffer_vld3_lane, uint, 32, 3);
 VECT_VAR_DECL_INIT(buffer_vld3_lane, uint, 64, 3);
 VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 8, 3);
 VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 16, 3);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT(buffer_vld3_lane, float, 16, 3);
+#endif
 VECT_VAR_DECL_INIT(buffer_vld3_lane, float, 32, 3);
 
 /* Input buffers for vld4_lane */
@@ -278,6 +311,9 @@ VECT_VAR_DECL_INIT(buffer_vld4_lane, uint, 32, 4);
 VECT_VAR_DECL_INIT(buffer_vld4_lane, uint, 64, 4);
 VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 8, 4);
 VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 16, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+VECT_VAR_DECL_INIT(buffer_vld4_lane, float, 16, 4);
+#endif
 VECT_VAR_DECL_INIT(buffer_vld4_lane, float, 32, 4);
 
 void exec_vldX_lane (void)
@@ -321,7 +357,7 @@ void exec_vldX_lane (void)
 	 sizeof(VECT_VAR(result, T1, W, N)));
 
   /* We need all variants in 64 bits, but there is no 64x2 variant.  */
-#define DECL_ALL_VLDX_LANE(X)			\
+#define DECL_ALL_VLDX_LANE_NO_FP16(X)		\
   DECL_VLDX_LANE(int, 8, 8, X);			\
   DECL_VLDX_LANE(int, 16, 4, X);		\
   DECL_VLDX_LANE(int, 32, 2, X);		\
@@ -338,6 +374,15 @@ void exec_vldX_lane (void)
   DECL_VLDX_LANE(float, 32, 2, X);		\
   DECL_VLDX_LANE(float, 32, 4, X)
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define DECL_ALL_VLDX_LANE(X)		\
+  DECL_ALL_VLDX_LANE_NO_FP16(X);	\
+  DECL_VLDX_LANE(float, 16, 4, X);	\
+  DECL_VLDX_LANE(float, 16, 8, X)
+#else
+#define DECL_ALL_VLDX_LANE(X) DECL_ALL_VLDX_LANE_NO_FP16(X)
+#endif
+
   /* Add some padding to try to catch out of bound accesses.  */
 #define ARRAY1(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[1]={42}
 #define DUMMY_ARRAY(V, T, W, N, L) \
@@ -346,7 +391,7 @@ void exec_vldX_lane (void)
 
   /* Use the same lanes regardless of the size of the array (X), for
      simplicity.  */
-#define TEST_ALL_VLDX_LANE(X)			\
+#define TEST_ALL_VLDX_LANE_NO_FP16(X)		\
   TEST_VLDX_LANE(, int, s, 8, 8, X, 7);		\
   TEST_VLDX_LANE(, int, s, 16, 4, X, 2);	\
   TEST_VLDX_LANE(, int, s, 32, 2, X, 0);	\
@@ -363,7 +408,16 @@ void exec_vldX_lane (void)
   TEST_VLDX_LANE(, float, f, 32, 2, X, 0);	\
   TEST_VLDX_LANE(q, float, f, 32, 4, X, 2)
 
-#define TEST_ALL_EXTRA_CHUNKS(X, Y)		\
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define TEST_ALL_VLDX_LANE(X)			\
+  TEST_ALL_VLDX_LANE_NO_FP16(X);		\
+  TEST_VLDX_LANE(, float, f, 16, 4, X, 2);	\
+  TEST_VLDX_LANE(q, float, f, 16, 8, X, 6)
+#else
+#define TEST_ALL_VLDX_LANE(X) TEST_ALL_VLDX_LANE_NO_FP16(X)
+#endif
+
+#define TEST_ALL_EXTRA_CHUNKS_NO_FP16(X,Y)	\
   TEST_EXTRA_CHUNK(int, 8, 8, X, Y);		\
   TEST_EXTRA_CHUNK(int, 16, 4, X, Y);		\
   TEST_EXTRA_CHUNK(int, 32, 2, X, Y);		\
@@ -380,9 +434,17 @@ void exec_vldX_lane (void)
   TEST_EXTRA_CHUNK(float, 32, 2, X, Y);		\
   TEST_EXTRA_CHUNK(float, 32, 4, X, Y)
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define TEST_ALL_EXTRA_CHUNKS(X,Y)		\
+  TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y);		\
+  TEST_EXTRA_CHUNK(float, 16, 4, X, Y);		\
+  TEST_EXTRA_CHUNK(float, 16, 8, X, Y)
+#else
+#define TEST_ALL_EXTRA_CHUNKS(X,Y) TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)
+#endif
+
   /* vldX_lane supports only a subset of all variants.  */
-#define CHECK_RESULTS_VLDX_LANE(test_name,EXPECTED,comment)		\
-  {									\
+#define CHECK_RESULTS_VLDX_LANE_NO_FP16(test_name,EXPECTED,comment)	\
     CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);		\
     CHECK(test_name, int, 32, 2, PRIx32, EXPECTED, comment);		\
@@ -397,8 +459,21 @@ void exec_vldX_lane (void)
     CHECK(test_name, uint, 16, 8, PRIx16, EXPECTED, comment);		\
     CHECK(test_name, uint, 32, 4, PRIx32, EXPECTED, comment);		\
     CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment);		\
-    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);	\
-  }									\
+    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment)
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define CHECK_RESULTS_VLDX_LANE(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_RESULTS_VLDX_LANE_NO_FP16(test_name,EXPECTED,comment);	\
+    CHECK_FP(test_name, float, 16, 4, PRIx16, EXPECTED, comment);	\
+    CHECK_FP(test_name, float, 16, 8, PRIx16, EXPECTED, comment);	\
+  }
+#else
+#define CHECK_RESULTS_VLDX_LANE(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_RESULTS_VLDX_LANE_NO_FP16(test_name,EXPECTED,comment);	\
+  }
+#endif
 
   /* Declare the temporary buffers / variables.  */
   DECL_ALL_VLDX_LANE(2);
@@ -419,6 +494,10 @@ void exec_vldX_lane (void)
   DUMMY_ARRAY(buffer_src, uint, 16, 8, 4);
   DUMMY_ARRAY(buffer_src, uint, 32, 4, 4);
   DUMMY_ARRAY(buffer_src, poly, 16, 8, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  DUMMY_ARRAY(buffer_src, float, 16, 4, 4);
+  DUMMY_ARRAY(buffer_src, float, 16, 8, 4);
+#endif
   DUMMY_ARRAY(buffer_src, float, 32, 2, 4);
   DUMMY_ARRAY(buffer_src, float, 32, 4, 4);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vset_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vset_lane.c
index 51594068364..e0499df5170 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vset_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vset_lane.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0x88 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf4, 0xf5, 0x55, 0xf7 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff1, 0x66, 0xfff3 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0x4840, 0xca80 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x4204cccd };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					0xf4, 0xf5, 0xf6, 0xf7,
@@ -41,6 +42,8 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
 					 0xfc, 0xfd, 0xdd, 0xff };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
 					 0xfff4, 0xfff5, 0xee, 0xfff7 };
+VECT_VAR_DECL(expected,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
+					   0xca00, 0x4480, 0xc900, 0xc880 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
 					   0xc1600000, 0x41333333 };
 
@@ -61,6 +64,10 @@ void exec_vset_lane (void)
 
   /* Initialize input "vector" from "buffer".  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  VLOAD(vector, buffer, , float, f, 16, 4);
+  VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
   VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
@@ -75,6 +82,9 @@ void exec_vset_lane (void)
   TEST_VSET_LANE(, uint, u, 64, 1, 0x88, 0);
   TEST_VSET_LANE(, poly, p, 8, 8, 0x55, 6);
   TEST_VSET_LANE(, poly, p, 16, 4, 0x66, 2);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VSET_LANE(, float, f, 16, 4, 8.5f, 2);
+#endif
   TEST_VSET_LANE(, float, f, 32, 2, 33.2f, 1);
 
   TEST_VSET_LANE(q, int, s, 8, 16, 0x99, 15);
@@ -87,6 +97,9 @@ void exec_vset_lane (void)
   TEST_VSET_LANE(q, uint, u, 64, 2, 0x11, 1);
   TEST_VSET_LANE(q, poly, p, 8, 16, 0xDD, 14);
   TEST_VSET_LANE(q, poly, p, 16, 8, 0xEE, 6);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VSET_LANE(q, float, f, 16, 8, 4.5f, 5);
+#endif
   TEST_VSET_LANE(q, float, f, 32, 4, 11.2f, 3);
 
   CHECK_RESULTS(TEST_MSG, "");
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c
index 08583b88cf3..825d07dbf77 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c
@@ -16,6 +16,7 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf6, 0x33, 0x33, 0x33,
 					0x33, 0x33, 0x33, 0x33 };
 VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff2, 0x3333, 0x3333, 0x3333 };
+VECT_VAR_DECL(expected,hfloat,16,4) [] = { 0xcb80, 0x3333, 0x3333, 0x3333 };
 VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0x33333333 };
 VECT_VAR_DECL(expected,int,8,16) [] = { 0xff, 0x33, 0x33, 0x33,
 					0x33, 0x33, 0x33, 0x33,
@@ -42,6 +43,8 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xfa, 0x33, 0x33, 0x33,
 					 0x33, 0x33, 0x33, 0x33 };
 VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff4, 0x3333, 0x3333, 0x3333,
 					 0x3333, 0x3333, 0x3333, 0x3333 };
+VECT_VAR_DECL(expected,hfloat,16,8) [] = { 0xc900, 0x3333, 0x3333, 0x3333,
+					   0x3333, 0x3333, 0x3333, 0x3333 };
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1700000, 0x33333333,
 					   0x33333333, 0x33333333 };
 
@@ -69,6 +72,9 @@ void exec_vst1_lane (void)
   TEST_VST1_LANE(, uint, u, 64, 1, 0);
   TEST_VST1_LANE(, poly, p, 8, 8, 6);
   TEST_VST1_LANE(, poly, p, 16, 4, 2);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VST1_LANE(, float, f, 16, 4, 1);
+#endif
   TEST_VST1_LANE(, float, f, 32, 2, 1);
 
   TEST_VST1_LANE(q, int, s, 8, 16, 15);
@@ -81,6 +87,9 @@ void exec_vst1_lane (void)
   TEST_VST1_LANE(q, uint, u, 64, 2, 0);
   TEST_VST1_LANE(q, poly, p, 8, 16, 10);
   TEST_VST1_LANE(q, poly, p, 16, 8, 4);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+  TEST_VST1_LANE(q, float, f, 16, 8, 6);
+#endif
   TEST_VST1_LANE(q, float, f, 32, 4, 1);
 
   CHECK_RESULTS(TEST_MSG, "");
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c
new file mode 100644
index 00000000000..dbf5241b591
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c
@@ -0,0 +1,15 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+void
+f_vst2_lane_f16 (float16_t * p, float16x4x2_t v)
+{
+  /* { dg-error "lane 4 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  vst2_lane_f16 (p, v, 4);
+  /* { dg-error "lane -1 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  vst2_lane_f16 (p, v, -1);
+  return;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c
new file mode 100644
index 00000000000..e3c0296534b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c
@@ -0,0 +1,15 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+void
+f_vst2q_lane_f16 (float16_t * p, float16x8x2_t v)
+{
+  /* { dg-error "lane 8 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  vst2q_lane_f16 (p, v, 8);
+  /* { dg-error "lane -1 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  vst2q_lane_f16 (p, v, -1);
+  return;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c
new file mode 100644
index 00000000000..406dfd410a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c
@@ -0,0 +1,15 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+void
+f_vst3_lane_f16 (float16_t * p, float16x4x3_t v)
+{
+  /* { dg-error "lane 4 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  vst3_lane_f16 (p, v, 4);
+  /* { dg-error "lane -1 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  vst3_lane_f16 (p, v, -1);
+  return;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c
new file mode 100644
index 00000000000..4e8b24cff8a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c
@@ -0,0 +1,15 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+void
+f_vst3q_lane_f16 (float16_t * p, float16x8x3_t v)
+{
+  /* { dg-error "lane 8 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  vst3q_lane_f16 (p, v, 8);
+  /* { dg-error "lane -1 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  vst3q_lane_f16 (p, v, -1);
+  return;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c
new file mode 100644
index 00000000000..0fe65116712
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c
@@ -0,0 +1,15 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+void
+f_vst4_lane_f16 (float16_t * p, float16x4x4_t v)
+{
+  /* { dg-error "lane 4 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  vst4_lane_f16 (p, v, 4);
+  /* { dg-error "lane -1 out of range 0 - 3" "" { xfail arm*-*-* } 0 } */
+  vst4_lane_f16 (p, v, -1);
+  return;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c
new file mode 100644
index 00000000000..9a5f09aa5fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c
@@ -0,0 +1,15 @@
+#include <arm_neon.h>
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-excess-errors "" { xfail arm*-*-* } } */
+
+void
+f_vst4q_lane_f16 (float16_t * p, float16x8x4_t v)
+{
+  /* { dg-error "lane 8 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  vst4q_lane_f16 (p, v, 8);
+  /* { dg-error "lane -1 out of range 0 - 7" "" { xfail arm*-*-* } 0 } */
+  vst4q_lane_f16 (p, v, -1);
+  return;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c b/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c
index bbb4c6f9d04..ffa4d229922 100644
--- a/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c
+++ b/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c
@@ -1,15 +1,23 @@
-/* { dg-do run } */
-
-#include <stdio.h>
-#include <assert.h>
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
 
 #define align (1ul << __ARM_ALIGN_MAX_PWR)
 static int x __attribute__ ((aligned (align)));
+static int y __attribute__ ((aligned (align)));
+
+extern void foo (int *x, int *y);
+extern int bar (int x, int y);
 
 int
-main ()
+dummy ()
 {
-  assert ((((unsigned long)&x) & (align - 1)) == 0);
+  int result;
 
-  return 0;
+  foo (&x, &y);
+  result = bar (x, y);
+
+  return result;
 }
+
+/* { dg-final { scan-assembler-times "zero\t4" 2 } } */
+/* { dg-final { scan-assembler "zero\t268435452" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c b/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c
index 7a6355b054e..7f356fe300a 100644
--- a/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c
+++ b/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c
@@ -1,15 +1,20 @@
-/* { dg-do run } */
-
-#include <stdio.h>
-#include <assert.h>
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
 
 #define align (1ul << __ARM_ALIGN_MAX_STACK_PWR)
+extern void foo (int *x);
+extern int bar (int x);
 
 int
-main ()
+dummy ()
 {
   int x __attribute__ ((aligned (align)));
+  int result;
+
+  foo (&x);
+  result = bar (x);
 
-  assert ((((unsigned long)&x) & (align - 1)) == 0);
-  return 0;
+  return result;
 }
+
+/* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, -65536" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gcc.target/aarch64/mod_2.c
new file mode 100644
index 00000000000..2645c18e741
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+#include "mod_2.x"
+
+/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
+/* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.x b/gcc/testsuite/gcc.target/aarch64/mod_2.x
new file mode 100644
index 00000000000..2b079a4b883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_2.x
@@ -0,0 +1,5 @@
+int
+f (int x)
+{
+  return x % 2;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.c b/gcc/testsuite/gcc.target/aarch64/mod_256.c
new file mode 100644
index 00000000000..567332c04e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_256.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+#include "mod_256.x"
+
+/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_256.x b/gcc/testsuite/gcc.target/aarch64/mod_256.x
new file mode 100644
index 00000000000..c1de42ce389
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mod_256.x
@@ -0,0 +1,5 @@
+int
+f (int x)
+{
+  return x % 256;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pic-small.c b/gcc/testsuite/gcc.target/aarch64/pic-small.c
index 282e4d073c0..2ea056af27d 100644
--- a/gcc/testsuite/gcc.target/aarch64/pic-small.c
+++ b/gcc/testsuite/gcc.target/aarch64/pic-small.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target aarch64_small_fpic } */
 /* { dg-options "-O2 -fpic -fno-inline --save-temps" } */
+/* { dg-skip-if "-fpic for AArch64 small code model" { aarch64*-*-* }  { "-mcmodel=tiny" "-mcmodel=large" } { "" } } */
 
 void abort ();
 int global_a;
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_high_1.c b/gcc/testsuite/gcc.target/aarch64/vget_high_1.c
index 4cb872da2cd..b6b57e0c546 100644
--- a/gcc/testsuite/gcc.target/aarch64/vget_high_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vget_high_1.c
@@ -14,6 +14,7 @@ VARIANT (int8_t, 8, int8x8_t, int8x16_t, s8)		\
 VARIANT (int16_t, 4, int16x4_t, int16x8_t, s16)		\
 VARIANT (int32_t, 2, int32x2_t, int32x4_t, s32)		\
 VARIANT (int64_t, 1, int64x1_t, int64x2_t, s64)		\
+VARIANT (float16_t, 4, float16x4_t, float16x8_t, f16)	\
 VARIANT (float32_t, 2, float32x2_t, float32x4_t, f32)	\
 VARIANT (float64_t, 1, float64x1_t, float64x2_t, f64)
 
@@ -51,6 +52,8 @@ main (int argc, char **argv)
   int16_t int16_t_data[8] = { -17, 19, 3, -999, 44048, 505, 9999, 1000};
   int32_t int32_t_data[4] = { 123456789, -987654321, -135792468, 975318642 };
   int64_t int64_t_data[2] = {0xfedcba9876543210LL, 0xdeadbabecafebeefLL };
+  float16_t float16_t_data[8] = { 1.25, 4.5, 7.875, 2.3125, 5.675, 8.875,
+      3.6875, 6.75};
   float32_t float32_t_data[4] = { 3.14159, 2.718, 1.414, 100.0 };
   float64_t float64_t_data[2] = { 1.01001000100001, 12345.6789 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_1.c b/gcc/testsuite/gcc.target/aarch64/vget_low_1.c
index f8016ef7312..2223676521c 100644
--- a/gcc/testsuite/gcc.target/aarch64/vget_low_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vget_low_1.c
@@ -14,6 +14,7 @@ VARIANT (int8_t, 8, int8x8_t, int8x16_t, s8)		\
 VARIANT (int16_t, 4, int16x4_t, int16x8_t, s16)		\
 VARIANT (int32_t, 2, int32x2_t, int32x4_t, s32)		\
 VARIANT (int64_t, 1, int64x1_t, int64x2_t, s64)		\
+VARIANT (float16_t, 4, float16x4_t, float16x8_t, f16)	\
 VARIANT (float32_t, 2, float32x2_t, float32x4_t, f32)	\
 VARIANT (float64_t, 1, float64x1_t, float64x2_t, f64)
 
@@ -51,6 +52,8 @@ main (int argc, char **argv)
   int16_t int16_t_data[8] = { -17, 19, 3, -999, 44048, 505, 9999, 1000};
   int32_t int32_t_data[4] = { 123456789, -987654321, -135792468, 975318642 };
   int64_t int64_t_data[2] = {0xfedcba9876543210LL, 0xdeadbabecafebeefLL };
+  float16_t float16_t_data[8] = { 1.25, 4.5, 7.875, 2.3125, 5.675, 8.875,
+      3.6875, 6.75};
   float32_t float32_t_data[4] = { 3.14159, 2.718, 1.414, 100.0 };
   float64_t float64_t_data[2] = { 1.01001000100001, 12345.6789 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c b/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c
index f8c6edb3bcf..fa9ef0f4e43 100644
--- a/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vld1-vst1_1.c
@@ -31,6 +31,7 @@ THING (int8x8_t, 8, int8_t, _s8)	\
 THING (uint8x8_t, 8, uint8_t, _u8)	\
 THING (int16x4_t, 4, int16_t, _s16)	\
 THING (uint16x4_t, 4, uint16_t, _u16)	\
+THING (float16x4_t, 4, float16_t, _f16)	\
 THING (int32x2_t, 2, int32_t, _s32)	\
 THING (uint32x2_t, 2, uint32_t, _u32)	\
 THING (float32x2_t, 2, float32_t, _f32) \
@@ -38,6 +39,7 @@ THING (int8x16_t, 16, int8_t, q_s8)	\
 THING (uint8x16_t, 16, uint8_t, q_u8)	\
 THING (int16x8_t, 8, int16_t, q_s16)	\
 THING (uint16x8_t, 8, uint16_t, q_u16)	\
+THING (float16x8_t, 8, float16_t, q_f16)\
 THING (int32x4_t, 4, int32_t, q_s32)	\
 THING (uint32x4_t, 4, uint32_t, q_u32)	\
 THING (float32x4_t, 4, float32_t, q_f32)\
diff --git a/gcc/testsuite/gcc.target/aarch64/vld1_lane.c b/gcc/testsuite/gcc.target/aarch64/vld1_lane.c
index 463c88c0a5f..c70df7135c1 100644
--- a/gcc/testsuite/gcc.target/aarch64/vld1_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/vld1_lane.c
@@ -16,6 +16,7 @@ VARIANT (int32, , 2, _s32, 0)	\
 VARIANT (int64, , 1, _s64, 0)	\
 VARIANT (poly8, , 8, _p8, 7)	\
 VARIANT (poly16, , 4, _p16, 2)	\
+VARIANT (float16, , 4, _f16, 3)	\
 VARIANT (float32, , 2, _f32, 1)	\
 VARIANT (float64, , 1, _f64, 0)	\
 VARIANT (uint8, q, 16, _u8, 13)	\
@@ -28,6 +29,7 @@ VARIANT (int32, q, 4, _s32, 1)	\
 VARIANT (int64, q, 2, _s64, 1)	\
 VARIANT (poly8, q, 16, _p8, 7)	\
 VARIANT (poly16, q, 8, _p16, 4)	\
+VARIANT (float16, q, 8, _f16, 3)\
 VARIANT (float32, q, 4, _f32, 2)\
 VARIANT (float64, q, 2, _f64, 1)
 
@@ -76,6 +78,7 @@ main (int argc, char **argv)
   int64_t int64_data = 0x1234567890abcdefLL;
   poly8_t poly8_data = 13;
   poly16_t poly16_data = 11111;
+  float16_t float16_data = 8.75;
   float32_t float32_data = 3.14159;
   float64_t float64_data = 1.010010001;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vldN_1.c b/gcc/testsuite/gcc.target/aarch64/vldN_1.c
index b64de16a165..caac94f86ce 100644
--- a/gcc/testsuite/gcc.target/aarch64/vldN_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vldN_1.c
@@ -39,6 +39,7 @@ VARIANT (int32, 2, STRUCT, _s32)	\
 VARIANT (int64, 1, STRUCT, _s64)	\
 VARIANT (poly8, 8, STRUCT, _p8)		\
 VARIANT (poly16, 4, STRUCT, _p16)	\
+VARIANT (float16, 4, STRUCT, _f16)	\
 VARIANT (float32, 2, STRUCT, _f32)	\
 VARIANT (float64, 1, STRUCT, _f64)	\
 VARIANT (uint8, 16, STRUCT, q_u8)	\
@@ -51,6 +52,7 @@ VARIANT (int32, 4, STRUCT, q_s32)	\
 VARIANT (int64, 2, STRUCT, q_s64)	\
 VARIANT (poly8, 16, STRUCT, q_p8)	\
 VARIANT (poly16, 8, STRUCT, q_p16)	\
+VARIANT (float16, 8, STRUCT, q_f16)	\
 VARIANT (float32, 4, STRUCT, q_f32)	\
 VARIANT (float64, 2, STRUCT, q_f64)
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vldN_dup_1.c b/gcc/testsuite/gcc.target/aarch64/vldN_dup_1.c
index 9af0565d617..68c3fc34f5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/vldN_dup_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vldN_dup_1.c
@@ -16,6 +16,7 @@ VARIANT (int32, , 2, _s32, STRUCT)	\
 VARIANT (int64, , 1, _s64, STRUCT)	\
 VARIANT (poly8, , 8, _p8, STRUCT)	\
 VARIANT (poly16, , 4, _p16, STRUCT)	\
+VARIANT (float16, , 4, _f16, STRUCT)	\
 VARIANT (float32, , 2, _f32, STRUCT)	\
 VARIANT (float64, , 1, _f64, STRUCT)	\
 VARIANT (uint8, q, 16, _u8, STRUCT)	\
@@ -28,6 +29,7 @@ VARIANT (int32, q, 4, _s32, STRUCT)	\
 VARIANT (int64, q, 2, _s64, STRUCT)	\
 VARIANT (poly8, q, 16, _p8, STRUCT)	\
 VARIANT (poly16, q, 8, _p16, STRUCT)	\
+VARIANT (float16, q, 8, _f16, STRUCT)	\
 VARIANT (float32, q, 4, _f32, STRUCT)	\
 VARIANT (float64, q, 2, _f64, STRUCT)
 
@@ -74,6 +76,7 @@ main (int argc, char **argv)
   int64_t *int64_data = (int64_t *)uint64_data;
   poly8_t poly8_data[4] = { 0, 7, 13, 18, };
   poly16_t poly16_data[4] = { 11111, 2222, 333, 44 };
+  float16_t float16_data[4] = { 1.0625, 3.125, 0.03125, 7.75 };
   float32_t float32_data[4] = { 3.14159, 2.718, 1.414, 100.0 };
   float64_t float64_data[4] = { 1.010010001, 12345.6789, -9876.54321, 1.618 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vldN_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vldN_lane_1.c
index 13ab45459f4..6837a116117 100644
--- a/gcc/testsuite/gcc.target/aarch64/vldN_lane_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vldN_lane_1.c
@@ -16,6 +16,7 @@ VARIANT (int32, , 2, _s32, 0, STRUCT)	\
 VARIANT (int64, , 1, _s64, 0, STRUCT)	\
 VARIANT (poly8, , 8, _p8, 7, STRUCT)	\
 VARIANT (poly16, , 4, _p16, 1, STRUCT)	\
+VARIANT (float16, , 4, _f16, 3, STRUCT)	\
 VARIANT (float32, , 2, _f32, 1, STRUCT)	\
 VARIANT (float64, , 1, _f64, 0, STRUCT)	\
 VARIANT (uint8, q, 16, _u8, 14, STRUCT)	\
@@ -28,6 +29,7 @@ VARIANT (int32, q, 4, _s32, 2, STRUCT)	\
 VARIANT (int64, q, 2, _s64, 1, STRUCT)	\
 VARIANT (poly8, q, 16, _p8, 12, STRUCT)	\
 VARIANT (poly16, q, 8, _p16, 5, STRUCT)	\
+VARIANT (float16, q, 8, _f16, 7, STRUCT)\
 VARIANT (float32, q, 4, _f32, 1, STRUCT)\
 VARIANT (float64, q, 2, _f64, 0, STRUCT)
 
@@ -71,7 +73,7 @@ main (int argc, char **argv)
 {
   /* Original data for all vector formats.  */
   uint64_t orig_data[8] = {0x1234567890abcdefULL, 0x13579bdf02468aceULL,
-			   0x012389ab4567cdefULL, 0xfeeddadacafe0431ULL,
+			   0x012389ab4567cdefULL, 0xdeeddadacafe0431ULL,
 			   0x1032547698badcfeULL, 0xbadbadbadbad0badULL,
 			   0x0102030405060708ULL, 0x0f0e0d0c0b0a0908ULL};
 
@@ -87,6 +89,7 @@ main (int argc, char **argv)
   int64_t *int64_data = (int64_t *)uint64_data;
   poly8_t poly8_data[4] = { 0, 7, 13, 18, };
   poly16_t poly16_data[4] = { 11111, 2222, 333, 44 };
+  float16_t float16_data[4] = { 0.8125, 7.5, 19, 0.046875 };
   float32_t float32_data[4] = { 3.14159, 2.718, 1.414, 100.0 };
   float64_t float64_data[4] = { 1.010010001, 12345.6789, -9876.54321, 1.618 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c
index 5fb11399f20..bc0132c20a7 100644
--- a/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vset_lane_1.c
@@ -16,6 +16,7 @@ VARIANT (int32_t, , 2, int32x2_t, _s32, 0)	\
 VARIANT (int64_t, , 1, int64x1_t, _s64, 0)	\
 VARIANT (poly8_t, , 8, poly8x8_t, _p8, 6)	\
 VARIANT (poly16_t, , 4, poly16x4_t, _p16, 2)	\
+VARIANT (float16_t, , 4, float16x4_t, _f16, 3)	\
 VARIANT (float32_t, , 2, float32x2_t, _f32, 1)	\
 VARIANT (float64_t, , 1, float64x1_t, _f64, 0)	\
 VARIANT (uint8_t, q, 16, uint8x16_t, _u8, 11)	\
@@ -28,6 +29,7 @@ VARIANT (int32_t, q, 4, int32x4_t, _s32, 3)	\
 VARIANT (int64_t, q, 2, int64x2_t, _s64, 0)	\
 VARIANT (poly8_t, q, 16, poly8x16_t, _p8, 14)	\
 VARIANT (poly16_t, q, 8, poly16x8_t, _p16, 6)	\
+VARIANT (float16_t, q, 8, float16x8_t, _f16, 6)	\
 VARIANT (float32_t, q, 4, float32x4_t, _f32, 2) \
 VARIANT (float64_t, q, 2, float64x2_t, _f64, 1)
 
@@ -76,6 +78,9 @@ main (int argc, char **argv)
   poly8_t poly8_t_data[16] =
       { 0, 7, 13, 18, 22, 25, 27, 28, 29, 31, 34, 38, 43, 49, 56, 64 };
   poly16_t poly16_t_data[8] = { 11111, 2222, 333, 44, 5, 65432, 54321, 43210 };
+  float16_t float16_t_data[8] = { 1.25, 4.5, 7.875, 2.3125, 5.675, 8.875,
+      3.6875, 6.75};
+
   float32_t float32_t_data[4] = { 3.14159, 2.718, 1.414, 100.0 };
   float64_t float64_t_data[2] = { 1.01001000100001, 12345.6789 };
 
diff --git a/gcc/testsuite/gcc.target/arm/mod_2.c b/gcc/testsuite/gcc.target/arm/mod_2.c
new file mode 100644
index 00000000000..93017a10683
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mod_2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+#include "../aarch64/mod_2.x"
+
+/* { dg-final { scan-assembler "rsblt\tr\[0-9\]*" } } */
+/* { dg-final { scan-assembler-times "and\tr\[0-9\].*1" 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mod_256.c b/gcc/testsuite/gcc.target/arm/mod_256.c
new file mode 100644
index 00000000000..ccb7f3cf68d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mod_256.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+#include "../aarch64/mod_256.x"
+
+/* { dg-final { scan-assembler "rsbpl\tr\[0-9\]*" } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/pr63210.c b/gcc/testsuite/gcc.target/arm/pr63210.c
index c3ae92801f5..9b63a67d3f0 100644
--- a/gcc/testsuite/gcc.target/arm/pr63210.c
+++ b/gcc/testsuite/gcc.target/arm/pr63210.c
@@ -1,6 +1,8 @@
 /* { dg-do assemble } */
 /* { dg-options "-mthumb -Os " }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-skip-if "do not test on armv4t" { *-*-* } { "-march=armv4t" } } */
+/* { dg-additional-options "-march=armv5t" {target arm_arch_v5t_ok} } */
 
 int foo1 (int c);
 int foo2 (int c);
diff --git a/gcc/testsuite/gcc.target/arm/pr67439_1.c b/gcc/testsuite/gcc.target/arm/pr67439_1.c
new file mode 100644
index 00000000000..f7a6128758a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr67439_1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-O1 -mfp16-format=ieee -march=armv7-a -mfpu=neon -mthumb -mrestrict-it" } */
+
+__fp16 h0 = -1.0;
+
+void
+f (__fp16 *p)
+{
+  h0 = 1.0;
+}
diff --git a/gcc/testsuite/gcc.target/avr/pr65210.c b/gcc/testsuite/gcc.target/avr/pr65210.c
new file mode 100644
index 00000000000..1aed4417c1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr65210.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+
+/* This testcase exposes PR65210. Usage of the io_low attribute
+   causes assertion failure because code only looks for the io
+   attribute if SYMBOL_FLAG_IO is set. */
+
+volatile char q __attribute__((io_low,address(0x81)));
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-scatter-1.c b/gcc/testsuite/gcc.target/i386/avx512f-scatter-1.c
new file mode 100644
index 00000000000..575f3bfa70c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-scatter-1.c
@@ -0,0 +1,218 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mavx512f" } */
+
+#define AVX512F
+
+#include "avx512f-check.h"
+
+#define N 1024
+float vf1[N], vf2[2*N+16];
+double vd1[N], vd2[2*N+16];
+int vi1[N], vi2[2*N+16], k[N];
+long vl1[N], vl2[2*N+16], l[N];
+
+__attribute__((noinline, noclone)) void
+f1 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[k[i]] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f2 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vi2[k[i]] = vi1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f3 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[k[i] + x] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f4 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vi2[k[i] + x] = vi1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f5 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[k[i]] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f6 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vl2[k[i]] = vl1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f7 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[k[i] + x] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f8 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vl2[k[i] + x] = vl1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f9 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[l[i]] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f10 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vi2[l[i]] = vi1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f11 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[l[i] + x] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f12 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vi2[l[i] + x] = vi1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f13 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[l[i]] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f14 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vl2[l[i]] = vl1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f15 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[l[i] + x] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f16 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vl2[l[i] + x] = vl1[i];
+}
+
+static void
+avx512f_test (void)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      asm ("");
+      vf1[i] = 17.0f + i;
+      vd1[i] = 19.0 + i;
+      vi1[i] = 21 + i;
+      vl1[i] = 23L + i;
+    }
+  for (i = 0; i < N; i++)
+    {
+      asm ("");
+      k[i] = (i % 2) ? (N / 2 + i) : (N / 2 - i / 2);
+      l[i] = 2 * i + i % 2;
+    }
+
+  f1 ();
+  f2 ();
+  for (i = 0; i < N; i++)
+    if (vf2[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 17
+	|| vi2[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 21)
+      abort ();
+
+  f3 (12);
+  f4 (14);
+  for (i = 0; i < N; i++)
+    if (vf2[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 12] != i + 17
+	|| vi2[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 14] != i + 21)
+      abort ();
+
+  f5 ();
+  f6 ();
+  for (i = 0; i < N; i++)
+    if (vd2[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 19
+	|| vl2[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 23)
+      abort ();
+
+  f7 (7);
+  f8 (9);
+  for (i = 0; i < N; i++)
+    if (vd2[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 7] != i + 19
+	|| vl2[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 9] != i + 23)
+      abort ();
+
+  f9 ();
+  f10 ();
+  for (i = 0; i < N; i++)
+    if (vf2[2 * i + i % 2] != i + 17
+	|| vi2[2 * i + i % 2] != i + 21)
+      abort ();
+
+  f11 (2);
+  f12 (4);
+  for (i = 0; i < N; i++)
+    if (vf2[2 * i + i % 2 + 2] != i + 17
+	|| vi2[2 * i + i % 2 + 4] != i + 21)
+      abort ();
+
+  f13 ();
+  f14 ();
+  for (i = 0; i < N; i++)
+    if (vd2[2 * i + i % 2] != i + 19
+	|| vl2[2 * i + i % 2] != i + 23)
+      abort ();
+
+  f15 (13);
+  f16 (15);
+  for (i = 0; i < N; i++)
+    if (vd2[2 * i + i % 2 + 13] != i + 19
+	|| vl2[2 * i + i % 2 + 15] != i + 23)
+      abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-scatter-2.c b/gcc/testsuite/gcc.target/i386/avx512f-scatter-2.c
new file mode 100644
index 00000000000..c59ce234f17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-scatter-2.c
@@ -0,0 +1,217 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mavx512f" } */
+
+#define AVX512F
+
+#include "avx512f-check.h"
+
+#define N 1024
+float vf1[N], vf2[2*N+16];
+double vd1[N], vd2[2*N+16];
+int k[N];
+long l[N];
+short n[2*N+16];
+
+__attribute__((noinline, noclone)) void
+f1 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[k[i]] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f2 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[k[i]] = (int) vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f3 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[k[i] + x] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f4 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[k[i] + x] = (int) vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f5 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[k[i]] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f6 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[k[i]] = (int) vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f7 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[k[i] + x] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f8 (int x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[k[i] + x] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f9 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[l[i]] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f10 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[l[i]] = (int) vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f11 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vf2[l[i] + x] = vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f12 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[l[i] + x] = (int) vf1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f13 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[l[i]] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f14 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[l[i]] = (int) vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f15 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    vd2[l[i] + x] = vd1[i];
+}
+
+__attribute__((noinline, noclone)) void
+f16 (long x)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    n[l[i] + x] = (int) vd1[i];
+}
+
+static void
+avx512f_test (void)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      asm ("");
+      vf1[i] = 17.0f + i;
+      vd1[i] = 19.0 + i;
+    }
+  for (i = 0; i < N; i++)
+    {
+      asm ("");
+      k[i] = (i % 2) ? (N / 2 + i) : (N / 2 - i / 2);
+      l[i] = 2 * i + i % 2;
+    }
+
+  f1 ();
+  f2 ();
+  for (i = 0; i < N; i++)
+    if (vf2[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 17
+	|| n[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 17)
+      abort ();
+
+  f3 (12);
+  f4 (14);
+  for (i = 0; i < N; i++)
+    if (vf2[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 12] != i + 17
+	|| n[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 14] != i + 17)
+      abort ();
+
+  f5 ();
+  f6 ();
+  for (i = 0; i < N; i++)
+    if (vd2[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 19
+	|| n[(i % 2) ? (N / 2 + i) : (N / 2 - i / 2)] != i + 19)
+      abort ();
+
+  f7 (7);
+  f8 (9);
+  for (i = 0; i < N; i++)
+    if (vd2[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 7] != i + 19
+	|| n[((i % 2) ? (N / 2 + i) : (N / 2 - i / 2)) + 9] != i + 19)
+      abort ();
+
+  f9 ();
+  f10 ();
+  for (i = 0; i < N; i++)
+    if (vf2[2 * i + i % 2] != i + 17
+	|| n[2 * i + i % 2] != i + 17)
+      abort ();
+
+  f11 (2);
+  f12 (4);
+  for (i = 0; i < N; i++)
+    if (vf2[2 * i + i % 2 + 2] != i + 17
+	|| n[2 * i + i % 2 + 4] != i + 17)
+      abort ();
+
+  f13 ();
+  f14 ();
+  for (i = 0; i < N; i++)
+    if (vd2[2 * i + i % 2] != i + 19
+	|| n[2 * i + i % 2] != i + 19)
+      abort ();
+
+  f15 (13);
+  f16 (15);
+  for (i = 0; i < N; i++)
+    if (vd2[2 * i + i % 2 + 13] != i + 19
+	|| n[2 * i + i % 2 + 15] != i + 19)
+      abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-scatter-3.c b/gcc/testsuite/gcc.target/i386/avx512f-scatter-3.c
new file mode 100644
index 00000000000..37137fd268a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-scatter-3.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mavx512f" } */
+
+#define AVX512F
+
+#include "avx512f-check.h"
+
+#define N 1024
+int a[N], b[N];
+
+__attribute__((noinline, noclone)) void
+foo (float *__restrict p, float *__restrict q,
+     int s1, int s2, int s3)
+{
+  int i;
+  for (i = 0; i < (N / 8); i++)
+    p[a[i] * s1 + b[i] * s2 + s3] = q[i];
+}
+
+static void
+avx512f_test (void)
+{
+  int i;
+  float c[N], d[N];
+  for (i = 0; i < N; i++)
+    {
+      a[i] = (i * 7) & (N / 8 - 1);
+      b[i] = (i * 13) & (N / 8 - 1);
+      c[i] = 179.13 + i;
+    }
+  foo (d, c, 3, 2, 4);
+  for (i = 0; i < (N / 8); i++)
+    if (d[a[i] * 3 + b[i] * 2 + 4] != (float) (179.13 + i))
+      abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c b/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c
new file mode 100644
index 00000000000..7463781281e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c
@@ -0,0 +1,29 @@
+/* { dg-do run { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } }
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -mcpu=power8 -maltivec" } */
+
+/* The expansion for vector character multiply introduces a vperm operation.
+   This tests that the swap optimization to remove swaps by changing the
+   vperm mask results in correct code.  */
+
+#include <altivec.h>
+
+void abort ();
+
+vector unsigned char r;
+vector unsigned char v =
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
+vector unsigned char i =
+  { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
+vector unsigned char e =
+  {0, 2, 6, 12, 20, 30, 42, 56, 72, 90, 110, 132, 156, 182, 210, 240};
+
+int main ()
+{
+  int j;
+  r = v * i;
+  if (!vec_all_eq (r, e))
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/swaps-p8-21.c b/gcc/testsuite/gcc.target/powerpc/swaps-p8-21.c
new file mode 100644
index 00000000000..b981fcf2f36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/swaps-p8-21.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } }
+/* { dg-options "-O2 -mcpu=power8 -maltivec" } */
+
+/* The expansion for vector character multiply introduces a vperm operation.
+   This tests that changing the vperm mask allows us to remove all swaps
+   from the generated code.  */
+
+#include <altivec.h>
+
+void abort ();
+
+vector unsigned char r;
+vector unsigned char v =
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
+vector unsigned char i =
+  { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
+
+int main ()
+{
+  int j;
+  r = v * i;
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vperm" 1 } } */
+/* { dg-final { scan-assembler-not "xxpermdi" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c b/gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c
new file mode 100644
index 00000000000..4c9dbdc8188
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-mult-char-1.c
@@ -0,0 +1,53 @@
+/* { dg-do run { target { powerpc*-*-* && vmx_hw } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include <altivec.h>
+
+extern void abort (void);
+
+vector unsigned char vmului(vector unsigned char v,
+			    vector unsigned char i)
+{
+	return v * i;
+}
+
+vector signed char vmulsi(vector signed char v,
+			  vector signed char i)
+{
+	return v * i;
+}
+
+int main ()
+{
+  vector unsigned char a = {2, 4, 6, 8, 10, 12, 14, 16,
+			    18, 20, 22, 24, 26, 28, 30, 32};
+  vector unsigned char b = {3, 6, 9, 12, 15, 18, 21, 24,
+			    27, 30, 33, 36, 39, 42, 45, 48};
+  vector unsigned char c = vmului (a, b);
+  vector unsigned char expect_c = {6, 24, 54, 96, 150, 216, 38, 128,
+				   230, 88, 214, 96, 246, 152, 70, 0};
+
+  vector signed char d = {2, -4, 6, -8, 10, -12, 14, -16,
+			  18, -20, 22, -24, 26, -28, 30, -32};
+  vector signed char e = {3, 6, -9, -12, 15, 18, -21, -24,
+			  27, 30, -33, -36, 39, 42, -45, -48};
+  vector signed char f = vmulsi (d, e);
+  vector signed char expect_f = {6, -24, -54, 96, -106, 40, -38, -128,
+				 -26, -88, 42, 96, -10, 104, -70, 0};
+
+  vector signed char g = {127, -128, 126, -126, 125, -125,  124, -124,
+			  123, -123, 122, -122, 121, -121,  120, -120};
+  vector signed char h = {  2,    2,  -2,   -2, 127,  127, -128, -128,
+			    10,  10, -10,  -10,  64,   65,  -64,  -65};
+  vector signed char i = vmulsi (g, h);
+  vector signed char expect_i = {-2, 0, 4, -4, 3, -3, 0, 0,
+				 -50, 50, 60, -60, 64, 71, 0, 120};
+
+  if (!vec_all_eq (c, expect_c))
+    abort ();
+  if (!vec_all_eq (f, expect_f))
+    abort ();
+  if (!vec_all_eq (i, expect_i))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-mult-char-2.c b/gcc/testsuite/gcc.target/powerpc/vec-mult-char-2.c
new file mode 100644
index 00000000000..04c67109bef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-mult-char-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { powerpc*-*-* && vmx_hw } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include <altivec.h>
+
+vector unsigned char vmului(vector unsigned char v,
+			    vector unsigned char i)
+{
+	return v * i;
+}
+
+vector signed char vmulsi(vector signed char v,
+			  vector signed char i)
+{
+	return v * i;
+}
+
+/* { dg-final { scan-assembler-times "vmulesb" 2 } } */
+/* { dg-final { scan-assembler-times "vmulosb" 2 } } */
+/* { dg-final { scan-assembler-times "vperm" 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift.c b/gcc/testsuite/gcc.target/powerpc/vec-shift.c
new file mode 100644
index 00000000000..80b59a2d3e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -O2" } */
+
+/* This used to ICE.  During gimplification, "i" is widened to an unsigned
+   int.  We used to fail at expand time as we tried to cram an SImode item
+   into a QImode memory slot.  This has been fixed to properly truncate the
+   shift amount when splatting it into a vector.  */
+
+typedef unsigned char v16ui __attribute__((vector_size(16)));
+
+v16ui vslb(v16ui v, unsigned char i)
+{
+	return v << i;
+}
+
+/* { dg-final { scan-assembler "vspltb" } } */
+/* { dg-final { scan-assembler "vslb" } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-genbytemask-1.c b/gcc/testsuite/gcc.target/s390/vector/vec-genbytemask-1.c
index b8cf3140bd7..26c189af15d 100644
--- a/gcc/testsuite/gcc.target/s390/vector/vec-genbytemask-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-genbytemask-1.c
@@ -1,11 +1,13 @@
 /* { dg-do run } */
 /* { dg-options "-O3 -mzarch -march=z13 --save-temps" } */
 /* { dg-require-effective-target vector } */
+/* { dg-require-effective-target int128 } */
 
 typedef unsigned char     uv16qi __attribute__((vector_size(16)));
 typedef unsigned short     uv8hi __attribute__((vector_size(16)));
 typedef unsigned int       uv4si __attribute__((vector_size(16)));
 typedef unsigned long long uv2di __attribute__((vector_size(16)));
+typedef unsigned __int128  uv1ti __attribute__((vector_size(16)));
 
 uv2di __attribute__((noinline))
 foo1 ()
@@ -45,6 +47,13 @@ foo4 ()
       0xff, 0, 0xff, 0,
       0, 0xff, 0, 0xff };
 }
+
+uv1ti __attribute__((noinline))
+foo5 ()
+{
+  return (uv1ti){ 0xff00ff00ff00ff00ULL };
+}
+
 /* { dg-final { scan-assembler-times "vgbm\t%v24,61605" 1 } } */
 
 int
@@ -64,6 +73,10 @@ main ()
 
   if (foo4()[1] != 0xff)
     __builtin_abort ();
+
+  if (foo5()[0] != 0xff00ff00ff00ff00ULL)
+    __builtin_abort ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-genmask-1.c b/gcc/testsuite/gcc.target/s390/vector/vec-genmask-1.c
index b0747f713bb..6093422fd0d 100644
--- a/gcc/testsuite/gcc.target/s390/vector/vec-genmask-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-genmask-1.c
@@ -66,4 +66,3 @@ main ()
     __builtin_abort ();
   return 0;
 }
-
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-genmask-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-genmask-2.c
index e3ae34154ca..46256e92531 100644
--- a/gcc/testsuite/gcc.target/s390/vector/vec-genmask-2.c
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-genmask-2.c
@@ -1,10 +1,12 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -mzarch -march=z13" } */
+/* { dg-require-effective-target int128 } */
 
 typedef unsigned char     uv16qi __attribute__((vector_size(16)));
 typedef unsigned short     uv8hi __attribute__((vector_size(16)));
 typedef unsigned int       uv4si __attribute__((vector_size(16)));
 typedef unsigned long long uv2di __attribute__((vector_size(16)));
+typedef unsigned __int128  uv1ti __attribute__((vector_size(16)));
 
 /* The elements differ.  */
 uv2di __attribute__((noinline))
@@ -43,4 +45,11 @@ foo4 ()
       0x82, 0x82, 0x82, 0x82,
       0x82, 0x82, 0x82, 0x82 };
 }
+
+/* We do not have vgmq.  */
+uv1ti
+foo5()
+{
+  return (uv1ti){ ((unsigned __int128)1 << 53) - 1 };
+}
 /* { dg-final { scan-assembler-not "vgm" } } */
diff --git a/gcc/testsuite/gfortran.dg/graphite/interchange-3.f90 b/gcc/testsuite/gfortran.dg/graphite/interchange-3.f90
index a99cf153d9e..d401638ccc8 100644
--- a/gcc/testsuite/gfortran.dg/graphite/interchange-3.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/interchange-3.f90
@@ -24,4 +24,4 @@ Program FOO
 
 end Program FOO
 
-! { dg-final { scan-tree-dump-times "tiled by" 2 "graphite" } }
+! { dg-final { scan-tree-dump-times "tiled by" 5 "graphite" } }
diff --git a/gcc/testsuite/gfortran.dg/pr67526.f90 b/gcc/testsuite/gfortran.dg/pr67526.f90
new file mode 100644
index 00000000000..3c0834f28dc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr67526.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! Original code from gerhard dot steinmetz dot fortran at t-online dot de
+! PR fortran/67526
+program p
+   character :: c1 = 'abc'(:     ! { dg-error "error in SUBSTRING" }
+   character :: c2 = 'abc'(3:    ! { dg-error "error in SUBSTRING" }
+   character :: c3 = 'abc'(:1    ! { dg-error "error in SUBSTRING" }
+   character :: c4 = 'abc'(2:2   ! { dg-error "error in SUBSTRING" }
+end
diff --git a/gcc/testsuite/gfortran.dg/read_dir.f90 b/gcc/testsuite/gfortran.dg/read_dir.f90
index 0e28f9f497e..4009ed69e63 100644
--- a/gcc/testsuite/gfortran.dg/read_dir.f90
+++ b/gcc/testsuite/gfortran.dg/read_dir.f90
@@ -7,13 +7,14 @@ program bug
    integer ios
    call system('[ -d junko.dir ] || mkdir junko.dir')
    open(unit=10, file='junko.dir',iostat=ios,action='read',access='stream')
-   if (ios.ne.0) call abort
+   if (ios.ne.0) then
+      call system('rmdir junko.dir')
+      call abort
+   end if
    read(10, iostat=ios) c
    if (ios.ne.21) then 
-      close(10)
-      call system('rmdir junko.dir')
+      close(10, status='delete')
       call abort
    end if
-   close(10)
-   call system('rmdir junko.dir')
+   close(10, status='delete')
 end program bug
diff --git a/gcc/testsuite/gfortran.dg/submodule_11.f08 b/gcc/testsuite/gfortran.dg/submodule_11.f08
new file mode 100644
index 00000000000..20367a9d19d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/submodule_11.f08
@@ -0,0 +1,45 @@
+! { dg-do run }
+! Test the fix for PR66993, in which the use associated version of 'i'
+! was incorrectly determined to be ambiguous with the 'i', host associated
+! in submodule 'sm' from the module 'm'. The principle has been tested with
+! the function 'time_two' in addition.
+!
+! Contributed by Mikael Morin  <mikael.morin@sfr.fr>
+!
+module m
+  integer, parameter :: i = -1
+  interface
+    module subroutine show_i
+    end subroutine show_i
+  end interface
+contains
+  integer function times_two (arg)
+    integer :: arg
+    times_two = -2*arg
+  end function
+end module m
+
+module n
+  integer, parameter :: i = 2
+contains
+  integer function times_two (arg)
+    integer :: arg
+    times_two = 2*arg
+  end function
+end module n
+
+submodule (m) sm
+  use n
+contains
+  module subroutine show_i
+    if (i .ne. 2) call abort
+    if (times_two (i) .ne. 4) call abort
+  end subroutine show_i
+end submodule sm
+
+program p
+  use m
+  call show_i
+  if (i .ne. -1) call abort
+  if (times_two (i) .ne. 2) call abort
+end program
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 363f7fe3877..84aebc9a798 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -124,6 +124,11 @@ proc check_cached_effective_target { prop args } {
 	verbose "check_cached_effective_target $prop: checking $target" 2
 	set et_cache($prop,target) $target
 	set et_cache($prop,value) [uplevel eval $args]
+	if {![info exists et_prop_list]
+	    || [lsearch $et_prop_list $prop] < 0} {
+	    lappend et_prop_list $prop
+	}
+	verbose "check_cached_effective_target cached list is now: $et_prop_list" 2
     }
     set value $et_cache($prop,value)
     verbose "check_cached_effective_target $prop: returning $value for $target" 2
@@ -2778,6 +2783,21 @@ proc check_effective_target_arm_neon_fp16_ok { } {
 		check_effective_target_arm_neon_fp16_ok_nocache]
 }
 
+proc check_effective_target_arm_neon_fp16_hw { } {
+    if {! [check_effective_target_arm_neon_fp16_ok] } {
+	return 0
+    }
+    global et_arm_neon_fp16_flags
+    check_runtime_nocache arm_neon_fp16_hw {
+	int
+	main (int argc, char **argv)
+	{
+	  asm ("vcvt.f32.f16 q1, d0");
+	  return 0;
+	}
+    } $et_arm_neon_fp16_flags
+}
+
 proc add_options_for_arm_neon_fp16 { flags } {
     if { ! [check_effective_target_arm_neon_fp16_ok] } {
 	return "$flags"
@@ -4551,7 +4571,8 @@ proc check_effective_target_vect_char_mult { } {
 	if { [istarget aarch64*-*-*]
 	     || [istarget ia64-*-*]
 	     || [istarget i?86-*-*] || [istarget x86_64-*-*]
-            || [check_effective_target_arm32] } {
+             || [check_effective_target_arm32]
+	     || [check_effective_target_powerpc_altivec] } {
 	   set et_vect_char_mult_saved 1
 	}
     }
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index ebf9dd23f77..c0eab40b139 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -3927,6 +3927,49 @@ get_references_in_stmt (gimple stmt, vec<data_ref_loc, va_heap> *references)
   return clobbers_memory;
 }
 
+
+/* Returns true if the loop-nest has any data reference.  */
+
+bool
+loop_nest_has_data_refs (loop_p loop)
+{
+  basic_block *bbs = get_loop_body (loop);
+  vec<data_ref_loc> references;
+  references.create (3);
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = bbs[i];
+      gimple_stmt_iterator bsi;
+
+      for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
+	{
+	  gimple stmt = gsi_stmt (bsi);
+	  get_references_in_stmt (stmt, &references);
+	  if (references.length ())
+	    {
+	      free (bbs);
+	      references.release ();
+	      return true;
+	    }
+	}
+    }
+  free (bbs);
+  references.release ();
+
+  if (loop->inner)
+    {
+      loop = loop->inner;
+      while (loop)
+	{
+	  if (loop_nest_has_data_refs (loop))
+	    return true;
+	  loop = loop->next;
+	}
+    }
+  return false;
+}
+
 /* Stores the data references in STMT to DATAREFS.  If there is an unanalyzable
    reference, returns false, otherwise returns true.  NEST is the outermost
    loop of the loop nest in which the references should be analyzed.  */
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 18bcc5c89bb..064843933f3 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -322,6 +322,7 @@ extern bool find_data_references_in_stmt (struct loop *, gimple,
 extern bool graphite_find_data_references_in_stmt (loop_p, loop_p, gimple,
 						   vec<data_reference_p> *);
 tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *);
+bool loop_nest_has_data_refs (loop_p loop);
 struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple, bool);
 extern bool find_loop_nest (struct loop *, vec<loop_p> *);
 extern struct data_dependence_relation *initialize_data_dependence_relation
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index d017479ec2e..c164121fdb4 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "cgraph.h"
 #include "tree-ssa.h"
+#include "params.h"
 
 /* This pass tries to distribute iterations of loops into several threads.
    The implementation is straightforward -- for each loop we test whether its
@@ -2092,6 +2093,10 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   type = TREE_TYPE (cvar);
   t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE);
   OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
+  int chunk_size = PARAM_VALUE (PARAM_PARLOOPS_CHUNK_SIZE);
+  if (chunk_size != 0)
+    OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (t)
+      = build_int_cst (integer_type_node, chunk_size);
 
   for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL);
   gimple_set_location (for_stmt, loc);
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 9be65de0cd2..10ae5f86d41 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -92,8 +92,7 @@ struct cond_equivalence
 };
 
 
-/* Structure for recording edge equivalences as well as any pending
-   edge redirections during the dominator optimizer.
+/* Structure for recording edge equivalences.
 
    Computing and storing the edge equivalences instead of creating
    them on-demand can save significant amounts of time, particularly
@@ -101,10 +100,7 @@ struct cond_equivalence
 
    These structures live for a single iteration of the dominator
    optimizer in the edge's AUX field.  At the end of an iteration we
-   free each of these structures and update the AUX field to point
-   to any requested redirection target (the code for updating the
-   CFG and SSA graph for edge redirection expects redirection edge
-   targets to be in the AUX field for each edge.  */
+   free each of these structures.  */
 
 struct edge_info
 {
@@ -1168,7 +1164,7 @@ pass_dominator::execute (function *fun)
   /* Create our hash tables.  */
   avail_exprs = new hash_table<expr_elt_hasher> (1024);
   avail_exprs_stack.create (20);
-  const_and_copies = new class const_and_copies (dump_file, dump_flags);
+  const_and_copies = new class const_and_copies ();
   need_eh_cleanup = BITMAP_ALLOC (NULL);
   need_noreturn_fixup.create (0);
 
diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
index 47725585be9..e944a9ac191 100644
--- a/gcc/tree-ssa-live.c
+++ b/gcc/tree-ssa-live.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cgraph.h"
 #include "ipa-utils.h"
+#include "cfgloop.h"
 
 #ifdef ENABLE_CHECKING
 static void  verify_live_on_entry (tree_live_info_p);
@@ -820,6 +821,14 @@ remove_unused_locals (void)
 	  }
       }
 
+  if (cfun->has_simduid_loops)
+    {
+      struct loop *loop;
+      FOR_EACH_LOOP (loop, 0)
+	if (loop->simduid && !is_used_p (loop->simduid))
+	  loop->simduid = NULL_TREE;
+    }
+
   cfun->has_local_explicit_reg_vars = false;
 
   /* Remove unmarked local and global vars from local_decls.  */
diff --git a/gcc/tree-ssa-scopedtables.c b/gcc/tree-ssa-scopedtables.c
index 41bf8879448..fedd92a850a 100644
--- a/gcc/tree-ssa-scopedtables.c
+++ b/gcc/tree-ssa-scopedtables.c
@@ -28,13 +28,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
 
-const_and_copies::const_and_copies (FILE *file, int flags)
-{
-  stack.create (20);
-  dump_file = file;
-  dump_flags = flags;
-}
-
 /* Pop entries off the stack until we hit the NULL marker.
    For each entry popped, use the SRC/DEST pair to restore
    SRC to its prior value.  */
@@ -42,11 +35,11 @@ const_and_copies::const_and_copies (FILE *file, int flags)
 void
 const_and_copies::pop_to_marker (void)
 {
-  while (stack.length () > 0)
+  while (m_stack.length () > 0)
     {
       tree prev_value, dest;
 
-      dest = stack.pop ();
+      dest = m_stack.pop ();
 
       /* A NULL value indicates we should stop unwinding, otherwise
 	 pop off the next entry as they're recorded in pairs.  */
@@ -62,7 +55,7 @@ const_and_copies::pop_to_marker (void)
 	  fprintf (dump_file, "\n");
 	}
 
-      prev_value = stack.pop ();
+      prev_value = m_stack.pop ();
       set_ssa_name_value (dest, prev_value);
     }
 }
@@ -97,9 +90,9 @@ const_and_copies::record_const_or_copy (tree x, tree y, tree prev_x)
     }
 
   set_ssa_name_value (x, y);
-  stack.reserve (2);
-  stack.quick_push (prev_x);
-  stack.quick_push (x);
+  m_stack.reserve (2);
+  m_stack.quick_push (prev_x);
+  m_stack.quick_push (x);
 }
 
 /* A new value has been assigned to LHS.  If necessary, invalidate any
@@ -121,16 +114,16 @@ const_and_copies::invalidate (tree lhs)
      then it's a "stop unwinding" marker.  Else the current marker is
      the SSA_NAME with an equivalence and the prior entry in the stack
      is what the current element is equivalent to.  */
-  for (int i = stack.length() - 1; i >= 0; i--)
+  for (int i = m_stack.length() - 1; i >= 0; i--)
     {
       /* Ignore the stop unwinding markers.  */
-      if ((stack)[i] == NULL)
+      if ((m_stack)[i] == NULL)
 	continue;
 
       /* We want to check the current value of stack[i] to see if
 	 it matches LHS.  If so, then invalidate.  */
-      if (SSA_NAME_VALUE ((stack)[i]) == lhs)
-	record_const_or_copy ((stack)[i], NULL_TREE);
+      if (SSA_NAME_VALUE ((m_stack)[i]) == lhs)
+	record_const_or_copy ((m_stack)[i], NULL_TREE);
 
       /* Remember, we're dealing with two elements in this case.  */
       i--;
diff --git a/gcc/tree-ssa-scopedtables.h b/gcc/tree-ssa-scopedtables.h
index bc30ee67583..f7d9ca4320c 100644
--- a/gcc/tree-ssa-scopedtables.h
+++ b/gcc/tree-ssa-scopedtables.h
@@ -20,21 +20,40 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_TREE_SSA_SCOPED_TABLES_H
 #define GCC_TREE_SSA_SCOPED_TABLES_H
 
+/* This class defines an unwindable const/copy equivalence table
+   layered on top of SSA_NAME_VALUE/set_ssa_name_value.
+
+   Essentially it's just a stack of name,prev value pairs with a
+   special marker (NULL) to indicate unwind points.  */
+
 class const_and_copies
 {
  public:
-  const_and_copies (FILE *, int);
-  ~const_and_copies (void) { stack.release (); }
-  void push_marker (void) { stack.safe_push (NULL_TREE); }
+  const_and_copies (void) { m_stack.create (20); };
+  ~const_and_copies (void) { m_stack.release (); }
+
+  /* Push the unwinding marker onto the stack.  */
+  void push_marker (void) { m_stack.safe_push (NULL_TREE); }
+
+  /* Restore the const/copies table to its state when the last marker
+     was pushed.  */
   void pop_to_marker (void);
+
+  /* Record a single const/copy pair that can be unwound.  */
   void record_const_or_copy (tree, tree);
+
+  /* Special entry point when we want to provide an explicit previous
+     value for the first argument.  Try to get rid of this in the future.  */
   void record_const_or_copy (tree, tree, tree);
+
+  /* When threading we need to invalidate certain equivalences after
+     following a loop backedge.  The entries we need to invalidate will
+     always be in this unwindable stack.  This entry point handles
+     finding and invalidating those entries.  */
   void invalidate (tree);
 
  private:
-  vec<tree> stack;
-  FILE *dump_file;
-  int dump_flags;
+  vec<tree> m_stack;
 };
 
 #endif /* GCC_TREE_SSA_SCOPED_TABLES_H */
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index c1f3c32b526..615eb9f104a 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5619,7 +5619,6 @@ create_variable_info_for_1 (tree decl, const char *name)
   auto_vec<fieldoff_s> fieldstack;
   fieldoff_s *fo;
   unsigned int i;
-  varpool_node *vnode;
 
   if (!declsize
       || !tree_fits_uhwi_p (declsize))
@@ -5637,12 +5636,10 @@ create_variable_info_for_1 (tree decl, const char *name)
   /* Collect field information.  */
   if (use_field_sensitive
       && var_can_have_subvars (decl)
-      /* ???  Force us to not use subfields for global initializers
-	 in IPA mode.  Else we'd have to parse arbitrary initializers.  */
+      /* ???  Force us to not use subfields for globals in IPA mode.
+	 Else we'd have to parse arbitrary initializers.  */
       && !(in_ipa_mode
-	   && is_global_var (decl)
-	   && (vnode = varpool_node::get (decl))
-	   && vnode->get_constructor ()))
+	   && is_global_var (decl)))
     {
       fieldoff_s *fo = NULL;
       bool notokay = false;
@@ -5774,13 +5771,13 @@ create_variable_info_for (tree decl, const char *name)
 
 	  /* If this is a global variable with an initializer and we are in
 	     IPA mode generate constraints for it.  */
-	  if (vnode->get_constructor ()
-	      && vnode->definition)
+	  ipa_ref *ref;
+	  for (unsigned idx = 0; vnode->iterate_reference (idx, ref); ++idx)
 	    {
 	      auto_vec<ce_s> rhsc;
 	      struct constraint_expr lhs, *rhsp;
 	      unsigned i;
-	      get_constraint_for_rhs (vnode->get_constructor (), &rhsc);
+	      get_constraint_for_address_of (ref->referred->decl, &rhsc);
 	      lhs.var = vi->id;
 	      lhs.offset = 0;
 	      lhs.type = SCALAR;
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 451dc2e1c2e..0ad24834feb 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -432,7 +432,8 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
       if (cached_lhs
 	  && (TREE_CODE (cached_lhs) == SSA_NAME
 	      || is_gimple_min_invariant (cached_lhs)))
-	const_and_copies->record_const_or_copy (gimple_get_lhs (stmt), cached_lhs);
+	const_and_copies->record_const_or_copy (gimple_get_lhs (stmt),
+						cached_lhs);
       else if (backedge_seen)
 	const_and_copies->invalidate (gimple_get_lhs (stmt));
     }
@@ -1208,7 +1209,8 @@ thread_through_normal_block (edge e,
   /* Now walk each statement recording any context sensitive
      temporary equivalences we can detect.  */
   gimple stmt
-    = record_temporary_equivalences_from_stmts_at_dest (e, const_and_copies, simplify,
+    = record_temporary_equivalences_from_stmts_at_dest (e, const_and_copies,
+							simplify,
 							*backedge_seen_p);
 
   /* There's two reasons STMT might be null, and distinguishing
@@ -1474,8 +1476,8 @@ thread_across_edge (gcond *dummy_cond,
 	if (!found)
 	  found = thread_through_normal_block (path->last ()->e, dummy_cond,
 					       handle_dominating_asserts,
-					       const_and_copies, simplify, path, visited,
-					       &backedge_seen) > 0;
+					       const_and_copies, simplify, path,
+					       visited, &backedge_seen) > 0;
 
 	/* If we were able to thread through a successor of E->dest, then
 	   record the jump threading opportunity.  */
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 3f007b5d3b1..c08479f0b77 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -1296,7 +1296,8 @@ pred_equal_p (pred_info x1, pred_info x2)
     return false;
 
   c1 = x1.cond_code;
-  if (x1.invert != x2.invert)
+  if (x1.invert != x2.invert
+      && TREE_CODE_CLASS (x2.cond_code) == tcc_comparison)
     c2 = invert_tree_comparison (x2.cond_code, false);
   else
     c2 = x2.cond_code;
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1eaef40048..2439bd6390b 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -267,8 +267,8 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
 	  return false;
 	}
 
-      if (STMT_VINFO_GATHER_P (stmtinfo_a)
-	  || STMT_VINFO_GATHER_P (stmtinfo_b))
+      if (STMT_VINFO_GATHER_SCATTER_P (stmtinfo_a)
+	  || STMT_VINFO_GATHER_SCATTER_P (stmtinfo_b))
 	{
 	  if (dump_enabled_p ())
 	    {
@@ -315,8 +315,8 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
 	  return false;
 	}
 
-      if (STMT_VINFO_GATHER_P (stmtinfo_a)
-	  || STMT_VINFO_GATHER_P (stmtinfo_b))
+      if (STMT_VINFO_GATHER_SCATTER_P (stmtinfo_a)
+	  || STMT_VINFO_GATHER_SCATTER_P (stmtinfo_b))
 	{
 	  if (dump_enabled_p ())
 	    {
@@ -2344,10 +2344,7 @@ vect_analyze_data_ref_access (struct data_reference *dr)
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
 	                     "zero step in outer loop.\n");
-	  if (DR_IS_READ (dr))
-  	    return true;
-	  else
-	    return false;
+	  return DR_IS_READ (dr);
 	}
     }
 
@@ -2997,12 +2994,12 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
   return true;
 }
 
-/* Check whether a non-affine read in stmt is suitable for gather load
-   and if so, return a builtin decl for that operation.  */
+/* Check whether a non-affine read or write in stmt is suitable for gather load
+   or scatter store and if so, return a builtin decl for that operation.  */
 
 tree
-vect_check_gather (gimple stmt, loop_vec_info loop_vinfo, tree *basep,
-		   tree *offp, int *scalep)
+vect_check_gather_scatter (gimple stmt, loop_vec_info loop_vinfo, tree *basep,
+			   tree *offp, int *scalep)
 {
   HOST_WIDE_INT scale = 1, pbitpos, pbitsize;
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -3031,7 +3028,7 @@ vect_check_gather (gimple stmt, loop_vec_info loop_vinfo, tree *basep,
 	base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0);
     }
 
-  /* The gather builtins need address of the form
+  /* The gather and scatter builtins need address of the form
      loop_invariant + vector * {1, 2, 4, 8}
      or
      loop_invariant + sign_extend (vector) * { 1, 2, 4, 8 }.
@@ -3194,8 +3191,13 @@ vect_check_gather (gimple stmt, loop_vec_info loop_vinfo, tree *basep,
   if (offtype == NULL_TREE)
     offtype = TREE_TYPE (off);
 
-  decl = targetm.vectorize.builtin_gather (STMT_VINFO_VECTYPE (stmt_info),
-					   offtype, scale);
+  if (DR_IS_READ (dr))
+    decl = targetm.vectorize.builtin_gather (STMT_VINFO_VECTYPE (stmt_info),
+					     offtype, scale);
+  else
+    decl = targetm.vectorize.builtin_scatter (STMT_VINFO_VECTYPE (stmt_info),
+					      offtype, scale);
+
   if (decl == NULL_TREE)
     return NULL_TREE;
 
@@ -3344,7 +3346,7 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo,
       gimple stmt;
       stmt_vec_info stmt_info;
       tree base, offset, init;
-      bool gather = false;
+      enum { SG_NONE, GATHER, SCATTER } gatherscatter = SG_NONE;
       bool simd_lane_access = false;
       int vf;
 
@@ -3383,18 +3385,22 @@ again:
 	    = DR_IS_READ (dr)
 	      && !TREE_THIS_VOLATILE (DR_REF (dr))
 	      && targetm.vectorize.builtin_gather != NULL;
+	  bool maybe_scatter
+	    = DR_IS_WRITE (dr)
+	      && !TREE_THIS_VOLATILE (DR_REF (dr))
+	      && targetm.vectorize.builtin_scatter != NULL;
 	  bool maybe_simd_lane_access
 	    = loop_vinfo && loop->simduid;
 
-	  /* If target supports vector gather loads, or if this might be
-	     a SIMD lane access, see if they can't be used.  */
+	  /* If target supports vector gather loads or scatter stores, or if
+	     this might be a SIMD lane access, see if they can't be used.  */
 	  if (loop_vinfo
-	      && (maybe_gather || maybe_simd_lane_access)
+	      && (maybe_gather || maybe_scatter || maybe_simd_lane_access)
 	      && !nested_in_vect_loop_p (loop, stmt))
 	    {
 	      struct data_reference *newdr
 		= create_data_ref (NULL, loop_containing_stmt (stmt),
-				   DR_REF (dr), stmt, true);
+				   DR_REF (dr), stmt, maybe_scatter ? false : true);
 	      gcc_assert (newdr != NULL && DR_REF (newdr));
 	      if (DR_BASE_ADDRESS (newdr)
 		  && DR_OFFSET (newdr)
@@ -3447,17 +3453,20 @@ again:
 			    }
 			}
 		    }
-		  if (!simd_lane_access && maybe_gather)
+		  if (!simd_lane_access && (maybe_gather || maybe_scatter))
 		    {
 		      dr = newdr;
-		      gather = true;
+		      if (maybe_gather)
+			gatherscatter = GATHER;
+		      else
+			gatherscatter = SCATTER;
 		    }
 		}
-	      if (!gather && !simd_lane_access)
+	      if (gatherscatter == SG_NONE && !simd_lane_access)
 		free_data_ref (newdr);
 	    }
 
-	  if (!gather && !simd_lane_access)
+	  if (gatherscatter == SG_NONE && !simd_lane_access)
 	    {
 	      if (dump_enabled_p ())
 		{
@@ -3485,7 +3494,7 @@ again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather || simd_lane_access)
+	  if (gatherscatter != SG_NONE || simd_lane_access)
 	    free_data_ref (dr);
 	  return false;
         }
@@ -3520,7 +3529,7 @@ again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather || simd_lane_access)
+	  if (gatherscatter != SG_NONE || simd_lane_access)
 	    free_data_ref (dr);
           return false;
         }
@@ -3540,7 +3549,7 @@ again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather || simd_lane_access)
+	  if (gatherscatter != SG_NONE || simd_lane_access)
 	    free_data_ref (dr);
           return false;
 	}
@@ -3565,7 +3574,7 @@ again:
 	  if (bb_vinfo)
 	    break;
 
-	  if (gather || simd_lane_access)
+	  if (gatherscatter != SG_NONE || simd_lane_access)
 	    free_data_ref (dr);
 	  return false;
 	}
@@ -3703,7 +3712,7 @@ again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather || simd_lane_access)
+	  if (gatherscatter != SG_NONE || simd_lane_access)
 	    free_data_ref (dr);
           return false;
         }
@@ -3736,10 +3745,10 @@ again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather || simd_lane_access)
+	  if (gatherscatter != SG_NONE || simd_lane_access)
 	    {
 	      STMT_VINFO_DATA_REF (stmt_info) = NULL;
-	      if (gather)
+	      if (gatherscatter != SG_NONE)
 		free_data_ref (dr);
 	    }
 	  return false;
@@ -3763,23 +3772,22 @@ again:
       if (vf > *min_vf)
 	*min_vf = vf;
 
-      if (gather)
+      if (gatherscatter != SG_NONE)
 	{
 	  tree off;
-
-	  gather = 0 != vect_check_gather (stmt, loop_vinfo, NULL, &off, NULL);
-	  if (gather
-	      && get_vectype_for_scalar_type (TREE_TYPE (off)) == NULL_TREE)
-	    gather = false;
-	  if (!gather)
+	  if (!vect_check_gather_scatter (stmt, loop_vinfo, NULL, &off, NULL)
+	      || get_vectype_for_scalar_type (TREE_TYPE (off)) == NULL_TREE)
 	    {
 	      STMT_VINFO_DATA_REF (stmt_info) = NULL;
 	      free_data_ref (dr);
 	      if (dump_enabled_p ())
 		{
-		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
-                                   "not vectorized: not suitable for gather "
-                                   "load ");
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   (gatherscatter == GATHER) ?
+				   "not vectorized: not suitable for gather "
+				   "load " :
+				   "not vectorized: not suitable for scatter "
+				   "store ");
 		  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
                   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
 		}
@@ -3787,8 +3795,9 @@ again:
 	    }
 
 	  datarefs[i] = dr;
-	  STMT_VINFO_GATHER_P (stmt_info) = true;
+	  STMT_VINFO_GATHER_SCATTER_P (stmt_info) = gatherscatter;
 	}
+
       else if (loop_vinfo
 	       && TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
 	{
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f87c0664e5b..359e010f7f9 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -810,10 +810,10 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
               return false;
           }
 
-      if (STMT_VINFO_GATHER_P (stmt_vinfo))
+      if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
 	{
 	  tree off;
-	  tree decl = vect_check_gather (stmt, loop_vinfo, NULL, &off, NULL);
+	  tree decl = vect_check_gather_scatter (stmt, loop_vinfo, NULL, &off, NULL);
 	  gcc_assert (decl);
 	  if (!process_use (stmt, off, loop_vinfo, live_p, relevant,
 			    &worklist, true))
@@ -1815,11 +1815,11 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   if (STMT_VINFO_STRIDED_P (stmt_info))
     return false;
 
-  if (STMT_VINFO_GATHER_P (stmt_info))
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
     {
       gimple def_stmt;
       tree def;
-      gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base,
+      gather_decl = vect_check_gather_scatter (stmt, loop_vinfo, &gather_base,
 				       &gather_off, &gather_scale);
       gcc_assert (gather_decl);
       if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL,
@@ -1879,7 +1879,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 
   /** Transform.  **/
 
-  if (STMT_VINFO_GATHER_P (stmt_info))
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
     {
       tree vec_oprnd0 = NULL_TREE, op;
       tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl));
@@ -5140,6 +5140,12 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
   unsigned int vec_num;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   tree aggr_type;
+  tree scatter_base = NULL_TREE, scatter_off = NULL_TREE;
+  tree scatter_off_vectype = NULL_TREE, scatter_decl = NULL_TREE;
+  int scatter_scale = 1;
+  enum vect_def_type scatter_idx_dt = vect_unknown_def_type;
+  enum vect_def_type scatter_src_dt = vect_unknown_def_type;
+  gimple new_stmt;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -5297,6 +5303,24 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
         }
     }
 
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+    {
+      gimple def_stmt;
+      tree def;
+      scatter_decl = vect_check_gather_scatter (stmt, loop_vinfo, &scatter_base,
+						&scatter_off, &scatter_scale);
+      gcc_assert (scatter_decl);
+      if (!vect_is_simple_use_1 (scatter_off, NULL, loop_vinfo, bb_vinfo,
+				 &def_stmt, &def, &scatter_idx_dt,
+				 &scatter_off_vectype))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                             "scatter index use not simple.");
+	  return false;
+	}
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
@@ -5311,6 +5335,146 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
 
   ensure_base_align (stmt_info, dr);
 
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+    {
+      tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE, op, src;
+      tree arglist = TYPE_ARG_TYPES (TREE_TYPE (scatter_decl));
+      tree rettype, srctype, ptrtype, idxtype, masktype, scaletype;
+      tree ptr, mask, var, scale, perm_mask = NULL_TREE;
+      edge pe = loop_preheader_edge (loop);
+      gimple_seq seq;
+      basic_block new_bb;
+      enum { NARROW, NONE, WIDEN } modifier;
+      int scatter_off_nunits = TYPE_VECTOR_SUBPARTS (scatter_off_vectype);
+
+      if (nunits == (unsigned int) scatter_off_nunits)
+	modifier = NONE;
+      else if (nunits == (unsigned int) scatter_off_nunits / 2)
+	{
+	  unsigned char *sel = XALLOCAVEC (unsigned char, scatter_off_nunits);
+	  modifier = WIDEN;
+
+	  for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
+	    sel[i] = i | nunits;
+
+	  perm_mask = vect_gen_perm_mask_checked (scatter_off_vectype, sel);
+	  gcc_assert (perm_mask != NULL_TREE);
+	}
+      else if (nunits == (unsigned int) scatter_off_nunits * 2)
+	{
+	  unsigned char *sel = XALLOCAVEC (unsigned char, nunits);
+	  modifier = NARROW;
+
+	  for (i = 0; i < (unsigned int) nunits; ++i)
+	    sel[i] = i | scatter_off_nunits;
+
+	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  gcc_assert (perm_mask != NULL_TREE);
+	  ncopies *= 2;
+	}
+      else
+	gcc_unreachable ();
+
+      rettype = TREE_TYPE (TREE_TYPE (scatter_decl));
+      ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      scaletype = TREE_VALUE (arglist);
+
+      gcc_checking_assert (TREE_CODE (masktype) == INTEGER_TYPE
+			   && TREE_CODE (rettype) == VOID_TYPE);
+
+      ptr = fold_convert (ptrtype, scatter_base);
+      if (!is_gimple_min_invariant (ptr))
+	{
+	  ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE);
+	  new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+	  gcc_assert (!new_bb);
+	}
+
+      /* Currently we support only unconditional scatter stores,
+	 so mask should be all ones.  */
+      mask = build_int_cst (masktype, -1);
+      mask = vect_init_vector (stmt, mask, masktype, NULL);
+
+      scale = build_int_cst (scaletype, scatter_scale);
+
+      prev_stmt_info = NULL;
+      for (j = 0; j < ncopies; ++j)
+	{
+	  if (j == 0)
+	    {
+	      src = vec_oprnd1
+		= vect_get_vec_def_for_operand (gimple_assign_rhs1 (stmt), stmt, NULL);
+	      op = vec_oprnd0
+		= vect_get_vec_def_for_operand (scatter_off, stmt, NULL);
+	    }
+	  else if (modifier != NONE && (j & 1))
+	    {
+	      if (modifier == WIDEN)
+		{
+		  src = vec_oprnd1
+		    = vect_get_vec_def_for_stmt_copy (scatter_src_dt, vec_oprnd1);
+		  op = permute_vec_elements (vec_oprnd0, vec_oprnd0, perm_mask,
+					     stmt, gsi);
+		}
+	      else if (modifier == NARROW)
+		{
+		  src = permute_vec_elements (vec_oprnd1, vec_oprnd1, perm_mask,
+					      stmt, gsi);
+		  op = vec_oprnd0
+		    = vect_get_vec_def_for_stmt_copy (scatter_idx_dt, vec_oprnd0);
+		}
+	      else
+		gcc_unreachable ();
+	    }
+	  else
+	    {
+	      src = vec_oprnd1
+		= vect_get_vec_def_for_stmt_copy (scatter_src_dt, vec_oprnd1);
+	      op = vec_oprnd0
+		= vect_get_vec_def_for_stmt_copy (scatter_idx_dt, vec_oprnd0);
+	    }
+
+	  if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
+	    {
+	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src))
+			  == TYPE_VECTOR_SUBPARTS (srctype));
+	      var = vect_get_new_vect_var (srctype, vect_simple_var, NULL);
+	      var = make_ssa_name (var);
+	      src = build1 (VIEW_CONVERT_EXPR, srctype, src);
+	      new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      src = var;
+	    }
+
+	  if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
+	    {
+	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
+			  == TYPE_VECTOR_SUBPARTS (idxtype));
+	      var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL);
+	      var = make_ssa_name (var);
+	      op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
+	      new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      op = var;
+	    }
+
+	  new_stmt
+	    = gimple_build_call (scatter_decl, 5, ptr, mask, op, src, scale);
+
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+	  if (prev_stmt_info == NULL)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+      return true;
+    }
+
   if (grouped_store)
     {
       first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
@@ -5584,7 +5748,6 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
   prev_stmt_info = NULL;
   for (j = 0; j < ncopies; j++)
     {
-      gimple new_stmt;
 
       if (j == 0)
 	{
@@ -6071,7 +6234,7 @@ vectorizable_load (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
     {
       grouped_load = true;
       /* FORNOW */
-      gcc_assert (! nested_in_vect_loop && !STMT_VINFO_GATHER_P (stmt_info));
+      gcc_assert (!nested_in_vect_loop && !STMT_VINFO_GATHER_SCATTER_P (stmt_info));
 
       first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
 
@@ -6134,12 +6297,12 @@ vectorizable_load (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
     }
 
 
-  if (STMT_VINFO_GATHER_P (stmt_info))
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
     {
       gimple def_stmt;
       tree def;
-      gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base,
-				       &gather_off, &gather_scale);
+      gather_decl = vect_check_gather_scatter (stmt, loop_vinfo, &gather_base,
+					       &gather_off, &gather_scale);
       gcc_assert (gather_decl);
       if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, bb_vinfo,
 				 &def_stmt, &def, &gather_dt,
@@ -6225,7 +6388,7 @@ vectorizable_load (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
 
   ensure_base_align (stmt_info, dr);
 
-  if (STMT_VINFO_GATHER_P (stmt_info))
+  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
     {
       tree vec_oprnd0 = NULL_TREE, op;
       tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl));
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 4b9837861a6..5145371217e 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -646,8 +646,8 @@ typedef struct _stmt_vec_info {
      vectorization.  */
   bool vectorizable;
 
-  /* For loads only, true if this is a gather load.  */
-  bool gather_p;
+  /* For loads if this is a gather, for stores if this is a scatter.  */
+  bool gather_scatter_p;
 
   /* True if this is an access with loop-invariant stride.  */
   bool strided_p;
@@ -667,7 +667,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_VEC_STMT(S)             (S)->vectorized_stmt
 #define STMT_VINFO_VECTORIZABLE(S)         (S)->vectorizable
 #define STMT_VINFO_DATA_REF(S)             (S)->data_ref_info
-#define STMT_VINFO_GATHER_P(S)		   (S)->gather_p
+#define STMT_VINFO_GATHER_SCATTER_P(S)	   (S)->gather_scatter_p
 #define STMT_VINFO_STRIDED_P(S)	   	   (S)->strided_p
 #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p
 
@@ -1063,8 +1063,8 @@ extern bool vect_analyze_data_refs_alignment (loop_vec_info, bb_vec_info);
 extern bool vect_verify_datarefs_alignment (loop_vec_info, bb_vec_info);
 extern bool vect_analyze_data_ref_accesses (loop_vec_info, bb_vec_info);
 extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
-extern tree vect_check_gather (gimple, loop_vec_info, tree *, tree *,
-			       int *);
+extern tree vect_check_gather_scatter (gimple, loop_vec_info, tree *, tree *,
+				       int *);
 extern bool vect_analyze_data_refs (loop_vec_info, bb_vec_info, int *,
 				    unsigned *);
 extern tree vect_create_data_ref_ptr (gimple, tree, struct loop *, tree,
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 685eb726a0d..d85481e0103 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -10149,7 +10149,7 @@ identify_jump_threads (void)
 
   /* Allocate our unwinder stack to unwind any temporary equivalences
      that might be recorded.  */
-  equiv_stack = new const_and_copies (dump_file, dump_flags);
+  equiv_stack = new const_and_copies ();
 
   /* To avoid lots of silly node creation, we create a single
      conditional and just modify it in-place when attempting to
diff --git a/gcc/varasm.c b/gcc/varasm.c
index d9290a17cbb..706e6527cfd 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -4699,7 +4699,7 @@ output_constant (tree exp, unsigned HOST_WIDE_INT size, unsigned int align)
 	exp = build1 (ADDR_EXPR, saved_type, TREE_OPERAND (exp, 0));
       /* Likewise for constant ints.  */
       else if (TREE_CODE (exp) == INTEGER_CST)
-	exp = wide_int_to_tree (saved_type, exp);
+	exp = fold_convert (saved_type, exp);
 
     }
author	bstarynk <bstarynk@138bc75d-0d04-0410-961f-82ee72b054a4>	2016-02-10 17:45:52 +0000
committer	bstarynk <bstarynk@138bc75d-0d04-0410-961f-82ee72b054a4>	2016-02-10 17:45:52 +0000
commit	eb76579392e0d61b9f33c90fdd8b620e563d0a12 (patch)
tree	4307edacc6d226e528b690b44a329d430fe03a61 /gcc
parent	2d9d01985a7a7866916fafa19c5c296702e69714 (diff)
download	gcc-eb76579392e0d61b9f33c90fdd8b620e563d0a12.tar.gz