Add AVX vectorized vp9_diamond_search_sad

This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
author: Geza Lore <gezalore@gmail.com> 2015-10-28 14:35:04 +0000
committer: Geza Lore <gezalore@gmail.com> 2015-11-11 14:03:47 +0000
commit: 5eefd3ebfdf61f76676de4f86e128e3d101311a2 (patch)
tree: a763404e3e9890907b57fc522408fa2d63fd9ce1 /vp9/vp9cx.mk
parent: 420e8d6d039c2224e00c13aba7f8908b68868359 (diff)
download: libvpx-5eefd3ebfdf61f76676de4f86e128e3d101311a2.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/vp9/vp9cx.mk b/vp9/vp9cx.mk
index 3f3bdef96..5918240e2 100644
--- a/vp9/vp9cx.mk
+++ b/vp9/vp9cx.mk
@@ -96,6 +96,7 @@ VP9_CX_SRCS-yes += encoder/vp9_mbgraph.h
 VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_avg_intrin_sse2.c
 VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_temporal_filter_apply_sse2.asm
 VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_quantize_sse2.c
+VP9_CX_SRCS-$(HAVE_AVX) += encoder/x86/vp9_diamond_search_sad_avx.c
 ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
 VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_highbd_block_error_intrin_sse2.c
 endif
author	Geza Lore <gezalore@gmail.com>	2015-10-28 14:35:04 +0000
committer	Geza Lore <gezalore@gmail.com>	2015-11-11 14:03:47 +0000
commit	5eefd3ebfdf61f76676de4f86e128e3d101311a2 (patch)
tree	a763404e3e9890907b57fc522408fa2d63fd9ce1 /vp9/vp9cx.mk
parent	420e8d6d039c2224e00c13aba7f8908b68868359 (diff)
download	libvpx-5eefd3ebfdf61f76676de4f86e128e3d101311a2.tar.gz