delta/ffmpeg.git - git.ffmpeg.org: ffmpeg.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86/vf_bwdif_init: limit AVX2 functions using 256bit vectors to cpus known ↵	James Almer	2023-03-25	1	-2/+2
\| \| \| \| \| \|	to be fast with it Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/bwdif: add avx2 filter_line function	James Darnley	2023-03-25	2	-5/+36
\| \| \| \| \| \| \|	8-bit: 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 10-bit: 2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3
*	avfilter/bwdif: move filter_line init to a dedicated function	James Darnley	2023-03-25	1	-3/+1
\|
*	x86: replace explicit REP_RETs with RETs	Lynne	2023-02-01	11	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.
*	libavfilter/x86/vf_convolution: fix sobel swap issue on WIN64	Wang, Bin	2022-11-21	1	-5/+6
\| \| \| \| \|	Reviewed by: James Almer <jamrial@gmail.com> Signed-off-by: Wang, Bin <bin.wang@intel.com>
*	libavfilter/x86/vf_convolution: add sobel filter optimization and unit test ↵	bwang30	2022-11-14	2	-0/+165
\| \| \| \| \| \| \| \| \| \| \| \|	with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
*	avfilter/vf_threshold: fix handling of zero threshold	Paul B Mahol	2022-10-27	1	-15/+8
\|
*	avfilter/x86/vf_bwdif: Remove obsolete MMXEXT functions	Andreas Rheinhardt	2022-06-22	2	-20/+0
\| \| \| \| \| \| \| \| \|	The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avfilter/x86/vf_idet: Remove obsolete MMX(EXT) functions	Andreas Rheinhardt	2022-06-22	2	-73/+1
\| \| \| \| \| \| \| \| \|	The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avfilter/x86/vf_yadif: Remove obsolete MMXEXT functions	Andreas Rheinhardt	2022-06-22	4	-37/+0
\| \| \| \| \| \| \| \| \|	The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avfilter/x86/vf_eq_init: Remove obsolete MMXEXT function	Andreas Rheinhardt	2022-06-22	2	-30/+2
\| \| \| \| \| \| \| \| \| \| \|	x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from process_mmxext are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avfilter/x86/vf_noise: Remove obsolete MMX function	Andreas Rheinhardt	2022-06-22	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \|	x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from line_noise_mmx are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avfilter/af_afir: Only keep DSP stuff in header	Andreas Rheinhardt	2022-05-06	1	-1/+1
\| \| \| \| \| \| \| \|	Only the AudioFIRDSPContext and the functions for its initialization are needed outside of lavfi/af_afir.c. Also rename the header to af_afirdsp.h to reflect the change. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avfilter/x86/vf_limiter: use movu, dst may not be always aligned	Paul B Mahol	2022-03-24	1	-2/+2
\| \| \| \|	Happens with pad filter after limiter.
*	avfilter/x86/vf_blend: use unaligned movs for output	Marton Balint	2022-03-21	1	-11/+11
\| \| \| \| \| \| \| \|	Fixes crashes with: ffmpeg -f lavfi -i allyuv=d=1 -vf tblend=difference128,pad=5000:ih:1 -f null x Signed-off-by: Marton Balint <cus@passwd.hu>
*	avfilter/vf_maskedmerge: fix rounding when masking	Paul B Mahol	2022-03-03	1	-7/+10
\|
*	avfilter/vf_nlmeans: add x86 SIMD	Paul B Mahol	2021-11-11	3	-0/+139
\|
*	x86/vf_lut3d: use three operand form for some instructions	James Almer	2021-10-14	1	-4/+4
\| \| \| \| \| \|	Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/vf_lut3d: fix building with --disable-optimizations	Mark Reid	2021-10-13	1	-0/+4
\|
*	avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation	Mark Reid	2021-10-10	3	-0/+752
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp, 1 runs, 0 skips 1434035335 UNITS in lut3d->interp, 2 runs, 0 skips 1423615347 UNITS in lut3d->interp, 4 runs, 0 skips 1426268863 UNITS in lut3d->interp, 8 runs, 0 skips sse2 905484420 UNITS in lut3d->interp, 1 runs, 0 skips 905659010 UNITS in lut3d->interp, 2 runs, 0 skips 915167140 UNITS in lut3d->interp, 4 runs, 0 skips 915834222 UNITS in lut3d->interp, 8 runs, 0 skips avx 574794860 UNITS in lut3d->interp, 1 runs, 0 skips 581035090 UNITS in lut3d->interp, 2 runs, 0 skips 584116720 UNITS in lut3d->interp, 4 runs, 0 skips 581460290 UNITS in lut3d->interp, 8 runs, 0 skips avx2 301698880 UNITS in lut3d->interp, 1 runs, 0 skips 301982880 UNITS in lut3d->interp, 2 runs, 0 skips 306962430 UNITS in lut3d->interp, 4 runs, 0 skips 305472025 UNITS in lut3d->interp, 8 runs, 0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp, 1 runs, 0 skips 1502922990 UNITS in lut3d->interp, 2 runs, 0 skips 1496114307 UNITS in lut3d->interp, 4 runs, 0 skips 1492554551 UNITS in lut3d->interp, 8 runs, 0 skips sse2 980777180 UNITS in lut3d->interp, 1 runs, 0 skips 986121520 UNITS in lut3d->interp, 2 runs, 0 skips 986489840 UNITS in lut3d->interp, 4 runs, 0 skips 998832248 UNITS in lut3d->interp, 8 runs, 0 skips avx 622212360 UNITS in lut3d->interp, 1 runs, 0 skips 622981160 UNITS in lut3d->interp, 2 runs, 0 skips 645396315 UNITS in lut3d->interp, 4 runs, 0 skips 641057075 UNITS in lut3d->interp, 8 runs, 0 skips avx2 321336400 UNITS in lut3d->interp, 1 runs, 0 skips 321268920 UNITS in lut3d->interp, 2 runs, 0 skips 323459895 UNITS in lut3d->interp, 4 runs, 0 skips 324949967 UNITS in lut3d->interp, 8 runs, 0 skips
*	avfilter/x86/vf_blend: unify indentation format	Wu Jianhua	2021-10-03	1	-20/+20
\| \| \| \|	Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: correct the order of loop step	Wu Jianhua	2021-09-18	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	The problem was caused by if the width of the processed block minus 1 is a multiple of the aligned number the instruction jle .bscale_scalar would skip the Optimized Loop Step, which will lead to an incorrect sampling when specifying steps more than 1. Move the Optimized Loop Step after .bscale_scalar to ensure the loop step is enabled. Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: fixed the fate-test failed on MacOS	Wu Jianhua	2021-09-18	1	-2/+4
\| \| \| \|	Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512()	Wu Jianhua	2021-08-29	2	-7/+589
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512()	Wu Jianhua	2021-08-29	2	-0/+196
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512()	Wu Jianhua	2021-08-29	2	-9/+16
\| \| \| \| \| \|	Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
*	avfilter/avf_showcqt: switch to TX FFT from avutil	Paul B Mahol	2021-07-27	1	-1/+1
\|
*	Remove unnecessary mem.h inclusions	Andreas Rheinhardt	2021-07-22	12	-12/+0
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	x86/vf_gblur: fix reg name in UNIX64 prologue	James Almer	2021-02-17	1	-1/+1
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/vf_gblur: fix postscale_slice prologue	James Almer	2021-02-17	1	-16/+13
\| \| \| \| \| \| \|	x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/x86/vf_gblur: add postscale SIMD	Paul B Mahol	2021-02-16	2	-3/+63
\|
*	avfilter/vf_convolution: add 16-column operation for filter_column()	Paul B Mahol	2021-02-13	1	-1/+1
\| \| \| \|	Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>
*	avfilter/vf_atadenoise: add sigma options	Paul B Mahol	2021-01-22	1	-8/+10
\|
*	avfilter/vf_v360: add mitchell interpolation	Paul B Mahol	2020-10-04	1	-1/+2
\|
*	avfilter/x86/vf_convolution_init: there is asm only for 8bit depth	Paul B Mahol	2020-09-15	1	-1/+1
\|
*	Revert "avfilter/yadif: simplify the code for better readability"	Limin Wang	2020-08-27	2	-2/+3
\| \| \| \|	This reverts commit 2a9b934675b9e2d3850b46f8a618c19b03f02551.
*	avfilter/yadif: simplify the code for better readability	Limin Wang	2020-08-26	2	-3/+2
\| \| \| \|	Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
*	x86/vf_blend: fix warnings about trailing empty parameters	James Almer	2020-07-12	1	-17/+17
\| \| \| \| \| \|	Finishes fixing ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/x86/vf_v360_init: add missing cases	Paul B Mahol	2020-04-02	1	-1/+3
\|
*	avfilter/vf_v360: add SIMD for lagrange9 interpolation	Paul B Mahol	2020-04-02	2	-0/+49
\|
*	vf_ssim: Fix loading doubles to float registers on i386	Martin Storsjö	2020-02-05	1	-1/+1
\| \| \| \| \| \| \|	This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb on i386 after breaking in fcc0424c933742c8fc852371e985d16b6eb4bfe9. Signed-off-by: Martin Storsjö <martin@martin.st>
*	avfilter/vf_ssim: improve precision	Paul B Mahol	2020-02-04	2	-13/+26
\| \| \| \|	Use doubles for accumulating floats.
*	avfilter/vf_v360: change remaps to int16_t type	Paul B Mahol	2020-01-19	1	-5/+5
\|
*	avfilter/x86/vf_interlace: always use unaligned movs	Marton Balint	2019-12-15	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	Fixes crashes in command lines such as: ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none Related to ticket #6491. Signed-off-by: Marton Balint <cus@passwd.hu>
*	avfilter/vf_maskedclamp: add x86 SIMD	Paul B Mahol	2019-10-23	3	-0/+144
\|
*	x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32	James Almer	2019-10-22	2	-7/+6
\| \| \| \| \|	Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/vf_transpose: fix cpuflags check	James Almer	2019-10-21	1	-2/+2
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avfilter/vf_transpose: add x86 SIMD	Paul B Mahol	2019-10-21	3	-0/+139
\|
*	avfilter/x86/vf_atadenoise: fix comment	Paul B Mahol	2019-10-21	1	-1/+1
\|
*	avfilter/x86/vf_atadenoise: add SIMD for serial too	Paul B Mahol	2019-10-17	2	-0/+134
\|