summaryrefslogtreecommitdiff
path: root/libavfilter/x86
Commit message (Collapse)AuthorAgeFilesLines
* x86/vf_bwdif_init: limit AVX2 functions using 256bit vectors to cpus known ↵James Almer2023-03-251-2/+2
| | | | | | to be fast with it Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/bwdif: add avx2 filter_line functionJames Darnley2023-03-252-5/+36
| | | | | | | 8-bit: 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 10-bit: 2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3
* avfilter/bwdif: move filter_line init to a dedicated functionJames Darnley2023-03-251-3/+1
|
* x86: replace explicit REP_RETs with RETsLynne2023-02-0111-22/+22
| | | | | | | | | | | | | | | | | | | From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.
* libavfilter/x86/vf_convolution: fix sobel swap issue on WIN64Wang, Bin2022-11-211-5/+6
| | | | | Reviewed by: James Almer <jamrial@gmail.com> Signed-off-by: Wang, Bin <bin.wang@intel.com>
* libavfilter/x86/vf_convolution: add sobel filter optimization and unit test ↵bwang302022-11-142-0/+165
| | | | | | | | | | | | with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
* avfilter/vf_threshold: fix handling of zero thresholdPaul B Mahol2022-10-271-15/+8
|
* avfilter/x86/vf_bwdif: Remove obsolete MMXEXT functionsAndreas Rheinhardt2022-06-222-20/+0
| | | | | | | | | The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/x86/vf_idet: Remove obsolete MMX(EXT) functionsAndreas Rheinhardt2022-06-222-73/+1
| | | | | | | | | The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/x86/vf_yadif: Remove obsolete MMXEXT functionsAndreas Rheinhardt2022-06-224-37/+0
| | | | | | | | | The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/x86/vf_eq_init: Remove obsolete MMXEXT functionAndreas Rheinhardt2022-06-222-30/+2
| | | | | | | | | | | x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from process_mmxext are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/x86/vf_noise: Remove obsolete MMX functionAndreas Rheinhardt2022-06-221-29/+0
| | | | | | | | | | | x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from line_noise_mmx are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/af_afir: Only keep DSP stuff in headerAndreas Rheinhardt2022-05-061-1/+1
| | | | | | | | Only the AudioFIRDSPContext and the functions for its initialization are needed outside of lavfi/af_afir.c. Also rename the header to af_afirdsp.h to reflect the change. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avfilter/x86/vf_limiter: use movu, dst may not be always alignedPaul B Mahol2022-03-241-2/+2
| | | | Happens with pad filter after limiter.
* avfilter/x86/vf_blend: use unaligned movs for outputMarton Balint2022-03-211-11/+11
| | | | | | | | Fixes crashes with: ffmpeg -f lavfi -i allyuv=d=1 -vf tblend=difference128,pad=5000:ih:1 -f null x Signed-off-by: Marton Balint <cus@passwd.hu>
* avfilter/vf_maskedmerge: fix rounding when maskingPaul B Mahol2022-03-031-7/+10
|
* avfilter/vf_nlmeans: add x86 SIMDPaul B Mahol2021-11-113-0/+139
|
* x86/vf_lut3d: use three operand form for some instructionsJames Almer2021-10-141-4/+4
| | | | | | Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_lut3d: fix building with --disable-optimizationsMark Reid2021-10-131-0/+4
|
* avfilter/vf_lut3d: add x86-optimized tetrahedral interpolationMark Reid2021-10-103-0/+752
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines.  I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp,       1 runs,      0 skips 1434035335 UNITS in lut3d->interp,       2 runs,      0 skips 1423615347 UNITS in lut3d->interp,       4 runs,      0 skips 1426268863 UNITS in lut3d->interp,       8 runs,      0 skips sse2 905484420 UNITS in lut3d->interp,       1 runs,      0 skips 905659010 UNITS in lut3d->interp,       2 runs,      0 skips 915167140 UNITS in lut3d->interp,       4 runs,      0 skips 915834222 UNITS in lut3d->interp,       8 runs,      0 skips avx 574794860 UNITS in lut3d->interp,       1 runs,      0 skips 581035090 UNITS in lut3d->interp,       2 runs,      0 skips 584116720 UNITS in lut3d->interp,       4 runs,      0 skips 581460290 UNITS in lut3d->interp,       8 runs,      0 skips avx2 301698880 UNITS in lut3d->interp,       1 runs,      0 skips 301982880 UNITS in lut3d->interp,       2 runs,      0 skips 306962430 UNITS in lut3d->interp,       4 runs,      0 skips 305472025 UNITS in lut3d->interp,       8 runs,      0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp,       1 runs,      0 skips 1502922990 UNITS in lut3d->interp,       2 runs,      0 skips 1496114307 UNITS in lut3d->interp,       4 runs,      0 skips 1492554551 UNITS in lut3d->interp,       8 runs,      0 skips sse2 980777180 UNITS in lut3d->interp,       1 runs,      0 skips 986121520 UNITS in lut3d->interp,       2 runs,      0 skips 986489840 UNITS in lut3d->interp,       4 runs,      0 skips 998832248 UNITS in lut3d->interp,       8 runs,      0 skips avx 622212360 UNITS in lut3d->interp,       1 runs,      0 skips 622981160 UNITS in lut3d->interp,       2 runs,      0 skips 645396315 UNITS in lut3d->interp,       4 runs,      0 skips 641057075 UNITS in lut3d->interp,       8 runs,      0 skips avx2 321336400 UNITS in lut3d->interp,       1 runs,      0 skips 321268920 UNITS in lut3d->interp,       2 runs,      0 skips 323459895 UNITS in lut3d->interp,       4 runs,      0 skips 324949967 UNITS in lut3d->interp,       8 runs,      0 skips
* avfilter/x86/vf_blend: unify indentation formatWu Jianhua2021-10-031-20/+20
| | | | Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: correct the order of loop stepWu Jianhua2021-09-181-2/+1
| | | | | | | | | | | The problem was caused by if the width of the processed block minus 1 is a multiple of the aligned number the instruction jle .bscale_scalar would skip the Optimized Loop Step, which will lead to an incorrect sampling when specifying steps more than 1. Move the Optimized Loop Step after .bscale_scalar to ensure the loop step is enabled. Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: fixed the fate-test failed on MacOSWu Jianhua2021-09-181-2/+4
| | | | Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512()Wu Jianhua2021-08-292-7/+589
| | | | | | | | | | | | | | | | | | We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512()Wu Jianhua2021-08-292-0/+196
| | | | | | | | | | | | | | The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* libavfilter/x86/vf_gblur: add ff_postscale_slice_avx512()Wu Jianhua2021-08-292-9/+16
| | | | | | Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* avfilter/avf_showcqt: switch to TX FFT from avutilPaul B Mahol2021-07-271-1/+1
|
* Remove unnecessary mem.h inclusionsAndreas Rheinhardt2021-07-2212-12/+0
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86/vf_gblur: fix reg name in UNIX64 prologueJames Almer2021-02-171-1/+1
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_gblur: fix postscale_slice prologueJames Almer2021-02-171-16/+13
| | | | | | | x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64 ABI uses only the first four regs for this purpose. Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_gblur: add postscale SIMDPaul B Mahol2021-02-162-3/+63
|
* avfilter/vf_convolution: add 16-column operation for filter_column()Paul B Mahol2021-02-131-1/+1
| | | | Based on patch by Xu Jun <xujunzz@sjtu.edu.cn>
* avfilter/vf_atadenoise: add sigma optionsPaul B Mahol2021-01-221-8/+10
|
* avfilter/vf_v360: add mitchell interpolationPaul B Mahol2020-10-041-1/+2
|
* avfilter/x86/vf_convolution_init: there is asm only for 8bit depthPaul B Mahol2020-09-151-1/+1
|
* Revert "avfilter/yadif: simplify the code for better readability"Limin Wang2020-08-272-2/+3
| | | | This reverts commit 2a9b934675b9e2d3850b46f8a618c19b03f02551.
* avfilter/yadif: simplify the code for better readabilityLimin Wang2020-08-262-3/+2
| | | | Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
* x86/vf_blend: fix warnings about trailing empty parametersJames Almer2020-07-121-17/+17
| | | | | | Finishes fixing ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/x86/vf_v360_init: add missing casesPaul B Mahol2020-04-021-1/+3
|
* avfilter/vf_v360: add SIMD for lagrange9 interpolationPaul B Mahol2020-04-022-0/+49
|
* vf_ssim: Fix loading doubles to float registers on i386Martin Storsjö2020-02-051-1/+1
| | | | | | | This fixes the tests filter-refcmp-ssim-yuv and filter-refcmp-ssim-rgb on i386 after breaking in fcc0424c933742c8fc852371e985d16b6eb4bfe9. Signed-off-by: Martin Storsjö <martin@martin.st>
* avfilter/vf_ssim: improve precisionPaul B Mahol2020-02-042-13/+26
| | | | Use doubles for accumulating floats.
* avfilter/vf_v360: change remaps to int16_t typePaul B Mahol2020-01-191-5/+5
|
* avfilter/x86/vf_interlace: always use unaligned movsMarton Balint2019-12-151-6/+6
| | | | | | | | | | Fixes crashes in command lines such as: ffmpeg -f lavfi -i testsrc2=704x576:r=50,interlace,pad=720:576:8 -f null none Related to ticket #6491. Signed-off-by: Marton Balint <cus@passwd.hu>
* avfilter/vf_maskedclamp: add x86 SIMDPaul B Mahol2019-10-233-0/+144
|
* x86/vf_transpose: make ff_transpose_8x8_16_sse2 work on x86_32James Almer2019-10-222-7/+6
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/vf_transpose: fix cpuflags checkJames Almer2019-10-211-2/+2
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avfilter/vf_transpose: add x86 SIMDPaul B Mahol2019-10-213-0/+139
|
* avfilter/x86/vf_atadenoise: fix commentPaul B Mahol2019-10-211-1/+1
|
* avfilter/x86/vf_atadenoise: add SIMD for serial tooPaul B Mahol2019-10-172-0/+134
|