summaryrefslogtreecommitdiff
path: root/libavcodec
Commit message (Collapse)AuthorAgeFilesLines
* arm: vp9mc: Minor adjustments from review of the aarch64 versionMartin Storsjö2016-11-102-91/+44
| | | | | | | | | | | | | | | | This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>
* aarch64: vp9: Add NEON optimizations of VP9 MC functionsMartin Storsjö2016-11-106-2/+839
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>
* vp9: Make the subpel filters non-staticMartin Storsjö2016-11-102-4/+6
| | | | | | Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>
* pthread_frame: properly propagate the hw frame context across frame threadsAnton Khirnov2016-11-101-0/+11
|
* mpegaudiodsp: aarch64: Adjust function prototype after ↵Diego Biurrun2016-11-101-2/+3
| | | | 2caa93b813adc5dbb7771dfe615da826a2947d18
* Use avpriv_report_missing_feature() where appropriateDiego Biurrun2016-11-0818-44/+43
|
* hevc: Support extradata changes from multiple stsdVittorio Giovara2016-11-081-0/+10
| | | | Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* hevc: Allow parsing external extradata buffersVittorio Giovara2016-11-081-7/+5
|
* hevc: Move hevc_decode_extradata before frame decodingVittorio Giovara2016-11-081-74/+74
| | | | | | Avoids a forward-declaration in the following commit. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* lavc: Add hevc main10 profile to avconv cliVittorio Giovara2016-11-082-1/+2
|
* lavu: Rename ycgco color space appropriatelyVittorio Giovara2016-11-082-2/+3
| | | | | | Planes are ordered as the name suggests now. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* h264_qpel: x86: Move function with only one instance out of template macroDiego Biurrun2016-11-081-9/+11
| | | | libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]
* lzf: update pointer p after reallocAndreas Cadhalpun2016-11-071-0/+2
| | | | | | | This fixes heap-use-after-free detected by AddressSanitizer. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* qsv{enc,dec}: extend the internal frame allocatorAnton Khirnov2016-11-074-35/+252
| | | | | | | Handle the internal frame requests, which is required by the HEVC encoding plugin. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
* qsv{dec,enc}: use a struct as a memory id with internal memory allocatorAnton Khirnov2016-11-074-5/+47
| | | | | | | This will allow implementing the allocator more fully, which is needed by the HEVC encoder plugin with video memory input. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
* qsv{dec,enc}: always use an internal mfxFrameSurface1Anton Khirnov2016-11-073-33/+36
| | | | | | | | For encoding, this avoids modifying the input surface, which we are not allowed to do. This will also be useful in the following commits. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>
* dxva2: fix surface selection when compiled with both d3d11va and dxva2Hendrik Leppkes2016-11-071-1/+2
| | | | | | | Fixes a regression introduced in be630b1e08ebe8f766b1798accd6b8e5e096f5aa Signed-off-by: Anton Khirnov <anton@khirnov.net>
* libx265: Add option to force IDR framesDerek Buitenhuis2016-11-072-2/+5
| | | | | | | This is in the same the same vein as 380146924ecad2e05e9dcc5c3c2e1b5ba47c51e8. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>
* x86: Drop stray semicolons after function definitionsDiego Biurrun2016-11-052-11/+11
| | | | | libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic] libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]
* arm: vp9mc: Insert a literal pool at the middle of the fileMartin Storsjö2016-11-041-0/+1
| | | | | | | | | This fixes errors like this when building non-pic binaries with armv6 as baseline: Error: invalid literal constant: pool needs to be closer Signed-off-by: Martin Storsjö <martin@martin.st>
* Drop unreachable break and return statementsDiego Biurrun2016-11-034-7/+0
|
* dnxhdenc: Have function pointer prototype match implementationDiego Biurrun2016-11-031-2/+4
| | | | | libavcodec/dnxhdenc.c(326) : warning C4028: formal parameter 1 different from declaration libavcodec/dnxhdenc.c(329) : warning C4028: formal parameter 1 different from declaration
* pixblockdsp: Have function pointer prototype match implementationDiego Biurrun2016-11-031-2/+4
| | | | | | libavcodec/pixblockdsp.c(58) : warning C4028: formal parameter 1 different from declaration libavcodec/pixblockdsp.c(63) : warning C4028: formal parameter 1 different from declaration libavcodec/pixblockdsp.c(66) : warning C4028: formal parameter 1 different from declaration
* ituh263dec: Have function signature match across declaration and definitionDiego Biurrun2016-11-031-1/+5
| | | | | libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 1 different from declaration libavcodec/ituh263dec.c(215) : warning C4028: formal parameter 2 different from declaration
* svq3: Drop unused function dctcoef_get()Diego Biurrun2016-11-031-5/+0
| | | | libavcodec/svq3.c:627:29: warning: unused function 'dctcoef_get' [-Wunused-function]
* intrax8: Have function signature match across declaration and definitionDiego Biurrun2016-11-031-1/+1
| | | | libavcodec/intrax8.c(776) : warning C4028: formal parameter 1 different from declaration
* options_table: Remove a now unnecessary include of config.hMartin Storsjö2016-11-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | The include of config.h was added in 2012 in 1d9c2dc8, due to the use of CONFIG_SNOW_ENCODER ifdefs within options_table.h. When the snow codec was dropped later (in a0c5917f8 in 2013), this include no longer served any purpose. options_table.h is included in builds for the host as well, when building documentation. config.h should not be included in code that is built for the host, since it can contain workarounds for the target compiler/environment, like adding a missing define of restrict, defining getenv(x) to NULL for environments that lack getenv. The seemingly innocent include reordering in 2025d37871 broke builds that have getenv(x) defined to NULL in config.h (Windows CE and Windows Phone/RT), since libavcodec/options_table.h include config.h, while libavformat/options_table.h end up bringing in more system headers, and those system headers can contain a proper definition of getenv, which clash with the getenv define in config.h. This was avoided earlier as long as libavformat/options_table.h (or avformat.h) was included before libavcodec/options_table.h. This fixes builds for Windows Phone/RT and CE. Signed-off-by: Martin Storsjö <martin@martin.st>
* arm: vp9: Add NEON optimizations of VP9 MC functionsMartin Storsjö2016-11-036-3/+913
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This work is sponsored by, and copyright, Google. The filter coefficients are signed values, where the product of the multiplication with one individual filter coefficient doesn't overflow a 16 bit signed value (the largest filter coefficient is 127). But when the products are accumulated, the resulting sum can overflow the 16 bit signed range. Instead of accumulating in 32 bit, we accumulate the largest product (either index 3 or 4) last with a saturated addition. (The VP8 MC asm does something similar, but slightly simpler, by accumulating each half of the filter separately. In the VP9 MC filters, each half of the filter can also overflow though, so the largest component has to be handled individually.) Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_neon: 1.71 1.15 1.42 1.49 vp9_avg8_neon: 2.51 3.63 3.14 2.58 vp9_avg16_neon: 2.95 6.76 3.01 2.84 vp9_avg32_neon: 3.29 6.64 2.85 3.00 vp9_avg64_neon: 3.47 6.67 3.14 2.80 vp9_avg_8tap_smooth_4h_neon: 3.22 4.73 2.76 4.67 vp9_avg_8tap_smooth_4hv_neon: 3.67 4.76 3.28 4.71 vp9_avg_8tap_smooth_4v_neon: 5.52 7.60 4.60 6.31 vp9_avg_8tap_smooth_8h_neon: 6.22 9.04 5.12 9.32 vp9_avg_8tap_smooth_8hv_neon: 6.38 8.21 5.72 8.17 vp9_avg_8tap_smooth_8v_neon: 9.22 12.66 8.15 11.10 vp9_avg_8tap_smooth_64h_neon: 7.02 10.23 5.54 11.58 vp9_avg_8tap_smooth_64hv_neon: 6.76 9.46 5.93 9.40 vp9_avg_8tap_smooth_64v_neon: 10.76 14.13 9.46 13.37 vp9_put4_neon: 1.11 1.47 1.00 1.21 vp9_put8_neon: 1.23 2.17 1.94 1.48 vp9_put16_neon: 1.63 4.02 1.73 1.97 vp9_put32_neon: 1.56 4.92 2.00 1.96 vp9_put64_neon: 2.10 5.28 2.03 2.35 vp9_put_8tap_smooth_4h_neon: 3.11 4.35 2.63 4.35 vp9_put_8tap_smooth_4hv_neon: 3.67 4.69 3.25 4.71 vp9_put_8tap_smooth_4v_neon: 5.45 7.27 4.49 6.52 vp9_put_8tap_smooth_8h_neon: 5.97 8.18 4.81 8.56 vp9_put_8tap_smooth_8hv_neon: 6.39 7.90 5.64 8.15 vp9_put_8tap_smooth_8v_neon: 9.03 11.84 8.07 11.51 vp9_put_8tap_smooth_64h_neon: 6.78 9.48 4.88 10.89 vp9_put_8tap_smooth_64hv_neon: 6.99 8.87 5.94 9.56 vp9_put_8tap_smooth_64v_neon: 10.69 13.30 9.43 14.34 For the larger 8tap filters, the speedup vs C code is around 5-14x. This is significantly faster than libvpx's implementation of the same functions, at least when comparing the put_8tap_smooth_64 functions (compared to vpx_convolve8_horiz_neon and vpx_convolve8_vert_neon from libvpx). Absolute runtimes from checkasm: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_64h_neon: 20150.3 14489.4 19733.6 10863.7 libvpx vpx_convolve8_horiz_neon: 52623.3 19736.4 21907.7 25027.7 vp9_put_8tap_smooth_64v_neon: 14455.0 12303.9 13746.4 9628.9 libvpx vpx_convolve8_vert_neon: 42090.0 17706.2 17659.9 16941.2 Thus, on the A9, the horizontal filter is only marginally faster than libvpx, while our version is significantly faster on the other cores, and the vertical filter is significantly faster on all cores. The difference is especially large on the A7. The libvpx implementation does the accumulation in 32 bit, which probably explains most of the differences. Signed-off-by: Martin Storsjö <martin@martin.st>
* vp9: Flip the order of arguments in MC functionsMartin Storsjö2016-11-035-75/+69
| | | | | | | | | This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>
* bink: Have function pointer prototype match implementationDiego Biurrun2016-11-021-1/+3
| | | | libavcodec/binkdsp.c(156) : warning C4028: formal parameter 1 different from declaration
* idct: Have function pointer prototype match implementationDiego Biurrun2016-11-021-3/+5
| | | | libavcodec/idctdsp.c(175) : warning C4028: formal parameter 2 different from declaration
* aactab: Move extern keyword to the front of array declarationsDiego Biurrun2016-11-021-2/+2
| | | | libavcodec/aactab.h:49:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]
* qsv: Be informative when reporting that no data has been consumedLuca Barbato2016-10-301-1/+1
|
* Use avpriv_request_sample() where appropriateDiego Biurrun2016-10-292-13/+4
|
* srt: Adjust signedness of sscanf format stringsDiego Biurrun2016-10-281-1/+1
| | | | Fixes several warnings from -Wformat.
* dxtory: Drop nonsense ISO C printf conversion specifiers for standard typesDiego Biurrun2016-10-281-3/+3
|
* Use ISO C printf conversion specifiers where appropriateDiego Biurrun2016-10-285-12/+14
|
* hap: Adjust printf length modifiers to match variable typesDiego Biurrun2016-10-281-1/+1
| | | | | libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t {aka unsigned int}’ [-Wformat=] libavcodec/hapenc.c:121:20: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=]
* Adjust printf conversion specifiers to match variable signednessDiego Biurrun2016-10-285-10/+10
|
* dnxhdenc: Drop pointless, commented-out debug outputDiego Biurrun2016-10-271-9/+0
|
* h264_loopfilter: Do not print value of uninitialized variableDiego Biurrun2016-10-271-1/+1
| | | | libavcodec/h264_loopfilter.c:531:111: warning: variable 'edge' is uninitialized when used here [-Wuninitialized]
* mpegaudio: Do not print value of uninitialized variableDiego Biurrun2016-10-271-2/+2
| | | | libavcodec/mpegaudiodec_template.c:885:97: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
* vaapi_decode: Remove vestigial unmap codeMark Thompson2016-10-241-4/+1
| | | | | | | The buffer map/unmap code was in an early version of this before it was committed, but the unmap was never removed. While wrong, this was harmless (and therefore unnoticed) because the buffers can't be mapped at this point - all drivers just did nothing with the call.
* vaapi_decode: Clear parameter buffers to fix picture reuseMark Thompson2016-10-241-0/+1
| | | | | | When decoding interlaced pictures, the structure is reused to render to the same surface twice. The parameter buffers were not being cleared, which caused the i965 driver to error out.
* vaapi_h264: fix RefPicList[] field flags.Gwenole Beauchesne2016-10-241-1/+2
| | | | | | | | | | | Use new H264Ref.reference field to track field picture flags. The H264Picture.reference flag in DPB is now irrelevant here. This is a regression from git commit a12d3188, and that affected multiple interlaced video streams. Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Signed-off-by: Mark Thompson <sw@jkqxz.net>
* hevc: x86: Add add_residual() SIMD optimizationsPierre Edouard Lepere2016-10-224-4/+416
| | | | | | | Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>, extended by James Almer <jamrial@gmail.com>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
* lavu: Add JEDEC P22 color primariesVittorio Giovara2016-10-212-1/+2
|
* hevc: factor out a repeated conditionAnton Khirnov2016-10-211-12/+8
|
* hevc: move the SliceType enum to hevc.hAnton Khirnov2016-10-217-52/+54
| | | | | Those values are decoder-independent and are also use by the VA-API encoder.
* audiodsp: x86: Remove pointless header fileDiego Biurrun2016-10-192-26/+3
| | | | | Its single forward declaration can be moved to the only place it is used, like is done for all other dsp init files.