summaryrefslogtreecommitdiff
path: root/libavutil/intmath.h
Commit message (Collapse)AuthorAgeFilesLines
* lavu/riscv: add <intmath.h> optimisationsRémi Denis-Courmont2022-09-131-2/+3
| | | | | This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.
* lavu: rename and move ff_parity to av_parityJames Almer2016-01-071-9/+3
| | | | | | | av_popcount is not defined in intmath.h. Reviewed-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>
* lavu: add ff_parity()Clément Bœsch2016-01-071-0/+12
|
* lavu/intmath: add faster clz supportGanesh Ajjanagadde2015-12-191-0/+18
| | | | | | | | This should be useful for the sofalizer filter. Reviewed-by: Kieran Kunhya <kierank@ob-encoder.com> Reviewed-by: Clément Bœsch <u@pkh.me> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
* avutil/intmath: fix undefined behavior in ff_ctzll_c()Michael Niedermayer2015-10-221-1/+1
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* lavu/intmath.h: Move x86 only msvc/icl functions to x86 specific header.Matt Oliver2015-10-191-32/+2
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/intmath: use de Bruijn based ff_ctzGanesh Ajjanagadde2015-10-141-25/+7
| | | | | | | | | | | It has already been demonstrated that the de Bruijn method has benefits over the current implementation: commit 971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c. That commit implemented it for long long, this extends it to the int version. Tested with FATE. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* intmath: remove av_ctz.Ronald S. Bultje2015-10-111-8/+6
| | | | | It's a non-installed header and only used in one place (flacenc). Since ff_ctz is static inline, it's fine to use that instead.
* avutil/intmath: Change debruijn_ctz64 to use 8bit elementsMichael Niedermayer2015-10-111-1/+1
| | | | | | | This reduces the memory & cache need from 256 to 64 bytes the code also seems faster with this change Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithmGanesh Ajjanagadde2015-10-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses Stein's binary GCD algorithm: https://en.wikipedia.org/wiki/Binary_GCD_algorithm to get a roughly 4x speedup over Euclidean GCD on standard architectures with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise. At the moment, the compiler intrinsic is used on GCC and Clang due to its easy availability. Quick note regarding overflow: yes, subtractions on int64_t can, but the llabs takes care of that. The llabs is also guaranteed to be safe, with no annoying INT64_MIN business since INT64_MIN being a power of 2, is shifted down before being sent to llabs. The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On GCC, this is provided by a built-in. On Microsoft, there is a BitScanForward64 analog of BitScanForward that should work; but I can't confirm. Apparently it is not available on 32 bit builds; so this may or may not work correctly. On Intel, per the documentation there is only an intrinsic for _bit_scan_forward and people have posted on forums regarding _bit_scan_forward64, but often their documentation is woeful. Again, I don't have it, so I can't test. As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest use a compiled version based on the De-Bruijn method of Leiserson et al: http://supertech.csail.mit.edu/papers/debruijn.pdf. Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell) with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a make fate. aac-am00_88.err: builtin: 714 decicycles in av_gcd, 4095 runs, 1 skips de-bruijn: 1440 decicycles in av_gcd, 4096 runs, 0 skips previous: 2889 decicycles in av_gcd, 4096 runs, 0 skips Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* doxygen: Remove lavu_internal groupTimothy Gu2015-08-221-9/+0
| | | | There is no use in an internal group for a public API documentation.
* avutil/intmath: check for ICC before GCCJames Almer2015-07-181-9/+9
| | | | | | | | Intel compiler also defines __GNUC__, so the Intel specific intrinsics were not really being used. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* libavutil: add x86 optimized av_popcountJames Almer2015-02-251-0/+3
| | | | | Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/intmath: Add () to protect the ff_log2() argumentMichael Niedermayer2015-02-171-1/+1
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avutil/intmath: enable builtin intrinsics for icl and msvc.Matthew Oliver2014-10-261-4/+36
| | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* intmath.h: Remove duplicated ARM include.Reimar Döffinger2014-08-311-4/+0
| | | | Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
* Merge commit '5ff998a233d759d0de83ea6f95c383d03d25d88e'Michael Niedermayer2012-11-051-0/+55
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '5ff998a233d759d0de83ea6f95c383d03d25d88e': flacenc: use uint64_t for bit counts flacenc: remove wasted trailing 0 bits lavu: add av_ctz() for trailing zero bit count flacenc: use a separate buffer for byte-swapping for MD5 checksum on big-endian fate: aac: Place LATM tests and general AAC tests in different groups build: The A64 muxer depends on rawenc.o for ff_raw_write_packet() Conflicts: doc/APIchanges libavutil/version.h tests/fate/aac.mak Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * lavu: add av_ctz() for trailing zero bit countJustin Ruggles2012-11-051-0/+55
| |
* | Merge commit '2d09b36c0379fcda8f984bc8ad8816c8326fd7bd'Michael Niedermayer2012-10-211-0/+4
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '2d09b36c0379fcda8f984bc8ad8816c8326fd7bd': doc/platform: Add info on shared builds with MSVC doc/platform: Move a caveat down to the notes section ARM: reinstate optimised intmath.h ffv1: update to ffv1 version 3 Conflicts: doc/platform.texi libavcodec/ffv1.c libavcodec/ffv1.h libavcodec/ffv1dec.c libavcodec/ffv1enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * ARM: reinstate optimised intmath.hMans Rullgard2012-10-201-0/+4
| | | | | | | | | | | | Use of the ARM optimised intmath.h was accidentally dropped in 9734b8b. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge commit 'd15c21e5fa3961f10026da1a3080a3aa3cf4cec9'Michael Niedermayer2012-10-211-5/+46
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | * commit 'd15c21e5fa3961f10026da1a3080a3aa3cf4cec9': avutil: Add a copy of ff_sqrt_tab back into avutil to restore ABI compatibility avutil: make some tables visible again avutil: remove inline av_log2 from public API celp_math: rename ff_log2 to ff_log2_q15 Conflicts: libavutil/libavutil.v Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * avutil: remove inline av_log2 from public APIMans Rullgard2012-10-201-5/+46
| | | | | | | | | | | | | | | | This removes inline av_log2 and av_log2_16bit from the public API, instead exporting them as regular functions. In-tree code still gets the inline and otherwise optimised variants. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge commit '9734b8ba56d05e970c353dfd5baafa43fdb08024'Michael Niedermayer2012-10-121-33/+0
|\ \ | |/ | | | | | | | | | | | | | | | | | | * commit '9734b8ba56d05e970c353dfd5baafa43fdb08024': Move avutil tables only used in libavcodec to libavcodec. Conflicts: libavcodec/mathtables.c libavutil/intmath.h Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * Move avutil tables only used in libavcodec to libavcodec.Diego Biurrun2012-10-111-35/+0
| |
| * x86: remove FASTDIV inline asmMans Rullgard2012-08-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 4.3 and later do the right thing with the plain C code. Earlier versions in 32-bit mode generate one extra instruction, needlessly zeroing what would be the high half of the shifted value. At least two gcc configurations miscompile the inline asm in some situations. In 64-bit mode, all gcc versions generate imul r64, r64 followed by shr. On Intel i7 and later, this imul is faster 32-bit mul. On older Intel and all AMD, it is slightly slower. On Atom it is much slower. Considering where the FASTDIV macro is used, any overall negative performance impact of this change should be negligible. If anyone cares, they should file a bug against gcc and get the instruction selection fixed. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-221-5/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | * qatar/master: build: x86: Only compile mpegvideo optimizations when necessary configure: Drop fastdiv option build: Make the E-AC-3 encoder select the AC-3 encoder fate: flac: Only run tests requiring samples when samples are available Conflicts: configure Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * configure: Drop fastdiv optionDiego Biurrun2012-08-221-5/+1
| | | | | | | | | | | | There is no point in having the user disable any fastdiv macros. Besides the condition implementation was broken and only disabled the C implementation, but no platform specific assembly versions.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2011-11-231-0/+8
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (22 commits) aacdec: Fix PS in ADTS. avconv: Consistently use PIX_FMT_NONE. dsputil: use cpuflags in x86 emu_edge_core dsputil: use movups instead of movdqu in ff_emu_edge_core_sse() wma: initialize prev_block_len_bits, next_block_len_bits, and block_len_bits. mov: Remove some redundant and obsolete comments. Add libavutil/mathematics.h #includes for INFINITY doxy: structure libavformat groups doxy: introduce an empty structure in libavcodec doxy: provide a start page and document libavutil doxy: cleanup pixfmt.h regtest: split video encode/decode tests into individual targets ARM: add explicit .arch and .fpu directives to asm.S pthread: do not touch has_b_frames avconv: cleanup the transcoding loop in output_packet(). avconv: split subtitle transcoding out of output_packet(). avconv: split video transcoding out of output_packet(). avconv: split audio transcoding out of output_packet(). avconv: reindent. avconv: move streamcopy-only code out of decoding loop. ... Conflicts: avconv.c libavcodec/aaccoder.c libavcodec/pthread.c libavcodec/version.h libavutil/audioconvert.h libavutil/avutil.h libavutil/mem.h tests/ref/vsynth1/dv tests/ref/vsynth1/mpeg2thread tests/ref/vsynth2/dv tests/ref/vsynth2/mpeg2thread Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * doxy: provide a start page and document libavutilLuca Barbato2011-11-221-0/+8
| | | | | | | | | | | | Introduce a basic layout, the subpages are currently left empty. Split libavutil in multiple groups as example of the structure
| * Replace FFmpeg with Libav in licence headersMans Rullgard2011-03-191-4/+4
|/ | | | Signed-off-by: Mans Rullgard <mans@mansr.com>
* Remove macro duplication between common.h and intmath.hMåns Rullgård2010-07-071-14/+0
| | | | Originally committed as revision 24086 to svn://svn.ffmpeg.org/ffmpeg/trunk
* intmath: whitespace cosmeticsMåns Rullgård2010-07-071-14/+9
| | | | Originally committed as revision 24085 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix build on configurations without fast av_log2()Måns Rullgård2010-03-091-1/+18
| | | | | | | This is a bit hackish. I will try to think of something nicer, but this will do for now. Originally committed as revision 22366 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move ff_sqrt() to libavutil/intmath.hMåns Rullgård2010-03-081-0/+22
| | | | Originally committed as revision 22345 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Move FASTDIV macro to intmath.hMåns Rullgård2010-01-191-0/+18
| | | | Originally committed as revision 21335 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Optimise av_log2 with clz when availableMåns Rullgård2010-01-141-0/+41
10% faster flac decoding on x86 and ARM. Originally committed as revision 21217 to svn://svn.ffmpeg.org/ffmpeg/trunk