| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
Add a POWER8 and POWER9 version of the autocorrelation functions.
flac --best is about 3.3x faster on POWER9 with this patch.
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
|
|
|
|
|
|
|
|
|
| |
Removes FLAC__lpc_restore_signal_16_intrin_sse2() which was faster
than than C code, but not faster than MMX-accelerated ASM functions.
It's also slower than the new SSE4.1 functions that were added by
the previous patch.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes FLAC__lpc_restore_signal_16_intrin_sse2().
It's faster than C code, but not faster than MMX-accelerated
ASM functions. It's also slower than the new SSE4.1 functions
that were added by the previous patch.
So this function wasn't very useful before, and now it's
even less useful. I don't see a reason to keep it.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
| |
As pointed out by Ozkan Sezer, on some platforms `int32_t` is actually
a typedef for `long` so `unsigned` cannot be used interchangably with
`FLAC__uint32`. Fix is to switch from `unsigned` to explicit sized ISO
C types defined in <stdint.h>.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Usage of internal aliases for float and double do not provide
substantial value. For integer-only libs, the macro
FLAC__INTEGER_ONLY_LIBRARY is used in the appropriate places
already.
Also, adapt copyright messages to include 2016.
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
Closes: https://github.com/xiph/flac/pull/10
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit http://git.xiph.org/?p=flac.git;a=commit;h=e9d805dd4374
changed the that calculate autocorrelation. However, the new code
worked slightly (about 4%) slower on Core 2, but with the new
presets the speed decrease can reach ~25%.
This patch enables both old and new functions and chooses between
them at runtime.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
| |
|
|
|
|
| |
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
| |
AMD stopped release new chips withe 3DNow in 2010.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
| |
Which in turn simplifies FLAC__lpc_restore_signal_16_intrin_sse2()
function.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removes FLAC__lpc_restore_signal_asm_ppc_altivec_16*
from lpc.h and stream_decoder.c
* Removes PPC-specific code from cpu.c and cpu.h
* Removes PPC stuff from libFLAC/Makefile.lite and build/*.mk
* Removes as/gas/PPC-specific stuff from configure.ac and
libFLAC/Makefile.am*
* Removes libFLAC/ppc folder and remove "src/libFLAC/ppc*/Makefile"
lines from configure.ac
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new function:
FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse41()
and rewrite function:
FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()
Testing shows noticeable speed increase on Intel Core i3/5/7 (up to 30%
for -8 mode), AMD Athlon64, Phenom, Bulldozer/Piledriver, but no increase
or even very small speed decrease (~2% for -8 mode) on Intel Core2.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
| |
This reverts commit 151739921b74fbf31420358a5fbeb094efa017ed.
This patch only when part way to replace all FLAC_* with FLaC_*
and its really not worth going all the way.
|
|
|
|
|
|
|
| |
Previous autorconf versions had problems with variable begining witj
'FLAC_' (autoconf uses 'AC_').
Reported-by: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
| |
The new functions are analogous to FLAC__lpc_restore_signal_asm_ia32_mmx.
FLAC uses them for x86-64 arch and also for ia32 if NASM is not available.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
| |
* Allow compiling using GCC GCC w/o SSE support.
* Allow SSE4.1 intrinsic functions to be enabled.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
|
|
| |
GCC generates slow ia32 code for FLAC__lpc_restore_signal_wide() and
FLAC__lpc_compute_residual_from_qlp_coefficients_wide() so 24-bit
encoding/decoding is slower for GCC compile than for MSVS or ICC
compile. This patch adds ia32 asm versions of these functions.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Splits lpc_x86intrin.c to lpc_intrin_sse.c and lpc_intrin_sse2.c
* Add FLAC__lpc_compute_residual_from_qlp_coefficients_intrin_sse2()
function to lpc_intrin_sse2.c
* Add lpc_intrin_sse41.c with two ..._wide_intrin_sse41() functions
(useful for 24-bit en-/decoding)
* Add precompute_partition_info_sums_intrin_sse2() / ...ssse3() and
disables precompute_partition_info_sums_32bit_asm_ia32_().
SSE2 version uses 4 SSE2 instructions instead of 1 SSSE3 instruction
PABSD so it is slightly slower.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
New functions are:
FLAC__lpc_compute_autocorrelation_intrin_sse_lag_4()
FLAC__lpc_compute_autocorrelation_intrin_sse_lag_8()
FLAC__lpc_compute_autocorrelation_intrin_sse_lag_12()
FLAC__lpc_compute_autocorrelation_intrin_sse_lag_16()
FLAC__lpc_compute_residual_from_qlp_coefficients_16_intrin_sse2()
Patch-from: lvqcl <lvqcl.mail@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
For the 32 bit x86 ASM functions there were already versions of this
function for lags (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it
means that during encoding FLAC also uses unaccelerated C function.
Patch-from: lvqcl <lvqcl.mail@gmail.com>
Signed-off-by: Erik de Castro Lopo <erikd@mega-nerd.com>
|
| |
|
| |
|
| |
|
|
|
|
| |
from datapath
|
| |
|
| |
|
|
|
|
| |
FLAC__lpc_compute_lp_coefficients() could cause an infinite loop later in FLAC__lpc_quantize_coefficients()
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
before lpc analysis
|
| |
|
| |
|
|
|
|
| |
either #ifdef'd out or written in fixed-point
|
|
|
|
| |
FLAC__real out of ordinals.h to src/libFLAC/include/private/float.h, add FLAC__double and FLAC__float and use these everywhere instead of double and float, and don't typedef FLAC__real/float/double when building in integer-only mode. still need to provide integer substitutes in several places.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
unused arg from quantizing routine
|
| |
|
| |
|
| |
|
| |
|