| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
autocorrelate_c: 644.0
autocorrelate_neon: 420.0
hf_apply_noise_0_c: 1688.5
hf_apply_noise_0_neon: 1498.6
hf_apply_noise_1_c: 1691.2
hf_apply_noise_1_neon: 1500.6
hf_apply_noise_2_c: 1688.1
hf_apply_noise_2_neon: 1500.3
hf_apply_noise_3_c: 1696.6
hf_apply_noise_3_neon: 1502.2
hf_g_filt_c: 2117.8
hf_g_filt_neon: 1218.7
hf_gen_c: 4573.4
hf_gen_neon: 2461.0
neg_odd_64_c: 72.0
neg_odd_64_neon: 64.7
qmf_deint_bfly_c: 1107.6
qmf_deint_bfly_neon: 291.6
qmf_deint_neg_c: 210.4
qmf_deint_neg_neon: 107.4
qmf_post_shuffle_c: 163.0
qmf_post_shuffle_neon: 107.7
qmf_pre_shuffle_c: 120.5
qmf_pre_shuffle_neon: 110.7
sum64x5_c: 1361.6
sum64x5_neon: 435.4
sum_square_c: 1686.4
sum_square_neon: 787.2
|
|
|
|
|
|
|
| |
Add fixed poind code.
Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
| |
|
|
|
|
|
| |
Signed-off-by: Mirjana Vulin <mvulin@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
| |
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
|
|
|
|
| |
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|