Faster MULT32_32_Q31 for ARM.

Uses a C implementation with a 32*32 => 64 multiplication, which ARM has. Speeds up decoding of a 64 kbps test file by 0.5MHz on an ARM7TDMI and 1.0MHz on an ARM9TDMI. 0.2% speedup on a 96 kbps enc+dec test on a Cortex A8. Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>
author: Nils Wallménius <nils@rockbox.org> 2013-05-22 23:05:07 +0200
committer: Timothy B. Terriberry <tterribe@xiph.org> 2013-05-22 15:33:22 -0700
commit: 70485d895487563b0558ff5c7e52fd2f3d4ee2ef (patch)
tree: 0f1fd33825f2a7cbf4a5c4a05099793a339007a7 /celt/arm
parent: 85ede2c6aa066da29fce5186394f46927358be3b (diff)
download: opus-70485d895487563b0558ff5c7e52fd2f3d4ee2ef.tar.gz
1 files changed, 5 insertions, 0 deletions
diff --git a/celt/arm/fixed_armv4.h b/celt/arm/fixed_armv4.h
index 73e4f434..bcacc343 100644
--- a/celt/arm/fixed_armv4.h
+++ b/celt/arm/fixed_armv4.h
@@ -68,4 +68,9 @@ static inline opus_val32 MULT16_32_Q15_armv4(opus_val16 a, opus_val32 b)
 #undef MAC16_32_Q15
 #define MAC16_32_Q15(c, a, b) ADD32(c, MULT16_32_Q15(a, b))
 
+
+/** 32x32 multiplication, followed by a 31-bit shift right. Results fits in 32 bits */
+#undef MULT32_32_Q31
+#define MULT32_32_Q31(a,b) (opus_val32)((((opus_int64)(a)) * ((opus_int64)(b)))>>31)
+
 #endif
author	Nils Wallménius <nils@rockbox.org>	2013-05-22 23:05:07 +0200
committer	Timothy B. Terriberry <tterribe@xiph.org>	2013-05-22 15:33:22 -0700
commit	70485d895487563b0558ff5c7e52fd2f3d4ee2ef (patch)
tree	0f1fd33825f2a7cbf4a5c4a05099793a339007a7 /celt/arm
parent	85ede2c6aa066da29fce5186394f46927358be3b (diff)
download	opus-70485d895487563b0558ff5c7e52fd2f3d4ee2ef.tar.gz