avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm

This uses Stein's binary GCD algorithm: https://en.wikipedia.org/wiki/Binary_GCD_algorithm to get a roughly 4x speedup over Euclidean GCD on standard architectures with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise. At the moment, the compiler intrinsic is used on GCC and Clang due to its easy availability. Quick note regarding overflow: yes, subtractions on int64_t can, but the llabs takes care of that. The llabs is also guaranteed to be safe, with no annoying INT64_MIN business since INT64_MIN being a power of 2, is shifted down before being sent to llabs. The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On GCC, this is provided by a built-in. On Microsoft, there is a BitScanForward64 analog of BitScanForward that should work; but I can't confirm. Apparently it is not available on 32 bit builds; so this may or may not work correctly. On Intel, per the documentation there is only an intrinsic for _bit_scan_forward and people have posted on forums regarding _bit_scan_forward64, but often their documentation is woeful. Again, I don't have it, so I can't test. As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest use a compiled version based on the De-Bruijn method of Leiserson et al: http://supertech.csail.mit.edu/papers/debruijn.pdf. Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell) with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a make fate. aac-am00_88.err: builtin: 714 decicycles in av_gcd, 4095 runs, 1 skips de-bruijn: 1440 decicycles in av_gcd, 4096 runs, 0 skips previous: 2889 decicycles in av_gcd, 4096 runs, 0 skips Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
author: Ganesh Ajjanagadde <gajjanagadde@gmail.com> 2015-10-10 21:58:47 -0400
committer: Michael Niedermayer <michael@niedermayer.cc> 2015-10-11 04:08:41 +0200
commit: 971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c (patch)
tree: 68b3c2a368a21f02fc06dde5ab5f75d3d7b44296 /libavutil/mathematics.c
parent: 1e7e4f13f95227d79bc8ab9a2167f02f7a3e063f (diff)
download: ffmpeg-971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c.tar.gz
1 files changed, 21 insertions, 5 deletions
diff --git a/libavutil/mathematics.c b/libavutil/mathematics.c
index 252794e460..16e4eba5b9 100644
--- a/libavutil/mathematics.c
+++ b/libavutil/mathematics.c
@@ -27,16 +27,32 @@
 #include <limits.h>
 
 #include "mathematics.h"
+#include "libavutil/intmath.h"
 #include "libavutil/common.h"
 #include "avassert.h"
 #include "version.h"
 
-int64_t av_gcd(int64_t a, int64_t b)
-{
-    if (b)
-        return av_gcd(b, a % b);
-    else
+/* Stein's binary GCD algorithm:
+ * https://en.wikipedia.org/wiki/Binary_GCD_algorithm */
+int64_t av_gcd(int64_t a, int64_t b) {
+    int za, zb, k;
+    int64_t u, v;
+    if (a == 0)
+        return b;
+    if (b == 0)
         return a;
+    za = ff_ctzll(a);
+    zb = ff_ctzll(b);
+    k  = FFMIN(za, zb);
+    u = llabs(a >> za);
+    v = llabs(b >> zb);
+    while (u != v) {
+        if (u > v)
+            FFSWAP(int64_t, v, u);
+        v -= u;
+        v >>= ff_ctzll(v);
+    }
+    return u << k;
 }
 
 int64_t av_rescale_rnd(int64_t a, int64_t b, int64_t c, enum AVRounding rnd)
author	Ganesh Ajjanagadde <gajjanagadde@gmail.com>	2015-10-10 21:58:47 -0400
committer	Michael Niedermayer <michael@niedermayer.cc>	2015-10-11 04:08:41 +0200
commit	971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c (patch)
tree	68b3c2a368a21f02fc06dde5ab5f75d3d7b44296 /libavutil/mathematics.c
parent	1e7e4f13f95227d79bc8ab9a2167f02f7a3e063f (diff)
download	ffmpeg-971d12b7f9d7be3ca8eb98e6c04ed521f83cbd3c.tar.gz