util: Add ALIGN_POWER2

Add static inline function to align a value to it's next power of 2. This is commonly done by a SWAR like the one in: http://aggregate.org/MAGIC/#Next Largest Power of 2 However a microbench shows that the implementation herer is a faster. It doesn't really impact the possible user of this function, but it's interesting nonetheless. Using a x86_64 i7 Ivy Bridge it shows a ~4% advantage by using clz instead instead of the OR and SHL chain. And this is by using a BSR since Ivy Bridge doesn't have LZCNT. New Haswell processors have the LZCNT instruction which can make this even better. ARM also has a CLZ instruction so it should be better, too. Code used to test: ... v = val[i]; t1 = get_cycles(0); a = ALIGN_POWER2(v); t1 = get_cycles(t1); t2 = get_cycles(0); v = nlpo2(v); t2 = get_cycles(t2); printf("%u\t%llu\t%llu\t%d\n", v, t1, t2, v == a); ... In which val is an array of 20 random unsigned int, nlop2 is the SWAR implementation and get_cycles uses RDTSC to measure the performance. Averages: ALIGN_POWER2: 30 cycles nlop2: 31.4 cycles
author: Lucas De Marchi <lucas.demarchi@intel.com> 2013-08-22 01:10:13 -0300
committer: Lucas De Marchi <lucas.demarchi@intel.com> 2013-09-20 01:08:46 -0500
commit: 3ba7f59e84857eb4dbe56a68fc7a3ffe8a650393 (patch)
tree: 14f75fb0dcf0469fdf2aaeecd658ff97fb97cac5
parent: 6506ddf5a37849049509324eeff72697f94584e3 (diff)
download: kmod-3ba7f59e84857eb4dbe56a68fc7a3ffe8a650393.tar.gz
1 files changed, 5 insertions, 0 deletions
diff --git a/libkmod/libkmod-util.h b/libkmod/libkmod-util.h
index f7f3e90..8a70aeb 100644
--- a/libkmod/libkmod-util.h
+++ b/libkmod/libkmod-util.h
@@ -51,3 +51,8 @@ do {						\
 	} *__p = (typeof(__p)) (ptr);		\
 	__p->__v = (val);			\
 } while(0)
+
+static _always_inline_ unsigned int ALIGN_POWER2(unsigned int u)
+{
+	return 1 << ((sizeof(u) * 8) - __builtin_clz(u - 1));
+}
author	Lucas De Marchi <lucas.demarchi@intel.com>	2013-08-22 01:10:13 -0300
committer	Lucas De Marchi <lucas.demarchi@intel.com>	2013-09-20 01:08:46 -0500
commit	3ba7f59e84857eb4dbe56a68fc7a3ffe8a650393 (patch)
tree	14f75fb0dcf0469fdf2aaeecd658ff97fb97cac5
parent	6506ddf5a37849049509324eeff72697f94584e3 (diff)
download	kmod-3ba7f59e84857eb4dbe56a68fc7a3ffe8a650393.tar.gz