summaryrefslogtreecommitdiff
path: root/lib/ovs-atomic.h
Commit message (Collapse)AuthorAgeFilesLines
* ovs-atomic: Prefer Clang intrinsics over <stdatomic.h>.Ben Pfaff2014-11-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | On my Debian "jessie" system, <stdatomic.h> provided by GCC 4.9 is busted when Clang 3.5 tries to use it. Even a trivial program like this: #include <stdatomic.h> void foo(void) { _Atomic(int) x; atomic_fetch_add(&x, 1); } yields: atomic.c:7:5: error: address argument to atomic operation must be a pointer to integer or pointer ('_Atomic(int) *' invalid) The Clang-specific version of ovs-atomic.h stills works, though, so this commit works around the problem. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
* ovs-atomics: Add atomic support Windows.Gurucharan Shetty2014-09-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change (i.e., with pthread locks for atomics on Windows), the benchmark for cmap and hmap was as follows: $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1 Benchmarking with n=10000000, 3 threads, 1.00% mutations: cmap insert: 61070 ms cmap iterate: 2750 ms cmap search: 14238 ms cmap destroy: 8354 ms hmap insert: 1701 ms hmap iterate: 985 ms hmap search: 3755 ms hmap destroy: 1052 ms After this change, the benchmark is as follows: $ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1 Benchmarking with n=10000000, 3 threads, 1.00% mutations: cmap insert: 3666 ms cmap iterate: 365 ms cmap search: 2016 ms cmap destroy: 1331 ms hmap insert: 1495 ms hmap iterate: 1026 ms hmap search: 4167 ms hmap destroy: 1046 ms So there is clearly a big improvement for cmap. But the correspondig test on Linux (with gcc 4.6) yeilds the following: ./tests/ovstest test-cmap benchmark 10000000 3 1 Benchmarking with n=10000000, 3 threads, 1.00% mutations: cmap insert: 3917 ms cmap iterate: 355 ms cmap search: 871 ms cmap destroy: 1158 ms hmap insert: 1988 ms hmap iterate: 1005 ms hmap search: 5428 ms hmap destroy: 980 ms So for this particular test, except for "cmap search", Windows and Linux have similar performance. Windows is around 2.5x slower in "cmap search" compared to Linux. This has to be investigated. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> [With a lot of inputs and help from Jarno] Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
* lib/ovs-atomic: Add atomic_count.Jarno Rajahalme2014-08-291-0/+57
| | | | | Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Add helpers for relaxed atomic access.Jarno Rajahalme2014-08-291-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | When an atomic variable is not serving to synchronize threads about the state of other (atomic or non-atomic) variables, no memory barrier is needed with the atomic operation. However, the default memory order for an atomic operation is memory_order_seq_cst, which always causes a system-wide locking of the memory bus and prevents both the CPU and the compiler from reordering memory accesses accross the atomic operation. This can add considerable stalls as each atomic operation (regardless of memory order) always includes a memory access. In most cases we can let the compiler reorder memory accesses to minimize the time we spend waiting for the completion of the atomic memory accesses by using the relaxed memory order. This patch adds helpers to make such accesses a little easier on the eye (and the fingers :-), but does not try to hide them completely. Following patches make use of these and remove all the (implied) memory_order_seq_cst use from the OVS code base. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Clarified comments on ovs_refcount_unref().Jarno Rajahalme2014-08-291-3/+5
| | | | | | | | ovs_refcount_unref() needs to syncronize with the other instances of itself rather than with ovs_refcount_ref(). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Native support for 32-bit 586 with GCC.Jarno Rajahalme2014-08-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XenServer runs OVS in dom0, which is a 32-bit VM. As the build environment lacks support for atomics, locked pthread atomics were used with considerable performance hit. This patch adds native support for ovs-atomic with 32-bit Pentium and higher CPUs, when compiled with an older GCC. We use inline asm with the cmpxchg8b instruction, which was a new instruction to Intel Pentium processors. We do not expect anyone to run OVS on 486 or older processor. cmap benchmark before the patch on 32-bit XenServer build (uses ovs-atomic-pthread): $ tests/ovstest test-cmap benchmark 2000000 8 0.1 Benchmarking with n=2000000, 8 threads, 0.10% mutations: cmap insert: 8835 ms cmap iterate: 379 ms cmap search: 6242 ms cmap destroy: 1145 ms After: $ tests/ovstest test-cmap benchmark 2000000 8 0.1 Benchmarking with n=2000000, 8 threads, 0.10% mutations: cmap insert: 711 ms cmap iterate: 68 ms cmap search: 353 ms cmap destroy: 209 ms Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Native support for x86_64 with GCC.Jarno Rajahalme2014-08-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some supported XenServer build environments lack compiler support for atomic operations. This patch provides native support for x86_64 on GCC, which covers possible future 64-bit builds on XenServer. Since this implementation is faster than the existing support prior to GCC 4.7, especially for cmap inserts, we use this with GCC < 4.7 on x86_64. Example numbers with "tests/test-cmap benchmark 2000000 8 0.1" on quad-core hyperthreaded laptop, built with GCC 4.6 -O2: Using ovs-atomic-pthreads on x86_64: Benchmarking with n=2000000, 8 threads, 0.10% mutations: cmap insert: 4725 ms cmap iterate: 329 ms cmap search: 5945 ms cmap destroy: 911 ms Using ovs-atomic-gcc4+ on x86_64: Benchmarking with n=2000000, 8 threads, 0.10% mutations: cmap insert: 845 ms cmap iterate: 58 ms cmap search: 308 ms cmap destroy: 295 ms With the native support provided by this patch: Benchmarking with n=2000000, 8 threads, 0.10% mutations: cmap insert: 530 ms cmap iterate: 59 ms cmap search: 305 ms cmap destroy: 232 ms Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Require memory_order be constant.Jarno Rajahalme2014-08-051-0/+4
| | | | | | | | | | | | | | | | Compiler implementations may provide sub-optimal support for a memory_order passed in as a run-time value (ref. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). Document that OVS atomics require the memory order to be passed in as a compile-time constant. It should be noted, however, that when inlining is disabled (i.e., compiling without optimization) even compile-time constants may be passed as run-time values to (non-inlined) functions. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Elaborate memory_order documentation.Jarno Rajahalme2014-08-051-7/+50
| | | | | | | | | | | | The definition of memory_order_relaxed included a compiler barrier, while it is not necessary, and indeed the following text on atomic_thread_fence and atomic_signal_fence contradicted that. memory_order_consume and memory_order_acq_rel are also more thoroughly described. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Add ovs_refcount_unref_relaxed(), ovs_refcount_try_ref_rcu().Jarno Rajahalme2014-07-071-0/+74
| | | | | | | | | | | | | | | | | | | | | | When a reference counted object is also RCU protected the deletion of the object's memory is always postponed. This allows memory_order_relaxed to be used also for unreferencing, as RCU quiescing provides a full memory barrier (it has to, or otherwise there could be lingering accesses to objects after they are recycled). Also, when access to the reference counted object is protected via a mutex or a lock, the locking primitives provide the required memory barrier functionality. Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if the refcount is non-zero and returns true if a reference was taken, false otherwise. This can be used in combined RCU/refcount scenarios where we have an RCU protected reference to an refcounted object, but which may be unref'ed at any time. If ovs_refcount_try_ref_rcu() fails, the object may still be safely used until the current thread quiesces. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* lib/ovs-atomic: Add atomic compare_exchange.Jarno Rajahalme2014-07-071-1/+27
| | | | | | Add support for atomic compare_exchange operations. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* ovs-atomic: Use explicit memory order for ovs_refcount.Jarno Rajahalme2014-07-071-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | Use explicit variants of atomic operations for the ovs_refcount to avoid the overhead of the default memory_order_seq_cst. Adding a reference requires no memory ordering, as the calling thread is already assumed to have protected access to the object being reference counted. Hence, memory_order_relaxed is used for ovs_refcount_ref(). ovs_refcount_read() does not change the reference count, so it can also use memory_order_relaxed. Unreferencing an object needs a release barrier, so that none of the accesses to the protected object are reordered after the atomic decrement operation. Additionally, an explicit acquire barrier is needed before the object is recycled, to keep the subsequent accesses to the object's memory from being reordered before the atomic decrement operation. This patch follows the memory ordering and argumentation discussed here: http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* INSTALL: Note about compiler atomics support.Jarno Rajahalme2014-06-041-0/+3
| | | | | | | | OVS is slow when compiled with pthreads atomics. Add a generic note in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment provides additional detail. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
* ovs-atomic: Remove atomic_uint64_t and atomic_int64_t.Simon Horman2014-05-161-4/+0
| | | | | | | | | | | | | Some concern has been raised by Ben Pfaff that atomic_uint64_t may not be portable. In particular on 32bit platforms that do not have atomic 64bit integers. Now that there are no longer any users of atomic_uint64_t remove it entirely. Also remove atomic_int64_t which has no users. Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Pfaff <blp@nicira.com>
* ovs-atomic: Delete atomic, atomic_flag, ovs_refcount destroy functions.Ben Pfaff2014-03-131-29/+1
| | | | | | | | None of the atomic implementations need a destroy function anymore, so it's "more standard" and more convenient for users to get rid of them. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* ovs-atomic-types: Move into ovs-atomic.h.Ben Pfaff2014-03-131-1/+41
| | | | | | | | Every implementation used this same code, so it makes sense to centralize it. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* ovs-atomic: Use raw types, not structs, when locks are required.Ben Pfaff2014-03-131-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, the GCC 4+ and pthreads implementations of atomics have used struct wrappers for their atomic types. This had the advantage of allowing a mutex to be wrapped in, in some cases, and of better type-checking by preventing stray uses of atomic variables other than through one of the atomic_*() functions or macros. However, the mutex meant that an atomic_destroy() function-like macro needed to be used. The struct wrapper also made it impossible to define new atomic types that were compatible with each other without using a typedef. For example, one could not simply define a macro like #define ATOMIC(TYPE) struct { TYPE value; } and then have two declarations like: ATOMIC(void *) x; ATOMIC(void *) y; and do anything with these objects that require type-compatibility, even "&x == &y", because the two structs are not compatible. One can do it through a typedef: typedef ATOMIC(void *) atomic_voidp; atomic_voidp x, y; but that is inconvenient, especially because of the need to invent a name for the type. This commit aims to ease the problem by getting rid of the wrapper structs in the cases where the atomic library used them. It gets rid of the mutexes, in the cases where they are still needed, by using a global array of mutexes instead. This commit also defines the ATOMIC macro described above and documents its use in ovs-atomic.h. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* ovs-atomic: Introduce a new 'struct ovs_refcount'.Ben Pfaff2014-01-081-0/+61
| | | | | | | | | | | This is a thin wrapper around an atomic_uint. It is useful anyhow because each ovs_refcount_ref() or ovs_refcount_unref() call saves a few lines of code. This commit also changes all the potential direct users over to use the new data structure. Signed-off-by: Ben Pfaff <blp@nicira.com>
* ovs-atomic: Add atomic_destroy() and use everywhere it is needed.Ben Pfaff2014-01-081-2/+11
| | | | | | | C11 is able to require that atomics don't need to be destroyed, but some of the OVS implementations do. Signed-off-by: Ben Pfaff <blp@nicira.com>
* ovs-atomic: New functions atomic_flag_init(), atomic_flag_destroy().Ben Pfaff2014-01-081-0/+22
| | | | | | | | | | | | | | | Standard C11 doesn't need these functions because it is able to require implementations not to need them. But we can't construct a portable implementation that does not need them in every case, so this commit adds them. These functions are only needed for atomic_flag objects that are dynamically allocated (because statically allocated objects can use ATOMIC_FLAG_INIT). So far there aren't any of those, but an upcoming commit will introduce one. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
* ovs-atomic: Add native Clang implementation.Ben Pfaff2013-08-261-0/+2
| | | | | | | | | With this implementation I get warnings with Clang on GNU/Linux when the previous patch is not applied. This ought to make it easier to avoid introducing new problems in the future even without building on FreeBSD. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
* ovs-atomic: Fix typo in comment.Ben Pfaff2013-08-211-1/+1
| | | | | Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ed Maste <emaste@freebsd.org>
* configure: Add configure-time check for GCC 4.0+ atomic built-ins.Ben Pfaff2013-07-311-1/+1
| | | | | | | | | | We found out earlier that GCC sometimes produces an error only at link time for atomic built-ins that are not supported on a platform. This actually tries the link at configure time and should thus reliably detect whether the atomic built-ins are really supported. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>
* ovs-atomic: New library for atomic operations.Ben Pfaff2013-06-281-0/+250
This library should prove useful for the threading changes coming up. The following commit introduces one (very simple) user. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>