| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On my Debian "jessie" system, <stdatomic.h> provided by GCC 4.9 is busted
when Clang 3.5 tries to use it. Even a trivial program like this:
#include <stdatomic.h>
void
foo(void)
{
_Atomic(int) x;
atomic_fetch_add(&x, 1);
}
yields:
atomic.c:7:5: error: address argument to atomic operation must be a
pointer to integer or pointer ('_Atomic(int) *' invalid)
The Clang-specific version of ovs-atomic.h stills works, though, so this
commit works around the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change (i.e., with pthread locks for atomics on Windows),
the benchmark for cmap and hmap was as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 61070 ms
cmap iterate: 2750 ms
cmap search: 14238 ms
cmap destroy: 8354 ms
hmap insert: 1701 ms
hmap iterate: 985 ms
hmap search: 3755 ms
hmap destroy: 1052 ms
After this change, the benchmark is as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 3666 ms
cmap iterate: 365 ms
cmap search: 2016 ms
cmap destroy: 1331 ms
hmap insert: 1495 ms
hmap iterate: 1026 ms
hmap search: 4167 ms
hmap destroy: 1046 ms
So there is clearly a big improvement for cmap.
But the correspondig test on Linux (with gcc 4.6) yeilds the following:
./tests/ovstest test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 3917 ms
cmap iterate: 355 ms
cmap search: 871 ms
cmap destroy: 1158 ms
hmap insert: 1988 ms
hmap iterate: 1005 ms
hmap search: 5428 ms
hmap destroy: 980 ms
So for this particular test, except for "cmap search", Windows and
Linux have similar performance. Windows is around 2.5x slower in "cmap search"
compared to Linux. This has to be investigated.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
[With a lot of inputs and help from Jarno]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
| |
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an atomic variable is not serving to synchronize threads about
the state of other (atomic or non-atomic) variables, no memory barrier
is needed with the atomic operation. However, the default memory
order for an atomic operation is memory_order_seq_cst, which always
causes a system-wide locking of the memory bus and prevents both the
CPU and the compiler from reordering memory accesses accross the
atomic operation. This can add considerable stalls as each atomic
operation (regardless of memory order) always includes a memory
access.
In most cases we can let the compiler reorder memory accesses to
minimize the time we spend waiting for the completion of the atomic
memory accesses by using the relaxed memory order. This patch adds
helpers to make such accesses a little easier on the eye (and the
fingers :-), but does not try to hide them completely.
Following patches make use of these and remove all the (implied)
memory_order_seq_cst use from the OVS code base.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
ovs_refcount_unref() needs to syncronize with the other instances of
itself rather than with ovs_refcount_ref().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XenServer runs OVS in dom0, which is a 32-bit VM. As the build
environment lacks support for atomics, locked pthread atomics were
used with considerable performance hit.
This patch adds native support for ovs-atomic with 32-bit Pentium and
higher CPUs, when compiled with an older GCC. We use inline asm with
the cmpxchg8b instruction, which was a new instruction to Intel
Pentium processors. We do not expect anyone to run OVS on 486 or older
processor.
cmap benchmark before the patch on 32-bit XenServer build (uses
ovs-atomic-pthread):
$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 8835 ms
cmap iterate: 379 ms
cmap search: 6242 ms
cmap destroy: 1145 ms
After:
$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 711 ms
cmap iterate: 68 ms
cmap search: 353 ms
cmap destroy: 209 ms
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some supported XenServer build environments lack compiler support for
atomic operations. This patch provides native support for x86_64 on
GCC, which covers possible future 64-bit builds on XenServer.
Since this implementation is faster than the existing support prior to
GCC 4.7, especially for cmap inserts, we use this with GCC < 4.7 on
x86_64.
Example numbers with "tests/test-cmap benchmark 2000000 8 0.1" on
quad-core hyperthreaded laptop, built with GCC 4.6 -O2:
Using ovs-atomic-pthreads on x86_64:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 4725 ms
cmap iterate: 329 ms
cmap search: 5945 ms
cmap destroy: 911 ms
Using ovs-atomic-gcc4+ on x86_64:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 845 ms
cmap iterate: 58 ms
cmap search: 308 ms
cmap destroy: 295 ms
With the native support provided by this patch:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 530 ms
cmap iterate: 59 ms
cmap search: 305 ms
cmap destroy: 232 ms
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compiler implementations may provide sub-optimal support for a
memory_order passed in as a run-time value (ref.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).
Document that OVS atomics require the memory order to be passed in as
a compile-time constant.
It should be noted, however, that when inlining is disabled (i.e.,
compiling without optimization) even compile-time constants may be
passed as run-time values to (non-inlined) functions.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The definition of memory_order_relaxed included a compiler barrier,
while it is not necessary, and indeed the following text on
atomic_thread_fence and atomic_signal_fence contradicted that.
memory_order_consume and memory_order_acq_rel are also more thoroughly
described.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a reference counted object is also RCU protected the deletion of
the object's memory is always postponed. This allows
memory_order_relaxed to be used also for unreferencing, as RCU
quiescing provides a full memory barrier (it has to, or otherwise
there could be lingering accesses to objects after they are recycled).
Also, when access to the reference counted object is protected via a
mutex or a lock, the locking primitives provide the required memory
barrier functionality.
Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if
the refcount is non-zero and returns true if a reference was taken,
false otherwise. This can be used in combined RCU/refcount scenarios
where we have an RCU protected reference to an refcounted object, but
which may be unref'ed at any time. If ovs_refcount_try_ref_rcu()
fails, the object may still be safely used until the current thread
quiesces.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
| |
Add support for atomic compare_exchange operations.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use explicit variants of atomic operations for the ovs_refcount to
avoid the overhead of the default memory_order_seq_cst.
Adding a reference requires no memory ordering, as the calling thread
is already assumed to have protected access to the object being
reference counted. Hence, memory_order_relaxed is used for
ovs_refcount_ref(). ovs_refcount_read() does not change the reference
count, so it can also use memory_order_relaxed.
Unreferencing an object needs a release barrier, so that none of the
accesses to the protected object are reordered after the atomic
decrement operation. Additionally, an explicit acquire barrier is
needed before the object is recycled, to keep the subsequent accesses
to the object's memory from being reordered before the atomic
decrement operation.
This patch follows the memory ordering and argumentation discussed
here:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
OVS is slow when compiled with pthreads atomics. Add a generic note
in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment
provides additional detail.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some concern has been raised by Ben Pfaff that atomic_uint64_t may not
be portable. In particular on 32bit platforms that do not have atomic
64bit integers.
Now that there are no longer any users of atomic_uint64_t remove it
entirely. Also remove atomic_int64_t which has no users.
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
|
|
|
|
|
|
|
|
| |
Every implementation used this same code, so it makes sense to centralize
it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, the GCC 4+ and pthreads implementations of atomics have used
struct wrappers for their atomic types. This had the advantage of allowing
a mutex to be wrapped in, in some cases, and of better type-checking by
preventing stray uses of atomic variables other than through one of the
atomic_*() functions or macros. However, the mutex meant that an
atomic_destroy() function-like macro needed to be used. The struct wrapper
also made it impossible to define new atomic types that were compatible
with each other without using a typedef. For example, one could not simply
define a macro like
#define ATOMIC(TYPE) struct { TYPE value; }
and then have two declarations like:
ATOMIC(void *) x;
ATOMIC(void *) y;
and do anything with these objects that require type-compatibility, even
"&x == &y", because the two structs are not compatible. One can do it
through a typedef:
typedef ATOMIC(void *) atomic_voidp;
atomic_voidp x, y;
but that is inconvenient, especially because of the need to invent a name
for the type.
This commit aims to ease the problem by getting rid of the wrapper structs
in the cases where the atomic library used them. It gets rid of the
mutexes, in the cases where they are still needed, by using a global
array of mutexes instead.
This commit also defines the ATOMIC macro described above and documents
its use in ovs-atomic.h.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a thin wrapper around an atomic_uint. It is useful anyhow because
each ovs_refcount_ref() or ovs_refcount_unref() call saves a few lines of
code.
This commit also changes all the potential direct users over to use the new
data structure.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
| |
C11 is able to require that atomics don't need to be destroyed, but some
of the OVS implementations do.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Standard C11 doesn't need these functions because it is able to require
implementations not to need them. But we can't construct a portable
implementation that does not need them in every case, so this commit adds
them.
These functions are only needed for atomic_flag objects that are
dynamically allocated (because statically allocated objects can use
ATOMIC_FLAG_INIT). So far there aren't any of those, but an upcoming
commit will introduce one.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
|
|
|
|
|
|
|
|
|
| |
With this implementation I get warnings with Clang on GNU/Linux when the
previous patch is not applied. This ought to make it easier to avoid
introducing new problems in the future even without building on FreeBSD.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ed Maste <emaste@freebsd.org>
|
|
|
|
|
|
|
|
|
|
| |
We found out earlier that GCC sometimes produces an error only at link time
for atomic built-ins that are not supported on a platform. This actually
tries the link at configure time and should thus reliably detect whether
the atomic built-ins are really supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
|
|
This library should prove useful for the threading changes coming up.
The following commit introduces one (very simple) user.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
|