summaryrefslogtreecommitdiff
path: root/qpid/cpp/examples/messaging
diff options
context:
space:
mode:
authorAlan Conway <aconway@apache.org>2014-12-19 03:18:57 +0000
committerAlan Conway <aconway@apache.org>2014-12-19 03:18:57 +0000
commit40e74eaa3f8a345e7bc888e36de79717b7c761d0 (patch)
tree4d9a08cb40caf897b9d73c55deac60374d97eb0c /qpid/cpp/examples/messaging
parentaa51ac52f3bd77d92acf585699bc7429666ad785 (diff)
downloadqpid-python-40e74eaa3f8a345e7bc888e36de79717b7c761d0.tar.gz
QPID-6278: HA broker abort in TXN soak test
The crash appears to be a race condition in async completion exposed by the HA TX code code as follows: 1. Message received and placed on tx-replication queue, completion delayed till backups ack. Completion count goes up for each backup then down as each backup acks. 2. Prepare received, message placed on primary's local persistent queue. Completion count goes up one then down one for local store completion (null store in this case). The race is something like this: - last backup ack arrives (on backup IO thread) and drops completion count to 0. - prepare arrives (on client thread) null store bumps count to 1 and immediately drops to 0. - both threads try to invoke the completion callback, one deletes it while the other is still invoking. The old completion logic assumed that only one thread can see the atomic counter go to 0. It does not handle the count going to 0 in one thread and concurrently being increased and decreased back to 0 in another. This case is introduced by HA transactions because the same message is put onto a tx-replication queue and then put again onto another persistent local queue, so there are two cycles of completion. The new logic fixes this only one call to completion callback is possible in all cases. Also fixed missing lock in ha/Primary.cpp. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1646618 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'qpid/cpp/examples/messaging')
0 files changed, 0 insertions, 0 deletions