QPID-2920: Updates to new-cluster-plan.txt

git-svn-id: https://svn.apache.org/repos/asf/qpid/branches/qpid-2920-active@1195515 13f79535-47bb-0310-9956-ffa450edef68
author: Alan Conway <aconway@apache.org> 2011-10-31 15:22:38 +0000
committer: Alan Conway <aconway@apache.org> 2011-10-31 15:22:38 +0000
commit: ced3bf2deb520d5b55a1e491b62fcfc384a07584 (patch)
tree: 7008b9c7cf1bea17d4b13e8b81c24060e776cf3d
parent: adcaecbdb26674008dab4df11b15db5032115ce1 (diff)
download: qpid-python-ced3bf2deb520d5b55a1e491b62fcfc384a07584.tar.gz
1 files changed, 32 insertions, 11 deletions
diff --git a/qpid/cpp/design_docs/new-cluster-plan.txt b/qpid/cpp/design_docs/new-cluster-plan.txt
index a6689dc6bd..757e421f4e 100644
--- a/qpid/cpp/design_docs/new-cluster-plan.txt
+++ b/qpid/cpp/design_docs/new-cluster-plan.txt
@@ -78,7 +78,7 @@ Implements multiple CPG groups for better concurrency.
 Multicast using fixed-size (64k) buffers, allow fragmetation of messages across buffers (frame by frame)
 
 * Design Questions
-** [[Queue sequence numbers vs. independant message IDs]] 
+** [[Queue sequence numbers vs. independant message IDs]]
 
 Current prototype uses queue+sequence number to identify message. This
 is tricky for updating new members as the sequence numbers are only
@@ -94,12 +94,21 @@ Throughput worse by 30% in contented case, 10% in uncontended.
 
 * Tasks to match existing cluster
 ** TODO [#A] Review old cluster code for more tasks. 1
+** TODO [#A] Put cluster enqueue after all policy & other checks.
+   SCHEDULED: <2011-10-31 Mon>
+
+gsim points out that we do policy check after multicasting enqueue so
+could have inconsistent. Multicast should be after enqueue and any
+other code that may decide to send/not send the message.
+
 ** TODO [#A] Defer and async completion of wiring commands. 5
 Testing requirement: Many tests assume wiring changes are visible
-across the cluster once the commad completes.
+across the cluster once the wiring commad completes.
 
-Name clashes: need to avoid race if same name queue/exchange declared
-on 2 brokers simultaneously
+Name clashes: avoid race if same name queue/exchange declared on 2
+brokers simultaneously.
+
+Clashes with non-replicated: see [[Allow non-replicated]] below.
 
 ** TODO [#A] Update to new members joining. 10.
 
@@ -152,12 +161,15 @@ Status includes
 - persistent store state (clean, dirty)
 - make it extensible, so additional state can be added in new protocols
 
+Clean store if last man standing or clean shutdown.
+Need to add multicast controls for shutdown.
+
 ** TODO [#B] Persistent cluster startup. 4
 
 Based on existing code:
 - Exchange dirty/clean exchanged in initial status.
 - Only one broker recovers from store, others update.
-** TODO [#B] Replace boost::hash with our own hash function. 1  
+** TODO [#B] Replace boost::hash with our own hash function. 1
 The hash function is effectively part of the interface so
 we need to be sure it doesn't change underneath us.
 
@@ -165,13 +177,13 @@ we need to be sure it doesn't change underneath us.
 Alerts for inconsistent message loss.
 
 ** TODO [#B] Management methods that modify queues. 5
+
 Replicate management methods that modify queues - e.g. move, purge.
 Target broker may not have all messages on other brokers for purge/destroy.
-- Queue::move() - need to wait for lock? Replicate?
+- Queue::purge() - wait for lock, purge local, mcast dequeues.
+- Queue::move() - wait for lock, move msgs (mcasts enqueues), mcast dequeues.
+- Queue::destroy() - messages to alternate exchange on all brokers.
 - Queue::get() - ???
-- Queue::purge() - replicate purge? or just delete what's on broker ?
-- Queue::destroy() - messages to alternate exchange on all brokers.?
-
 Need to add callpoints & mcast messages to replicate these?
 ** TODO [#B] TX transaction support. 5
 Extend broker::Cluster interface to capture transaction context and completion.
@@ -195,6 +207,11 @@ Extend broker::Cluster interface to capture transaction context and completion.
 Running brokers exchange DTX information.
 New broker update includes DTX information.
 
+** TODO [#B] Replicate state for Fairshare?
+gsim: fairshare would need explicit code to keep it in sync across
+nodes; that may not be required however.
+** TODO [#B] Timed auto-delete queues?
+gsim: may need specific attention?
 ** TODO [#B] Async completion of accept. 4
 When this is fixed in the standalone broker, it should be fixed for cluster.
 
@@ -212,7 +229,7 @@ but fails on other(s) - e.g. due to store limits?
 - fail on non-local broker = possible duplication.
 
 We have more flexibility now, we don't *have* to crash
-- but we've lost some of our redundancy guarantee, how to inform user? 
+- but we've lost some of our redundancy guarantee, how to inform user?
 
 Options to respond to inconsistent error:
 - stop broker
@@ -242,10 +259,14 @@ Need to
 - save replicated status to stored (in arguments).
 - support in management tools.
 
+Avoid name clashes between replicated/non-replicated: multicast
+local-only names as well, all brokers keep a map and refuse to create
+clashes.
+
 ** TODO [#C] Handling immediate messages in a cluster. 2
 Include remote consumers in descision to deliver an immediate message.
 * Improvements over existing cluster
-** TODO [#C] Remove old cluster hacks and workarounds. 
+** TODO [#C] Remove old cluster hacks and workarounds.
 The old cluster has workarounds in the broker code that can be removed.
 - [ ] drop code to replicate management model.
 - [ ] drop timer workarounds for TTL, management, heartbeats.
author	Alan Conway <aconway@apache.org>	2011-10-31 15:22:38 +0000
committer	Alan Conway <aconway@apache.org>	2011-10-31 15:22:38 +0000
commit	ced3bf2deb520d5b55a1e491b62fcfc384a07584 (patch)
tree	7008b9c7cf1bea17d4b13e8b81c24060e776cf3d
parent	adcaecbdb26674008dab4df11b15db5032115ce1 (diff)
download	qpid-python-ced3bf2deb520d5b55a1e491b62fcfc384a07584.tar.gz