From a7c403f8f6d5622166f0c5027b0ceb972b06d6df Mon Sep 17 00:00:00 2001 From: Alan Conway Date: Wed, 12 Oct 2011 19:55:34 +0000 Subject: QPID-2920: Update new-cluster-plan.txt and new-cluster-design.txt. Filled out outstanding tasks in plan. Added comments on live upgrades to design and plan. git-svn-id: https://svn.apache.org/repos/asf/qpid/branches/qpid-2920-active@1182558 13f79535-47bb-0310-9956-ffa450edef68 --- qpid/cpp/design_docs/new-cluster-design.txt | 32 +++++++++-- qpid/cpp/design_docs/new-cluster-plan.txt | 86 ++++++++++++++++++----------- 2 files changed, 79 insertions(+), 39 deletions(-) diff --git a/qpid/cpp/design_docs/new-cluster-design.txt b/qpid/cpp/design_docs/new-cluster-design.txt index f2063524de..a162ea68ec 100644 --- a/qpid/cpp/design_docs/new-cluster-design.txt +++ b/qpid/cpp/design_docs/new-cluster-design.txt @@ -234,10 +234,10 @@ Note: - Updatee stalls clients until the update completes. (Note: May be possible to avoid updatee stall as well, needs thought) -** Cluster API +** Internal cluster interface -The new cluster API is similar to the MessageStore interface, but -provides more detail (message positions) qand some additional call +The new cluster interface is similar to the MessageStore interface, but +provides more detail (message positions) and some additional call points (e.g. acquire) The cluster interface captures these events: @@ -284,13 +284,33 @@ cluster uses the same messagea allocation threading/logic as a standalone broker, with a little extra asynchronous book-keeping. If a queue has multiple consumers connected to multiple brokers, the -new cluster time shares the queue which is less efficient than having -all consumers connected to the same broker. +new cluster time-shares the queue which is less efficient than having +all consumers on a queue connected to the same broker. ** Flow control New design does not queue up CPG delivered messages, they are processed immediately in the CPG deliver thread. This means that CPG's - flow control is sufficient for qpid. +flow control is sufficient for qpid. + +** Live upgrades + +Live upgrades refers to the ability to upgrade a cluster while it is +running, with no downtime. Each brokers in the cluster is shut down, +and then re-started with a new version of the broker code. + +To achieve this +- Cluster protocl XML file has a new element attached + to each method. This is the version at which the method was added. +- New versions can only add methods, existing methods cannot be changed. +- The cluster handshake for new members includes the protocol version + at each member. +- The cluster's version is the lowest version among its members. +- A newer broker can join and older cluster. When it does, it must restrict + itself to speaking the older version protocol. +- When the cluster version increases (because the lowest version member has left) + the remaining members may move up to the new version. + + * Design debates ** Active/active vs. active passive diff --git a/qpid/cpp/design_docs/new-cluster-plan.txt b/qpid/cpp/design_docs/new-cluster-plan.txt index 6fb9d3fd9f..32e3f710e7 100644 --- a/qpid/cpp/design_docs/new-cluster-plan.txt +++ b/qpid/cpp/design_docs/new-cluster-plan.txt @@ -25,8 +25,8 @@ Meaning of priorities: [#C] Can be addressed in a later release. The existig prototype is bare bones to do performance benchmarks: -- Implement publish and consumer locking protocol. -- Defered delivery and asynchronous completion of message till self-delivered. +- Implements publish and consumer locking protocol. +- Defered delivery and asynchronous completion of message. - Optimize the case all consumers are on the same node. - No new member updates, no failover updates, no transactions, no persistence etc. @@ -79,7 +79,6 @@ Multicast using fixed-size (64k) buffers, allow fragmetation of messages across * Open questions - ** TODO [#A] Queue sequence numbers vs. independant message IDs. SCHEDULED: <2011-10-07 Fri> @@ -94,18 +93,18 @@ However they imply additional map lookups that might hurt performance. - [ ] Prototype independent message IDs, check performance. * Outstanding Tasks -** TODO [#A] Defer and async complete wiring commands. +** TODO [#A] Defer and async completion of wiring commands. Testing requirement: Many tests assume wiring changes are visible across the cluster once the commad completes. -Name clashes: avoid race if same name queue/exchange declared on 2 -brokers simultaneously +Name clashes: need to avoid race if same name queue/exchange declared +on 2 brokers simultaneously -** TODO [#B] Management support +** TODO [#A] Passing all existing cluster tests. -- Replicate management methods that modify queues - e.g. move, purge. -- Report connections - local only or cluster-wide? +The new cluster should be a drop-in replacement for the old, so it +should be able to pass all the existing tests. ** TODO [#A] Update to new members joining. @@ -140,29 +139,28 @@ Exchange updatee: Updater remains active throughout. Updatee stalls clients until the update completes. -** TODO [#B] TX transaction support. -Extend broker::Cluster interface to capture transaction context and completion. -Running brokers exchange TX information. -New broker update includes TX information. +Updating queue/exchange/binding objects is via the same encode/decode +that is used by the store. Updatee to use recovery interfaces to +recover? -** TODO [#B] DTX transaction support. -Extend broker::Cluster interface to capture transaction context and completion. -Running brokers exchange DTX information. -New broker update includes DTX information. -** TODO [#B] Async completion of accept. -When this is fixed in the standalone broker, it should be fixed for cluster. +** TODO [#A] Failover updates to client. +Implement the amq.failover exchange to notify clients of membership. -** TODO [#B] Persistence support. -InitialStatus protoocl etc. to support persistent start-up (existing code) +** TODO [#B] Initial status protocol. +Handshake to give status of each broker member to new members joining. +Status includes +- persistent store state (clean, dirty) +- cluster protocol version. + +** TODO [#B] Persistent cluster support. +Initial status protoocl to support persistent start-up (see existing code) Only one broker recovers from store, update to others. Assign cluster IDs to messages recovered from store, don't replicate. See Queue::recover. -** TODO [#B] Handle other ways that messages can leave a queue. - -Ways other than a consumer that messages are taken off a queue. - +** TODO [#B] Management support +Replicate management methods that modify queues - e.g. move, purge. Target broker may not have all messages on other brokers for purge/destroy. - Queue::move() - need to wait for lock? Replicate? - Queue::get() - ??? @@ -171,6 +169,26 @@ Target broker may not have all messages on other brokers for purge/destroy. Need to add callpoints & mcast messages to replicate these? +** TODO [#B] TX transaction support. +Extend broker::Cluster interface to capture transaction context and completion. +Running brokers exchange TX information. +New broker update includes TX information. + + // FIXME aconway 2010-10-18: As things stand the cluster is not + // compatible with transactions + // - enqueues occur after routing is complete + // - no call to Cluster::enqueue, should be in Queue::process? + // - no transaction context associated with messages in the Cluster interface. + // - no call to Cluster::accept in Queue::dequeueCommitted + +** TODO [#B] DTX transaction support. +Extend broker::Cluster interface to capture transaction context and completion. +Running brokers exchange DTX information. +New broker update includes DTX information. + +** TODO [#B] Async completion of accept. +When this is fixed in the standalone broker, it should be fixed for cluster. + ** TODO [#B] Network partitions and quorum. Re-use existing implementation. @@ -209,25 +227,27 @@ The old cluster has workarounds in the broker code that can be removed. - [ ] drop security workarounds: cluster code now operates after message decoding. - [ ] drop connection tracking in cluster code. - [ ] simper inconsistent-error handling code, no need to stall. -** TODO [#C] Support for live updates. +** TODO [#C] Support for live upgrades. + Allow brokers in a running cluster to be replaced one-by-one with a new version. The old cluster protocol was unstable because any changes in broker state caused changes to the cluster protocol.The new design should be much more stable. -TODO: think about strategies for allowing live updates while extending -the cluster protocol - - - - +Points to implement: +- Brokers should ignore unknown controls (with a warning) rather than an error. +- Limit logging frequency for unknown control warnings. +- Add a version number at front of every CPG message. Determines how the + rest of the message is decoded. (allows for entirely new encodings e.g. AMQP 1.0) +- Protocol version XML element in cluster.xml, on each control. +- Initial status protocol to include protocol version number. ** TODO [#C] Support for AMQP 1.0. * Testing ** TODO [#A] Pass all existing cluster tests. -Requires [[Defer and async complete wiring commands.]] +Requires [[Defer and async completion of wiring commands.]] ** TODO [#A] New cluster tests. Stress tests & performance benchmarks focused on changes in new cluster: - concurrency by queues rather than connections. -- cgit v1.2.1