From 328786af8f28a792045ef7f7b08df7a7778d2179 Mon Sep 17 00:00:00 2001 From: Alan Conway Date: Mon, 31 Oct 2011 11:28:33 +0000 Subject: QPID-2920: Updates to new-cluster-plan.txt and new-cluster-design.txt git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1195415 13f79535-47bb-0310-9956-ffa450edef68 --- qpid/cpp/design_docs/new-cluster-design.txt | 3 + qpid/cpp/design_docs/new-cluster-plan.txt | 152 ++++++++++++++++------------ 2 files changed, 93 insertions(+), 62 deletions(-) diff --git a/qpid/cpp/design_docs/new-cluster-design.txt b/qpid/cpp/design_docs/new-cluster-design.txt index 936530a39a..63632f9265 100644 --- a/qpid/cpp/design_docs/new-cluster-design.txt +++ b/qpid/cpp/design_docs/new-cluster-design.txt @@ -305,6 +305,9 @@ To achieve this - New versions can only add methods, existing methods cannot be changed. - The cluster handshake for new members includes the protocol version at each member. +- Each cpg message starts with header: version,size. Allows new encodings later. +- Brokers ignore messages of higher version. + - The cluster's version is the lowest version among its members. - A newer broker can join and older cluster. When it does, it must restrict itself to speaking the older version protocol. diff --git a/qpid/cpp/design_docs/new-cluster-plan.txt b/qpid/cpp/design_docs/new-cluster-plan.txt index 626e443be7..a6689dc6bd 100644 --- a/qpid/cpp/design_docs/new-cluster-plan.txt +++ b/qpid/cpp/design_docs/new-cluster-plan.txt @@ -77,38 +77,31 @@ Implements multiple CPG groups for better concurrency. CLOSED: [2011-10-05 Wed 17:22] Multicast using fixed-size (64k) buffers, allow fragmetation of messages across buffers (frame by frame) -* Open questions +* Design Questions +** [[Queue sequence numbers vs. independant message IDs]] -** TODO [#A] Queue sequence numbers vs. independant message IDs. - SCHEDULED: <2011-10-07 Fri> +Current prototype uses queue+sequence number to identify message. This +is tricky for updating new members as the sequence numbers are only +known on delivery. -Current prototype uses queue sequence numbers to identify -message. This is tricky for updating new members as the sequence -numbers are only known on delivery. - -Independent message IDs that can be generated and sent with the message simplify -this and potentially allow performance benefits by relaxing total ordering. -However they imply additional map lookups that might hurt performance. +Independent message IDs that can be generated and sent as part of the +message simplify this and potentially allow performance benefits by +relaxing total ordering. However they require additional map lookups +that hurt performance. - [X] Prototype independent message IDs, check performance. Throughput worse by 30% in contented case, 10% in uncontended. -Sticking with queue sequence numbers. - -* Outstanding Tasks -** TODO [#A] Defer and async completion of wiring commands. +* Tasks to match existing cluster +** TODO [#A] Review old cluster code for more tasks. 1 +** TODO [#A] Defer and async completion of wiring commands. 5 Testing requirement: Many tests assume wiring changes are visible across the cluster once the commad completes. Name clashes: need to avoid race if same name queue/exchange declared on 2 brokers simultaneously -** TODO [#A] Passing all existing cluster tests. - -The new cluster should be a drop-in replacement for the old, so it -should be able to pass all the existing tests. - -** TODO [#A] Update to new members joining. +** TODO [#A] Update to new members joining. 10. Need to resolve [[Queue sequence numbers vs. independant message IDs]] first. - implicit sequence numbers are more tricky to replicate to new member. @@ -145,27 +138,33 @@ Updating queue/exchange/binding objects is via the same encode/decode that is used by the store. Updatee to use recovery interfaces to recover? -** TODO [#A] Failover updates to client. +** TODO [#A] Failover updates to client. 2 Implement the amq.failover exchange to notify clients of membership. +** TODO [#A] Passing all existing cluster tests. 5 -** TODO [#B] Initial status protocol. +The new cluster should be a drop-in replacement for the old, so it +should be able to pass all the existing tests. + +** TODO [#B] Initial status protocol. 3 Handshake to give status of each broker member to new members joining. Status includes -- persistent store state (clean, dirty) - cluster protocol version. +- persistent store state (clean, dirty) +- make it extensible, so additional state can be added in new protocols -** TODO [#B] Replace boost::hash with our own hash function. +** TODO [#B] Persistent cluster startup. 4 + +Based on existing code: +- Exchange dirty/clean exchanged in initial status. +- Only one broker recovers from store, others update. +** TODO [#B] Replace boost::hash with our own hash function. 1 The hash function is effectively part of the interface so we need to be sure it doesn't change underneath us. -** TODO [#B] Persistent cluster support. -Initial status protoocl to support persistent start-up (see existing code) - -Only one broker recovers from store, update to others. - -Assign cluster IDs to messages recovered from store, don't replicate. See Queue::recover. +** TODO [#B] Management model. 3 +Alerts for inconsistent message loss. -** TODO [#B] Management support +** TODO [#B] Management methods that modify queues. 5 Replicate management methods that modify queues - e.g. move, purge. Target broker may not have all messages on other brokers for purge/destroy. - Queue::move() - need to wait for lock? Replicate? @@ -174,8 +173,7 @@ Target broker may not have all messages on other brokers for purge/destroy. - Queue::destroy() - messages to alternate exchange on all brokers.? Need to add callpoints & mcast messages to replicate these? - -** TODO [#B] TX transaction support. +** TODO [#B] TX transaction support. 5 Extend broker::Cluster interface to capture transaction context and completion. Running brokers exchange TX information. New broker update includes TX information. @@ -187,54 +185,67 @@ New broker update includes TX information. // - no transaction context associated with messages in the Cluster interface. // - no call to Cluster::accept in Queue::dequeueCommitted -** TODO [#B] DTX transaction support. +Injecting holes into a queue: +- Multicast a 'non-message' that just consumes one queue position. +- Used to reserve a message ID (position) for a non-commited message. +- Also could allow non-replicated messages on a replicated queue if required. + +** TODO [#B] DTX transaction support. 5 Extend broker::Cluster interface to capture transaction context and completion. Running brokers exchange DTX information. New broker update includes DTX information. -** TODO [#B] Async completion of accept. +** TODO [#B] Async completion of accept. 4 When this is fixed in the standalone broker, it should be fixed for cluster. -** TODO [#B] Network partitions and quorum. +** TODO [#B] Network partitions and quorum. 2 Re-use existing implementation. -** TODO [#B] Review error handling, put in a consitent model. +** TODO [#B] Review error handling, put in a consitent model. 4. - [ ] Review all asserts, for possible throw. - [ ] Decide on fatal vs. non-fatal errors. -** TODO [#B] Implement inconsistent error handling policy. -What to do if a message is enqueued sucessfully on the local broker, -but fails on one or more backups - e.g. due to store limits? -- we have more flexibility, we don't *have* to crash -- but we've loste some of our redundancy guarantee, how should we inform client? +** TODO [#B] Implement inconsistent error handling policy. 5 +What to do if a message is enqueued sucessfully on some broker(s), +but fails on other(s) - e.g. due to store limits? +- fail on local broker = possible message loss. +- fail on non-local broker = possible duplication. -** TODO [#C] Allow non-replicated exchanges, queues. +We have more flexibility now, we don't *have* to crash +- but we've lost some of our redundancy guarantee, how to inform user? -Set qpid.replicate=false in declare arguments, set flag on Exchange, Queue objects. -- save replicated status to store. -- support in management tools. -Replicated queue: replicate all messages. -Replicated exchange: replicate bindings to replicated queues only. +Options to respond to inconsistent error: +- stop broker +- reset broker (exec a new qpidd) +- reset queue +- log critical +- send management event -Configurable default? Defaults to true. +Most important is to inform of the risk of message loss. +Current favourite: reset queue+log critical+ management event. +Configurable choices? -** TODO [#C] Refactoring of common concerns. +** TODO [#C] Allow non-replicated exchanges, queues. 5 -There are a bunch of things that act as "Queue observers" with intercept -points in similar places. -- QueuePolicy -- QueuedEvents (async replication) -- MessageStore -- Cluster +3 levels set in declare arguments: +- qpid.replicate=no - nothing is replicated. +- qpid.replicate=wiring - queues/exchanges are replicated but not messages. +- qpid.replicate=yes - queues exchanges and messages are replicated. -Look for ways to capitalize on the similarity & simplify the code. +Wiring use case: it's OK to lose some messages (up to the max depth of +the queue) but the queue/exchange structure must be highly available +so clients can resume communication after fail over. -In particular QueuedEvents (async replication) strongly resembles -cluster replication, but over TCP rather than multicast. +Configurable default? Default same as old cluster? -** TODO [#C] Handling immediate messages in a cluster -Include remote consumers in descision to deliver an immediate message? -** TODO [#C] Remove old cluster hacks and workarounds +Need to +- save replicated status to stored (in arguments). +- support in management tools. + +** TODO [#C] Handling immediate messages in a cluster. 2 +Include remote consumers in descision to deliver an immediate message. +* Improvements over existing cluster +** TODO [#C] Remove old cluster hacks and workarounds. The old cluster has workarounds in the broker code that can be removed. - [ ] drop code to replicate management model. - [ ] drop timer workarounds for TTL, management, heartbeats. @@ -265,6 +276,23 @@ same backward compatibility strategy as the store. This allows for adding new elements to the end of structures but not changing or removing new elements. +NOTE: Any change to the association of CPG group names and queues will +break compatibility. How to work around this? + +** TODO [#C] Refactoring of common concerns. + +There are a bunch of things that act as "Queue observers" with intercept +points in similar places. +- QueuePolicy +- QueuedEvents (async replication) +- MessageStore +- Cluster + +Look for ways to capitalize on the similarity & simplify the code. + +In particular QueuedEvents (async replication) strongly resembles +cluster replication, but over TCP rather than multicast. + ** TODO [#C] Support for AMQP 1.0. * Testing -- cgit v1.2.1