summaryrefslogtreecommitdiff
path: root/qpid/cpp/src/qpid/ha/Primary.cpp
Commit message (Collapse)AuthorAgeFilesLines
* QPID-7207: remove cpp and python subdirs from svn trunk, they have migrated ↵Robert Gemmell2016-07-051-439/+0
| | | | | | to their own git repositories git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1751566 13f79535-47bb-0310-9956-ffa450edef68
* QPID-7326: Memory bloat on HA primary brokerAlan Conway2016-06-271-7/+0
| | | | | | Removed left-over code that was keeping queues in an unused map. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1750417 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5855 - Simplified HA transaction logic.Alan Conway2015-09-031-53/+1
| | | | | | | | | | | | | | | | | | | | | | | Removed complex and incorrect HA+TX logic, reverted to the following limitation: You can use transactions in a HA cluster, but there are limitations on the transactional guarantees. Transactions function normally with the *primary* broker but replication to the backups is not coverted by the atomic guarantee. The following situations are all safe: - Client rolls back a transaction. - Client successfully commits a transaction. - Primary fails during a transaction *before* the client sends a commit. - Transaction contains only one message. The problem case is when all of the following occur: - transaction contains multiple actions (enqueues or dequeues) - primary fails between client sending commit and receiving commit-complete. In this case it is possible that only part of the transaction was replicated to the backups, so on fail-over partial transaction results may be visible. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1701109 13f79535-47bb-0310-9956-ffa450edef68
* QPID-6278: HA broker abort in TXN soak testAlan Conway2014-12-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The crash appears to be a race condition in async completion exposed by the HA TX code code as follows: 1. Message received and placed on tx-replication queue, completion delayed till backups ack. Completion count goes up for each backup then down as each backup acks. 2. Prepare received, message placed on primary's local persistent queue. Completion count goes up one then down one for local store completion (null store in this case). The race is something like this: - last backup ack arrives (on backup IO thread) and drops completion count to 0. - prepare arrives (on client thread) null store bumps count to 1 and immediately drops to 0. - both threads try to invoke the completion callback, one deletes it while the other is still invoking. The old completion logic assumed that only one thread can see the atomic counter go to 0. It does not handle the count going to 0 in one thread and concurrently being increased and decreased back to 0 in another. This case is introduced by HA transactions because the same message is put onto a tx-replication queue and then put again onto another persistent local queue, so there are two cycles of completion. The new logic fixes this only one call to completion callback is possible in all cases. Also fixed missing lock in ha/Primary.cpp. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1646618 13f79535-47bb-0310-9956-ffa450edef68
* QPID-6020: HA logging improvements - log prefix with status and ID.Alan Conway2014-08-211-2/+2
| | | | | | | Fix log prefix for RemoteBackup and PrimaryTxObserver objects. Use short UUIDs for showing UUID sets in logs. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1619581 13f79535-47bb-0310-9956-ffa450edef68
* QPID-6020: HA logging improvements - log prefix with status and ID.Alan Conway2014-08-191-23/+41
| | | | | | | | | | | | | | | | | Include broker status and ID in (almost) all logging messages. Makes it much easier to track broker state and interactions. Sundry other logging improvements including: - Demote noisy messages to trace - connections from rgmanager status checks, searching for primary. - Rationalise start-up messages. - Improved queue state detail replicating subscription and queue guard initialization. - Fail to prepare TX is error. - Collect all primary TX errors into one. - Fix status of catchup brokers in primary membership for logging. - Add process name/PID info to client connection messages. - Various minor message tweaks. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1619003 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5974: HA qpid-txtest2 can bring down a cluster (JERR_MAP_LOCKED))Alan Conway2014-08-081-2/+12
| | | | | | | | | | | Problem: transactional dequeues can be sent via two paths as part of the transaction and via the normal queue replication. If journal is involved this can result result in store errors if the normal replication path attempts to dequeue before the transaction. Solution: this is also the case for enqueues, and we already have code in place to skip replication of tx enqueues via the normal route. Copied the same logic for dequeues. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1616703 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5666: HA fails with resource-limit-exceeded: Exceeded replicated queue ↵Alan Conway2014-04-071-1/+1
| | | | | | | | | | | | limit This is regression introduced in r1561206: CommitDate: Fri Jan 24 21:54:59 2014 +0000 QPID-5513: HA backup fails if number of replicated queues exceeds number of channels. Fixed by the current commit. PrimaryQueueLimits was not taking account of queues already on the broker prior to promotion. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1585507 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5528: HA Clean up error messages around rolled-back transactions.Alan Conway2014-02-031-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A simple transaction test on a 3 node cluster generates a lot of errors and rollback messages in the broker logs even though the test code never rolls back a transaction. E.g. qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10 The errors are caused by queues being deleted while backup brokers are using them. This happens a lot in the transaction test because a transactional session must create a new transaction when the previous one closes. When the session closes the open transaction is rolled back automatically. Thus there is almost always an empty transaction that is created then immediately rolled back at the end of the session. Backup brokers may still be in the process of subscribing to the transaction's replication queue at this point, causing (harmlesss) errors. This commit takes the following steps to clean up the unwanted error and rollback messages: HA TX messages cleaned up: - Remove log messages about rolling back/destroying empty transactions. - Remove misleading "backup disconnected" message for cancelled transactions. - Remove spurious warning about ignored unreplicated dequeues. - Include TxReplicator destroy in QueueReplicator mutex, idempotence check before destroy. Allow HA to suppress/modify broker exception logging: - Move broker exception logging into ErrorListener - Every SessionHandler has DefaultErrorListener that does the same logging as before. - Added SessionHandlerObserver to allow plugins to change the error listener. - HA plugin set ErrorListeners to log harmless exceptions as HA debug messages. Unrelated cleanup: - Broker now logs "incoming execution exceptions" as debug messages rather than ignoring. - Exception prefixes: don't add the prefix if already present. The exception test above should now pass without errors or rollback messages in the logs. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1564010 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: Minor rationalization of log statement priorities.Alan Conway2014-01-271-7/+5
| | | | | | | | | Demote "backup of queue x connected to y" from info to debug. Tighten up redundant 'notice' messages around promotion of primary. Promote 'DTX not implemented' to warning Misc. other minor adjustments. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1561833 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: Minor refactor to improve code safety: calling shared_from_this on ↵Alan Conway2014-01-271-3/+2
| | | | | | | | | | | | | | creation. Previous anti-pattern: Classes need to call shared_from_this during creation, but can't call it in the ctor so had a separate initiailize function that the user was required to call immediately after the constructor. Possible for user to forget. Improved pattern: Introduce public static create() functions to call constructor and initialize, make constructor and initialize private. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1561828 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5513: HA backup fails if number of replicated queues exceeds number of ↵Alan Conway2014-01-241-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | channels. The problem: - create cluster of 2 brokers. - create more than 32768 queues (exceeds number of channels on a connection) - backup exits with critical error but - client creating queues receives no error, primary continues with unreplicated queue. The solution: Primary raises an error to the client if it attempts to create queues in excess of the channel limit. The queue is not created on primary or backup, primary and backup continue as normal. In addition: raised the channel limit from 32k to 64k. There was no reason for the smaller limit. See discussion: http://qpid.2158936.n2.nabble.com/CHANNEL-MAX-and-CHANNEL-HIGH-BIT-question-tp7603121p7603138.html New unit test to reproduce the issue, must create > 64k queues. Other minor improvements: - brokertest framework doesn't override --log options in the arguments. - increased default heartbeat in test framework for tests that have busy brokers. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1561206 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5489: Uuid code improvementsAndrew Stitcher2014-01-171-1/+1
| | | | | | | | | | | | | - Don't use uuid_compare() as it will get the wrong version of the function under FreeBSD which has a uuid library build into libc with different function signatures from libuuid but some overlapping names. - Reorganise the uuid code to limit the used external symbols to uuid_generate(), uuid_parse(), uuid_unparse() - Minimise the framing::Uuid code so that it is a simple wrapper around types::Uuid - Use uuid_generate() as the symbol to search in CMake (uuid_compare() isn't used in qpid anymore). git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1559017 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5430: HA primary broker does not go active if there are no replicated ↵Alan Conway2013-12-181-0/+1
| | | | | | | | | | queues. Primary::opened was not checking if the primary was ready after a knonw backup reconnected, only when a replicated queue became ready. Thus if there were no replicated queues the primary never became ready. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1552025 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5421: HA replication error in stand-alone replicationAlan Conway2013-12-131-12/+0
| | | | | | | | | | | | | | | | | There were replication errors because with stand-alone replication an IdSetter was not set on the original queue until queue replication was set up. Any messages on the queue *before* replication was setup had 0 replication IDs. When one of those messages was dequeued on the source queue, an incorrect message was dequeued on the replica queue. The fix is to add an IdSetter to every queue when replication is enabled. The unit test ha_tests.ReplicationTests.test_standalone_queue_replica has been updated to test for this issue. This commit also has some general tidy-up work around IdSetter and QueueSnapshot. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1550819 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5404: HA broker message duplication when deleting a queue with an ↵Alan Conway2013-12-101-0/+9
| | | | | | | | | | | | | | | alt-exchange The old code ran auto-delete on the backup on disconnect. This reroutes messages onto the alt queue with incorrect replication IDs from the original queue, and then replicates duplicate rerouted messages from the primary. The solution is to process auto deletes on the new primary and let them replicate to the backups. - Move all auto-delete logic into QueueReplicator - Primary process auto-delete on QueueReplicator as part of promotion. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1549844 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5139: HA transactions block a thread, can deadlock the brokerAlan Conway2013-10-291-13/+17
| | | | | | | | | | | | | | | | | | | | | PrimaryTxObserver::prepare used to block pending responses from each backup. With concurrent transactions this can deadlock the broker: once all worker threads are blocked in prepare, responses from backups cannot be received. This commit generalizes the async completion mechanism for messages to allow async completion of arbitrary commands. It leaves the special-case code for messages undisturbed but adds a second path (starting from SessionState::handleCommand) for async completion of other commands. In particular it implements tx.commit to allow async completion. TxBuffer is now an AsyncCompletion and commitLocal() is split into - startCommit() called by SemanticState::commit() - endCommit() called when the commit command completes TxAccept no longer holds pre-computed ranges, compute fresh each time. - Avoid range iterators going out of date during a delayed commit. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1536754 13f79535-47bb-0310-9956-ffa450edef68
* QPID-5139: Make TxBuffer inherit from AsyncCompletion.Alan Conway2013-10-291-4/+6
| | | | | | Switched from shared_ptr to intrusive_ptr for TxBuffer and DtxBuffer. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1536752 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: HA Primary should not log messages for unreplicated queues and ↵Alan Conway2013-09-181-13/+17
| | | | | | exchanges. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1524570 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: HA clean up transaction artifacts at end of TX.Alan Conway2013-08-301-5/+16
| | | | | | | | | | - Backups delete transactions on failover. - TxReplicator cancel subscriptions when transaction is finished. - TxReplicator rollback if destroyed prematurely. - Handle special case of no backups for a tx. - ha_tests.py: new and modified tests to cover the new functionality. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1518982 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: HA Handle brokers joining and leaving during a transaction.Alan Conway2013-08-051-2/+1
| | | | | | | | | | | During a transaction: - A broker leaving aborts the transaction. - A broker joining does not participate in the transaction - but does receive the results of the TX via normal replication. Clean up tx-queues when the transaction completes. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1510678 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: Remove use of boost::make_shared, not availble on some older versions.Alan Conway2013-08-051-2/+0
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1510596 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: HA logging fixes.Alan Conway2013-08-011-2/+2
| | | | | | | - Removed "FIXME" log statements inadvertently left in code. - Changed some trace statements to debug to faclilitate debugging. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1509428 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: HA TX transactions, blocking wait for prepareAlan Conway2013-08-011-2/+6
| | | | | | | | | | | Backups send prepare messages to primary, primary delays completion of prepare till all are prepared (or there is a failure). This is NOT the production solution - blocking could cause a deadlock. We need to introduce asynchronous completion of prepare without blocking. This interim solution allows testing on other aspects of TX support. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1509424 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: HA TX transactions: basic replication.Alan Conway2013-08-011-2/+39
| | | | | | | | | | | | | | | On primary a PrimaryTxObserver observes a transaction's TxBuffer and generates transaction events on a tx-replication-queue. On the backup a TxReplicator receives the events and constructs a TxBuffer equivalent to the one in the primary. Unfinished: - Primary does not wait for backups to prepare() before committing. - All connected backups are assumed to be in the transaction, there are race conditions around brokers joining/leavinv where this assumption is invalid. - Need more tests. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1509423 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: HA get rid of Primary::get() singleton.Alan Conway2013-08-011-4/+0
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1509422 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4327: Renamed ConfigurationObserver as BrokerObserver.Alan Conway2013-08-011-7/+7
| | | | | | | | This class really was intended as a observer for broker-level events which includes configuration but may in future include other non-configuration events such as transactions. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1509420 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4944: HA Sporadic failure - logging improvements used to investigate.Alan Conway2013-07-041-1/+1
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1499788 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4348: HA Use independent sequence numbers for identifying messagesAlan Conway2013-06-171-70/+133
| | | | | | | | | | | | | | | Previously HA code used queue sequence numbers to identify messasges. This assumes that message sequence is identical on primary and backup. Implementing new features (for example transactions) requires that we tolerate ordering differences between primary and backups. This patch introduces a new, queue-scoped HA sequence number managed by the HA plugin. The HA ID is set *before* the message is enqueued and assigned a queue sequence number. This means it is possible to identify messages before they are enqueued, e.g. messages in an open transaction. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1493771 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4748: Consistent handling of durations in broker configuration, ↵Alan Conway2013-04-191-2/+1
| | | | | | | | | | | | | | | | | | | | allowing sub-second intervals. Provides string conversion for sys::Duration, allowing intervals to be expressed like this: 10.5 - value in seconds, backward compatible. 10.5s - value in seconds 10.5ms - value in milliseconds 10.5us - value in microseconds 10.5ns - value in nanoseconds Converted the folllowing broker options to Duration: mgmtPubInterval, queueCleanInterval, linkMaintenanceInterval, linkHeartbeatInterval Did not convert: maxNegotiateTime. This is expressed in milliseconds so it would not be backward compatible to make it a Duration. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1469661 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4555: HA Primary sets explicit qpid.replicate in Queue and Exchange ↵Alan Conway2013-02-071-15/+25
| | | | | | | | | | | | | | | arguments. Previously both Primary and Backup would calculate the qpid.replicate value independently, assuming the result would be the same. In the case of exclusive queues, the exclusivity can change over time so its possible that primary and backup won't agree. Now only Primary does the calculation with exclusive, auto-delete etc. and puts an explicity qpid.replicate in the queue or event arguments. Backup uses the value set by primary. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1443678 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4555: HA Check for backup ready when new backup joins.Alan Conway2013-02-071-3/+6
| | | | | | | | | This test was missing so if there were no backed-up queues the backup would never be marked ready. It was workig because of a separte bug: auto-delete/exclusive queues were being replicated incorrectly so there were always replicated queues (temp queues created by qpid-ha) git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1443677 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: HA refactor, re-organise code for clarity and thread safety.Alan Conway2013-01-231-6/+20
| | | | | | | | | | | | | | | | Introduce Role base class. Primary and Backup are now subclasses of Role. Moved backup/primary specific code from HaBroker to the Backup and Primary roles. HaBroker always holds a single Role, via a thread-safe RoleHolder. RoleHolder ensures atomic transition between roles: the old role is deleted before the new role is created. Membership is now independently thread safe, breaking the potential deadlock between HaBroker and the Roles. Logging improvements and other minor cleanup. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1437771 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4516: Sporadic failure in ha_tests test_failover_send_receiveAlan Conway2013-01-111-0/+4
| | | | | | | | | | | | | | Several fixes were required in the code to correct this problem: - Missing break statement in switch. - Remove unused function HaBroker::resetMembership - Abort connection of timed-out backups so they can attempt to reconnect. - New primary resets membership before allowing backups to connect. - Test for and ignore double-promotion. - HaBroker: dynamic logPrefix() shows status. Made status atomic for efficient access for log messages. - Update primary status in membership. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1432273 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4516: Sporadic failure in ha_tests test_failover_send_receiveAlan Conway2012-12-191-6/+3
| | | | | | | | | | | | | | | | Sporadic failures in ha_tests.py test_failover_send_receive. Two types of failure observed: - core dumps in a debug build at a C++ assertion - python test assertion like: AssertionError: Broker<137:cluster0-0.log qpidd-157 :35273> expected='ready', actual='catchup' The following fixes were made to correct the problem: - Missing break statement in switch. - Remove unused function HaBroker::resetMembership - Abort connection of timed-out backups so they can attempt to reconnect. - New primary resets membership before allowing backups to connect. - Remove incorrect demotion ready->catchup on timeout. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1424169 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4428: HA add UUID tag to avoid using an out of date queue/exchange.Alan Conway2012-11-141-2/+27
| | | | | | | | | | | | | | | | | Imagine a cluster with primary A and backups B and C. A queue Q is created on A and replicated to B, C. Now A dies and B takes over as primary. Before C can connect to B, a client destroys Q and creates a new queue with the same name. When B connects it sees Q and incorrectly assumes it is the same Q that it has already replicated. Now C has an inconsistent replica of Q. The fix is to tag queues/exchanges with a UUID so a backup can tell if a queue is not the same as the one it has already replicated, even if the names are the same. This all also applies to exchanges. - Minor imrovements to printing UUIDs in a FieldTable. - Fix comparison of void Variants, added operator != git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1409241 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4360: Fix test bug: Non-ready HA broker can be incorrectly promoted to ↵Alan Conway2012-10-091-2/+2
| | | | | | | | | | | primary. Test test_delete_missing_response was failing with "cluster active, cannot promote". - Fixed test bug: "fake" primary triggered "cannot promote". - Backup: always create QueueReplicator if not already existing. - Terminology change: "initial" queues -> "catch-up" queues. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1396244 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: HA minor log message improvement.Alan Conway2012-09-141-2/+2
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1384886 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4223: HA Completion isn't sent when queue that has acquired but ↵Alan Conway2012-09-141-10/+11
| | | | | | | | | | | unacknowledged messages is deleted - Extended ha_test.py test_failover_send_receive to kill backup as well as primary - QueueRegistry::destroy was not calling observer. - Primary removes disconnected brokers backups and expectedBackups - Primary calls checkReady in all cases where broker is removed from expectedBackups git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1384882 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: HA improved logging messages.Alan Conway2012-09-141-1/+1
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1384881 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4178: broker refactoringGordon Sim2012-08-101-1/+1
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1371676 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: HA only expect READY backups in recovery.Alan Conway2012-08-071-1/+1
| | | | | | | | | | | Don't wait for un-ready backups to become ready in recover, they weren't ready before the failure so don't wait for them to become ready after a failure. Waiting for READY backups gives us equivalent safety to before the failure. Minor test & log improvements. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1370325 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4191: HA removing self address breaks if a VIP is used.Alan Conway2012-08-061-6/+6
| | | | | | | | | | | | | | | | | | | Pre this patch the HA broker removed its own address from the set of cluster addresses to form the set of failover addresses. The goal was avoid useless self-connection attempts. However this was broken with a Virtual IP address where a single address is used for the entire cluster. The remove-self is not essential, self-connection attempts are prevented elsewhere. Backup brokers will be prevented from connecting to self by the same connection-observer as normal clients, and this patch addes self-connection checks ins This patch - removes the code to remove self-addresses - adds self-connection checks in ConnectionObserver - adds & reorders some log statements & comments for greater clarity. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1370002 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4176: HA Error handlingAlan Conway2012-07-311-18/+24
| | | | | | | Additional error handling and logging for ConnectionObserver, Primary and ReplicatingSubscription. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1367649 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4175: HA code rationalize loggingAlan Conway2012-07-301-3/+3
| | | | | | | | | | | Clean up and rationalize log messages and levels. notice: Major broker-level events: connecting, failing-over, primary active, backup ready. info: Major queue level events: subscriptions ready, replicators created etc. debug: Detailed replication events: accept/reject conections, details of queue replication protocol. trace: dumping raw QMF messages git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1367231 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4159: HA missing messages in failover test.Alan Conway2012-07-261-1/+1
| | | | | | | | | | | | Fix test_failover_send_receive showing missing messages. With this fix, ran with -DDURATION=2 overnight with no failures. - Primary, RemoteBackup: Only report "ready" once per remote backup. - HaBroker: Put membership updates under mutex. - ReplicatingSubscription: Check for backup missing messages at the front. - ha_tests.py: Added assertion to test_priority_ring, verify primary queue as expected. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1366179 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: Fix typos, update comments, update log messages.Alan Conway2012-07-231-3/+7
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1364806 13f79535-47bb-0310-9956-ffa450edef68
* NO-JIRA: HA Minor logging improvements.Alan Conway2012-07-181-3/+2
| | | | git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1363047 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4148: HA Not setting initial queues for new RemoteBackups Alan Conway2012-07-181-5/+10
| | | | | | | | Fix bug introduced by r1362584: "QPID-4144 HA broker deadlocks on broker::QueueRegistry lock and ha::Primary lock" Stopped setting initial queues on new (i.e. not expected) RemoteBackups. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1363014 13f79535-47bb-0310-9956-ffa450edef68
* QPID-4145: HA Minor fixes to recovery Alan Conway2012-07-171-4/+17
| | | | | | | | | - Demote timed-out backups from ready to catch-up. - Don't cancel connected backups on timeout, only disconnected ones. - Don't allow promotion of a catch-up broker. - Minor logging improvement. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1362658 13f79535-47bb-0310-9956-ffa450edef68