summaryrefslogtreecommitdiff
path: root/qpid/cpp/src/qpid/ha/README.h
blob: 9e8eda7e0d717a75dbad589e4c7d67e079ef78fc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
// This header file provides an overview of the ha module for doxygen.

namespace qpid {
namespace ha { // Auto-links for names in the ha namespace.

/** \defgroup ha-module
    \brief High Availability Module

This is a brief overview of how HA replication works, intended as a starting
point to understand the code. You should _first_ read the HA chapter of the "C++
broker book" at http://qpid.apache.org/documentation.html for a general
understanding of the active/passive, hot-standby strategy.

Class names without a prefix are in the qpid::ha namespace. See the
documentation of individual classes for more detail.

## Overview of HA classes. ##

HaBroker is the main container for HA code, it holds a Role which can be Primary
or Backup. It is responsible for implementing QMF management functions.

Backup implements the back-up Role. It holds a BrokerReplicator which subscribes
to management events on the primary e.g. create/delete queue/exchange/binding.
It uses a ReplicationTest to check if an object is configured for replication.
For replicated objects it creates corresponding broker::Queue, broker::Exchange
etc. on the local broker.

For replicated queues the BrokerReplicator also creates a QueueReplicator.  The
QueueReplicator subscribes to the primary queue with arguments to tell the
primary to use a ReplicatingSubscription.  See "Queue Replication" below for
more on the ReplicatingSubscription/QueueReplicator interaction.

Primary implements the primary Role. For each replicated queue it creates an
IdSetter to set replication IDs on messages delivered to the queue. Each backup
that connects is tracked by a RemoteBackup.

For each (queue,backup) pair, the Primary creates a ReplicatingSubscription and
a QueueGuard. The QueueGuard delays completion of messages delivered to the
queue until they are replicated to and acknowledged by the backup.

When the primary fails, one of the backups is promoted by an external cluster
resource manager. The other backups fail-over to the new primary. See "Queue
Fail Over" below.

## Locating the Primary ##

Clients connect to the cluster via one of:
- a virtual IP address that is assigned to the primary broker
- a list of addresses for all brokers. 

If the connection fails the client re-tries its address or list of addresses
till it re-connects.  Backup brokers have a ConnectionObserver that rejects
client connections by aborting the connection, causing the client to re-try.

## Message Identifiers ##

Replication IDs are sequence numbers assigned to a message *before* it is
enqueued. We use the IDs to:
- identify messages to dequeue on a backup.
- remove extra messages from backup on failover.
- avoid downloading messages already on backup on failover.

Aside: Originally the queue position number was used as the ID, but that was
insufficient for two reasons:
-  We sometimes need to identify messages that are not yet enqueued, for example messages in an open transaction.
- We don't want to require maintaining identical message sequences on every broker e.g. so transactions can be committed independently by each broker.

## At-least-once and delayed completion. ##

We guarantee no message loss, aka at-least-once delivery. Any message received
_and acknowledged_ by the primary will not be lost. Clients must keep sent
messages until they are acknowledged. In a failure the client re-connects and
re-sends all unacknowledged (or "in-doubt") messages. 

This ensures no messages are lost, but it is possible for in-doubt messages to
be duplicated if the broker did receive them but failed before sending the
acknowledgment. It is up to the application to handle duplicates correctly.

We implement the broker's side of the bargain using _delayed completion_.  When
the primary receives a message it does not complete (acknowledge) the message
until the message is replicated to and acknowledged by all the backups.  In the
case of persistent messages, that includes writing the message to persistent
store.

## Queue Replication ##

Life-cycle of a message on a replicated queue, on the primary:

Record: Received for primary queue.
- IdSetter sets a replication ID.

Enqueue: Published to primary queue.
- QueueGuard delays completion (increments completion count).

Deliver: Backup is ready to receive data.
- ReplicatingSubscription sends message to backup QueueReplicator

Acknowledge: Received from backup QueueReplicator.
- ReplicatingSubscription completes (decrements completion count).

Dequeue: Removed from queue
- QueueGuard completes if not already completed by ReplicatingSubscription.
- ReplicatingSubscription sends dequeued message ID to backup QueueReplicator.

There is a simple protocol between ReplicatingSubscription and QueueReplicator
(see Event sub-classes) so ReplicatingSubscription can send
- replication IDs of messages as they are replicated.
- replication IDs of messages that have been dequeued on the primary.

## Queue Failover ##

On failover, for each queue, the backup gets the QueueSnapshot of messages on
the queue and sends it with the subscription to the primary queue. The primary
responds with its own snapshot.

Both parties use this information to determine:

- Messages that are on the backup and don't need to be replicated again.
- Messages that are dequeued on the primary and can be discarded by the backup.
- Queue position for the QueueGuard to start delaying completion.
- Queue position for the ReplicatingSubscription to start replicating from.

After that handshake queue replication as described above begins.

## Standalone replication ##

The HA module supports ad-hoc replication between standalone brokers that are
not in a cluster, using the same basic queue replication techniques.

## Queue and Backup "ready" ##

A QueueGuard is set on the queue when a backup first subscribes or when a backup
fails over to a new primary. Messages before the _first guarded position_ cannot
be delayed because they may have already been acknowledged to clients.

A backup sends a set of acknowledged message ids when subscribing, messages that
are already on the backup and therefore safe.

A ReplicatingSubscription is _ready_ when all messages are safe or delayed.  We
know this is the case when both the following conditions hold:

- The ReplicatingSubscription has reached the position preceding the first guarded position AND
- All messages prior to the first guarded position are safe.


*/
/** \namespace qpid::ha \brief High Availability \ingroup ha-module */
}} // namespace qpid::ha