qpid/doc/book/src/Active-Passive-Cluster.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361

<?xml version="1.0" encoding="utf-8"?>
<!--

Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.

-->

<section id="chap-Messaging_User_Guide-Active_Passive_Cluster">

  <title>Active-passive Messaging Clusters (Preview)</title>

  <section>
    <title>Overview</title>
    <para>
      This release provides a preview of a new module for High Availability (HA). The new
      module is not yet complete or ready for production use, it being made available so
      that users can experiment with the new approach and provide feedback early in the
      development process.  Feedback should go to <ulink
      url="mailto:user@qpid.apache.org">user@qpid.apache.org</ulink>.
    </para>
    <para>
      The old cluster module takes an <firstterm>active-active</firstterm> approach,
      i.e. all the brokers in a cluster are able to handle client requests
      simultaneously. The new HA module takes an <firstterm>active-passive</firstterm>,
      <firstterm>hot-standby</firstterm> approach.
    </para>
    <para>
      In an active-passive cluster, only one broker, known as the
      <firstterm>primary</firstterm>, is active and serving clients at a time. The other
      brokers are standing by as <firstterm>backups</firstterm>. Changes on the primary
      are immediately replicated to all the backups so they are always up-to-date or
      "hot".  If the primary fails, one of the backups is promoted to be the new
      primary. Clients fail-over to the new primary automatically. If there are multiple
      backups, the backups also fail-over to become backups of the new primary.
    </para>
    <para>
      The new approach depends on an external <firstterm>cluster resource
      manager</firstterm> to detect failure of the primary and choose the new primary. The
      first supported resource manager will be <ulink
      url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink>, but it will
      be possible to add integration with other resource managers in the future. The
      preview version is not integrated with any resource manager, you can use the
      <command>qpid-ha</command> tool to simulate the actions of a resource manager or do
      your own integration.
    </para>
    <section>
      <title>Why the new approach?</title>
      The new active-passive approach has several advantages compared to the
      existing active-active cluster module.
      <itemizedlist>
	<listitem>
	  It does not depend directly on openais or corosync. It does not use multicast
	  which simplifies deployment.
	</listitem>
	<listitem>
	  It is more portable: in environments that don't support corosync, it can be
	  integrated with a resource manager available in that environment.
	</listitem>
	<listitem>
	  Replication to a <firstterm>disaster recovery</firstterm> site can be handled as
	  simply another node in the cluster, it does not require a separate replication
	  mechanism.
	</listitem>
	<listitem>
	  It can take advantage of features provided by the resource manager, for example
	  virtual IP addresses.
	</listitem>
	<listitem>
	  Improved performance and scalability due to better use of multiple CPU s
	</listitem>
      </itemizedlist>
    </section>
    <section>

      <title>Limitations</title>

      <para>
	There are a number of known limitations in the current preview implementation. These
	will be fixed in the production versions.
      </para>

      <itemizedlist>
	<listitem>
	  Transactional changes to queue state are not replicated atomically. If the
	  primary crashes during a transaction, it is possible that the backup could
	  contain only part of the changes introduced by a transaction.
	</listitem>
	<listitem>
	  During a fail-over one backup is promoted to primary and any other backups switch to
	  the new primary. Messages sent to the new primary before all the backups have
	  switched could be lost if the new primary itself fails before all the backups have
	  switched.
	</listitem>
	<listitem>
	  When used with a persistent store: if the entire cluster fails, there are no tools
	  to help identify the most recent store.
	</listitem>
	<listitem>
	  Acknowledgments are confirmed to clients before the message has been dequeued
	  from replicas or indeed from the local store if that is asynchronous.
	</listitem>
	<listitem>
	  A persistent broker must have its store erased before joining an existing cluster.
	  In the production version a persistent broker will be able to load its store and
	  avoid downloading messages that are in the store from the primary.
	</listitem>
	<listitem>
	  Configuration changes (creating or deleting queues, exchanges and bindings) are
	  replicated asynchronously. Management tools used to make changes will consider the
	  change complete when it is complete on the primary, it may not yet be replicated
	  to all the backups.
	</listitem>
	<listitem>
	  Deletions made immediately after a failure (before all the backups are ready) may
	  be lost on a backup. Queues, exchange or bindings that were deleted on the primary could
	  re-appear if that backup is promoted to primary on a subsequent failure.
	</listitem>
	<listitem>
	  Better control is needed over which queues/exchanges are replicated and which are not.
	</listitem>
	<listitem>
	  There are some known issues affecting performance, both the throughput of
	  replication and the time taken for backups to fail-over. Performance will improve
	  in the production version.
	</listitem>
	<listitem>
	  Federated links from the primary will be lost in fail over, they will not be
	  re-connected on the new primary. Federation links to the primary can fail over.
	</listitem>
	<listitem>
	  Only plain FIFO queues can be replicated. LVQ and ring queues are not yet supported.
	</listitem>
      </itemizedlist>
    </section>
  </section>


  <section>
    <title>Configuring the Brokers</title>
    <para>
      The broker must load the <filename>ha</filename> module, it is loaded by default
      when you start a broker. The following broker options are available for the HA module.
    </para>
    <table frame="all" id="ha-broker-options">
      <title>Options for High Availability Messaging Cluster</title>
      <tgroup align="left" cols="2" colsep="1" rowsep="1">
	<colspec colname="c1" colwidth="1*"/>
	<colspec colname="c2" colwidth="4*"/>
	<thead>
	  <row>
	    <entry align="center" nameend="c2" namest="c1">
	      Options for High Availability Messaging Cluster
	    </entry>
	  </row>
	</thead>
	<tbody>
	  <row>
	    <entry>
	      <command>--ha-cluster <replaceable>yes|no</replaceable></command>
	    </entry>
	    <entry>
	      Set to "yes" to have the broker join a cluster.
	    </entry>
	  </row>
	  <row>
	    <entry>
	      <command>--ha-brokers <replaceable>URL</replaceable></command>
	    </entry>
	    <entry>
	      URL use by brokers to connect to each other. The URL lists the addresses of
	      all the brokers in the cluster
	      <footnote>
		<para>
		  If the resource manager supports virtual IP addresses then the URL can
		  contain just the single virtual IP.
		</para>
	      </footnote>
	      in the following form:
	      <programlisting>
		url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
		addr = tcp_addr / rmda_addr / ssl_addr / ...
		tcp_addr = ["tcp:"] host [":" port]
		rdma_addr = "rdma:" host [":" port]
		ssl_addr = "ssl:" host [":" port]'
	      </programlisting>
	    </entry>
	  </row>
	  <row>
	    <entry> <command>--ha-public-brokers <replaceable>URL</replaceable></command> </entry>
	    <entry>
	      URL used by clients to connect to the brokers in the same format as
	      <command>--ha-brokers</command> above. Use this option if you want client
	      traffic on a different network from broker replication traffic. If this
	      option is not set, clients will use the same URL as brokers.
	    </entry>
	  </row>
	  <row>
	    <entry>
	      <para><command>--ha-username <replaceable>USER</replaceable></command></para>
	      <para><command>--ha-password <replaceable>PASS</replaceable></command></para>
	      <para><command>--ha-mechanism <replaceable>MECH</replaceable></command></para>
	    </entry>
	    <entry>
	      Brokers use <replaceable>USER</replaceable>,
	      <replaceable>PASS</replaceable>, <replaceable>MECH</replaceable> to
	      authenticate when connecting to each other.
	    </entry>
	  </row>
	</tbody>
      </tgroup>
    </table>
    <para>
      To configure a cluster you must set at least <command>ha-cluster</command> and <command>ha-brokers</command>
    </para>
  </section>


  <section>
    <title>Creating replicated queues and exchanges</title>
    <para>
      To create a replicated queue or exchange, pass the argument
      <command>qpid.replicate</command> when creating the queue or exchange. It should
      have one of the following three values:
      <itemizedlist>
	<listitem>
	  <firstterm>all</firstterm>: Replicate the queue or exchange, messages and bindings.
	</listitem>
	<listitem>
	  <firstterm>configuration</firstterm>: Replicate the existence of the queue or
	  exchange and bindings but don't replicate messages.
	</listitem>
	<listitem>
	  <firstterm>none</firstterm>: Don't replicate, this is the default.
	</listitem>
      </itemizedlist>
    </para>
    Bindings are automatically replicated if the queue and exchange being bound both have
    replication argument of <command>all</command> or <command>confguration</command>, they are
    not replicated otherwise.

    You can create replicated queues and exchanges with the <command>qpid-config</command>
    management tool like this:
    <programlisting>
      qpid-config add queue myqueue --replicate all
    </programlisting>

    To create replicated queues and exchangs via the client API, add a <command>node</command> entry to the address like this:
    <programlisting>
      "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
    </programlisting>
  </section>


  <section>
    <title>Client Fail-over</title>
    <para>
      Clients can only connect to the single primary broker. All other brokers in the
      cluster are backups, and they automatically reject any attempt by a client to
      connect.
    </para>
    <para>
      Clients are configured with the addreses of all of the brokers in the cluster.
      <footnote>
	<para>
	  If the resource manager supports virtual IP addresses then the clients
	  can be configured with a single virtual IP address.
	</para>
      </footnote>
      When the client tries to connect initially, it will try all of its addresses until it
      successfully connects to the primary. If the primary fails, clients will try to
      try to re-connect to all the known brokers until they find the new primary.
    </para>
    <para>
      Suppose your cluster has 3 nodes: <command>node1</command>, <command>node2</command> and <command>node3</command> all using the default AMQP port.
    </para>
    <para>
      With the C++ client, you specify all the cluster addresses in a single URL, for example:
      <programlisting>
	qpid::messaging::Connection c("node1:node2:node3");
      </programlisting>
    </para>
    <para>
      With the python client, you specify <command>reconnect=True</command> and a list of <replaceable>host:port</replaceable> addresses as <command>reconnect_urls</command> when calling <command>establish</command> or <command>open</command>
      <programlisting>
	connection = qpid.messaging.Connection.establish("node1", reconnect=True, "reconnect_urls=["node1", "node2", "node3"])
      </programlisting>
    </para>
  </section>

  <section>
    <title>Broker fail-over</title>
    <para>
      Broker fail-over is managed by a <firstterm>cluster resource
      manager</firstterm>. The initial preview version of HA is not integrated with a
      resource manager, the production version will be integrated with <ulink
      url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> and it may
      be integrated with other resource managers in the future.
    </para>
    <para>
      The resource manager is responsible for ensuring that there is exactly one broker
      is acting as primary at all times. It selects the initial primary broker when the
      cluster is started, detects failure of the primary, and chooses the backup to
      promote as the new primary.
    </para>
    <para>
      You can simulate the actions of a resource manager, or indeed do your own
      integration with a resource manager using the <command>qpid-ha</command> tool.  The
      command
      <programlisting>
	qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable>
      </programlisting>
      will promote the broker listening on
      <replaceable>host</replaceable>:<replaceable>port</replaceable> to be the primary.
      You should only promote a broker to primary when there is no other primary in the
      cluster. The brokers will not detect multiple primaries, they rely on the resource
      manager to do that.
    </para>
    <para>
      A clustered broker always starts initially in <firstterm>discovery</firstterm>
      mode. It uses the addresses configured in the <command>ha-brokers</command>
      configuration option and tries to connect to each in turn until it finds to the
      primary. The resource manager is responsible for choosing on of the backups to
      promote as the initial primary.
    </para>
    <para>
      If the primary fails, all the backups are disconnected and return to discovery mode.
      The resource manager chooses one to promote as the new primary. The other backups
      will eventually discover the new primary and reconnect.
    </para>
  </section>
  <section>
    <title>Broker Administration</title>
    <para>
      You can connect to a backup broker with the administrative tool
      <command>qpid-ha</command>. You can also connect with the tools
      <command>qpid-config</command>, <command>qpid-route</command> and
      <command>qpid-stat</command> if you pass the flag <command>--ha-admin</command> on the
      command line.  If you do connect to a backup you should not modify any of the
      replicated queues, as this will disrupt the replication and may result in
      message loss.
    </para>
  </section>
</section>
<!-- LocalWords:  scalability rgmanager multicast RGManager mailto LVQ
-->