summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlan Conway <aconway@apache.org>2012-03-27 14:49:47 +0000
committerAlan Conway <aconway@apache.org>2012-03-27 14:49:47 +0000
commit012f33fd105fb0838898bb66a25823aaf07a9704 (patch)
treecf3728250f631f59113de088de4a348d0162bafc
parent6fee1f87a9258d962b9bc3de7afad343ff1838b9 (diff)
downloadqpid-python-012f33fd105fb0838898bb66a25823aaf07a9704.tar.gz
QPID-3603: Update new HA docs with information on rgmanager, more detail about client connections.
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1305855 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r--qpid/doc/book/src/Active-Passive-Cluster.xml497
1 files changed, 353 insertions, 144 deletions
diff --git a/qpid/doc/book/src/Active-Passive-Cluster.xml b/qpid/doc/book/src/Active-Passive-Cluster.xml
index 3eaadad51e..266fd3551d 100644
--- a/qpid/doc/book/src/Active-Passive-Cluster.xml
+++ b/qpid/doc/book/src/Active-Passive-Cluster.xml
@@ -27,66 +27,62 @@ under the License.
<section>
<title>Overview</title>
<para>
- This release provides a preview of a new module for High Availability (HA). The new
- module is not yet complete or ready for production use, it being made available so
- that users can experiment with the new approach and provide feedback early in the
- development process. Feedback should go to <ulink
- url="mailto:user@qpid.apache.org">user@qpid.apache.org</ulink>.
+ This release provides a preview of a new module for High Availability (HA). The new module is
+ not yet complete or ready for production use. It being made available so that users can
+ experiment with the new approach and provide feedback early in the development process.
+ Feedback should go to <ulink url="mailto:user@qpid.apache.org">dev@qpid.apache.org</ulink>.
</para>
<para>
- The old cluster module takes an <firstterm>active-active</firstterm> approach,
- i.e. all the brokers in a cluster are able to handle client requests
- simultaneously. The new HA module takes an <firstterm>active-passive</firstterm>,
- <firstterm>hot-standby</firstterm> approach.
+ The old cluster module takes an <firstterm>active-active</firstterm> approach, i.e. all the
+ brokers in a cluster are able to handle client requests simultaneously. The new HA module
+ takes an <firstterm>active-passive</firstterm>, <firstterm>hot-standby</firstterm> approach.
</para>
<para>
- In an active-passive cluster, only one broker, known as the
- <firstterm>primary</firstterm>, is active and serving clients at a time. The other
- brokers are standing by as <firstterm>backups</firstterm>. Changes on the primary
- are immediately replicated to all the backups so they are always up-to-date or
- "hot". If the primary fails, one of the backups is promoted to be the new
- primary. Clients fail-over to the new primary automatically. If there are multiple
- backups, the backups also fail-over to become backups of the new primary.
+ In an active-passive cluster only one broker, known as the <firstterm>primary</firstterm>, is
+ active and serving clients at a time. The other brokers are standing by as
+ <firstterm>backups</firstterm>. Changes on the primary are immediately replicated to all the
+ backups so they are always up-to-date or "hot". If the primary fails, one of the backups is
+ promoted to take over as the new primary. Clients fail-over to the new primary
+ automatically. If there are multiple backups, the backups also fail-over to become backups of
+ the new primary. Backup brokers reject connection attempts, to enforce the requirement that
+ only the primary be active.
</para>
<para>
- The new approach depends on an external <firstterm>cluster resource
- manager</firstterm> to detect failure of the primary and choose the new primary. The
- first supported resource manager will be <ulink
- url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink>, but it will
- be possible to add integration with other resource managers in the future. The
- preview version is not integrated with any resource manager, you can use the
- <command>qpid-ha</command> tool to simulate the actions of a resource manager or do
- your own integration.
+ This approach depends on an external <firstterm>cluster resource manager</firstterm> to detect
+ failures and choose the primary. <ulink
+ url="https://fedorahosted.org/cluster/wiki/RGManager">Rgmanager</ulink> is supported
+ initially, but others may be supported in future future.
</para>
<section>
<title>Why the new approach?</title>
- The new active-passive approach has several advantages compared to the
- existing active-active cluster module.
- <itemizedlist>
- <listitem>
- It does not depend directly on openais or corosync. It does not use multicast
- which simplifies deployment.
- </listitem>
- <listitem>
- It is more portable: in environments that don't support corosync, it can be
- integrated with a resource manager available in that environment.
- </listitem>
- <listitem>
- Replication to a <firstterm>disaster recovery</firstterm> site can be handled as
- simply another node in the cluster, it does not require a separate replication
- mechanism.
- </listitem>
- <listitem>
- It can take advantage of features provided by the resource manager, for example
- virtual IP addresses.
- </listitem>
- <listitem>
- Improved performance and scalability due to better use of multiple CPU s
- </listitem>
- </itemizedlist>
+ <para>
+ The new active-passive approach has several advantages compared to the
+ existing active-active cluster module.
+ <itemizedlist>
+ <listitem>
+ It does not depend directly on openais or corosync. It does not use multicast
+ which simplifies deployment.
+ </listitem>
+ <listitem>
+ It is more portable: in environments that don't support corosync, it can be
+ integrated with a resource manager available in that environment.
+ </listitem>
+ <listitem>
+ Replication to a <firstterm>disaster recovery</firstterm> site can be handled as
+ simply another node in the cluster, it does not require a separate replication
+ mechanism.
+ </listitem>
+ <listitem>
+ It can take advantage of features provided by the resource manager, for example
+ virtual IP addresses.
+ </listitem>
+ <listitem>
+ Improved performance and scalability due to better use of multiple CPU s
+ </listitem>
+ </itemizedlist>
+ </para>
</section>
<section>
-
<title>Limitations</title>
<para>
@@ -96,9 +92,9 @@ under the License.
<itemizedlist>
<listitem>
- Transactional changes to queue state are not replicated atomically. If the
- primary crashes during a transaction, it is possible that the backup could
- contain only part of the changes introduced by a transaction.
+ Transactional changes to queue state are not replicated atomically. If the primary crashes
+ during a transaction, it is possible that the backup could contain only part of the
+ changes introduced by a transaction.
</listitem>
<listitem>
During a fail-over one backup is promoted to primary and any other backups switch to
@@ -107,14 +103,14 @@ under the License.
switched.
</listitem>
<listitem>
- When used with a persistent store: if the entire cluster fails, there are no tools
- to help identify the most recent store.
- </listitem>
- <listitem>
Acknowledgments are confirmed to clients before the message has been dequeued
from replicas or indeed from the local store if that is asynchronous.
</listitem>
<listitem>
+ When used with a persistent store: if the entire cluster fails, there are no tools to help
+ identify the most recent store.
+ </listitem>
+ <listitem>
A persistent broker must have its store erased before joining an existing cluster.
In the production version a persistent broker will be able to load its store and
avoid downloading messages that are in the store from the primary.
@@ -149,18 +145,32 @@ under the License.
</section>
</section>
-
+ <section>
+ <title>Virtual IP Addresses</title>
+ <para>
+ Some resource managers (including <command>rgmanager</command>) support <firstterm>virtual IP
+ addresses</firstterm>. A virtual IP address is an IP address that can be relocated to any of
+ the nodes in a cluster. The resource manager associates this address with the primary node in
+ the cluster, and relocates it to the new primary when there is a failure. This simplifies
+ configuration as you can publish a single IP address rather than a list.
+ </para>
+ <para>
+ A virtual IP address can be used by clients to connect to the primary, and also by backup
+ brokers when they connect to the primary. The following sections will explain how to configure
+ virtual IP addresses for clients or brokers.
+ </para>
+ </section>
<section>
<title>Configuring the Brokers</title>
<para>
- The broker must load the <filename>ha</filename> module, it is loaded by default
- when you start a broker. The following broker options are available for the HA module.
+ The broker must load the <filename>ha</filename> module, it is loaded by default. The
+ following broker options are available for the HA module.
</para>
<table frame="all" id="ha-broker-options">
<title>Options for High Availability Messaging Cluster</title>
<tgroup align="left" cols="2" colsep="1" rowsep="1">
<colspec colname="c1" colwidth="1*"/>
- <colspec colname="c2" colwidth="4*"/>
+ <colspec colname="c2" colwidth="3*"/>
<thead>
<row>
<entry align="center" nameend="c2" namest="c1">
@@ -171,7 +181,7 @@ under the License.
<tbody>
<row>
<entry>
- <command>--ha-cluster <replaceable>yes|no</replaceable></command>
+ <literal>--ha-cluster <replaceable>yes|no</replaceable></literal>
</entry>
<entry>
Set to "yes" to have the broker join a cluster.
@@ -179,7 +189,7 @@ under the License.
</row>
<row>
<entry>
- <command>--ha-brokers <replaceable>URL</replaceable></command>
+ <literal>--ha-brokers <replaceable>URL</replaceable></literal>
</entry>
<entry>
URL use by brokers to connect to each other. The URL lists the addresses of
@@ -201,19 +211,19 @@ under the License.
</entry>
</row>
<row>
- <entry> <command>--ha-public-brokers <replaceable>URL</replaceable></command> </entry>
+ <entry> <literal>--ha-public-brokers <replaceable>URL</replaceable></literal> </entry>
<entry>
URL used by clients to connect to the brokers in the same format as
- <command>--ha-brokers</command> above. Use this option if you want client
+ <literal>--ha-brokers</literal> above. Use this option if you want client
traffic on a different network from broker replication traffic. If this
option is not set, clients will use the same URL as brokers.
</entry>
</row>
<row>
<entry>
- <para><command>--ha-username <replaceable>USER</replaceable></command></para>
- <para><command>--ha-password <replaceable>PASS</replaceable></command></para>
- <para><command>--ha-mechanism <replaceable>MECH</replaceable></command></para>
+ <para><literal>--ha-username <replaceable>USER</replaceable></literal></para>
+ <para><literal>--ha-password <replaceable>PASS</replaceable></literal></para>
+ <para><literal>--ha-mechanism <replaceable>MECH</replaceable></literal></para>
</entry>
<entry>
Brokers use <replaceable>USER</replaceable>,
@@ -225,16 +235,15 @@ under the License.
</tgroup>
</table>
<para>
- To configure a cluster you must set at least <command>ha-cluster</command> and <command>ha-brokers</command>
+ To configure a cluster you must set at least <literal>ha-cluster</literal> and <literal>ha-brokers</literal>.
</para>
</section>
-
<section>
<title>Creating replicated queues and exchanges</title>
<para>
To create a replicated queue or exchange, pass the argument
- <command>qpid.replicate</command> when creating the queue or exchange. It should
+ <literal>qpid.replicate</literal> when creating the queue or exchange. It should
have one of the following three values:
<itemizedlist>
<listitem>
@@ -249,113 +258,313 @@ under the License.
</listitem>
</itemizedlist>
</para>
- Bindings are automatically replicated if the queue and exchange being bound both have
- replication argument of <command>all</command> or <command>confguration</command>, they are
- not replicated otherwise.
+ <para>
+ Bindings are automatically replicated if the queue and exchange being bound both have
+ replication argument of <literal>all</literal> or <literal>configuration</literal>, they are
+ not replicated otherwise.
+ </para>
+ <para>
+ You can create replicated queues and exchanges with the <command>qpid-config</command>
+ management tool like this:
+ <programlisting>
+ qpid-config add queue myqueue --replicate all
+ </programlisting>
+ </para>
+ <para>
+ To create replicated queues and exchanges via the client API, add a <literal>node</literal> entry to the address like this:
+ <programlisting>
+ "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
+ </programlisting>
+ </para>
+ </section>
- You can create replicated queues and exchanges with the <command>qpid-config</command>
- management tool like this:
- <programlisting>
- qpid-config add queue myqueue --replicate all
- </programlisting>
+ <section>
+ <title>Client Connection and Fail-over</title>
+ <para>
+ Clients can only connect to the primary broker. Backup brokers automatically reject any
+ connection attempt by a client.
+ </para>
+ <para>
+ Clients are configured with the URL for the cluster. There are two possibilities
+ <itemizedlist>
+ <listitem> The URL contains multiple addresses, one for each broker in the cluster.</listitem>
+ <listitem>
+ The URL contains a single <firstterm>virtual IP address</firstterm> that is assigned to the primary broker by the resource manager.
+ <footnote><para>Only if the resource manager supports virtual IP addresses</para></footnote>
+ </listitem>
+ </itemizedlist>
+ In the first case, clients will repeatedly re-try each address in the URL until they
+ successfully connect to the primary. In the second case the resource manager will assign the
+ virtual IP address to the primary broker, so clients only need to re-try on a single address.
+ </para>
+ <para>
+ When the primary broker fails all clients are disconnected. They go back to re-trying until
+ they connect to the new primary. Any messages that have been sent by the client, but not yet
+ acknowledged as delivered, are resent. Similarly messages that have been sent by the broker,
+ but not acknowledged, are re-queued.
+ </para>
+ <para>
+ Suppose your cluster has 3 nodes: <literal>node1</literal>, <literal>node2</literal>
+ and <literal>node3</literal> all using the default AMQP port. To connect a client you
+ need to specify the address(es) and set the <literal>reconnect</literal> property to
+ <literal>true</literal>. Here's how to connect each type of client:
+ </para>
+ <section>
+ <title>C++ clients</title>
+ <para>
+ With the C++ client, you specify multiple cluster addresses in a single URL
+ <footnote>
+ <para>
+ The full grammar for the URL is:
+ <programlisting>
+ url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+ addr = tcp_addr / rmda_addr / ssl_addr / ...
+ tcp_addr = ["tcp:"] host [":" port]
+ rdma_addr = "rdma:" host [":" port]
+ ssl_addr = "ssl:" host [":" port]'
+ </programlisting>
+ </para>
+ </footnote>. You also
+ need to specify the connection option <literal>reconnect</literal> to be true. For
+ example:
+ <programlisting>
+ qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+ </programlisting>
+ </para>
+ </section>
+ <section>
+ <title>Python clients</title>
+ <para>
+ With the python client, you specify <literal>reconnect=True</literal> and a list of
+ <replaceable>host:port</replaceable> addresses as <literal>reconnect_urls</literal>
+ when calling <literal>Connection.establish</literal> or <literal>Connection.open</literal>
+ <programlisting>
+ connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
+ </programlisting>
+ </para>
+ </section>
+ <section>
+ <title>Java JMS Clients</title>
+ <para>
+ In Java JMS clients, client fail-over is handled automatically if it is enabled in the
+ connection. You can configure a connection to use fail-over using the
+ <command>failover</command> property:
+ </para>
- To create replicated queues and exchangs via the client API, add a <command>node</command> entry to the address like this:
- <programlisting>
- "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
- </programlisting>
- </section>
+ <screen>
+ connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;&amp;failover=&#39;failover_exchange&#39;
+ </screen>
+ <para>
+ This property can take three values:
+ </para>
+ <variablelist>
+ <title>Fail-over Modes</title>
+ <varlistentry>
+ <term>failover_exchange</term>
+ <listitem>
+ <para>
+ If the connection fails, fail over to any other broker in the cluster.
+ </para>
+
+ </listitem>
+
+ </varlistentry>
+ <varlistentry>
+ <term>roundrobin</term>
+ <listitem>
+ <para>
+ If the connection fails, fail over to one of the brokers specified in the <command>brokerlist</command>.
+ </para>
+
+ </listitem>
+
+ </varlistentry>
+ <varlistentry>
+ <term>singlebroker</term>
+ <listitem>
+ <para>
+ Fail-over is not supported; the connection is to a single broker only.
+ </para>
+
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ In a Connection URL, heartbeat is set using the <command>idle_timeout</command> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
+ </para>
+
+ <screen>
+ connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;,idle_timeout=3
+ </screen>
+
+ </section>
+ </section>
<section>
- <title>Client Fail-over</title>
+ <title>The Cluster Resource Manager</title>
<para>
- Clients can only connect to the single primary broker. All other brokers in the
- cluster are backups, and they automatically reject any attempt by a client to
- connect.
+ Broker fail-over is managed by a <firstterm>cluster resource manager</firstterm>. An
+ integration with <ulink
+ url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is provided, but it is
+ possible to integrate with other resource managers.
</para>
<para>
- Clients are configured with the addreses of all of the brokers in the cluster.
- <footnote>
- <para>
- If the resource manager supports virtual IP addresses then the clients
- can be configured with a single virtual IP address.
- </para>
- </footnote>
- When the client tries to connect initially, it will try all of its addresses until it
- successfully connects to the primary. If the primary fails, clients will try to
- try to re-connect to all the known brokers until they find the new primary.
+ The resource manager is responsible for starting an appropriately-configured broker on each
+ node in the cluster. The resource manager then <firstterm>promotes</firstterm> one of the
+ brokers to be the primary. The other brokers connect to the primary as backups, using the URL
+ provided in the <literal>ha-brokers</literal> configuration option.
</para>
<para>
- Suppose your cluster has 3 nodes: <command>node1</command>, <command>node2</command> and <command>node3</command> all using the default AMQP port.
+ Once connected, the backup brokers synchronize their state with the primary. When a backup is
+ synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers
+ continually receive updates from the primary in order to stay synchronized.
</para>
<para>
- With the C++ client, you specify all the cluster addresses in a single URL, for example:
- <programlisting>
- qpid::messaging::Connection c("node1:node2:node3");
- </programlisting>
+ If the primary fails, backup brokers go into fail-over mode. The resource manager must detect
+ the failure and promote one of the backups to be the new primary. The other backups connect
+ to the new primary and synchronize their state so they can be backups for it.
</para>
<para>
- With the python client, you specify <command>reconnect=True</command> and a list of <replaceable>host:port</replaceable> addresses as <command>reconnect_urls</command> when calling <command>establish</command> or <command>open</command>
- <programlisting>
- connection = qpid.messaging.Connection.establish("node1", reconnect=True, "reconnect_urls=["node1", "node2", "node3"])
- </programlisting>
+ The resource manager is also responsible for protecting the cluster from
+ <firstterm>split-brain</firstterm> conditions resulting from a network partition.
+ A network partition divide a cluster into two sub-groups which cannot see each other.
+ Usually a <firstterm>quorum</firstterm> voting algorithm is used that disables
+ nodes in the inquorate sub-group.
</para>
</section>
-
<section>
- <title>Broker fail-over</title>
+ <title>Configuring <command>rgmanager</command> as resource manager</title>
+ <para>
+ This section assumes that you are already familiar with setting up and configuring
+ clustered services using <command>cman</command> and <command>rgmanager</command>. It
+ will show you how to configure an active-passive, hot-standby <command>qpidd</command>
+ HA cluster.
+ </para>
<para>
- Broker fail-over is managed by a <firstterm>cluster resource
- manager</firstterm>. The initial preview version of HA is not integrated with a
- resource manager, the production version will be integrated with <ulink
- url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> and it may
- be integrated with other resource managers in the future.
+ Here is an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named
+ mrg32, mrg34 and mrg35. We will go through the configuration step-by-step.
</para>
+ <programlisting>
+<![CDATA[
+<?xml version="1.0"?>
+<cluster alias="qpid-hot-standby" config_version="4" name="qpid-hot-standby">
+ <clusternodes>
+ <clusternode name="mrg32" nodeid="1">
+ <fence/>
+ </clusternode>
+ <clusternode name="mrg34" nodeid="2">
+ <fence/>
+ </clusternode>
+ <clusternode name="mrg35" nodeid="3">
+ <fence/>
+ </clusternode>
+ </clusternodes>
+ <cman/>
+ <rm>
+ <failoverdomains>
+ <failoverdomain name="mrg32-domain" restricted="1">
+ <failoverdomainnode name="mrg32"/>
+ </failoverdomain>
+ <failoverdomain name="mrg34-domain" restricted="1">
+ <failoverdomainnode name="mrg34"/>
+ </failoverdomain>
+ <failoverdomain name="mrg35-domain" restricted="1">
+ <failoverdomainnode name="mrg35"/>
+ </failoverdomain>
+ </failoverdomains>
+ <resources>
+ <script file="/etc/init.d/qpidd" name="qpidd"/>
+ <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>
+ <ip address="20.0.10.200" monitor_link="1"/>
+ <ip address="20.0.20.200" monitor_link="1"/>
+ </resources>
+
+ <!-- There is a qpidd service on each node, it should be restarted if it fails. -->
+ <service name="mrg32-qpidd-service" domain="mrg32-domain" recovery="restart">
+ <script ref="qpidd"/>
+ </service>
+ <service name="mrg34-qpidd-service" domain="mrg34-domain" recovery="restart">
+ <script ref="qpidd"/>
+ </service>
+ <service name="mrg35-qpidd-service" domain="mrg35-domain" recovery="restart">
+ <script ref="qpidd"/>
+ </service>
+
+ <!-- There should always be a single qpidd-primary service, it can run on any node. -->
+ <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
+ <script ref="qpidd-primary"/>
+ <ip ref="20.0.10.200"/>
+ <ip ref="20.0.20.200"/>
+ </service>
+ </rm>
+ <fencedevices/>
+ <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
+</cluster>
+]]>
+ </programlisting>
<para>
- The resource manager is responsible for ensuring that there is exactly one broker
- is acting as primary at all times. It selects the initial primary broker when the
- cluster is started, detects failure of the primary, and chooses the backup to
- promote as the new primary.
+ There is a <literal>failoverdomain</literal> for each node containing just that
+ one node. This lets us stipulate that the qpidd service should always run on all
+ nodes.
</para>
<para>
- You can simulate the actions of a resource manager, or indeed do your own
- integration with a resource manager using the <command>qpid-ha</command> tool. The
- command
- <programlisting>
- qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable>
- </programlisting>
- will promote the broker listening on
- <replaceable>host</replaceable>:<replaceable>port</replaceable> to be the primary.
- You should only promote a broker to primary when there is no other primary in the
- cluster. The brokers will not detect multiple primaries, they rely on the resource
- manager to do that.
+ The <literal>resources</literal> section defines the usual initialization script to
+ start the <command>qpidd</command> service. <command>qpidd</command>. It also
+ defines the <command>qpid-primary</command> script. Starting this script does not
+ actually start a new service, rather it promotes the existing
+ <command>qpidd</command> broker to primary status.
</para>
<para>
- A clustered broker always starts initially in <firstterm>discovery</firstterm>
- mode. It uses the addresses configured in the <command>ha-brokers</command>
- configuration option and tries to connect to each in turn until it finds to the
- primary. The resource manager is responsible for choosing on of the backups to
- promote as the initial primary.
+ The <literal>resources</literal> section also defines a pair of virtual IP
+ addresses on different sub-nets. One will be used for broker-to-broker
+ communication, the other for client-to-broker.
</para>
<para>
- If the primary fails, all the backups are disconnected and return to discovery mode.
- The resource manager chooses one to promote as the new primary. The other backups
- will eventually discover the new primary and reconnect.
+ The <literal>service</literal> section defines 3 <command>qpidd</command> services,
+ one for each node. Each service is in a restricted fail-over domain containing just
+ that node, and has the <literal>restart</literal> recovery policy. The effect of
+ this is that rgmanager will run <command>qpidd</command> on each node, restarting if
+ it fails.
+ </para>
+ <para>
+ There is a single <literal>qpidd-primary-service</literal> running the
+ <command>qpidd-primary</command> script which is not restricted to a domain and has
+ the <literal>relocate</literal> recovery policy. This means rgmanager will start
+ <command>qpidd-primary</command> on one of the nodes when the cluster starts and
+ will relocate it to another node if the original node fails. Running the
+ <literal>qpidd-primary</literal> script does not actually start a new process,
+ rather it promotes the existing broker to become the primary.
</para>
</section>
+
<section>
<title>Broker Administration</title>
<para>
- You can connect to a backup broker with the administrative tool
- <command>qpid-ha</command>. You can also connect with the tools
- <command>qpid-config</command>, <command>qpid-route</command> and
- <command>qpid-stat</command> if you pass the flag <command>--ha-admin</command> on the
- command line. If you do connect to a backup you should not modify any of the
- replicated queues, as this will disrupt the replication and may result in
- message loss.
+ Normally, clients are not allowed to connect to a backup broker. However management tools are
+ allowed to connect to a backup brokers. If you use these tools you <emphasis>must
+ not</emphasis> add or remove messages from replicated queues, or delete replicated queues or
+ exchanges as this will corrupt the replication process and may cause message loss.
+ </para>
+ <para>
+ <command>qpid-ha</command> allows you to view and change HA configuration settings.
+ </para>
+ <para>
+ The tools <command>qpid-config</command>, <command>qpid-route</command> and
+ <command>qpid-stat</command> will connect to a backup if you pass the flag <command>--ha-admin</command> on the
+ command line.
+ </para>
+ <para>
+ To promote a broker to primary use the following command:
+ <programlisting>
+ qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable>
+ </programlisting>
+ The resource manager must ensure that it does not promote a broker to primary when
+ there is already a primary in the cluster.
</para>
</section>
</section>
-<!-- LocalWords: scalability rgmanager multicast RGManager mailto LVQ
+
+<!-- LocalWords: scalability rgmanager multicast RGManager mailto LVQ qpidd IP dequeued Transactional username
-->