summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlan Conway <aconway@apache.org>2012-03-28 16:24:52 +0000
committerAlan Conway <aconway@apache.org>2012-03-28 16:24:52 +0000
commitcb0c0dbd13a27b535623ca0fdd6ef59f6f13622a (patch)
treea09a3060f577b006720c740bf2c733937ce5e8f8
parented41b499bbe7f2a51afb204799724227b8fca4f3 (diff)
downloadqpid-python-cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a.tar.gz
QPID-3603: Update HA documentation: example of virtual IP addresses
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1306454 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r--qpid/cpp/etc/cluster.conf-example.xml.in20
-rw-r--r--qpid/doc/book/src/Active-Passive-Cluster.xml282
2 files changed, 187 insertions, 115 deletions
diff --git a/qpid/cpp/etc/cluster.conf-example.xml.in b/qpid/cpp/etc/cluster.conf-example.xml.in
index dbeb3af537..eb70ebbb1e 100644
--- a/qpid/cpp/etc/cluster.conf-example.xml.in
+++ b/qpid/cpp/etc/cluster.conf-example.xml.in
@@ -7,14 +7,18 @@ This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
<cluster name="qpid-test" config_version="18">
<!-- The cluster has 3 nodes. Each has a unique nodid and one vote for quorum. -->
<clusternodes>
- <clusternode name="node1" nodeid="1"/>
- <clusternode name="node2" nodeid="2"/>
- <clusternode name="node3" nodeid="3"/>
+ <clusternode name="node1" nodeid="1">
+ <fence/>
+ </clusternode>
+ <clusternode name="node2" nodeid="2">
+ <fence/>
+ </clusternode>
+ <clusternode name="node3" nodeid="3">
+ <fence/>
+ </clusternode>
</clusternodes>
- <!-- Resouce Manager configuration.
- TODO explain central_processing="1"
- -->
- <rm log_level="7" central_processing="1">
+ <!-- Resouce Manager configuration. -->
+ <rm log_level="7"> <!-- Verbose logging -->
<!--
There is a failoverdomain for each node containing just that node.
This lets us stipulate that the qpidd service should always run on all nodes.
@@ -59,7 +63,7 @@ This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
<!-- There should always be a single qpidd-primary service, it can run on any node. -->
<service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
<script ref="qpidd-primary"/>
- <!-- The primary has the IP addresses for brokers and clients. -->
+ <!-- The primary has the IP addresses for brokers and clients to connect. -->
<ip ref="20.0.10.200"/>
<ip ref="20.0.20.200"/>
</service>
diff --git a/qpid/doc/book/src/Active-Passive-Cluster.xml b/qpid/doc/book/src/Active-Passive-Cluster.xml
index 5ab515f235..52748b2570 100644
--- a/qpid/doc/book/src/Active-Passive-Cluster.xml
+++ b/qpid/doc/book/src/Active-Passive-Cluster.xml
@@ -13,7 +13,7 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+h"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
@@ -148,15 +148,17 @@ under the License.
<section>
<title>Virtual IP Addresses</title>
<para>
- Some resource managers (including <command>rgmanager</command>) support <firstterm>virtual IP
- addresses</firstterm>. A virtual IP address is an IP address that can be relocated to any of
- the nodes in a cluster. The resource manager associates this address with the primary node in
- the cluster, and relocates it to the new primary when there is a failure. This simplifies
- configuration as you can publish a single IP address rather than a list.
+ Some resource managers (including <command>rgmanager</command>) support
+ <firstterm>virtual IP addresses</firstterm>. A virtual IP address is an IP
+ address that can be relocated to any of the nodes in a cluster. The
+ resource manager associates this address with the primary node in the
+ cluster, and relocates it to the new primary when there is a failure. This
+ simplifies configuration as you can publish a single IP address rather
+ than a list.
</para>
<para>
- A virtual IP address can be used by clients to connect to the primary, and also by backup
- brokers when they connect to the primary. The following sections will explain how to configure
+ A virtual IP address can be used by clients and backup brokers to connect
+ to the primary. The following sections will explain how to configure
virtual IP addresses for clients or brokers.
</para>
</section>
@@ -266,42 +268,61 @@ under the License.
<para>
You can create replicated queues and exchanges with the <command>qpid-config</command>
management tool like this:
- <programlisting>
- qpid-config add queue myqueue --replicate all
- </programlisting>
</para>
+ <programlisting>
+ qpid-config add queue myqueue --replicate all
+ </programlisting>
<para>
To create replicated queues and exchanges via the client API, add a <literal>node</literal> entry to the address like this:
- <programlisting>
- "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
- </programlisting>
</para>
+ <programlisting>
+ "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
+ </programlisting>
</section>
<section>
<title>Client Connection and Fail-over</title>
<para>
- Clients can only connect to the primary broker. Backup brokers automatically reject any
- connection attempt by a client.
+ Clients can only connect to the primary broker. Backup brokers
+ automatically reject any connection attempt by a client.
</para>
<para>
- Clients are configured with the URL for the cluster. There are two possibilities
+ Clients are configured with the URL for the cluster (details below for
+ each type of client). There are two possibilities
<itemizedlist>
- <listitem> The URL contains multiple addresses, one for each broker in the cluster.</listitem>
<listitem>
- The URL contains a single <firstterm>virtual IP address</firstterm> that is assigned to the primary broker by the resource manager.
+ The URL contains multiple addresses, one for each broker in the cluster.
+ </listitem>
+ <listitem>
+ The URL contains a single <firstterm>virtual IP address</firstterm>
+ that is assigned to the primary broker by the resource manager.
<footnote><para>Only if the resource manager supports virtual IP addresses</para></footnote>
</listitem>
</itemizedlist>
- In the first case, clients will repeatedly re-try each address in the URL until they
- successfully connect to the primary. In the second case the resource manager will assign the
- virtual IP address to the primary broker, so clients only need to re-try on a single address.
+ In the first case, clients will repeatedly re-try each address in the URL
+ until they successfully connect to the primary. In the second case the
+ resource manager will assign the virtual IP address to the primary broker,
+ so clients only need to re-try on a single address.
</para>
<para>
- When the primary broker fails all clients are disconnected. They go back to re-trying until
- they connect to the new primary. Any messages that have been sent by the client, but not yet
- acknowledged as delivered, are resent. Similarly messages that have been sent by the broker,
- but not acknowledged, are re-queued.
+ When the primary broker fails, clients re-try all known cluster addresses
+ until they connect to the new primary. The client re-sends any messages
+ that were previously sent but not acknowledged by the broker at the time
+ of the failure. Similarly messages that have been sent by the broker, but
+ not acknowledged by the client, are re-queued.
+ </para>
+ <para>
+ TCP can be slow to detect connection failures. A client can configure a
+ connection to use a <firstterm>heartbeat</firstterm> to detect connection
+ failure, and can specify a time interval for the heartbeat. If heartbeats
+ are in use, failures will be detected no later than twice the heartbeat
+ interval. The following sections explain how to enable heartbeat in each
+ client.
+ </para>
+ <para>
+ See &#34;Cluster Failover&#34; in <citetitle>Programming in Apache
+ Qpid</citetitle> for details on how to keep the client aware of cluster
+ membership.
</para>
<para>
Suppose your cluster has 3 nodes: <literal>node1</literal>, <literal>node2</literal>
@@ -316,39 +337,57 @@ under the License.
<footnote>
<para>
The full grammar for the URL is:
- <programlisting>
- url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
- addr = tcp_addr / rmda_addr / ssl_addr / ...
- tcp_addr = ["tcp:"] host [":" port]
- rdma_addr = "rdma:" host [":" port]
- ssl_addr = "ssl:" host [":" port]'
- </programlisting>
</para>
- </footnote>. You also
- need to specify the connection option <literal>reconnect</literal> to be true. For
- example:
<programlisting>
- qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+ url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+ addr = tcp_addr / rmda_addr / ssl_addr / ...
+ tcp_addr = ["tcp:"] host [":" port]
+ rdma_addr = "rdma:" host [":" port]
+ ssl_addr = "ssl:" host [":" port]'
</programlisting>
+ </footnote>
+ You also need to specify the connection option
+ <literal>reconnect</literal> to be true. For example:
+ </para>
+ <programlisting>
+ qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+ </programlisting>
+ <para>
+ Heartbeats are disabled by default. You can enable them by specifying a
+ heartbeat interval (in seconds) for the connection via the
+ <literal>heartbeat</literal> option. For example:
+ <programlisting>
+ qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
+ </programlisting>
</para>
</section>
<section>
<title>Python clients</title>
<para>
- With the python client, you specify <literal>reconnect=True</literal> and a list of
- <replaceable>host:port</replaceable> addresses as <literal>reconnect_urls</literal>
- when calling <literal>Connection.establish</literal> or <literal>Connection.open</literal>
+ With the python client, you specify <literal>reconnect=True</literal>
+ and a list of <replaceable>host:port</replaceable> addresses as
+ <literal>reconnect_urls</literal> when calling
+ <literal>Connection.establish</literal> or
+ <literal>Connection.open</literal>
+ </para>
<programlisting>
connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
</programlisting>
+ <para>
+ Heartbeats are disabled by default. You can
+ enable them by specifying a heartbeat interval (in seconds) for the
+ connection via the &#39;heartbeat&#39; option. For example:
</para>
+ <programlisting>
+ connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
+ </programlisting>
</section>
<section>
<title>Java JMS Clients</title>
<para>
- In Java JMS clients, client fail-over is handled automatically if it is enabled in the
- connection. You can configure a connection to use fail-over using the
- <command>failover</command> property:
+ In Java JMS clients, client fail-over is handled automatically if it is
+ enabled in the connection. You can configure a connection to use
+ fail-over using the <command>failover</command> property:
</para>
<screen>
@@ -398,33 +437,35 @@ under the License.
<screen>
connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;,idle_timeout=3
</screen>
-
</section>
</section>
<section>
<title>The Cluster Resource Manager</title>
<para>
- Broker fail-over is managed by a <firstterm>cluster resource manager</firstterm>. An
- integration with <ulink
- url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is provided, but it is
- possible to integrate with other resource managers.
+ Broker fail-over is managed by a <firstterm>cluster resource
+ manager</firstterm>. An integration with <ulink
+ url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is
+ provided, but it is possible to integrate with other resource managers.
</para>
<para>
- The resource manager is responsible for starting an appropriately-configured broker on each
- node in the cluster. The resource manager then <firstterm>promotes</firstterm> one of the
- brokers to be the primary. The other brokers connect to the primary as backups, using the URL
- provided in the <literal>ha-brokers</literal> configuration option.
+ The resource manager is responsible for starting a on each node in the
+ cluster. The resource manager then <firstterm>promotes</firstterm> one of
+ the brokers to be the primary. The other brokers connect to the primary as
+ backups, using the URL provided in the <literal>ha-brokers</literal>
+ configuration option.
</para>
<para>
- Once connected, the backup brokers synchronize their state with the primary. When a backup is
- synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers
- continually receive updates from the primary in order to stay synchronized.
+ Once connected, the backup brokers synchronize their state with the
+ primary. When a backup is synchronized, or "hot", it is ready to take
+ over if the primary fails. Backup brokers continually receive updates
+ from the primary in order to stay synchronized.
</para>
<para>
- If the primary fails, backup brokers go into fail-over mode. The resource manager must detect
- the failure and promote one of the backups to be the new primary. The other backups connect
- to the new primary and synchronize their state so they can be backups for it.
+ If the primary fails, backup brokers go into fail-over mode. The resource
+ manager must detect the failure and promote one of the backups to be the
+ new primary. The other backups connect to the new primary and synchronize
+ their state so they can be backups for it.
</para>
<para>
The resource manager is also responsible for protecting the cluster from
@@ -437,65 +478,84 @@ under the License.
<section>
<title>Configuring <command>rgmanager</command> as resource manager</title>
<para>
- This section assumes that you are already familiar with setting up and configuring
- clustered services using <command>cman</command> and <command>rgmanager</command>. It
- will show you how to configure an active-passive, hot-standby <command>qpidd</command>
- HA cluster.
+ This section assumes that you are already familiar with setting up and
+ configuring clustered services using <command>cman</command> and
+ <command>rgmanager</command>. It will show you how to configure an
+ active-passive, hot-standby <command>qpidd</command> HA cluster.
</para>
<para>
- Here is an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named
- mrg32, mrg34 and mrg35. We will go through the configuration step-by-step.
+ Here is an example <literal>cluster.conf</literal> file for a cluster of 3
+ nodes named node1, node2 and node3. We will go through the configuration
+ step-by-step.
</para>
<programlisting>
-<![CDATA[
+ <![CDATA[
<?xml version="1.0"?>
-<cluster alias="qpid-hot-standby" config_version="4" name="qpid-hot-standby">
+<!--
+This is an example of a cluster.conf file to run qpidd HA under rgmanager.
+This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
+-->
+
+<cluster name="qpid-test" config_version="18">
+ <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for quorum. -->
<clusternodes>
- <clusternode name="mrg32" nodeid="1">
+ <clusternode name="node1" nodeid="1">
<fence/>
</clusternode>
- <clusternode name="mrg34" nodeid="2">
+ <clusternode name="node2" nodeid="2">
<fence/>
</clusternode>
- <clusternode name="mrg35" nodeid="3">
+ <clusternode name="node3" nodeid="3">
<fence/>
</clusternode>
</clusternodes>
- <cman/>
- <rm log_level="7" <!-- Verbose logging -->
- central_processing="1"> <!-- TODO explain-->
+ <!-- Resouce Manager configuration. -->
+ <rm log_level="7"> <!-- Verbose logging -->
+ <!--
+ There is a failoverdomain for each node containing just that node.
+ This lets us stipulate that the qpidd service should always run on all nodes.
+ -->
<failoverdomains>
- <failoverdomain name="mrg32-domain" restricted="1">
- <failoverdomainnode name="mrg32"/>
+ <failoverdomain name="node1-domain" restricted="1">
+ <failoverdomainnode name="node1"/>
</failoverdomain>
- <failoverdomain name="mrg34-domain" restricted="1">
- <failoverdomainnode name="mrg34"/>
+ <failoverdomain name="node2-domain" restricted="1">
+ <failoverdomainnode name="node2"/>
</failoverdomain>
- <failoverdomain name="mrg35-domain" restricted="1">
- <failoverdomainnode name="mrg35"/>
+ <failoverdomain name="node3-domain" restricted="1">
+ <failoverdomainnode name="node3"/>
</failoverdomain>
</failoverdomains>
+
<resources>
- <script file="/etc/init.d/qpidd" name="qpidd"/>
- <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>
+ <!-- This script starts a qpidd broker acting as a backup. -->
+ <script file="!!sysconfdir!!/init.d/qpidd" name="qpidd"/>
+
+ <!-- This script promotes the qpidd broker on this node to primary. -->
+ <script file="!!sysconfdir!!/init.d/qpidd-primary" name="qpidd-primary"/>
+
+ <!-- This is a virtual IP address for broker replication traffic. -->
<ip address="20.0.10.200" monitor_link="1"/>
+
+ <!-- This is a virtual IP address on a seprate network for client traffic. -->
<ip address="20.0.20.200" monitor_link="1"/>
</resources>
<!-- There is a qpidd service on each node, it should be restarted if it fails. -->
- <service name="mrg32-qpidd-service" domain="mrg32-domain" recovery="restart">
+ <service name="node1-qpidd-service" domain="node1-domain" recovery="restart">
<script ref="qpidd"/>
</service>
- <service name="mrg34-qpidd-service" domain="mrg34-domain" recovery="restart">
+ <service name="node2-qpidd-service" domain="node2-domain" recovery="restart">
<script ref="qpidd"/>
</service>
- <service name="mrg35-qpidd-service" domain="mrg35-domain" recovery="restart">
+ <service name="node3-qpidd-service" domain="node3-domain" recovery="restart">
<script ref="qpidd"/>
</service>
<!-- There should always be a single qpidd-primary service, it can run on any node. -->
<service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
<script ref="qpidd-primary"/>
+ <!-- The primary has the IP addresses for brokers and clients to connect. -->
<ip ref="20.0.10.200"/>
<ip ref="20.0.20.200"/>
</service>
@@ -503,7 +563,7 @@ under the License.
<fencedevices/>
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
</cluster>
-]]>
+ ]]>
</programlisting>
<para>
There is a <literal>failoverdomain</literal> for each node containing just that
@@ -511,32 +571,48 @@ under the License.
nodes.
</para>
<para>
- The <literal>resources</literal> section defines the usual initialization script to
- start the <command>qpidd</command> service. <command>qpidd</command>. It also
- defines the <command>qpid-primary</command> script. Starting this script does not
+ The <literal>resources</literal> section defines the usual initialization
+ script to start the <command>qpidd</command> service.
+ <command>qpidd</command>. It also defines the
+ <command>qpid-primary</command> script. Starting this script does not
actually start a new service, rather it promotes the existing
<command>qpidd</command> broker to primary status.
</para>
<para>
The <literal>resources</literal> section also defines a pair of virtual IP
addresses on different sub-nets. One will be used for broker-to-broker
- communication, the other for client-to-broker.
+ communication, the other for client-to-broker.
</para>
<para>
- The <literal>service</literal> section defines 3 <command>qpidd</command> services,
- one for each node. Each service is in a restricted fail-over domain containing just
- that node, and has the <literal>restart</literal> recovery policy. The effect of
- this is that rgmanager will run <command>qpidd</command> on each node, restarting if
- it fails.
+ To take advantage of the virtual IP addresses, <filename>qpidd.conf</filename>
+ should contain these lines:
+ </para>
+ <programlisting>
+ ha-cluster=yes
+ ha-brokers=20.0.20.200
+ ha-public-brokers=20.0.10.200
+ </programlisting>
+ <para>
+ This configuration specifies that backup brokers will use 20.0.20.200
+ to connect to the primary and will advertise 20.0.10.200 to clients.
+ Clients should connect to 20.0.10.200.
+ </para>
+ <para>
+ The <literal>service</literal> section defines 3 <command>qpidd</command>
+ services, one for each node. Each service is in a restricted fail-over
+ domain containing just that node, and has the <literal>restart</literal>
+ recovery policy. The effect of this is that rgmanager will run
+ <command>qpidd</command> on each node, restarting if it fails.
</para>
<para>
There is a single <literal>qpidd-primary-service</literal> running the
- <command>qpidd-primary</command> script which is not restricted to a domain and has
- the <literal>relocate</literal> recovery policy. This means rgmanager will start
- <command>qpidd-primary</command> on one of the nodes when the cluster starts and
- will relocate it to another node if the original node fails. Running the
- <literal>qpidd-primary</literal> script does not actually start a new process,
- rather it promotes the existing broker to become the primary.
+ <command>qpidd-primary</command> script which is not restricted to a
+ domain and has the <literal>relocate</literal> recovery policy. This means
+ rgmanager will start <command>qpidd-primary</command> on one of the nodes
+ when the cluster starts and will relocate it to another node if the
+ original node fails. Running the <literal>qpidd-primary</literal> script
+ does not start a new broker process, it promotes the existing broker to
+ become the primary.
</para>
</section>
@@ -556,14 +632,6 @@ under the License.
<command>qpid-stat</command> will connect to a backup if you pass the flag <command>--ha-admin</command> on the
command line.
</para>
- <para>
- To promote a broker to primary use the following command:
- <programlisting>
- qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable>
- </programlisting>
- The resource manager must ensure that it does not promote a broker to primary when
- there is already a primary in the cluster.
- </para>
</section>
</section>