diff options
author | Alan Conway <aconway@apache.org> | 2012-03-27 14:49:47 +0000 |
---|---|---|
committer | Alan Conway <aconway@apache.org> | 2012-03-27 14:49:47 +0000 |
commit | 012f33fd105fb0838898bb66a25823aaf07a9704 (patch) | |
tree | cf3728250f631f59113de088de4a348d0162bafc | |
parent | 6fee1f87a9258d962b9bc3de7afad343ff1838b9 (diff) | |
download | qpid-python-012f33fd105fb0838898bb66a25823aaf07a9704.tar.gz |
QPID-3603: Update new HA docs with information on rgmanager, more detail about client connections.
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1305855 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r-- | qpid/doc/book/src/Active-Passive-Cluster.xml | 497 |
1 files changed, 353 insertions, 144 deletions
diff --git a/qpid/doc/book/src/Active-Passive-Cluster.xml b/qpid/doc/book/src/Active-Passive-Cluster.xml index 3eaadad51e..266fd3551d 100644 --- a/qpid/doc/book/src/Active-Passive-Cluster.xml +++ b/qpid/doc/book/src/Active-Passive-Cluster.xml @@ -27,66 +27,62 @@ under the License. <section> <title>Overview</title> <para> - This release provides a preview of a new module for High Availability (HA). The new - module is not yet complete or ready for production use, it being made available so - that users can experiment with the new approach and provide feedback early in the - development process. Feedback should go to <ulink - url="mailto:user@qpid.apache.org">user@qpid.apache.org</ulink>. + This release provides a preview of a new module for High Availability (HA). The new module is + not yet complete or ready for production use. It being made available so that users can + experiment with the new approach and provide feedback early in the development process. + Feedback should go to <ulink url="mailto:user@qpid.apache.org">dev@qpid.apache.org</ulink>. </para> <para> - The old cluster module takes an <firstterm>active-active</firstterm> approach, - i.e. all the brokers in a cluster are able to handle client requests - simultaneously. The new HA module takes an <firstterm>active-passive</firstterm>, - <firstterm>hot-standby</firstterm> approach. + The old cluster module takes an <firstterm>active-active</firstterm> approach, i.e. all the + brokers in a cluster are able to handle client requests simultaneously. The new HA module + takes an <firstterm>active-passive</firstterm>, <firstterm>hot-standby</firstterm> approach. </para> <para> - In an active-passive cluster, only one broker, known as the - <firstterm>primary</firstterm>, is active and serving clients at a time. The other - brokers are standing by as <firstterm>backups</firstterm>. Changes on the primary - are immediately replicated to all the backups so they are always up-to-date or - "hot". If the primary fails, one of the backups is promoted to be the new - primary. Clients fail-over to the new primary automatically. If there are multiple - backups, the backups also fail-over to become backups of the new primary. + In an active-passive cluster only one broker, known as the <firstterm>primary</firstterm>, is + active and serving clients at a time. The other brokers are standing by as + <firstterm>backups</firstterm>. Changes on the primary are immediately replicated to all the + backups so they are always up-to-date or "hot". If the primary fails, one of the backups is + promoted to take over as the new primary. Clients fail-over to the new primary + automatically. If there are multiple backups, the backups also fail-over to become backups of + the new primary. Backup brokers reject connection attempts, to enforce the requirement that + only the primary be active. </para> <para> - The new approach depends on an external <firstterm>cluster resource - manager</firstterm> to detect failure of the primary and choose the new primary. The - first supported resource manager will be <ulink - url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink>, but it will - be possible to add integration with other resource managers in the future. The - preview version is not integrated with any resource manager, you can use the - <command>qpid-ha</command> tool to simulate the actions of a resource manager or do - your own integration. + This approach depends on an external <firstterm>cluster resource manager</firstterm> to detect + failures and choose the primary. <ulink + url="https://fedorahosted.org/cluster/wiki/RGManager">Rgmanager</ulink> is supported + initially, but others may be supported in future future. </para> <section> <title>Why the new approach?</title> - The new active-passive approach has several advantages compared to the - existing active-active cluster module. - <itemizedlist> - <listitem> - It does not depend directly on openais or corosync. It does not use multicast - which simplifies deployment. - </listitem> - <listitem> - It is more portable: in environments that don't support corosync, it can be - integrated with a resource manager available in that environment. - </listitem> - <listitem> - Replication to a <firstterm>disaster recovery</firstterm> site can be handled as - simply another node in the cluster, it does not require a separate replication - mechanism. - </listitem> - <listitem> - It can take advantage of features provided by the resource manager, for example - virtual IP addresses. - </listitem> - <listitem> - Improved performance and scalability due to better use of multiple CPU s - </listitem> - </itemizedlist> + <para> + The new active-passive approach has several advantages compared to the + existing active-active cluster module. + <itemizedlist> + <listitem> + It does not depend directly on openais or corosync. It does not use multicast + which simplifies deployment. + </listitem> + <listitem> + It is more portable: in environments that don't support corosync, it can be + integrated with a resource manager available in that environment. + </listitem> + <listitem> + Replication to a <firstterm>disaster recovery</firstterm> site can be handled as + simply another node in the cluster, it does not require a separate replication + mechanism. + </listitem> + <listitem> + It can take advantage of features provided by the resource manager, for example + virtual IP addresses. + </listitem> + <listitem> + Improved performance and scalability due to better use of multiple CPU s + </listitem> + </itemizedlist> + </para> </section> <section> - <title>Limitations</title> <para> @@ -96,9 +92,9 @@ under the License. <itemizedlist> <listitem> - Transactional changes to queue state are not replicated atomically. If the - primary crashes during a transaction, it is possible that the backup could - contain only part of the changes introduced by a transaction. + Transactional changes to queue state are not replicated atomically. If the primary crashes + during a transaction, it is possible that the backup could contain only part of the + changes introduced by a transaction. </listitem> <listitem> During a fail-over one backup is promoted to primary and any other backups switch to @@ -107,14 +103,14 @@ under the License. switched. </listitem> <listitem> - When used with a persistent store: if the entire cluster fails, there are no tools - to help identify the most recent store. - </listitem> - <listitem> Acknowledgments are confirmed to clients before the message has been dequeued from replicas or indeed from the local store if that is asynchronous. </listitem> <listitem> + When used with a persistent store: if the entire cluster fails, there are no tools to help + identify the most recent store. + </listitem> + <listitem> A persistent broker must have its store erased before joining an existing cluster. In the production version a persistent broker will be able to load its store and avoid downloading messages that are in the store from the primary. @@ -149,18 +145,32 @@ under the License. </section> </section> - + <section> + <title>Virtual IP Addresses</title> + <para> + Some resource managers (including <command>rgmanager</command>) support <firstterm>virtual IP + addresses</firstterm>. A virtual IP address is an IP address that can be relocated to any of + the nodes in a cluster. The resource manager associates this address with the primary node in + the cluster, and relocates it to the new primary when there is a failure. This simplifies + configuration as you can publish a single IP address rather than a list. + </para> + <para> + A virtual IP address can be used by clients to connect to the primary, and also by backup + brokers when they connect to the primary. The following sections will explain how to configure + virtual IP addresses for clients or brokers. + </para> + </section> <section> <title>Configuring the Brokers</title> <para> - The broker must load the <filename>ha</filename> module, it is loaded by default - when you start a broker. The following broker options are available for the HA module. + The broker must load the <filename>ha</filename> module, it is loaded by default. The + following broker options are available for the HA module. </para> <table frame="all" id="ha-broker-options"> <title>Options for High Availability Messaging Cluster</title> <tgroup align="left" cols="2" colsep="1" rowsep="1"> <colspec colname="c1" colwidth="1*"/> - <colspec colname="c2" colwidth="4*"/> + <colspec colname="c2" colwidth="3*"/> <thead> <row> <entry align="center" nameend="c2" namest="c1"> @@ -171,7 +181,7 @@ under the License. <tbody> <row> <entry> - <command>--ha-cluster <replaceable>yes|no</replaceable></command> + <literal>--ha-cluster <replaceable>yes|no</replaceable></literal> </entry> <entry> Set to "yes" to have the broker join a cluster. @@ -179,7 +189,7 @@ under the License. </row> <row> <entry> - <command>--ha-brokers <replaceable>URL</replaceable></command> + <literal>--ha-brokers <replaceable>URL</replaceable></literal> </entry> <entry> URL use by brokers to connect to each other. The URL lists the addresses of @@ -201,19 +211,19 @@ under the License. </entry> </row> <row> - <entry> <command>--ha-public-brokers <replaceable>URL</replaceable></command> </entry> + <entry> <literal>--ha-public-brokers <replaceable>URL</replaceable></literal> </entry> <entry> URL used by clients to connect to the brokers in the same format as - <command>--ha-brokers</command> above. Use this option if you want client + <literal>--ha-brokers</literal> above. Use this option if you want client traffic on a different network from broker replication traffic. If this option is not set, clients will use the same URL as brokers. </entry> </row> <row> <entry> - <para><command>--ha-username <replaceable>USER</replaceable></command></para> - <para><command>--ha-password <replaceable>PASS</replaceable></command></para> - <para><command>--ha-mechanism <replaceable>MECH</replaceable></command></para> + <para><literal>--ha-username <replaceable>USER</replaceable></literal></para> + <para><literal>--ha-password <replaceable>PASS</replaceable></literal></para> + <para><literal>--ha-mechanism <replaceable>MECH</replaceable></literal></para> </entry> <entry> Brokers use <replaceable>USER</replaceable>, @@ -225,16 +235,15 @@ under the License. </tgroup> </table> <para> - To configure a cluster you must set at least <command>ha-cluster</command> and <command>ha-brokers</command> + To configure a cluster you must set at least <literal>ha-cluster</literal> and <literal>ha-brokers</literal>. </para> </section> - <section> <title>Creating replicated queues and exchanges</title> <para> To create a replicated queue or exchange, pass the argument - <command>qpid.replicate</command> when creating the queue or exchange. It should + <literal>qpid.replicate</literal> when creating the queue or exchange. It should have one of the following three values: <itemizedlist> <listitem> @@ -249,113 +258,313 @@ under the License. </listitem> </itemizedlist> </para> - Bindings are automatically replicated if the queue and exchange being bound both have - replication argument of <command>all</command> or <command>confguration</command>, they are - not replicated otherwise. + <para> + Bindings are automatically replicated if the queue and exchange being bound both have + replication argument of <literal>all</literal> or <literal>configuration</literal>, they are + not replicated otherwise. + </para> + <para> + You can create replicated queues and exchanges with the <command>qpid-config</command> + management tool like this: + <programlisting> + qpid-config add queue myqueue --replicate all + </programlisting> + </para> + <para> + To create replicated queues and exchanges via the client API, add a <literal>node</literal> entry to the address like this: + <programlisting> + "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}" + </programlisting> + </para> + </section> - You can create replicated queues and exchanges with the <command>qpid-config</command> - management tool like this: - <programlisting> - qpid-config add queue myqueue --replicate all - </programlisting> + <section> + <title>Client Connection and Fail-over</title> + <para> + Clients can only connect to the primary broker. Backup brokers automatically reject any + connection attempt by a client. + </para> + <para> + Clients are configured with the URL for the cluster. There are two possibilities + <itemizedlist> + <listitem> The URL contains multiple addresses, one for each broker in the cluster.</listitem> + <listitem> + The URL contains a single <firstterm>virtual IP address</firstterm> that is assigned to the primary broker by the resource manager. + <footnote><para>Only if the resource manager supports virtual IP addresses</para></footnote> + </listitem> + </itemizedlist> + In the first case, clients will repeatedly re-try each address in the URL until they + successfully connect to the primary. In the second case the resource manager will assign the + virtual IP address to the primary broker, so clients only need to re-try on a single address. + </para> + <para> + When the primary broker fails all clients are disconnected. They go back to re-trying until + they connect to the new primary. Any messages that have been sent by the client, but not yet + acknowledged as delivered, are resent. Similarly messages that have been sent by the broker, + but not acknowledged, are re-queued. + </para> + <para> + Suppose your cluster has 3 nodes: <literal>node1</literal>, <literal>node2</literal> + and <literal>node3</literal> all using the default AMQP port. To connect a client you + need to specify the address(es) and set the <literal>reconnect</literal> property to + <literal>true</literal>. Here's how to connect each type of client: + </para> + <section> + <title>C++ clients</title> + <para> + With the C++ client, you specify multiple cluster addresses in a single URL + <footnote> + <para> + The full grammar for the URL is: + <programlisting> + url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* + addr = tcp_addr / rmda_addr / ssl_addr / ... + tcp_addr = ["tcp:"] host [":" port] + rdma_addr = "rdma:" host [":" port] + ssl_addr = "ssl:" host [":" port]' + </programlisting> + </para> + </footnote>. You also + need to specify the connection option <literal>reconnect</literal> to be true. For + example: + <programlisting> + qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}"); + </programlisting> + </para> + </section> + <section> + <title>Python clients</title> + <para> + With the python client, you specify <literal>reconnect=True</literal> and a list of + <replaceable>host:port</replaceable> addresses as <literal>reconnect_urls</literal> + when calling <literal>Connection.establish</literal> or <literal>Connection.open</literal> + <programlisting> + connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"]) + </programlisting> + </para> + </section> + <section> + <title>Java JMS Clients</title> + <para> + In Java JMS clients, client fail-over is handled automatically if it is enabled in the + connection. You can configure a connection to use fail-over using the + <command>failover</command> property: + </para> - To create replicated queues and exchangs via the client API, add a <command>node</command> entry to the address like this: - <programlisting> - "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}" - </programlisting> - </section> + <screen> + connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&failover='failover_exchange' + </screen> + <para> + This property can take three values: + </para> + <variablelist> + <title>Fail-over Modes</title> + <varlistentry> + <term>failover_exchange</term> + <listitem> + <para> + If the connection fails, fail over to any other broker in the cluster. + </para> + + </listitem> + + </varlistentry> + <varlistentry> + <term>roundrobin</term> + <listitem> + <para> + If the connection fails, fail over to one of the brokers specified in the <command>brokerlist</command>. + </para> + + </listitem> + + </varlistentry> + <varlistentry> + <term>singlebroker</term> + <listitem> + <para> + Fail-over is not supported; the connection is to a single broker only. + </para> + + </listitem> + </varlistentry> + </variablelist> + <para> + In a Connection URL, heartbeat is set using the <command>idle_timeout</command> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds: + </para> + + <screen> + connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3 + </screen> + + </section> + </section> <section> - <title>Client Fail-over</title> + <title>The Cluster Resource Manager</title> <para> - Clients can only connect to the single primary broker. All other brokers in the - cluster are backups, and they automatically reject any attempt by a client to - connect. + Broker fail-over is managed by a <firstterm>cluster resource manager</firstterm>. An + integration with <ulink + url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is provided, but it is + possible to integrate with other resource managers. </para> <para> - Clients are configured with the addreses of all of the brokers in the cluster. - <footnote> - <para> - If the resource manager supports virtual IP addresses then the clients - can be configured with a single virtual IP address. - </para> - </footnote> - When the client tries to connect initially, it will try all of its addresses until it - successfully connects to the primary. If the primary fails, clients will try to - try to re-connect to all the known brokers until they find the new primary. + The resource manager is responsible for starting an appropriately-configured broker on each + node in the cluster. The resource manager then <firstterm>promotes</firstterm> one of the + brokers to be the primary. The other brokers connect to the primary as backups, using the URL + provided in the <literal>ha-brokers</literal> configuration option. </para> <para> - Suppose your cluster has 3 nodes: <command>node1</command>, <command>node2</command> and <command>node3</command> all using the default AMQP port. + Once connected, the backup brokers synchronize their state with the primary. When a backup is + synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers + continually receive updates from the primary in order to stay synchronized. </para> <para> - With the C++ client, you specify all the cluster addresses in a single URL, for example: - <programlisting> - qpid::messaging::Connection c("node1:node2:node3"); - </programlisting> + If the primary fails, backup brokers go into fail-over mode. The resource manager must detect + the failure and promote one of the backups to be the new primary. The other backups connect + to the new primary and synchronize their state so they can be backups for it. </para> <para> - With the python client, you specify <command>reconnect=True</command> and a list of <replaceable>host:port</replaceable> addresses as <command>reconnect_urls</command> when calling <command>establish</command> or <command>open</command> - <programlisting> - connection = qpid.messaging.Connection.establish("node1", reconnect=True, "reconnect_urls=["node1", "node2", "node3"]) - </programlisting> + The resource manager is also responsible for protecting the cluster from + <firstterm>split-brain</firstterm> conditions resulting from a network partition. + A network partition divide a cluster into two sub-groups which cannot see each other. + Usually a <firstterm>quorum</firstterm> voting algorithm is used that disables + nodes in the inquorate sub-group. </para> </section> - <section> - <title>Broker fail-over</title> + <title>Configuring <command>rgmanager</command> as resource manager</title> + <para> + This section assumes that you are already familiar with setting up and configuring + clustered services using <command>cman</command> and <command>rgmanager</command>. It + will show you how to configure an active-passive, hot-standby <command>qpidd</command> + HA cluster. + </para> <para> - Broker fail-over is managed by a <firstterm>cluster resource - manager</firstterm>. The initial preview version of HA is not integrated with a - resource manager, the production version will be integrated with <ulink - url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> and it may - be integrated with other resource managers in the future. + Here is an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named + mrg32, mrg34 and mrg35. We will go through the configuration step-by-step. </para> + <programlisting> +<![CDATA[ +<?xml version="1.0"?> +<cluster alias="qpid-hot-standby" config_version="4" name="qpid-hot-standby"> + <clusternodes> + <clusternode name="mrg32" nodeid="1"> + <fence/> + </clusternode> + <clusternode name="mrg34" nodeid="2"> + <fence/> + </clusternode> + <clusternode name="mrg35" nodeid="3"> + <fence/> + </clusternode> + </clusternodes> + <cman/> + <rm> + <failoverdomains> + <failoverdomain name="mrg32-domain" restricted="1"> + <failoverdomainnode name="mrg32"/> + </failoverdomain> + <failoverdomain name="mrg34-domain" restricted="1"> + <failoverdomainnode name="mrg34"/> + </failoverdomain> + <failoverdomain name="mrg35-domain" restricted="1"> + <failoverdomainnode name="mrg35"/> + </failoverdomain> + </failoverdomains> + <resources> + <script file="/etc/init.d/qpidd" name="qpidd"/> + <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/> + <ip address="20.0.10.200" monitor_link="1"/> + <ip address="20.0.20.200" monitor_link="1"/> + </resources> + + <!-- There is a qpidd service on each node, it should be restarted if it fails. --> + <service name="mrg32-qpidd-service" domain="mrg32-domain" recovery="restart"> + <script ref="qpidd"/> + </service> + <service name="mrg34-qpidd-service" domain="mrg34-domain" recovery="restart"> + <script ref="qpidd"/> + </service> + <service name="mrg35-qpidd-service" domain="mrg35-domain" recovery="restart"> + <script ref="qpidd"/> + </service> + + <!-- There should always be a single qpidd-primary service, it can run on any node. --> + <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate"> + <script ref="qpidd-primary"/> + <ip ref="20.0.10.200"/> + <ip ref="20.0.20.200"/> + </service> + </rm> + <fencedevices/> + <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> +</cluster> +]]> + </programlisting> <para> - The resource manager is responsible for ensuring that there is exactly one broker - is acting as primary at all times. It selects the initial primary broker when the - cluster is started, detects failure of the primary, and chooses the backup to - promote as the new primary. + There is a <literal>failoverdomain</literal> for each node containing just that + one node. This lets us stipulate that the qpidd service should always run on all + nodes. </para> <para> - You can simulate the actions of a resource manager, or indeed do your own - integration with a resource manager using the <command>qpid-ha</command> tool. The - command - <programlisting> - qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable> - </programlisting> - will promote the broker listening on - <replaceable>host</replaceable>:<replaceable>port</replaceable> to be the primary. - You should only promote a broker to primary when there is no other primary in the - cluster. The brokers will not detect multiple primaries, they rely on the resource - manager to do that. + The <literal>resources</literal> section defines the usual initialization script to + start the <command>qpidd</command> service. <command>qpidd</command>. It also + defines the <command>qpid-primary</command> script. Starting this script does not + actually start a new service, rather it promotes the existing + <command>qpidd</command> broker to primary status. </para> <para> - A clustered broker always starts initially in <firstterm>discovery</firstterm> - mode. It uses the addresses configured in the <command>ha-brokers</command> - configuration option and tries to connect to each in turn until it finds to the - primary. The resource manager is responsible for choosing on of the backups to - promote as the initial primary. + The <literal>resources</literal> section also defines a pair of virtual IP + addresses on different sub-nets. One will be used for broker-to-broker + communication, the other for client-to-broker. </para> <para> - If the primary fails, all the backups are disconnected and return to discovery mode. - The resource manager chooses one to promote as the new primary. The other backups - will eventually discover the new primary and reconnect. + The <literal>service</literal> section defines 3 <command>qpidd</command> services, + one for each node. Each service is in a restricted fail-over domain containing just + that node, and has the <literal>restart</literal> recovery policy. The effect of + this is that rgmanager will run <command>qpidd</command> on each node, restarting if + it fails. + </para> + <para> + There is a single <literal>qpidd-primary-service</literal> running the + <command>qpidd-primary</command> script which is not restricted to a domain and has + the <literal>relocate</literal> recovery policy. This means rgmanager will start + <command>qpidd-primary</command> on one of the nodes when the cluster starts and + will relocate it to another node if the original node fails. Running the + <literal>qpidd-primary</literal> script does not actually start a new process, + rather it promotes the existing broker to become the primary. </para> </section> + <section> <title>Broker Administration</title> <para> - You can connect to a backup broker with the administrative tool - <command>qpid-ha</command>. You can also connect with the tools - <command>qpid-config</command>, <command>qpid-route</command> and - <command>qpid-stat</command> if you pass the flag <command>--ha-admin</command> on the - command line. If you do connect to a backup you should not modify any of the - replicated queues, as this will disrupt the replication and may result in - message loss. + Normally, clients are not allowed to connect to a backup broker. However management tools are + allowed to connect to a backup brokers. If you use these tools you <emphasis>must + not</emphasis> add or remove messages from replicated queues, or delete replicated queues or + exchanges as this will corrupt the replication process and may cause message loss. + </para> + <para> + <command>qpid-ha</command> allows you to view and change HA configuration settings. + </para> + <para> + The tools <command>qpid-config</command>, <command>qpid-route</command> and + <command>qpid-stat</command> will connect to a backup if you pass the flag <command>--ha-admin</command> on the + command line. + </para> + <para> + To promote a broker to primary use the following command: + <programlisting> + qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable> + </programlisting> + The resource manager must ensure that it does not promote a broker to primary when + there is already a primary in the cluster. </para> </section> </section> -<!-- LocalWords: scalability rgmanager multicast RGManager mailto LVQ + +<!-- LocalWords: scalability rgmanager multicast RGManager mailto LVQ qpidd IP dequeued Transactional username --> |