diff options
author | Alan Conway <aconway@apache.org> | 2012-03-28 16:24:52 +0000 |
---|---|---|
committer | Alan Conway <aconway@apache.org> | 2012-03-28 16:24:52 +0000 |
commit | cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a (patch) | |
tree | a09a3060f577b006720c740bf2c733937ce5e8f8 | |
parent | ed41b499bbe7f2a51afb204799724227b8fca4f3 (diff) | |
download | qpid-python-cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a.tar.gz |
QPID-3603: Update HA documentation: example of virtual IP addresses
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1306454 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r-- | qpid/cpp/etc/cluster.conf-example.xml.in | 20 | ||||
-rw-r--r-- | qpid/doc/book/src/Active-Passive-Cluster.xml | 282 |
2 files changed, 187 insertions, 115 deletions
diff --git a/qpid/cpp/etc/cluster.conf-example.xml.in b/qpid/cpp/etc/cluster.conf-example.xml.in index dbeb3af537..eb70ebbb1e 100644 --- a/qpid/cpp/etc/cluster.conf-example.xml.in +++ b/qpid/cpp/etc/cluster.conf-example.xml.in @@ -7,14 +7,18 @@ This example assumes a 3 node cluster, with nodes named node1, node2 and node3. <cluster name="qpid-test" config_version="18"> <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for quorum. --> <clusternodes> - <clusternode name="node1" nodeid="1"/> - <clusternode name="node2" nodeid="2"/> - <clusternode name="node3" nodeid="3"/> + <clusternode name="node1" nodeid="1"> + <fence/> + </clusternode> + <clusternode name="node2" nodeid="2"> + <fence/> + </clusternode> + <clusternode name="node3" nodeid="3"> + <fence/> + </clusternode> </clusternodes> - <!-- Resouce Manager configuration. - TODO explain central_processing="1" - --> - <rm log_level="7" central_processing="1"> + <!-- Resouce Manager configuration. --> + <rm log_level="7"> <!-- Verbose logging --> <!-- There is a failoverdomain for each node containing just that node. This lets us stipulate that the qpidd service should always run on all nodes. @@ -59,7 +63,7 @@ This example assumes a 3 node cluster, with nodes named node1, node2 and node3. <!-- There should always be a single qpidd-primary service, it can run on any node. --> <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate"> <script ref="qpidd-primary"/> - <!-- The primary has the IP addresses for brokers and clients. --> + <!-- The primary has the IP addresses for brokers and clients to connect. --> <ip ref="20.0.10.200"/> <ip ref="20.0.20.200"/> </service> diff --git a/qpid/doc/book/src/Active-Passive-Cluster.xml b/qpid/doc/book/src/Active-Passive-Cluster.xml index 5ab515f235..52748b2570 100644 --- a/qpid/doc/book/src/Active-Passive-Cluster.xml +++ b/qpid/doc/book/src/Active-Passive-Cluster.xml @@ -13,7 +13,7 @@ http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +h"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. @@ -148,15 +148,17 @@ under the License. <section> <title>Virtual IP Addresses</title> <para> - Some resource managers (including <command>rgmanager</command>) support <firstterm>virtual IP - addresses</firstterm>. A virtual IP address is an IP address that can be relocated to any of - the nodes in a cluster. The resource manager associates this address with the primary node in - the cluster, and relocates it to the new primary when there is a failure. This simplifies - configuration as you can publish a single IP address rather than a list. + Some resource managers (including <command>rgmanager</command>) support + <firstterm>virtual IP addresses</firstterm>. A virtual IP address is an IP + address that can be relocated to any of the nodes in a cluster. The + resource manager associates this address with the primary node in the + cluster, and relocates it to the new primary when there is a failure. This + simplifies configuration as you can publish a single IP address rather + than a list. </para> <para> - A virtual IP address can be used by clients to connect to the primary, and also by backup - brokers when they connect to the primary. The following sections will explain how to configure + A virtual IP address can be used by clients and backup brokers to connect + to the primary. The following sections will explain how to configure virtual IP addresses for clients or brokers. </para> </section> @@ -266,42 +268,61 @@ under the License. <para> You can create replicated queues and exchanges with the <command>qpid-config</command> management tool like this: - <programlisting> - qpid-config add queue myqueue --replicate all - </programlisting> </para> + <programlisting> + qpid-config add queue myqueue --replicate all + </programlisting> <para> To create replicated queues and exchanges via the client API, add a <literal>node</literal> entry to the address like this: - <programlisting> - "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}" - </programlisting> </para> + <programlisting> + "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}" + </programlisting> </section> <section> <title>Client Connection and Fail-over</title> <para> - Clients can only connect to the primary broker. Backup brokers automatically reject any - connection attempt by a client. + Clients can only connect to the primary broker. Backup brokers + automatically reject any connection attempt by a client. </para> <para> - Clients are configured with the URL for the cluster. There are two possibilities + Clients are configured with the URL for the cluster (details below for + each type of client). There are two possibilities <itemizedlist> - <listitem> The URL contains multiple addresses, one for each broker in the cluster.</listitem> <listitem> - The URL contains a single <firstterm>virtual IP address</firstterm> that is assigned to the primary broker by the resource manager. + The URL contains multiple addresses, one for each broker in the cluster. + </listitem> + <listitem> + The URL contains a single <firstterm>virtual IP address</firstterm> + that is assigned to the primary broker by the resource manager. <footnote><para>Only if the resource manager supports virtual IP addresses</para></footnote> </listitem> </itemizedlist> - In the first case, clients will repeatedly re-try each address in the URL until they - successfully connect to the primary. In the second case the resource manager will assign the - virtual IP address to the primary broker, so clients only need to re-try on a single address. + In the first case, clients will repeatedly re-try each address in the URL + until they successfully connect to the primary. In the second case the + resource manager will assign the virtual IP address to the primary broker, + so clients only need to re-try on a single address. </para> <para> - When the primary broker fails all clients are disconnected. They go back to re-trying until - they connect to the new primary. Any messages that have been sent by the client, but not yet - acknowledged as delivered, are resent. Similarly messages that have been sent by the broker, - but not acknowledged, are re-queued. + When the primary broker fails, clients re-try all known cluster addresses + until they connect to the new primary. The client re-sends any messages + that were previously sent but not acknowledged by the broker at the time + of the failure. Similarly messages that have been sent by the broker, but + not acknowledged by the client, are re-queued. + </para> + <para> + TCP can be slow to detect connection failures. A client can configure a + connection to use a <firstterm>heartbeat</firstterm> to detect connection + failure, and can specify a time interval for the heartbeat. If heartbeats + are in use, failures will be detected no later than twice the heartbeat + interval. The following sections explain how to enable heartbeat in each + client. + </para> + <para> + See "Cluster Failover" in <citetitle>Programming in Apache + Qpid</citetitle> for details on how to keep the client aware of cluster + membership. </para> <para> Suppose your cluster has 3 nodes: <literal>node1</literal>, <literal>node2</literal> @@ -316,39 +337,57 @@ under the License. <footnote> <para> The full grammar for the URL is: - <programlisting> - url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* - addr = tcp_addr / rmda_addr / ssl_addr / ... - tcp_addr = ["tcp:"] host [":" port] - rdma_addr = "rdma:" host [":" port] - ssl_addr = "ssl:" host [":" port]' - </programlisting> </para> - </footnote>. You also - need to specify the connection option <literal>reconnect</literal> to be true. For - example: <programlisting> - qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}"); + url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* + addr = tcp_addr / rmda_addr / ssl_addr / ... + tcp_addr = ["tcp:"] host [":" port] + rdma_addr = "rdma:" host [":" port] + ssl_addr = "ssl:" host [":" port]' </programlisting> + </footnote> + You also need to specify the connection option + <literal>reconnect</literal> to be true. For example: + </para> + <programlisting> + qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}"); + </programlisting> + <para> + Heartbeats are disabled by default. You can enable them by specifying a + heartbeat interval (in seconds) for the connection via the + <literal>heartbeat</literal> option. For example: + <programlisting> + qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}"); + </programlisting> </para> </section> <section> <title>Python clients</title> <para> - With the python client, you specify <literal>reconnect=True</literal> and a list of - <replaceable>host:port</replaceable> addresses as <literal>reconnect_urls</literal> - when calling <literal>Connection.establish</literal> or <literal>Connection.open</literal> + With the python client, you specify <literal>reconnect=True</literal> + and a list of <replaceable>host:port</replaceable> addresses as + <literal>reconnect_urls</literal> when calling + <literal>Connection.establish</literal> or + <literal>Connection.open</literal> + </para> <programlisting> connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"]) </programlisting> + <para> + Heartbeats are disabled by default. You can + enable them by specifying a heartbeat interval (in seconds) for the + connection via the 'heartbeat' option. For example: </para> + <programlisting> + connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10) + </programlisting> </section> <section> <title>Java JMS Clients</title> <para> - In Java JMS clients, client fail-over is handled automatically if it is enabled in the - connection. You can configure a connection to use fail-over using the - <command>failover</command> property: + In Java JMS clients, client fail-over is handled automatically if it is + enabled in the connection. You can configure a connection to use + fail-over using the <command>failover</command> property: </para> <screen> @@ -398,33 +437,35 @@ under the License. <screen> connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3 </screen> - </section> </section> <section> <title>The Cluster Resource Manager</title> <para> - Broker fail-over is managed by a <firstterm>cluster resource manager</firstterm>. An - integration with <ulink - url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is provided, but it is - possible to integrate with other resource managers. + Broker fail-over is managed by a <firstterm>cluster resource + manager</firstterm>. An integration with <ulink + url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is + provided, but it is possible to integrate with other resource managers. </para> <para> - The resource manager is responsible for starting an appropriately-configured broker on each - node in the cluster. The resource manager then <firstterm>promotes</firstterm> one of the - brokers to be the primary. The other brokers connect to the primary as backups, using the URL - provided in the <literal>ha-brokers</literal> configuration option. + The resource manager is responsible for starting a on each node in the + cluster. The resource manager then <firstterm>promotes</firstterm> one of + the brokers to be the primary. The other brokers connect to the primary as + backups, using the URL provided in the <literal>ha-brokers</literal> + configuration option. </para> <para> - Once connected, the backup brokers synchronize their state with the primary. When a backup is - synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers - continually receive updates from the primary in order to stay synchronized. + Once connected, the backup brokers synchronize their state with the + primary. When a backup is synchronized, or "hot", it is ready to take + over if the primary fails. Backup brokers continually receive updates + from the primary in order to stay synchronized. </para> <para> - If the primary fails, backup brokers go into fail-over mode. The resource manager must detect - the failure and promote one of the backups to be the new primary. The other backups connect - to the new primary and synchronize their state so they can be backups for it. + If the primary fails, backup brokers go into fail-over mode. The resource + manager must detect the failure and promote one of the backups to be the + new primary. The other backups connect to the new primary and synchronize + their state so they can be backups for it. </para> <para> The resource manager is also responsible for protecting the cluster from @@ -437,65 +478,84 @@ under the License. <section> <title>Configuring <command>rgmanager</command> as resource manager</title> <para> - This section assumes that you are already familiar with setting up and configuring - clustered services using <command>cman</command> and <command>rgmanager</command>. It - will show you how to configure an active-passive, hot-standby <command>qpidd</command> - HA cluster. + This section assumes that you are already familiar with setting up and + configuring clustered services using <command>cman</command> and + <command>rgmanager</command>. It will show you how to configure an + active-passive, hot-standby <command>qpidd</command> HA cluster. </para> <para> - Here is an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named - mrg32, mrg34 and mrg35. We will go through the configuration step-by-step. + Here is an example <literal>cluster.conf</literal> file for a cluster of 3 + nodes named node1, node2 and node3. We will go through the configuration + step-by-step. </para> <programlisting> -<![CDATA[ + <![CDATA[ <?xml version="1.0"?> -<cluster alias="qpid-hot-standby" config_version="4" name="qpid-hot-standby"> +<!-- +This is an example of a cluster.conf file to run qpidd HA under rgmanager. +This example assumes a 3 node cluster, with nodes named node1, node2 and node3. +--> + +<cluster name="qpid-test" config_version="18"> + <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for quorum. --> <clusternodes> - <clusternode name="mrg32" nodeid="1"> + <clusternode name="node1" nodeid="1"> <fence/> </clusternode> - <clusternode name="mrg34" nodeid="2"> + <clusternode name="node2" nodeid="2"> <fence/> </clusternode> - <clusternode name="mrg35" nodeid="3"> + <clusternode name="node3" nodeid="3"> <fence/> </clusternode> </clusternodes> - <cman/> - <rm log_level="7" <!-- Verbose logging --> - central_processing="1"> <!-- TODO explain--> + <!-- Resouce Manager configuration. --> + <rm log_level="7"> <!-- Verbose logging --> + <!-- + There is a failoverdomain for each node containing just that node. + This lets us stipulate that the qpidd service should always run on all nodes. + --> <failoverdomains> - <failoverdomain name="mrg32-domain" restricted="1"> - <failoverdomainnode name="mrg32"/> + <failoverdomain name="node1-domain" restricted="1"> + <failoverdomainnode name="node1"/> </failoverdomain> - <failoverdomain name="mrg34-domain" restricted="1"> - <failoverdomainnode name="mrg34"/> + <failoverdomain name="node2-domain" restricted="1"> + <failoverdomainnode name="node2"/> </failoverdomain> - <failoverdomain name="mrg35-domain" restricted="1"> - <failoverdomainnode name="mrg35"/> + <failoverdomain name="node3-domain" restricted="1"> + <failoverdomainnode name="node3"/> </failoverdomain> </failoverdomains> + <resources> - <script file="/etc/init.d/qpidd" name="qpidd"/> - <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/> + <!-- This script starts a qpidd broker acting as a backup. --> + <script file="!!sysconfdir!!/init.d/qpidd" name="qpidd"/> + + <!-- This script promotes the qpidd broker on this node to primary. --> + <script file="!!sysconfdir!!/init.d/qpidd-primary" name="qpidd-primary"/> + + <!-- This is a virtual IP address for broker replication traffic. --> <ip address="20.0.10.200" monitor_link="1"/> + + <!-- This is a virtual IP address on a seprate network for client traffic. --> <ip address="20.0.20.200" monitor_link="1"/> </resources> <!-- There is a qpidd service on each node, it should be restarted if it fails. --> - <service name="mrg32-qpidd-service" domain="mrg32-domain" recovery="restart"> + <service name="node1-qpidd-service" domain="node1-domain" recovery="restart"> <script ref="qpidd"/> </service> - <service name="mrg34-qpidd-service" domain="mrg34-domain" recovery="restart"> + <service name="node2-qpidd-service" domain="node2-domain" recovery="restart"> <script ref="qpidd"/> </service> - <service name="mrg35-qpidd-service" domain="mrg35-domain" recovery="restart"> + <service name="node3-qpidd-service" domain="node3-domain" recovery="restart"> <script ref="qpidd"/> </service> <!-- There should always be a single qpidd-primary service, it can run on any node. --> <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate"> <script ref="qpidd-primary"/> + <!-- The primary has the IP addresses for brokers and clients to connect. --> <ip ref="20.0.10.200"/> <ip ref="20.0.20.200"/> </service> @@ -503,7 +563,7 @@ under the License. <fencedevices/> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> </cluster> -]]> + ]]> </programlisting> <para> There is a <literal>failoverdomain</literal> for each node containing just that @@ -511,32 +571,48 @@ under the License. nodes. </para> <para> - The <literal>resources</literal> section defines the usual initialization script to - start the <command>qpidd</command> service. <command>qpidd</command>. It also - defines the <command>qpid-primary</command> script. Starting this script does not + The <literal>resources</literal> section defines the usual initialization + script to start the <command>qpidd</command> service. + <command>qpidd</command>. It also defines the + <command>qpid-primary</command> script. Starting this script does not actually start a new service, rather it promotes the existing <command>qpidd</command> broker to primary status. </para> <para> The <literal>resources</literal> section also defines a pair of virtual IP addresses on different sub-nets. One will be used for broker-to-broker - communication, the other for client-to-broker. + communication, the other for client-to-broker. </para> <para> - The <literal>service</literal> section defines 3 <command>qpidd</command> services, - one for each node. Each service is in a restricted fail-over domain containing just - that node, and has the <literal>restart</literal> recovery policy. The effect of - this is that rgmanager will run <command>qpidd</command> on each node, restarting if - it fails. + To take advantage of the virtual IP addresses, <filename>qpidd.conf</filename> + should contain these lines: + </para> + <programlisting> + ha-cluster=yes + ha-brokers=20.0.20.200 + ha-public-brokers=20.0.10.200 + </programlisting> + <para> + This configuration specifies that backup brokers will use 20.0.20.200 + to connect to the primary and will advertise 20.0.10.200 to clients. + Clients should connect to 20.0.10.200. + </para> + <para> + The <literal>service</literal> section defines 3 <command>qpidd</command> + services, one for each node. Each service is in a restricted fail-over + domain containing just that node, and has the <literal>restart</literal> + recovery policy. The effect of this is that rgmanager will run + <command>qpidd</command> on each node, restarting if it fails. </para> <para> There is a single <literal>qpidd-primary-service</literal> running the - <command>qpidd-primary</command> script which is not restricted to a domain and has - the <literal>relocate</literal> recovery policy. This means rgmanager will start - <command>qpidd-primary</command> on one of the nodes when the cluster starts and - will relocate it to another node if the original node fails. Running the - <literal>qpidd-primary</literal> script does not actually start a new process, - rather it promotes the existing broker to become the primary. + <command>qpidd-primary</command> script which is not restricted to a + domain and has the <literal>relocate</literal> recovery policy. This means + rgmanager will start <command>qpidd-primary</command> on one of the nodes + when the cluster starts and will relocate it to another node if the + original node fails. Running the <literal>qpidd-primary</literal> script + does not start a new broker process, it promotes the existing broker to + become the primary. </para> </section> @@ -556,14 +632,6 @@ under the License. <command>qpid-stat</command> will connect to a backup if you pass the flag <command>--ha-admin</command> on the command line. </para> - <para> - To promote a broker to primary use the following command: - <programlisting> - qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable> - </programlisting> - The resource manager must ensure that it does not promote a broker to primary when - there is already a primary in the cluster. - </para> </section> </section> |