QPID-3603: Update HA documentation: example of virtual IP addresses

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1306454 13f79535-47bb-0310-9956-ffa450edef68
author: Alan Conway <aconway@apache.org> 2012-03-28 16:24:52 +0000
committer: Alan Conway <aconway@apache.org> 2012-03-28 16:24:52 +0000
commit: cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a (patch)
tree: a09a3060f577b006720c740bf2c733937ce5e8f8
parent: ed41b499bbe7f2a51afb204799724227b8fca4f3 (diff)
download: qpid-python-cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a.tar.gz
2 files changed, 187 insertions, 115 deletions
diff --git a/qpid/cpp/etc/cluster.conf-example.xml.in b/qpid/cpp/etc/cluster.conf-example.xml.in
index dbeb3af537..eb70ebbb1e 100644
--- a/qpid/cpp/etc/cluster.conf-example.xml.in
+++ b/qpid/cpp/etc/cluster.conf-example.xml.in
@@ -7,14 +7,18 @@ This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
 <cluster name="qpid-test" config_version="18">
   <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for quorum. -->
   <clusternodes>
-    <clusternode name="node1" nodeid="1"/>
-    <clusternode name="node2" nodeid="2"/>
-    <clusternode name="node3" nodeid="3"/>
+    <clusternode name="node1" nodeid="1">
+      <fence/>
+    </clusternode>
+    <clusternode name="node2" nodeid="2">
+      <fence/>
+    </clusternode>
+    <clusternode name="node3" nodeid="3">
+      <fence/>
+    </clusternode>
   </clusternodes>
-  <!-- Resouce Manager configuration.
-       TODO explain central_processing="1"
-  -->
-  <rm log_level="7" central_processing="1">
+  <!-- Resouce Manager configuration. -->
+  <rm log_level="7">		<!-- Verbose logging -->
     <!--
 	There is a failoverdomain for each node containing just that node.
 	This lets us stipulate that the qpidd service should always run on all nodes.
@@ -59,7 +63,7 @@ This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
     <!-- There should always be a single qpidd-primary service, it can run on any node. -->
     <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
       <script ref="qpidd-primary"/>
-      <!-- The primary has the IP addresses for brokers and clients. -->
+      <!-- The primary has the IP addresses for brokers and clients to connect. -->
       <ip ref="20.0.10.200"/>
       <ip ref="20.0.20.200"/>
     </service>
diff --git a/qpid/doc/book/src/Active-Passive-Cluster.xml b/qpid/doc/book/src/Active-Passive-Cluster.xml
index 5ab515f235..52748b2570 100644
--- a/qpid/doc/book/src/Active-Passive-Cluster.xml
+++ b/qpid/doc/book/src/Active-Passive-Cluster.xml
@@ -13,7 +13,7 @@ http://www.apache.org/licenses/LICENSE-2.0
 
 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+h"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
@@ -148,15 +148,17 @@ under the License.
   <section>
     <title>Virtual IP Addresses</title>
     <para>
-      Some resource managers (including <command>rgmanager</command>) support <firstterm>virtual IP
-      addresses</firstterm>. A virtual IP address is an IP address that can be relocated to any of
-      the nodes in a cluster.  The resource manager associates this address with the primary node in
-      the cluster, and relocates it to the new primary when there is a failure. This simplifies
-      configuration as you can publish a single IP address rather than a list.
+      Some resource managers (including <command>rgmanager</command>) support
+      <firstterm>virtual IP addresses</firstterm>. A virtual IP address is an IP
+      address that can be relocated to any of the nodes in a cluster.  The
+      resource manager associates this address with the primary node in the
+      cluster, and relocates it to the new primary when there is a failure. This
+      simplifies configuration as you can publish a single IP address rather
+      than a list.
     </para>
     <para>
-      A virtual IP address can be used by clients to connect to the primary, and also by backup
-      brokers when they connect to the primary. The following sections will explain how to configure
+      A virtual IP address can be used by clients and backup brokers to connect
+      to the primary. The following sections will explain how to configure
       virtual IP addresses for clients or brokers.
     </para>
   </section>
@@ -266,42 +268,61 @@ under the License.
     <para>
       You can create replicated queues and exchanges with the <command>qpid-config</command>
       management tool like this:
-      <programlisting>
-	qpid-config add queue myqueue --replicate all
-      </programlisting>
     </para>
+    <programlisting>
+      qpid-config add queue myqueue --replicate all
+    </programlisting>
     <para>
       To create replicated queues and exchanges via the client API, add a <literal>node</literal> entry to the address like this:
-      <programlisting>
-	"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
-      </programlisting>
     </para>
+    <programlisting>
+      "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
+    </programlisting>
   </section>
 
   <section>
     <title>Client Connection and Fail-over</title>
     <para>
-      Clients can only connect to the primary broker. Backup brokers automatically reject any
-      connection attempt by a client.
+      Clients can only connect to the primary broker. Backup brokers
+      automatically reject any connection attempt by a client.
     </para>
     <para>
-      Clients are configured with the URL for the cluster. There are two possibilities
+      Clients are configured with the URL for the cluster (details below for
+      each type of client). There are two possibilities
       <itemizedlist>
-	<listitem> The URL contains multiple addresses, one for each broker in the cluster.</listitem>
 	<listitem>
-	  The URL contains a single <firstterm>virtual IP address</firstterm> that is assigned to the primary broker by the resource manager.
+	  The URL contains multiple addresses, one for each broker in the cluster.
+	</listitem>
+	<listitem>
+	  The URL contains a single <firstterm>virtual IP address</firstterm>
+	  that is assigned to the primary broker by the resource manager.
 	  <footnote><para>Only if the resource manager supports virtual IP addresses</para></footnote>
 	</listitem>
       </itemizedlist>
-      In the first case, clients will repeatedly re-try each address in the URL until they
-      successfully connect to the primary. In the second case the resource manager will assign the
-      virtual IP address to the primary broker, so clients only need to re-try on a single address.
+      In the first case, clients will repeatedly re-try each address in the URL
+      until they successfully connect to the primary. In the second case the
+      resource manager will assign the virtual IP address to the primary broker,
+      so clients only need to re-try on a single address.
     </para>
     <para>
-      When the primary broker fails all clients are disconnected. They go back to re-trying until
-      they connect to the new primary.  Any messages that have been sent by the client, but not yet
-      acknowledged as delivered, are resent. Similarly messages that have been sent by the broker,
-      but not acknowledged, are re-queued.
+      When the primary broker fails, clients re-try all known cluster addresses
+      until they connect to the new primary.  The client re-sends any messages
+      that were previously sent but not acknowledged by the broker at the time
+      of the failure.  Similarly messages that have been sent by the broker, but
+      not acknowledged by the client, are re-queued.
+    </para>
+    <para>
+      TCP can be slow to detect connection failures. A client can configure a
+      connection to use a <firstterm>heartbeat</firstterm> to detect connection
+      failure, and can specify a time interval for the heartbeat. If heartbeats
+      are in use, failures will be detected no later than twice the heartbeat
+      interval. The following sections explain how to enable heartbeat in each
+      client.
+    </para>
+    <para>
+      See &#34;Cluster Failover&#34; in <citetitle>Programming in Apache
+      Qpid</citetitle> for details on how to keep the client aware of cluster
+      membership.
     </para>
     <para>
       Suppose your cluster has 3 nodes: <literal>node1</literal>, <literal>node2</literal>
@@ -316,39 +337,57 @@ under the License.
 	<footnote>
 	  <para>
 	    The full grammar for the URL is:
-	    <programlisting>
-	      url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
-	      addr = tcp_addr / rmda_addr / ssl_addr / ...
-	      tcp_addr = ["tcp:"] host [":" port]
-	      rdma_addr = "rdma:" host [":" port]
-	      ssl_addr = "ssl:" host [":" port]'
-	    </programlisting>
 	  </para>
-	  </footnote>. You also
-	  need to specify the connection option <literal>reconnect</literal> to be true. For
-	  example:
 	  <programlisting>
-	    qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+	    url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
+	    addr = tcp_addr / rmda_addr / ssl_addr / ...
+	    tcp_addr = ["tcp:"] host [":" port]
+	    rdma_addr = "rdma:" host [":" port]
+	    ssl_addr = "ssl:" host [":" port]'
 	  </programlisting>
+	</footnote>
+	You also need to specify the connection option
+	<literal>reconnect</literal> to be true.  For example:
+      </para>
+      <programlisting>
+	qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
+      </programlisting>
+      <para>
+	Heartbeats are disabled by default. You can enable them by specifying a
+	heartbeat interval (in seconds) for the connection via the
+	<literal>heartbeat</literal> option. For example:
+	<programlisting>
+	  qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
+	</programlisting>
       </para>
     </section>
     <section>
       <title>Python clients</title>
       <para>
-      With the python client, you specify <literal>reconnect=True</literal> and a list of
-      <replaceable>host:port</replaceable> addresses as <literal>reconnect_urls</literal>
-      when calling <literal>Connection.establish</literal> or <literal>Connection.open</literal>
+	With the python client, you specify <literal>reconnect=True</literal>
+	and a list of <replaceable>host:port</replaceable> addresses as
+	<literal>reconnect_urls</literal> when calling
+	<literal>Connection.establish</literal> or
+	<literal>Connection.open</literal>
+      </para>
       <programlisting>
 	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
       </programlisting>
+      <para>
+	Heartbeats are disabled by default. You can
+	enable them by specifying a heartbeat interval (in seconds) for the
+	connection via the &#39;heartbeat&#39; option. For example:
       </para>
+      <programlisting>
+	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
+      </programlisting>
     </section>
     <section>
       <title>Java JMS Clients</title>
       <para>
-	In Java JMS clients, client fail-over is handled automatically if it is enabled in the
-	connection.  You can configure a connection to use fail-over using the
-	<command>failover</command> property:
+	In Java JMS clients, client fail-over is handled automatically if it is
+	enabled in the connection.  You can configure a connection to use
+	fail-over using the <command>failover</command> property:
       </para>
 
       <screen>
@@ -398,33 +437,35 @@ under the License.
       <screen>
 	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist=&#39;tcp://localhost:5672&#39;,idle_timeout=3
       </screen>
-
     </section>
   </section>
 
   <section>
     <title>The Cluster Resource Manager</title>
     <para>
-      Broker fail-over is managed by a <firstterm>cluster resource manager</firstterm>.  An
-      integration with <ulink
-      url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is provided, but it is
-      possible to integrate with other resource managers.
+      Broker fail-over is managed by a <firstterm>cluster resource
+      manager</firstterm>.  An integration with <ulink
+      url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is
+      provided, but it is possible to integrate with other resource managers.
     </para>
     <para>
-      The resource manager is responsible for starting an appropriately-configured broker on each
-      node in the cluster.  The resource manager then <firstterm>promotes</firstterm> one of the
-      brokers to be the primary. The other brokers connect to the primary as backups, using the URL
-      provided in the <literal>ha-brokers</literal> configuration option.
+      The resource manager is responsible for starting a on each node in the
+      cluster.  The resource manager then <firstterm>promotes</firstterm> one of
+      the brokers to be the primary. The other brokers connect to the primary as
+      backups, using the URL provided in the <literal>ha-brokers</literal>
+      configuration option.
     </para>
     <para>
-      Once connected, the backup brokers synchronize their state with the primary.  When a backup is
-      synchronized, or "hot", it is ready to take over if the primary fails.  Backup brokers
-      continually receive updates from the primary in order to stay synchronized.
+      Once connected, the backup brokers synchronize their state with the
+      primary.  When a backup is synchronized, or "hot", it is ready to take
+      over if the primary fails.  Backup brokers continually receive updates
+      from the primary in order to stay synchronized.
     </para>
     <para>
-      If the primary fails, backup brokers go into fail-over mode. The resource manager must detect
-      the failure and promote one of the backups to be the new primary.  The other backups connect
-      to the new primary and synchronize their state so they can be backups for it.
+      If the primary fails, backup brokers go into fail-over mode. The resource
+      manager must detect the failure and promote one of the backups to be the
+      new primary.  The other backups connect to the new primary and synchronize
+      their state so they can be backups for it.
     </para>
     <para>
       The resource manager is also responsible for protecting the cluster from
@@ -437,65 +478,84 @@ under the License.
   <section>
     <title>Configuring <command>rgmanager</command> as resource manager</title>
     <para>
-      This section assumes that you are already familiar with setting up and configuring
-      clustered services using <command>cman</command> and <command>rgmanager</command>. It
-      will show you how to configure an active-passive, hot-standby <command>qpidd</command>
-      HA cluster.
+      This section assumes that you are already familiar with setting up and
+      configuring clustered services using <command>cman</command> and
+      <command>rgmanager</command>. It will show you how to configure an
+      active-passive, hot-standby <command>qpidd</command> HA cluster.
     </para>
     <para>
-      Here is an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named
-      mrg32, mrg34 and mrg35. We will go through the configuration step-by-step.
+      Here is an example <literal>cluster.conf</literal> file for a cluster of 3
+      nodes named node1, node2 and node3. We will go through the configuration
+      step-by-step.
     </para>
     <programlisting>
-<![CDATA[
+      <![CDATA[
 <?xml version="1.0"?>
-<cluster alias="qpid-hot-standby" config_version="4" name="qpid-hot-standby">
+<!--
+This is an example of a cluster.conf file to run qpidd HA under rgmanager.
+This example assumes a 3 node cluster, with nodes named node1, node2 and node3.
+-->
+
+<cluster name="qpid-test" config_version="18">
+  <!-- The cluster has 3 nodes. Each has a unique nodid and one vote for quorum. -->
   <clusternodes>
-    <clusternode name="mrg32" nodeid="1">
+    <clusternode name="node1" nodeid="1">
       <fence/>
     </clusternode>
-    <clusternode name="mrg34" nodeid="2">
+    <clusternode name="node2" nodeid="2">
       <fence/>
     </clusternode>
-    <clusternode name="mrg35" nodeid="3">
+    <clusternode name="node3" nodeid="3">
       <fence/>
     </clusternode>
   </clusternodes>
-  <cman/>
-  <rm log_level="7"		<!-- Verbose logging -->
-      central_processing="1"> 	<!-- TODO explain-->
+  <!-- Resouce Manager configuration. -->
+  <rm log_level="7">		<!-- Verbose logging -->
+    <!--
+	There is a failoverdomain for each node containing just that node.
+	This lets us stipulate that the qpidd service should always run on all nodes.
+    -->
     <failoverdomains>
-      <failoverdomain name="mrg32-domain" restricted="1">
-	<failoverdomainnode name="mrg32"/>
+      <failoverdomain name="node1-domain" restricted="1">
+	<failoverdomainnode name="node1"/>
       </failoverdomain>
-      <failoverdomain name="mrg34-domain" restricted="1">
-	<failoverdomainnode name="mrg34"/>
+      <failoverdomain name="node2-domain" restricted="1">
+	<failoverdomainnode name="node2"/>
       </failoverdomain>
-      <failoverdomain name="mrg35-domain" restricted="1">
-	<failoverdomainnode name="mrg35"/>
+      <failoverdomain name="node3-domain" restricted="1">
+	<failoverdomainnode name="node3"/>
       </failoverdomain>
     </failoverdomains>
+
     <resources>
-      <script file="/etc/init.d/qpidd" name="qpidd"/>
-      <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>
+      <!-- This script starts a qpidd broker acting as a backup. -->
+      <script file="!!sysconfdir!!/init.d/qpidd" name="qpidd"/>
+
+      <!-- This script promotes the qpidd broker on this node to primary. -->
+      <script file="!!sysconfdir!!/init.d/qpidd-primary" name="qpidd-primary"/>
+
+      <!-- This is a virtual IP address for broker replication traffic. -->
       <ip address="20.0.10.200" monitor_link="1"/>
+
+      <!-- This is a virtual IP address on a seprate network for client traffic. -->
       <ip address="20.0.20.200" monitor_link="1"/>
     </resources>
 
     <!-- There is a qpidd service on each node, it should be restarted if it fails. -->
-    <service name="mrg32-qpidd-service" domain="mrg32-domain" recovery="restart">
+    <service name="node1-qpidd-service" domain="node1-domain" recovery="restart">
       <script ref="qpidd"/>
     </service>
-    <service name="mrg34-qpidd-service" domain="mrg34-domain" recovery="restart">
+    <service name="node2-qpidd-service" domain="node2-domain" recovery="restart">
       <script ref="qpidd"/>
     </service>
-    <service name="mrg35-qpidd-service" domain="mrg35-domain"  recovery="restart">
+    <service name="node3-qpidd-service" domain="node3-domain"  recovery="restart">
       <script ref="qpidd"/>
     </service>
 
     <!-- There should always be a single qpidd-primary service, it can run on any node. -->
     <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
       <script ref="qpidd-primary"/>
+      <!-- The primary has the IP addresses for brokers and clients to connect. -->
       <ip ref="20.0.10.200"/>
       <ip ref="20.0.20.200"/>
     </service>
@@ -503,7 +563,7 @@ under the License.
   <fencedevices/>
   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
 </cluster>
-]]>
+      ]]>
     </programlisting>
     <para>
       There is a <literal>failoverdomain</literal> for each node containing just that
@@ -511,32 +571,48 @@ under the License.
       nodes.
     </para>
     <para>
-      The <literal>resources</literal> section defines the usual initialization script to
-      start the <command>qpidd</command> service.  <command>qpidd</command>. It also
-      defines the <command>qpid-primary</command> script. Starting this script does not
+      The <literal>resources</literal> section defines the usual initialization
+      script to start the <command>qpidd</command> service.
+      <command>qpidd</command>. It also defines the
+      <command>qpid-primary</command> script. Starting this script does not
       actually start a new service, rather it promotes the existing
       <command>qpidd</command> broker to primary status.
     </para>
     <para>
       The <literal>resources</literal> section also defines a pair of virtual IP
       addresses on different sub-nets. One will be used for broker-to-broker
-      communication, the other for client-to-broker.
+      communication, the other for client-to-broker. 
     </para>
     <para>
-      The <literal>service</literal> section defines 3 <command>qpidd</command> services,
-      one for each node. Each service is in a restricted fail-over domain containing just
-      that node, and has the <literal>restart</literal> recovery policy. The effect of
-      this is that rgmanager will run <command>qpidd</command> on each node, restarting if
-      it fails.
+      To take advantage of the virtual IP addresses, <filename>qpidd.conf</filename>
+      should contain these  lines:
+    </para>
+    <programlisting>
+      ha-cluster=yes
+      ha-brokers=20.0.20.200
+      ha-public-brokers=20.0.10.200
+    </programlisting>
+    <para>
+      This configuration specifies that backup brokers will use 20.0.20.200
+      to connect to the primary and will advertise 20.0.10.200 to clients.
+      Clients should connect to 20.0.10.200.
+    </para>
+    <para>
+      The <literal>service</literal> section defines 3 <command>qpidd</command>
+      services, one for each node. Each service is in a restricted fail-over
+      domain containing just that node, and has the <literal>restart</literal>
+      recovery policy. The effect of this is that rgmanager will run
+      <command>qpidd</command> on each node, restarting if it fails.
     </para>
     <para>
       There is a single <literal>qpidd-primary-service</literal> running the
-      <command>qpidd-primary</command> script which is not restricted to a domain and has
-      the <literal>relocate</literal> recovery policy. This means rgmanager will start
-      <command>qpidd-primary</command> on one of the nodes when the cluster starts and
-      will relocate it to another node if the original node fails. Running the
-      <literal>qpidd-primary</literal> script does not actually start a new process,
-      rather it promotes the existing broker to become the primary.
+      <command>qpidd-primary</command> script which is not restricted to a
+      domain and has the <literal>relocate</literal> recovery policy. This means
+      rgmanager will start <command>qpidd-primary</command> on one of the nodes
+      when the cluster starts and will relocate it to another node if the
+      original node fails. Running the <literal>qpidd-primary</literal> script
+      does not start a new broker process, it promotes the existing broker to
+      become the primary.
     </para>
   </section>
 
@@ -556,14 +632,6 @@ under the License.
       <command>qpid-stat</command> will connect to a backup if you pass the flag <command>--ha-admin</command> on the
       command line.
     </para>
-    <para>
-      To promote a broker to primary use the following command:
-      <programlisting>
-	qpid-ha promote -b <replaceable>host</replaceable>:<replaceable>port</replaceable>
-      </programlisting>
-      The resource manager must ensure that it does not promote a broker to primary when
-      there is already a primary in the cluster.
-    </para>
   </section>
 </section>
author	Alan Conway <aconway@apache.org>	2012-03-28 16:24:52 +0000
committer	Alan Conway <aconway@apache.org>	2012-03-28 16:24:52 +0000
commit	cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a (patch)
tree	a09a3060f577b006720c740bf2c733937ce5e8f8
parent	ed41b499bbe7f2a51afb204799724227b8fca4f3 (diff)
download	qpid-python-cb0c0dbd13a27b535623ca0fdd6ef59f6f13622a.tar.gz