NO-JIRA: Updates to HA documentation.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1351501 13f79535-47bb-0310-9956-ffa450edef68
author: Alan Conway <aconway@apache.org> 2012-06-18 21:45:36 +0000
committer: Alan Conway <aconway@apache.org> 2012-06-18 21:45:36 +0000
commit: 5fbacc774744500e604d58d8904e1c3f8f09578a (patch)
tree: 1bd9a444309cc509e3e144e43f43e13ed54d85a6
parent: c45ee73853cb7c84bb2a7dd0c7f9fdecd7aa9286 (diff)
download: qpid-python-5fbacc774744500e604d58d8904e1c3f8f09578a.tar.gz
1 files changed, 64 insertions, 57 deletions
diff --git a/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml b/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
index b2d82ad1f6..d00464c92c 100644
--- a/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
+++ b/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
@@ -22,15 +22,13 @@ under the License.
 
 <section id="chap-Messaging_User_Guide-Active_Passive_Cluster">
 
-  <title>Active-passive Messaging Clusters (Preview)</title>
+  <title>Active-passive Messaging Clusters</title>
 
   <section>
     <title>Overview</title>
     <para>
-      This release provides a preview of a new module for High Availability (HA). The new module is
-      not yet complete or ready for production use. It being made available so that users can
-      experiment with the new approach and provide feedback early in the development process.
-      Feedback should go to <ulink url="mailto:user@qpid.apache.org">dev@qpid.apache.org</ulink>.
+      This release provides a preview of a new module for High Availability (HA).
+      This module is intended to eventually replace the existing cluster module.
     </para>
     <para>
       The old cluster module takes an <firstterm>active-active</firstterm> approach, i.e. all the
@@ -45,13 +43,13 @@ under the License.
       promoted to take over as the new primary. Clients fail-over to the new primary
       automatically. If there are multiple backups, the backups also fail-over to become backups of
       the new primary.  Backup brokers reject connection attempts, to enforce the requirement that
-      only the primary be active.
+      only the primary be active. Clients fail-over till the successfully connect to the primary broker.
     </para>
     <para>
-      This approach depends on an external <firstterm>cluster resource manager</firstterm> to detect
-      failures and choose the primary. <ulink
-      url="https://fedorahosted.org/cluster/wiki/RGManager">Rgmanager</ulink> is supported
-      initially, but others may be supported in the future.
+      This approach requires on an external <firstterm>cluster resource
+      manager</firstterm> to detect failures and choose the new primary. <ulink
+      url="https://fedorahosted.org/cluster/wiki/RGManager">Rgmanager</ulink> is
+      supported initially, but others may be supported in the future.
     </para>
     <section>
       <title>Why the new approach?</title>
@@ -77,19 +75,17 @@ under the License.
 	    virtual IP addresses.
 	  </listitem>
 	  <listitem>
-	    Improved performance and scalability due to better use of multiple CPU s
+	    Improved performance and scalability due to better use of multiple CPUs
 	  </listitem>
 	</itemizedlist>
       </para>
     </section>
     <section>
       <title>Limitations</title>
-
       <para>
 	There are a number of known limitations in the current preview implementation. These
 	will be fixed in the production versions.
       </para>
-
       <itemizedlist>
 	<listitem>
 	  Transactional changes to queue state are not replicated atomically. If the primary crashes
@@ -97,23 +93,11 @@ under the License.
 	  changes introduced by a transaction.
 	</listitem>
 	<listitem>
-	  During a fail-over one backup is promoted to primary and any other backups switch to
-	  the new primary. Messages sent to the new primary before all the backups have
-	  switched could be lost if the new primary itself fails before all the backups have
-	  switched.
-	</listitem>
-	<listitem>
-	  Acknowledgments are confirmed to clients before the message has been dequeued
-	  from replicas or indeed from the local store if that is asynchronous.
-	</listitem>
-	<listitem>
-	  When used with a persistent store: if the entire cluster fails, there are no tools to help
-	  identify the most recent store.
-	</listitem>
-	<listitem>
-	  A persistent broker must have its store erased before joining an existing cluster.
-	  In the production version a persistent broker will be able to load its store and
-	  avoid downloading messages that are in the store from the primary.
+	  Not yet integrated with the persistent store.  A persistent broker must have its
+	  store erased before joining an existing cluster.  If the entire cluster fails,
+	  there are no tools to help identify the most recent store. In the future a
+	  persistent broker will be able to use its stored messages to avoid downloading
+	  messages from the primary when joining a cluster.
 	</listitem>
 	<listitem>
 	  Configuration changes (creating or deleting queues, exchanges and bindings) are
@@ -127,20 +111,9 @@ under the License.
 	  re-appear if that backup is promoted to primary on a subsequent failure.
 	</listitem>
 	<listitem>
-	  Better control is needed over which queues/exchanges are replicated and which are not.
-	</listitem>
-	<listitem>
-	  There are some known issues affecting performance, both the throughput of
-	  replication and the time taken for backups to fail-over. Performance will improve
-	  in the production version.
-	</listitem>
-	<listitem>
 	  Federated links from the primary will be lost in fail over, they will not be
 	  re-connected on the new primary. Federation links to the primary can fail over.
 	</listitem>
-	<listitem>
-	  Only plain FIFO queues can be replicated. LVQ and ring queues are not yet supported.
-	</listitem>
       </itemizedlist>
     </section>
   </section>
@@ -196,7 +169,7 @@ under the License.
 	    </entry>
 	    <entry>
 	      <para>
-		A URL listing each broker in the cluster.
+		The URL
 		<footnote>
 		  <para>
 		  The full format of the URL is given by this grammar:
@@ -209,9 +182,10 @@ under the License.
 		  </programlisting>
 		  </para>
 		</footnote>
-		It is used by brokers to connect to each other. This URL must explicitly
-		list each broker, it cannot be a virtual IP address.
-		For example <literal>amqp:node1.exaple.com,node2.exaple.com,node3.exaple.com</literal>
+		used by cluster brokers to connect to each other. The URL can
+		contain a list of all the brokers' addresses or it can contain a single
+		virtual IP address.  If a list is used it is comma separated, for example
+		<literal>amqp:node1.exaple.com,node2.exaple.com,node3.exaple.com</literal>
 	      </para>
 	    </entry>
 	  </row>
@@ -219,18 +193,13 @@ under the License.
 	    <entry><literal>--ha-public-url <replaceable>URL</replaceable></literal> </entry>
 	    <entry>
 	      <para>
-		The URL that is advertized to clients. This has the same
-		format as the <literal>--ha-brokers-url</literal> URL above.
+		The URL that is advertized to clients. This defaults to the
+		<literal>--ha-brokers-url</literal> URL above, and has the same format.  A
+		virtual IP address is recommended for the public URL as it simplifies
+		deployment and hides changes to the cluster membership from clients.
 	      </para>
 	      <para>
-		This URL can contain a list of all the brokers'
-		addresses or a single virtual IP address.  A virtual
-		IP address is recommended as it simplifies deployment
-		and hides changes to the cluster membership from
-		clients.
-	      </para>
-	      <para>
-		You can use this option to put client traffic on a different network from
+		This option allows you to put client traffic on a different network from
 		broker traffic, which is recommended.
 	      </para>
 	    </entry>
@@ -274,7 +243,7 @@ under the License.
     </para>
     <para>
       The resource manager is responsible for starting the <command>qpidd</command> broker
-      on each node in the cluster. The resource manager <firstterm>promotes</firstterm>
+      on each node in the cluster. The resource manager then <firstterm>promotes</firstterm>
       one of the brokers to be the primary. The other brokers connect to the primary as
       backups, using the URL provided in the <literal>ha-brokers-url</literal> configuration
       option.
@@ -463,7 +432,8 @@ NOTE: fencing is not shown, you must configure fencing appropriately for your cl
       option. It has one of the following values:
       <itemizedlist>
 	<listitem>
-	  <firstterm>all</firstterm>: Replicate everything automatically: queues, exchanges, bindings and messages.
+	  <firstterm>all</firstterm>: Replicate everything automatically: queues,
+	  exchanges, bindings and messages.
 	</listitem>
 	<listitem>
 	  <firstterm>configuration</firstterm>: Replicate the existence of queues,
@@ -659,6 +629,43 @@ NOTE: fencing is not shown, you must configure fencing appropriately for your cl
       </screen>
     </section>
   </section>
+
+    <section>
+    <title>Integrating with other Cluster Resource Managers</title>
+    <para>
+      To integrate with a different resource manager you must configure it to:
+      <itemizedlist>
+	<listitem>Start a qpidd process on each node of the cluster.</listitem>
+	<listitem>Restart qpidd if it crases.</listitem>
+	<listitem>Promote exactly one of the brokers to primary.</listitem>
+	<listitem>Detect a failure and promote a new primary.</listitem>
+      </itemizedlist>
+    </para>
+    <para>
+      The <command>qpid-ha</command> command allows you to check if a broker is primary,
+      and to promote a backup to primary.
+    </para>
+    <para>
+      To test if a broker is the primary:
+      <programlisting>
+	qpid-ha -b <replaceable>broker-address</replaceable> status --expect=primary
+      </programlisting>
+      This command will return 0 if the broker at <replaceable>broker-address</replaceable>
+      is the primary, non-0 otherwise.
+    </para>
+    <para>
+      To promote a broker to primary:
+      <programlisting>
+	qpid-ha -b <replaceable>broker-address</replaceable> promote
+      </programlisting>
+    </para>
+    <para>
+      <command>qpid-ha --help</command> gives information on other commands and options available.
+      You can also use <command>qpid-ha</command> to manually examine and promote brokers. This
+      can be useful for testing failover scenarios without having to set up a full resource manager,
+      or to simulate a cluster on a single node. For deployment, a resource manager is required.
+    </para>
+  </section>
 </section>
 
 <!-- LocalWords:  scalability rgmanager multicast RGManager mailto LVQ qpidd IP dequeued Transactional username
author	Alan Conway <aconway@apache.org>	2012-06-18 21:45:36 +0000
committer	Alan Conway <aconway@apache.org>	2012-06-18 21:45:36 +0000
commit	5fbacc774744500e604d58d8904e1c3f8f09578a (patch)
tree	1bd9a444309cc509e3e144e43f43e13ed54d85a6
parent	c45ee73853cb7c84bb2a7dd0c7f9fdecd7aa9286 (diff)
download	qpid-python-5fbacc774744500e604d58d8904e1c3f8f09578a.tar.gz