summaryrefslogtreecommitdiff
path: root/docs/programmer_reference/rep_partition.html
diff options
context:
space:
mode:
authorLorry Tar Creator <lorry-tar-importer@baserock.org>2015-02-17 17:25:57 +0000
committer <>2015-03-17 16:26:24 +0000
commit780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch)
tree598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/rep_partition.html
parent7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff)
downloadberkeleydb-master.tar.gz
Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz.HEADdb-6.1.23master
Diffstat (limited to 'docs/programmer_reference/rep_partition.html')
-rw-r--r--docs/programmer_reference/rep_partition.html220
1 files changed, 129 insertions, 91 deletions
diff --git a/docs/programmer_reference/rep_partition.html b/docs/programmer_reference/rep_partition.html
index e5740736..9b662f4b 100644
--- a/docs/programmer_reference/rep_partition.html
+++ b/docs/programmer_reference/rep_partition.html
@@ -14,7 +14,7 @@
<body>
<div xmlns="" class="navheader">
<div class="libver">
- <p>Library Version 11.2.5.3</p>
+ <p>Library Version 12.1.6.1</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
@@ -22,9 +22,7 @@
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="rep_twosite.html">Prev</a> </td>
- <th width="60%" align="center">Chapter 12. 
- Berkeley DB Replication
- </th>
+ <th width="60%" align="center">Chapter 12.  Berkeley DB Replication </th>
<td width="20%" align="right"> <a accesskey="n" href="rep_faq.html">Next</a></td>
</tr>
</table>
@@ -38,93 +36,133 @@
</div>
</div>
</div>
- <p>The Berkeley DB replication implementation can be affected by network
-partitioning problems.</p>
- <p>For example, consider a replication group with N members. The network
-partitions with the master on one side and more than N/2 of the sites
-on the other side. The sites on the side with the master will continue
-forward, and the master will continue to accept write queries for the
-databases. Unfortunately, the sites on the other side of the partition,
-realizing they no longer have a master, will hold an election. The
-election will succeed as there are more than N/2 of the total sites
-participating, and there will then be two masters for the replication
-group. Since both masters are potentially accepting write queries, the
-databases could diverge in incompatible ways.</p>
- <p>If multiple masters are ever found to exist in a replication group, a
-master detecting the problem will return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a>. If
-the application sees this return, it should reconfigure itself as a
-client (by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a>), and then call for an election
-(by calling <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a>). The site that wins the election may be
-one of the two previous masters, or it may be another site entirely.
-Regardless, the winning system will bring all of the other systems into
-conformance.</p>
- <p>As another example, consider a replication group with a master
-environment and two clients A and B, where client A may upgrade to
-master status and client B cannot. Then, assume client A is partitioned
-from the other two database environments, and it becomes out-of-date
-with respect to the master. Then, assume the master crashes and does
-not come back on-line. Subsequently, the network partition is restored,
-and clients A and B hold an election. As client B cannot win the
-election, client A will win by default, and in order to get back into
-sync with client B, possibly committed transactions on client B will be
-unrolled until the two sites can once again move forward together.</p>
- <p>In both of these examples, there is a phase where a newly elected master
-brings the members of a replication group into conformance with itself
-so that it can start sending new information to them. This can result
-in the loss of information as previously committed transactions are
-unrolled.</p>
- <p>In architectures where network partitions are an issue, applications
-may want to implement a heart-beat protocol to minimize the consequences
-of a bad network partition. As long as a master is able to contact at
-least half of the sites in the replication group, it is impossible for
-there to be two masters. If the master can no longer contact a
-sufficient number of systems, it should reconfigure itself as a client,
-and hold an election. Replication Manager does not currently
-implement such a feature, so this technique is only available to Base API
-applications.</p>
- <p>There is another tool applications can use to minimize the damage in
-the case of a network partition. By specifying an <span class="bold"><strong>nsites</strong></span>
-argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> that is larger than the actual number of
-database environments in the replication group, Base API applications can keep
-systems from declaring themselves the master unless they can talk to
-a large percentage of the sites in the system. For example, if there
-are 20 database environments in the replication group, and an argument
-of 30 is specified to the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method, then a system will have
-to be able to talk to at least 16 of the sites to declare itself the
-master.</p>
- <p>Replication Manager uses the value of <span class="bold"><strong>nsites</strong></span> (configured by
-the <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV-&gt;rep_set_nsites()</a> method) for elections as well as in calculating how
-many acknowledgements to wait for when sending a
-<a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> message. So this technique may be useful here
-as well, unless the application uses the <a href="../api_reference/C/repmgrset_ack_policy.html#ackspolicy_DB_REPMGR_ACKS_ALL" class="olink">DB_REPMGR_ACKS_ALL</a> or
-<a href="../api_reference/C/repmgrset_ack_policy.html#ackspolicy_DB_REPMGR_ACKS_ALL_PEERS" class="olink">DB_REPMGR_ACKS_ALL_PEERS</a> acknowledgement policies.</p>
- <p>Specifying a <span class="bold"><strong>nsites</strong></span> argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> that is
-smaller than the actual number of database environments in the
-replication group has its uses as well. For example, consider a
-replication group with 2 environments. If they are partitioned from
-each other, neither of the sites could ever get enough votes to become
-the master. A reasonable alternative would be to specify a
-<span class="bold"><strong>nsites</strong></span> argument of 2 to one of the systems
-and a <span class="bold"><strong>nsites</strong></span>
-argument of 1 to the other. That way, one of the systems could win
-elections even when partitioned, while the other one could not. This
-would allow one of the systems to continue accepting write
-queries after the partition.</p>
- <p>In a 2-site group, Replication Manager by default reacts to the loss of
-communication with the master by observing a strict majority rule that
-prevents the survivor from taking over. Thus it avoids multiple masters and
-the need to unroll some transactions if both sites are running but cannot
-communicate. But it does leave the group in a read-only state until both
-sites are available. If application availability while one site is down is a
-priority and it is acceptable to risk unrolling some transactions, there
-is a configuration option to turn off the strict majority rule and allow
-the surviving client to declare itself to be master. See the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV-&gt;rep_set_config()</a>
-method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag for more information.</p>
- <p>These scenarios stress the importance of good network infrastructure in
-Berkeley DB replicated environments. When replicating database environments
-over sufficiently lossy networking, the best solution may well be to
-pick a single master, and only hold elections when human intervention
-has determined the selected master is unable to recover at all.</p>
+ <p>
+ The Berkeley DB replication implementation can be affected
+ by network partitioning problems.
+ </p>
+ <p>
+ For example, consider a replication group with N members.
+ The network partitions with the master on one side and more
+ than N/2 of the sites on the other side. The sites on the side
+ with the master will continue forward, and the master will
+ continue to accept write queries for the databases.
+ Unfortunately, the sites on the other side of the partition,
+ realizing they no longer have a master, will hold an election.
+ The election will succeed as there are more than N/2 of the
+ total sites participating, and there will then be two masters
+ for the replication group. Since both masters are potentially
+ accepting write queries, the databases could diverge in
+ incompatible ways.
+ </p>
+ <p>
+ If multiple masters are ever found to exist in a replication
+ group, a master detecting the problem will return
+ <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a>. Replication Manager applications
+ automatically handle duplicate master situations. If a Base
+ API application sees this return, it should reconfigure itself
+ as a client (by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV-&gt;rep_start()</a>), and then call for an
+ election (by calling <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a>). The site that wins the
+ election may be one of the two previous masters, or it may be
+ another site entirely. Regardless, the winning system will
+ bring all of the other systems into conformance.
+ </p>
+ <p>
+ As another example, consider a replication group with a
+ master environment and two clients A and B, where client A may
+ upgrade to master status and client B cannot. Then, assume
+ client A is partitioned from the other two database
+ environments, and it becomes out-of-date with respect to the
+ master. Then, assume the master crashes and does not come back
+ on-line. Subsequently, the network partition is restored, and
+ clients A and B hold an election. As client B cannot win the
+ election, client A will win by default, and in order to get
+ back into sync with client B, possibly committed transactions
+ on client B will be unrolled until the two sites can once
+ again move forward together.
+ </p>
+ <p>
+ In both of these examples, there is a phase where a newly
+ elected master brings the members of a replication group into
+ conformance with itself so that it can start sending new
+ information to them. This can result in the loss of
+ information as previously committed transactions are
+ unrolled.
+ </p>
+ <p>
+ In architectures where network partitions are an issue,
+ applications may want to implement a heartbeat protocol to
+ minimize the consequences of a bad network partition. As long
+ as a master is able to contact at least half of the sites in
+ the replication group, it is impossible for there to be two
+ masters. If the master can no longer contact a sufficient
+ number of systems, it should reconfigure itself as a client,
+ and hold an election. Replication Manager does not currently
+ implement such a feature, so this technique is only available
+ to Base API applications.
+ </p>
+ <p>
+ There is another tool applications can use to minimize the
+ damage in the case of a network partition. By specifying an
+ <span class="bold"><strong>nsites</strong></span> argument to
+ <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> that is larger than the actual number of database
+ environments in the replication group, Base API applications
+ can keep systems from declaring themselves the master unless
+ they can talk to a large percentage of the sites in the
+ system. For example, if there are 20 database environments in
+ the replication group, and an argument of 30 is specified to
+ the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> method, then a system will have to be able to
+ talk to at least 16 of the sites to declare itself the master.
+ Replication Manager automatically maintains the number of
+ sites in the replication group, so this technique is only
+ available to Base API applications.
+ </p>
+ <p>
+ Specifying a <span class="bold"><strong>nsites</strong></span>
+ argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV-&gt;rep_elect()</a> that is smaller than the actual number
+ of database environments in the replication group has its uses
+ as well. For example, consider a replication group with 2
+ environments. If they are partitioned from each other, neither
+ of the sites could ever get enough votes to become the master.
+ A reasonable alternative would be to specify a <span class="bold"><strong>nsites</strong></span> argument of 2 to one of the
+ systems and a <span class="bold"><strong>nsites</strong></span> argument
+ of 1 to the other. That way, one of the systems could win
+ elections even when partitioned, while the other one could
+ not. This would allow one of the systems to continue accepting
+ write queries after the partition.
+ </p>
+ <p>
+ In a two-site group, Replication Manager by default reacts to
+ the loss of communication with the master by observing a
+ strict majority rule that prevents the survivor from taking
+ over. Thus it avoids multiple masters and the need to unroll
+ some transactions if both sites are running but cannot
+ communicate. But it does leave the group in a read-only state
+ until both sites are available. If application availability
+ while one site is down is a priority and it is acceptable to
+ risk unrolling some transactions, there is a configuration
+ option to turn off the strict majority rule and allow the
+ surviving client to declare itself to be master. See the
+ <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV-&gt;rep_set_config()</a> method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag for more
+ information.
+ </p>
+ <p>
+ Preferred master mode is another alternative for two-site
+ Replication Manager replication groups. It allows the survivor
+ to take over after the loss of communication with the master.
+ When communications are restored, it always preserves the
+ transactions from the preferred master site. See
+ <a class="xref" href="rep_twosite.html#twosite_prefmas" title="Preferred master mode">Preferred master mode</a>
+ for more information.
+ </p>
+ <p>
+ These scenarios stress the importance of good network
+ infrastructure in Berkeley DB replicated environments. When
+ replicating database environments over sufficiently lossy
+ networking, the best solution may well be to pick a single
+ master, and only hold elections when human intervention has
+ determined the selected master is unable to recover at
+ all.
+ </p>
</div>
<div class="navfooter">
<hr />