diff options
| author | Lorry Tar Creator <lorry-tar-importer@baserock.org> | 2015-02-17 17:25:57 +0000 |
|---|---|---|
| committer | <> | 2015-03-17 16:26:24 +0000 |
| commit | 780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch) | |
| tree | 598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/rep_partition.html | |
| parent | 7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff) | |
| download | berkeleydb-master.tar.gz | |
Diffstat (limited to 'docs/programmer_reference/rep_partition.html')
| -rw-r--r-- | docs/programmer_reference/rep_partition.html | 220 |
1 files changed, 129 insertions, 91 deletions
diff --git a/docs/programmer_reference/rep_partition.html b/docs/programmer_reference/rep_partition.html index e5740736..9b662f4b 100644 --- a/docs/programmer_reference/rep_partition.html +++ b/docs/programmer_reference/rep_partition.html @@ -14,7 +14,7 @@ <body> <div xmlns="" class="navheader"> <div class="libver"> - <p>Library Version 11.2.5.3</p> + <p>Library Version 12.1.6.1</p> </div> <table width="100%" summary="Navigation header"> <tr> @@ -22,9 +22,7 @@ </tr> <tr> <td width="20%" align="left"><a accesskey="p" href="rep_twosite.html">Prev</a> </td> - <th width="60%" align="center">Chapter 12. - Berkeley DB Replication - </th> + <th width="60%" align="center">Chapter 12. Berkeley DB Replication </th> <td width="20%" align="right"> <a accesskey="n" href="rep_faq.html">Next</a></td> </tr> </table> @@ -38,93 +36,133 @@ </div> </div> </div> - <p>The Berkeley DB replication implementation can be affected by network -partitioning problems.</p> - <p>For example, consider a replication group with N members. The network -partitions with the master on one side and more than N/2 of the sites -on the other side. The sites on the side with the master will continue -forward, and the master will continue to accept write queries for the -databases. Unfortunately, the sites on the other side of the partition, -realizing they no longer have a master, will hold an election. The -election will succeed as there are more than N/2 of the total sites -participating, and there will then be two masters for the replication -group. Since both masters are potentially accepting write queries, the -databases could diverge in incompatible ways.</p> - <p>If multiple masters are ever found to exist in a replication group, a -master detecting the problem will return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a>. If -the application sees this return, it should reconfigure itself as a -client (by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>), and then call for an election -(by calling <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>). The site that wins the election may be -one of the two previous masters, or it may be another site entirely. -Regardless, the winning system will bring all of the other systems into -conformance.</p> - <p>As another example, consider a replication group with a master -environment and two clients A and B, where client A may upgrade to -master status and client B cannot. Then, assume client A is partitioned -from the other two database environments, and it becomes out-of-date -with respect to the master. Then, assume the master crashes and does -not come back on-line. Subsequently, the network partition is restored, -and clients A and B hold an election. As client B cannot win the -election, client A will win by default, and in order to get back into -sync with client B, possibly committed transactions on client B will be -unrolled until the two sites can once again move forward together.</p> - <p>In both of these examples, there is a phase where a newly elected master -brings the members of a replication group into conformance with itself -so that it can start sending new information to them. This can result -in the loss of information as previously committed transactions are -unrolled.</p> - <p>In architectures where network partitions are an issue, applications -may want to implement a heart-beat protocol to minimize the consequences -of a bad network partition. As long as a master is able to contact at -least half of the sites in the replication group, it is impossible for -there to be two masters. If the master can no longer contact a -sufficient number of systems, it should reconfigure itself as a client, -and hold an election. Replication Manager does not currently -implement such a feature, so this technique is only available to Base API -applications.</p> - <p>There is another tool applications can use to minimize the damage in -the case of a network partition. By specifying an <span class="bold"><strong>nsites</strong></span> -argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> that is larger than the actual number of -database environments in the replication group, Base API applications can keep -systems from declaring themselves the master unless they can talk to -a large percentage of the sites in the system. For example, if there -are 20 database environments in the replication group, and an argument -of 30 is specified to the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method, then a system will have -to be able to talk to at least 16 of the sites to declare itself the -master.</p> - <p>Replication Manager uses the value of <span class="bold"><strong>nsites</strong></span> (configured by -the <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a> method) for elections as well as in calculating how -many acknowledgements to wait for when sending a -<a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> message. So this technique may be useful here -as well, unless the application uses the <a href="../api_reference/C/repmgrset_ack_policy.html#ackspolicy_DB_REPMGR_ACKS_ALL" class="olink">DB_REPMGR_ACKS_ALL</a> or -<a href="../api_reference/C/repmgrset_ack_policy.html#ackspolicy_DB_REPMGR_ACKS_ALL_PEERS" class="olink">DB_REPMGR_ACKS_ALL_PEERS</a> acknowledgement policies.</p> - <p>Specifying a <span class="bold"><strong>nsites</strong></span> argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> that is -smaller than the actual number of database environments in the -replication group has its uses as well. For example, consider a -replication group with 2 environments. If they are partitioned from -each other, neither of the sites could ever get enough votes to become -the master. A reasonable alternative would be to specify a -<span class="bold"><strong>nsites</strong></span> argument of 2 to one of the systems -and a <span class="bold"><strong>nsites</strong></span> -argument of 1 to the other. That way, one of the systems could win -elections even when partitioned, while the other one could not. This -would allow one of the systems to continue accepting write -queries after the partition.</p> - <p>In a 2-site group, Replication Manager by default reacts to the loss of -communication with the master by observing a strict majority rule that -prevents the survivor from taking over. Thus it avoids multiple masters and -the need to unroll some transactions if both sites are running but cannot -communicate. But it does leave the group in a read-only state until both -sites are available. If application availability while one site is down is a -priority and it is acceptable to risk unrolling some transactions, there -is a configuration option to turn off the strict majority rule and allow -the surviving client to declare itself to be master. See the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> -method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag for more information.</p> - <p>These scenarios stress the importance of good network infrastructure in -Berkeley DB replicated environments. When replicating database environments -over sufficiently lossy networking, the best solution may well be to -pick a single master, and only hold elections when human intervention -has determined the selected master is unable to recover at all.</p> + <p> + The Berkeley DB replication implementation can be affected + by network partitioning problems. + </p> + <p> + For example, consider a replication group with N members. + The network partitions with the master on one side and more + than N/2 of the sites on the other side. The sites on the side + with the master will continue forward, and the master will + continue to accept write queries for the databases. + Unfortunately, the sites on the other side of the partition, + realizing they no longer have a master, will hold an election. + The election will succeed as there are more than N/2 of the + total sites participating, and there will then be two masters + for the replication group. Since both masters are potentially + accepting write queries, the databases could diverge in + incompatible ways. + </p> + <p> + If multiple masters are ever found to exist in a replication + group, a master detecting the problem will return + <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a>. Replication Manager applications + automatically handle duplicate master situations. If a Base + API application sees this return, it should reconfigure itself + as a client (by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>), and then call for an + election (by calling <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>). The site that wins the + election may be one of the two previous masters, or it may be + another site entirely. Regardless, the winning system will + bring all of the other systems into conformance. + </p> + <p> + As another example, consider a replication group with a + master environment and two clients A and B, where client A may + upgrade to master status and client B cannot. Then, assume + client A is partitioned from the other two database + environments, and it becomes out-of-date with respect to the + master. Then, assume the master crashes and does not come back + on-line. Subsequently, the network partition is restored, and + clients A and B hold an election. As client B cannot win the + election, client A will win by default, and in order to get + back into sync with client B, possibly committed transactions + on client B will be unrolled until the two sites can once + again move forward together. + </p> + <p> + In both of these examples, there is a phase where a newly + elected master brings the members of a replication group into + conformance with itself so that it can start sending new + information to them. This can result in the loss of + information as previously committed transactions are + unrolled. + </p> + <p> + In architectures where network partitions are an issue, + applications may want to implement a heartbeat protocol to + minimize the consequences of a bad network partition. As long + as a master is able to contact at least half of the sites in + the replication group, it is impossible for there to be two + masters. If the master can no longer contact a sufficient + number of systems, it should reconfigure itself as a client, + and hold an election. Replication Manager does not currently + implement such a feature, so this technique is only available + to Base API applications. + </p> + <p> + There is another tool applications can use to minimize the + damage in the case of a network partition. By specifying an + <span class="bold"><strong>nsites</strong></span> argument to + <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> that is larger than the actual number of database + environments in the replication group, Base API applications + can keep systems from declaring themselves the master unless + they can talk to a large percentage of the sites in the + system. For example, if there are 20 database environments in + the replication group, and an argument of 30 is specified to + the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method, then a system will have to be able to + talk to at least 16 of the sites to declare itself the master. + Replication Manager automatically maintains the number of + sites in the replication group, so this technique is only + available to Base API applications. + </p> + <p> + Specifying a <span class="bold"><strong>nsites</strong></span> + argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> that is smaller than the actual number + of database environments in the replication group has its uses + as well. For example, consider a replication group with 2 + environments. If they are partitioned from each other, neither + of the sites could ever get enough votes to become the master. + A reasonable alternative would be to specify a <span class="bold"><strong>nsites</strong></span> argument of 2 to one of the + systems and a <span class="bold"><strong>nsites</strong></span> argument + of 1 to the other. That way, one of the systems could win + elections even when partitioned, while the other one could + not. This would allow one of the systems to continue accepting + write queries after the partition. + </p> + <p> + In a two-site group, Replication Manager by default reacts to + the loss of communication with the master by observing a + strict majority rule that prevents the survivor from taking + over. Thus it avoids multiple masters and the need to unroll + some transactions if both sites are running but cannot + communicate. But it does leave the group in a read-only state + until both sites are available. If application availability + while one site is down is a priority and it is acceptable to + risk unrolling some transactions, there is a configuration + option to turn off the strict majority rule and allow the + surviving client to declare itself to be master. See the + <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag for more + information. + </p> + <p> + Preferred master mode is another alternative for two-site + Replication Manager replication groups. It allows the survivor + to take over after the loss of communication with the master. + When communications are restored, it always preserves the + transactions from the preferred master site. See + <a class="xref" href="rep_twosite.html#twosite_prefmas" title="Preferred master mode">Preferred master mode</a> + for more information. + </p> + <p> + These scenarios stress the importance of good network + infrastructure in Berkeley DB replicated environments. When + replicating database environments over sufficiently lossy + networking, the best solution may well be to pick a single + master, and only hold elections when human intervention has + determined the selected master is unable to recover at + all. + </p> </div> <div class="navfooter"> <hr /> |
