diff options
| author | Lorry Tar Creator <lorry-tar-importer@baserock.org> | 2015-02-17 17:25:57 +0000 |
|---|---|---|
| committer | <> | 2015-03-17 16:26:24 +0000 |
| commit | 780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch) | |
| tree | 598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/rep_elect.html | |
| parent | 7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff) | |
| download | berkeleydb-master.tar.gz | |
Diffstat (limited to 'docs/programmer_reference/rep_elect.html')
| -rw-r--r-- | docs/programmer_reference/rep_elect.html | 336 |
1 files changed, 190 insertions, 146 deletions
diff --git a/docs/programmer_reference/rep_elect.html b/docs/programmer_reference/rep_elect.html index 33427c6a..bf3bd062 100644 --- a/docs/programmer_reference/rep_elect.html +++ b/docs/programmer_reference/rep_elect.html @@ -8,13 +8,13 @@ <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> <link rel="up" href="rep.html" title="Chapter 12. Berkeley DB Replication" /> - <link rel="prev" href="rep_mgr_ack.html" title="Choosing a Replication Manager Ack Policy" /> + <link rel="prev" href="rep_mgr_ack.html" title="Choosing a Replication Manager acknowledgement policy" /> <link rel="next" href="rep_mastersync.html" title="Synchronizing with a master" /> </head> <body> <div xmlns="" class="navheader"> <div class="libver"> - <p>Library Version 11.2.5.3</p> + <p>Library Version 12.1.6.1</p> </div> <table width="100%" summary="Navigation header"> <tr> @@ -22,9 +22,7 @@ </tr> <tr> <td width="20%" align="left"><a accesskey="p" href="rep_mgr_ack.html">Prev</a> </td> - <th width="60%" align="center">Chapter 12. - Berkeley DB Replication - </th> + <th width="60%" align="center">Chapter 12. Berkeley DB Replication </th> <td width="20%" align="right"> <a accesskey="n" href="rep_mastersync.html">Next</a></td> </tr> </table> @@ -38,152 +36,197 @@ </div> </div> </div> - <p>Replication Manager automatically conducts elections when necessary, -based on configuration information supplied to the -<a href="../api_reference/C/reppriority.html" class="olink">DB_ENV->rep_set_priority()</a> method, unless the application turns off automatic -elections using the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method.</p> - <p>It is the responsibility of a Base API application -to initiate elections if desired. It is never dangerous -to hold an election, as the Berkeley DB election process ensures there is -never more than a single master database environment. Clients should -initiate an election whenever they lose contact with the master -environment, whenever they see a return of <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_HOLDELECTION" class="olink">DB_REP_HOLDELECTION</a> -from the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV->rep_process_message()</a> method, or when, for whatever reason, they do -not know who the master is. It is not necessary for applications to -immediately hold elections when they start, as any existing master -will be discovered after calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>. If no master has -been found after a short wait period, then the application should call -for an election.</p> - <p>For a client to win an election, the replication group must currently -have no master, and the client must have the most recent log records. -In the case of clients having equivalent log records, the priority of -the database environments participating in the election will determine -the winner. The application specifies the minimum number of replication -group members that must participate in an election for a winner to be -declared. We recommend at least ((N/2) + 1) members. If fewer than the -simple majority are specified, a warning will be given.</p> - <p>If an application's policy for what site should win an election can be -parameterized in terms of the database environment's information (that -is, the number of sites, available log records and a relative priority -are all that matter), then Berkeley DB can handle all elections transparently. -However, there are cases where the application has more complete -knowledge and needs to affect the outcome of elections. For example, -applications may choose to handle master selection, explicitly -designating master and client sites. Applications in these cases may -never need to call for an election. Alternatively, applications may -choose to use <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>'s arguments to force the correct outcome -to an election. That is, if an application has three sites, A, B, and -C, and after a failure of C determines that A must become the winner, -the application can guarantee an election's outcome by specifying -priorities appropriately after an election:</p> + <p> + Replication Manager automatically conducts elections when + necessary, based on configuration information supplied to the + <a href="../api_reference/C/reppriority.html" class="olink">DB_ENV->rep_set_priority()</a> method, unless the application turns off + automatic elections using the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method. + </p> + <p> + It is the responsibility of a Base API application to + initiate elections if desired. It is never dangerous to hold + an election, as the Berkeley DB election process ensures there + is never more than a single master database environment. + Clients should initiate an election whenever they lose contact + with the master environment, whenever they see a return of + <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_HOLDELECTION" class="olink">DB_REP_HOLDELECTION</a> from the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV->rep_process_message()</a> method, or when, + for whatever reason, they do not know who the master is. It is + not necessary for applications to immediately hold elections + when they start, as any existing master will be discovered + after calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>. If no master has been found after a + short wait period, then the application should call for an + election. + </p> + <p> + For a client to win an election, the replication group must + currently have no master, and the client must have the most + recent log records. In the case of clients having equivalent + log records, the priority of the database environments + participating in the election will determine the winner. The + application specifies the minimum number of replication group + members that must participate in an election for a winner to + be declared. We recommend at least ((N/2) + 1) members. If + fewer than the simple majority are specified, a warning will + be given. + </p> + <p> + If an application's policy for what site should win an + election can be parameterized in terms of the database + environment's information (that is, the number of sites, + available log records and a relative priority are all that + matter), then Berkeley DB can handle all elections + transparently. However, there are cases where the application + has more complete knowledge and needs to affect the outcome of + elections. For example, applications may choose to handle + master selection, explicitly designating master and client + sites. Applications in these cases may never need to call for + an election. Alternatively, applications may choose to use + <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>'s arguments to force the correct outcome to an + election. That is, if an application has three sites, A, B, + and C, and after a failure of C determines that A must become + the winner, the application can guarantee an election's + outcome by specifying priorities appropriately after an + election: + </p> <pre class="programlisting">on A: priority 100, nsites 2 on B: priority 0, nsites 2</pre> - <p>It is dangerous to configure more than one master environment using the -<a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a> method, and applications should be careful not to do so. -Applications should only configure themselves as the master environment -if they are the only possible master, or if they have won an election. -An application knows it has won an election when it receives the -<a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REP_ELECTED" class="olink">DB_EVENT_REP_ELECTED</a> event.</p> <p> - Normally, when a master failure is detected it is desired that an - election finish quickly so the application can continue to service - updates. Also, participating sites are already up and can participate. - However, in the case of restarting a whole group after an - administrative shutdown, it is possible that a slower booting site had - later logs than any other site. To cover that case, an application - would like to give the election more time to ensure all sites have a - chance to participate. Since it is intractable for a starting site to - determine which case the whole group is in, the use of a long timeout - gives all sites a reasonable chance to participate. If an application - wanting full participation sets the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method's - <span class="bold"><strong>nvotes</strong></span> argument to the number of sites - in the group and one site does not reboot, a master can never be elected - without manual intervention. -</p> + It is dangerous to configure more than one master + environment using the <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a> method, and applications + should be careful not to do so. Applications should only + configure themselves as the master environment if they are the + only possible master, or if they have won an election. An + application knows it has won an election when it receives the + <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REP_ELECTED" class="olink">DB_EVENT_REP_ELECTED</a> event. + </p> + <p> + Normally, when a master failure is detected it is desired + that an election finish quickly so the application can + continue to service updates. Also, participating sites are + already up and can participate. However, in the case of + restarting a whole group after an administrative shutdown, it + is possible that a slower booting site had later logs than any + other site. To cover that case, an application would like to + give the election more time to ensure all sites have a chance + to participate. Since it is intractable for a starting site to + determine which case the whole group is in, the use of a long + timeout gives all sites a reasonable chance to participate. If + an application wanting full participation sets the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> + method's <span class="bold"><strong>nvotes</strong></span> argument to + the number of sites in the group and one site does not reboot, + a master can never be elected without manual intervention. + </p> + <p> + In those cases, the desired action at a group level is to + hold a full election if all sites crashed and a majority + election if a subset of sites crashed or rebooted. Since an + individual site cannot know which number of votes to require, + a mechanism is available to accomplish this using timeouts. By + setting a long timeout (perhaps on the order of minutes) using + the <span class="bold"><strong>DB_REP_FULL_ELECTION_TIMEOUT</strong></span> flag to the + <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method, an application can allow Berkeley DB to + elect a master even without full participation. Sites may also + want to set a normal election timeout for majority based + elections using the <span class="bold"><strong>DB_REP_ELECTION_TIMEOUT</strong></span> + flag to the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method. + </p> + <p> + Consider 3 sites, A, B, and C where A is the master. In the + case where all three sites crash and all reboot, all sites + will set a timeout for a full election, say 10 minutes, but + only require a majority for <span class="bold"><strong>nvotes</strong></span> to + the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. Once all + three sites are booted the election will complete immediately + if they reboot within 10 minutes of each other. Consider if + all three sites crash and only two reboot. The two sites will + enter the election, but after the 10 minute timeout they will + elect with the majority of two sites. Using the full election + timeout sets a threshold for allowing a site to reboot and + rejoin the group. + </p> + <p> + To add a database environment to the replication group with + the intent of it becoming the master, first add it as a + client. Since it may be out-of-date with respect to the + current master, allow it to update itself from the current + master. Then, shut the current master down. Presumably, the + added client will win the subsequent election. If the client + does not win the election, it is likely that it was not given + sufficient time to update itself with respect to the current + master. + </p> + <p> + If a client is unable to find a master or win an election, + it means that the network has been partitioned and there are + not enough environments participating in the election for one + of the participants to win. In this case, the application + should repeatedly call <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a> and <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>, alternating + between attempting to discover an existing master, and holding + an election to declare a new one. In desperate circumstances, + an application could simply declare itself the master by + calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>, or by reducing the number of participants + required to win an election until the election is won. Neither + of these solutions is recommended: in the case of a network + partition, either of these choices can result in there being + two masters in one replication group, and the databases in the + environment might irretrievably diverge as they are modified + in different ways by the masters. + </p> + <p> + Note that this presents a special problem for a replication + group consisting of only two environments. If a master site + fails, the remaining client can never comprise a majority of + sites in the group. If the client application can reach a + remote network site, or some other external tie-breaker, it + may be able to determine whether it is safe to declare itself + master. Otherwise it must choose between providing + availability of a writable master (at the risk of duplicate + masters), or strict protection against duplicate masters (but + no master when a failure occurs). Replication Manager offers + this choice via the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method + <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag. Base API applications can + accomplish this by judicious setting of the <span class="bold"><strong>nvotes</strong></span> and <span class="bold"><strong>nsites</strong></span> + parameters to the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. + </p> <p> -In those cases, the desired action at a group level is to hold -a full election if all sites crashed and a majority election if -a subset of sites crashed or rebooted. Since an individual site cannot know -which number of votes to require, a mechanism is available to -accomplish this using timeouts. By setting a long timeout (perhaps -on the order of minutes) using the <span class="bold"><strong>DB_REP_FULL_ELECTION_TIMEOUT</strong></span> -flag to the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method, an application can -allow Berkeley DB to elect a master even without full participation. -Sites may also want to set a normal election timeout for majority -based elections using the <span class="bold"><strong>DB_REP_ELECTION_TIMEOUT</strong></span> flag -to the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method.</p> + It is possible for a less-preferred database environment to + win an election if a number of systems crash at the same time. + Because an election winner is declared as soon as enough + environments participate in the election, the environment on a + slow booting but well-connected machine might lose to an + environment on a badly connected but faster booting machine. + In the case of a number of environments crashing at the same + time (for example, a set of replicated servers in a single + machine room), applications should bring the database + environments on line as clients initially (which will allow + them to process read queries immediately), and then hold an + election after sufficient time has passed for the slower + booting machines to catch up. + </p> <p> -Consider 3 sites, A, B, and C where A is the master. In the -case where all three sites crash and all reboot, all sites -will set a timeout for a full election, say 10 minutes, but only -require a majority for <span class="bold"><strong>nvotes</strong></span> to the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. -Once all three sites are booted the election will complete -immediately if they reboot within 10 minutes of each other. Consider -if all three sites crash and only two reboot. The two sites will -enter the election, but after the 10 minute timeout they will -elect with the majority of two sites. Using the full election -timeout sets a threshold for allowing a site to reboot and rejoin -the group.</p> - <p>To add a database environment to the replication group with the intent -of it becoming the master, first add it as a client. Since it may be -out-of-date with respect to the current master, allow it to update -itself from the current master. Then, shut the current master down. -Presumably, the added client will win the subsequent election. If the -client does not win the election, it is likely that it was not given -sufficient time to update itself with respect to the current master.</p> - <p>If a client is unable to find a master or win an election, it means that -the network has been partitioned and there are not enough environments -participating in the election for one of the participants to win. -In this case, the application should repeatedly call <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a> -and <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>, alternating between attempting to discover an -existing master, and holding an election to declare a new one. In -desperate circumstances, an application could simply declare itself the -master by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>, or by reducing the number of -participants required to win an election until the election is won. -Neither of these solutions is recommended: in the case of a network -partition, either of these choices can result in there being two masters -in one replication group, and the databases in the environment might -irretrievably diverge as they are modified in different ways by the -masters.</p> - <p>Note that this presents a special problem for a replication group -consisting of only two environments. If a master site fails, the -remaining client can never comprise a majority of sites in the group. -If the client application can reach a remote network site, or some other -external tie-breaker, it may be able to determine whether it is safe -to declare itself master. Otherwise it must choose between providing -availability of a writable master (at the risk of duplicate masters), -or strict protection against duplicate masters (but no master when a -failure occurs). Replication Manager offers this choice via the -<a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag. Base API -applications can accomplish this by judicious setting of the -<span class="bold"><strong>nvotes</strong></span> and -<span class="bold"><strong>nsites</strong></span> parameters to the -<a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. </p> - <p>It is possible for a less-preferred database environment to win an -election if a number of systems crash at the same time. Because an -election winner is declared as soon as enough environments participate -in the election, the environment on a slow booting but well-connected -machine might lose to an environment on a badly connected but faster -booting machine. In the case of a number of environments crashing at -the same time (for example, a set of replicated servers in a single -machine room), applications should bring the database environments on -line as clients initially (which will allow them to process read queries -immediately), and then hold an election after sufficient time has passed -for the slower booting machines to catch up.</p> - <p>If, for any reason, a less-preferred database environment becomes the -master, it is possible to switch masters in a replicated environment. -For example, the preferred master crashes, and one of the replication -group clients becomes the group master. In order to restore the -preferred master to master status, take the following steps:</p> + If, for any reason, a less-preferred database environment + becomes the master, it is possible to switch masters in a + replicated environment. For example, the preferred master + crashes, and one of the replication group clients becomes the + group master. In order to restore the preferred master to + master status, take the following steps: + </p> <div class="orderedlist"> <ol type="1"> - <li>The preferred master should reboot and re-join the replication group -as a client.</li> - <li>Once the preferred master has caught up with the replication group, the -application on the current master should complete all active transactions -and reconfigure itself as a client using the <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a> method.</li> - <li>Then, the current or preferred master should call for an election using -the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method.</li> + <li> + The preferred master should reboot and re-join the + replication group as a client. + </li> + <li> + Once the preferred master has caught up with the + replication group, the application on the current master + should complete all active transactions and reconfigure + itself as a client using the <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a> method. + </li> + <li> + Then, the current or preferred master should call + for an election using the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. + </li> </ol> </div> </div> @@ -198,11 +241,12 @@ the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elec <td width="40%" align="right"> <a accesskey="n" href="rep_mastersync.html">Next</a></td> </tr> <tr> - <td width="40%" align="left" valign="top">Choosing a Replication Manager Ack Policy </td> + <td width="40%" align="left" valign="top">Choosing a Replication Manager acknowledgement policy </td> <td width="20%" align="center"> <a accesskey="h" href="index.html">Home</a> </td> - <td width="40%" align="right" valign="top"> Synchronizing with a master</td> + <td width="40%" align="right" valign="top"> Synchronizing with a + master</td> </tr> </table> </div> |
