diff options
| author | Lorry Tar Creator <lorry-tar-importer@baserock.org> | 2015-02-17 17:25:57 +0000 |
|---|---|---|
| committer | <> | 2015-03-17 16:26:24 +0000 |
| commit | 780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch) | |
| tree | 598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/rep_lease.html | |
| parent | 7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff) | |
| download | berkeleydb-master.tar.gz | |
Diffstat (limited to 'docs/programmer_reference/rep_lease.html')
| -rw-r--r-- | docs/programmer_reference/rep_lease.html | 511 |
1 files changed, 277 insertions, 234 deletions
diff --git a/docs/programmer_reference/rep_lease.html b/docs/programmer_reference/rep_lease.html index 6db976d0..301e11af 100644 --- a/docs/programmer_reference/rep_lease.html +++ b/docs/programmer_reference/rep_lease.html @@ -3,7 +3,7 @@ <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> - <title>Master Leases</title> + <title>Master leases</title> <link rel="stylesheet" href="gettingStarted.css" type="text/css" /> <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /> <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> @@ -14,17 +14,15 @@ <body> <div xmlns="" class="navheader"> <div class="libver"> - <p>Library Version 11.2.5.3</p> + <p>Library Version 12.1.6.1</p> </div> <table width="100%" summary="Navigation header"> <tr> - <th colspan="3" align="center">Master Leases</th> + <th colspan="3" align="center">Master leases</th> </tr> <tr> <td width="20%" align="left"><a accesskey="p" href="rep_trans.html">Prev</a> </td> - <th width="60%" align="center">Chapter 12. - Berkeley DB Replication - </th> + <th width="60%" align="center">Chapter 12. Berkeley DB Replication </th> <td width="20%" align="right"> <a accesskey="n" href="rep_ryw.html">Next</a></td> </tr> </table> @@ -34,7 +32,7 @@ <div class="titlepage"> <div> <div> - <h2 class="title" style="clear: both"><a id="rep_lease"></a>Master Leases</h2> + <h2 class="title" style="clear: both"><a id="rep_lease"></a>Master leases</h2> </div> </div> </div> @@ -42,343 +40,388 @@ <dl> <dt> <span class="sect2"> - <a href="rep_lease.html#masterlease_change_groupsize">Changing Group Size</a> + <a href="rep_lease.html#masterlease_change_groupsize">Changing group + size</a> </span> </dt> </dl> </div> - <p> - Some applications have strict requirements about the consistency of - data read on a master site. Berkeley DB provides a mechanism called - master leases to provide such consistency. Without master leases, it - is sometimes possible for Berkeley DB to return old data to an - application when newer data is available due to unfortunate scheduling - as illustrated below: -</p> + <p> + Some applications have strict requirements about the + consistency of data read on a master site. Berkeley DB + provides a mechanism called master leases to provide such + consistency. Without master leases, it is sometimes possible + for Berkeley DB to return old data to an application when + newer data is available due to unfortunate scheduling as + illustrated below: + </p> <div class="orderedlist"> <ol type="1"> - <li><span class="bold"><strong>Application on master site</strong></span>: Read data item - <span class="emphasis"><em>foo</em></span> via Berkeley DB <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> or <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> call. - </li> - <li><span class="bold"><strong>Application on master site</strong></span>: sleep, get descheduled, etc. - </li> - <li><span class="bold"><strong>System</strong></span>: Master changes role, becomes a client. - </li> - <li><span class="bold"><strong>System</strong></span>: New site is elected master. - </li> - <li><span class="bold"><strong>System</strong></span>: New master modifies data item - <span class="emphasis"><em>foo</em></span>. - </li> - <li><span class="bold"><strong>Application</strong></span>: Berkeley DB returns old data for - <span class="emphasis"><em>foo</em></span> to application. - </li> + <li><span class="bold"><strong>Application on master + site</strong></span>: Read data item + <span class="emphasis"><em>foo</em></span> via Berkeley DB <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> or + <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> call. + </li> + <li><span class="bold"><strong>Application on master + site</strong></span>: sleep, get descheduled, etc. + </li> + <li><span class="bold"><strong>System</strong></span>: Master changes + role, becomes a client. + </li> + <li><span class="bold"><strong>System</strong></span>: New site is + elected master. + </li> + <li><span class="bold"><strong>System</strong></span>: New master + modifies data item <span class="emphasis"><em>foo</em></span>. + </li> + <li><span class="bold"><strong>Application</strong></span>: Berkeley DB + returns old data for <span class="emphasis"><em>foo</em></span> to + application. + </li> </ol> </div> - <p> - By using master leases, Berkeley DB can provide guarantees about the - consistency of data read on a master site. The master site can be - considered a recognized authority for the data and consequently can - provide authoritative reads. Clients grant master leases to a master - site. By doing so, clients acknowledge the right of that site to - retain the role of master for a period of time. During that period of - time, clients cannot elect a new master, become master, nor grant their - lease to another site. -</p> - <p> - By holding a collection of granted leases, a master site can guarantee - to the application that the data returned is the current, authoritative - value. As a master performs operations, it continually requests - updated grants from the clients. When a read operation is required, - the master guarantees that it holds a valid collection of lease grants - from clients before returning data to the application. By holding - leases, Berkeley DB provides several guarantees to the application: -</p> + <p> + By using master leases, Berkeley DB can provide guarantees + about the consistency of data read on a master site. The + master site can be considered a recognized authority for the + data and consequently can provide authoritative reads. Clients + grant master leases to a master site. By doing so, clients + acknowledge the right of that site to retain the role of + master for a period of time. During that period of time, + clients cannot elect a new master, become master, or grant + their lease to another site. + </p> + <p> + By holding a collection of granted leases, a master site + can guarantee to the application that the data returned is the + current, authoritative value. As a master performs operations, + it continually requests updated grants from the clients. When + a read operation is required, the master guarantees that it + holds a valid collection of lease grants from clients before + returning data to the application. By holding leases, Berkeley + DB provides several guarantees to the application: + </p> <div class="orderedlist"> <ol type="1"> <li> - Authoritative reads: A guarantee that the data being read by the - application is the current value. - </li> + Authoritative reads: A guarantee that the data + being read by the application is the current value. + </li> <li> <p> - Durability from rollback: A guarantee that the data being - written or read by the application is permanent across a - majority of client sites and will never be rolled back. - </p> - <p> - The rollback guarantee also depends on the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> flag. The - guarantee is effective as long as there isn't total replication group - failure while clients have granted leases but are holding the updates - in their cache. The application must weigh the performance impact of - synchronous transactions against the risk of total replication group - failure. If clients grant a lease while holding updated data in cache, - and total failure occurs, then the data is no longer present on the - clients and rollback can occur if the master also crashes. - </p> + Durability from rollback: A guarantee that the data + being written or read by the application is permanent + across a majority of client sites and will never be + rolled back. + </p> <p> - The guarantee that data will not be rolled back applies only to data - successfully committed on a master. Data read on a client, or read - while ignoring leases can be rolled back. - </p> + The rollback guarantee also depends on the + <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> flag. The guarantee is effective as + long as there isn't a failure of half of the + replication group while clients have granted leases + but are holding the updates in their cache. The + application must weigh the performance impact of + synchronous transactions against the risk of the + failure of at least half of the replication group. If + clients grant a lease while holding updated data in + cache, and failure occurs, then the data is no longer + present on the clients and rollback can occur if a + sufficient number of other sites also crash. + </p> + <p> + The guarantee that data will not be rolled back + applies only to data successfully committed on a + master. Data read on a client, or read while ignoring + leases can be rolled back. + </p> </li> <li> + <p> + Freshness: A guarantee that the data being read by + the application on the <span class="emphasis"><em>master</em></span> is + up-to-date and has not been modified or removed during + the read. + </p> <p> - Freshness: A guarantee that the data being read by the - application on the <span class="emphasis"><em>master</em></span> is up-to-date - and has not been modified or removed during the read. - </p> - <p> - The read authority is only on the master. Read operations on a - client always ignore leases and consequently, these operations - can return stale data. - </p> + The read authority is only on the master. Read + operations on a client always ignore leases and + consequently, these operations can return stale data. + </p> </li> <li> - <p> - Master viability: A guarantee that a current master with valid - leases cannot encounter a duplicate master situation. - </p> - <p> - Leases remove the possibility of a duplicate master situation that - forces the current master to downgrade to a client. However, it is - still possible that old masters with expired leases can discover a - later master and return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a> to the application. - </p> + <p> + Master viability: A guarantee that a current master + with valid leases cannot encounter a duplicate master + situation. + </p> + <p> + Leases remove the possibility of a duplicate master + situation that forces the current master to downgrade + to a client. However, it is still possible that old + masters with expired leases can discover a later + master and return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a> to the + application. + </p> </li> </ol> </div> <p> - There are several requirements of the application using leases: -</p> + There are several requirements of the application using + leases: + </p> <div class="orderedlist"> <ol type="1"> + <li> + Replication Manager applications must configure a + majority (or larger) acknowledgement policy via the + <a href="../api_reference/C/repmgrset_ack_policy.html" class="olink">DB_ENV->repmgr_set_ack_policy()</a> method. Base API applications must + implement and enforce such a policy on their own. + </li> + <li> + Base API applications must return an error from the + send callback function when the majority acknowledgement + policy is not met for permanent records marked with + <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a>. Note that the Replication Manager + automatically fulfills this requirement. + </li> + <li> + Base API applications must set the number of sites + in the group using the <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a> method before starting + replication and cannot change it during operation. + </li> + <li> + Using leases in a replication group is all or none. + Behavior is undefined when some sites configure leases and + others do not. Use the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method to turn on + leases. + </li> <li> - Replication Manager applications must configure a majority (or - larger) acknowledgement policy via the <a href="../api_reference/C/repmgrset_ack_policy.html" class="olink">DB_ENV->repmgr_set_ack_policy()</a> method. - Base API applications must implement and enforce such a policy on - their own. - </li> - <li> - Base API applications must return an error from the send callback - function when the majority acknowledgement policy is not met for - permanent records marked with <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a>. Note that the - Replication Manager automatically fulfills this requirement. - </li> + The configured lease timeout value must be the same + on all sites in a replication group, set via the + <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method. + </li> + <li> + The configured clock skew ratio must be the same on + all sites in a replication group. This value defaults to + no skew, but can be set via the <a href="../api_reference/C/repclockskew.html" class="olink">DB_ENV->rep_set_clockskew()</a> method. + </li> + <li> + Applications that care about read guarantees must + perform all read operations on the master. Reading on a + client does not guarantee freshness. + </li> <li> - Base API applications must set the number of sites in the group - using the <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a> method before starting replication and cannot - change it during operation. - </li> + The application must use elections to choose a + master site. It must never simply declare a master without + having won an election (as is allowed without Master + Leases). + </li> <li> - Using leases in a replication group is all or none. Behavior is - undefined when some sites configure leases and others do not. Use - the <a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method to turn on leases. - </li> - <li> - The configured lease timeout value must be the same on all sites - in a replication group, set via the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method. - </li> - <li> - The configured clock_scale_factor value must be the same on all - sites in a replication group. This value defaults to no skew, but - can be set via the <a href="../api_reference/C/repclockskew.html" class="olink">DB_ENV->rep_set_clockskew()</a> method. - </li> - <li> - Applications that care about read guarantees must perform all read - operations on the master. Reading on a client does not guarantee - freshness. - </li> - <li> - The application must use elections to choose a master site. It - must never simply declare a master without having won an election - (as is allowed without Master Leases). - </li> + Unelectable (zero priority) sites never grant + leases and cannot be used to guarantee data durability. A + majority of sites in the replication group must be + electable in order to meet the requirement of getting + lease grants from a majority of sites. Minimizing the + number of unelectable sites improves replication group + availability. + </li> </ol> </div> + <p> + Master leases are based on timeouts. Berkeley DB assumes + that time always runs forward. Users who change the system + clock on either client or master sites when leases are in use + void all guarantees and can get undefined behavior. See the + <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method for more information. + </p> <p> - Master leases are based on timeouts. Berkeley DB assumes that time - always runs forward. Users who change the system clock on either - client or master sites when leases are in use void all guarantees and - can get undefined behavior. See the <a href="../api_reference/C/repset_timeout.html" class="olink">DB_ENV->rep_set_timeout()</a> method for more - information. -</p> - <p> - Applications using master leases should be prepared to handle - <code class="literal">DB_REP_LEASE_EXPIRED</code> errors from read operations - on a master and from the <a href="../api_reference/C/txncommit.html" class="olink">DB_TXN->commit()</a> method. -</p> - <p> - Read operations on a master that should not be subject to leases can - use the <a href="../api_reference/C/dbget.html#get_DB_IGNORE_LEASE" class="olink">DB_IGNORE_LEASE</a> flag to the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> method. Read operations - on a client always imply leases are ignored. -</p> - <p> - Master lease checks cannot succeed until a majority of sites have - completed client synchronization. Read operations on a master performed - before this condition is met can use the <a href="../api_reference/C/dbget.html#get_DB_IGNORE_LEASE" class="olink">DB_IGNORE_LEASE</a> flag to - avoid errors. -</p> + Applications using master leases should be prepared to + handle <code class="literal">DB_REP_LEASE_EXPIRED</code> errors from + read operations on a master and from the <a href="../api_reference/C/txncommit.html" class="olink">DB_TXN->commit()</a> method. + </p> + <p> + Read operations on a master that should not be subject to + leases can use the <a href="../api_reference/C/dbget.html#get_DB_IGNORE_LEASE" class="olink">DB_IGNORE_LEASE</a> flag to the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> + method. Read operations on a client always imply leases are + ignored. + </p> + <p> + Master lease checks cannot succeed until a majority of + sites have completed client synchronization. Read operations + on a master performed before this condition is met can use the + <a href="../api_reference/C/dbget.html#get_DB_IGNORE_LEASE" class="olink">DB_IGNORE_LEASE</a> flag to avoid errors. + </p> <p> - Clients are forbidden from participating in elections while they have - an outstanding lease granted to a master. Therefore, if the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> - method is called, then Berkeley DB will block, waiting until its lease - grant expires before participating in any election. While it waits, - the client attempts to contact the current master. If the client finds - a current master, then it returns from the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. When - leases are configured and the lease has never yet been granted (on - start-up), clients must wait a full lease timeout before participating - in an election. -</p> + Clients are forbidden from participating in elections while + they have an outstanding lease granted to a master. Therefore, + if the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method is called, then Berkeley DB will + block, waiting until its lease grant expires before + participating in any election. While it waits, the client + attempts to contact the current master. If the client finds a + current master, then it returns from the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method. + When leases are configured and the lease has never yet been + granted (on start-up), clients must wait a full lease timeout + before participating in an election. + </p> <div class="sect2" lang="en" xml:lang="en"> <div class="titlepage"> <div> <div> - <h3 class="title"><a id="masterlease_change_groupsize"></a>Changing Group Size</h3> + <h3 class="title"><a id="masterlease_change_groupsize"></a>Changing group + size</h3> </div> </div> </div> <p> - If you are using master leases and you change the size of your - replication group, there is a remote possibility that you can - lose some data previously thought to be durable. This is only - true for users of the Base API. + If you are using master leases and you change the size + of your replication group, there is a remote possibility + that you can lose some data previously thought to be + durable. This is only true for users of the Base API. </p> <p> - The problem can arise if you are removing sites from your - replication group. (You might be increasing the size of your - site overall, but if you remove all of the wrong sites you can - lose data.) + The problem can arise if you are removing sites from + your replication group. (You might be increasing the size + of your group overall, but if you remove all of the wrong + sites you can lose data.) </p> - <p> - Suppose you have a replication group with five sites; A, B, C, - D and E; and you are using a quorum acknowledgement policy. Then: + <p> + Suppose you have a replication group with five sites; + A, B, C, D and E; and you are using a quorum + acknowledgement policy. Then: </p> <div class="orderedlist"> <ol type="1"> <li> - <p> - Master A replicates a transaction to replicas B and C. - Those sites acknowledge the write activity. + <p> + Master A replicates a transaction to replicas B + and C. Those sites acknowledge the write activity. </p> </li> <li> - <p> - Sites D and E do not receive the transaction. However, - B and C have acknowledged the transaction, which means the - acknowledgement policy is met and so the transaction is - considered durable. + <p> + Sites D and E do not receive the transaction. + However, B and C have acknowledged the + transaction, which means the acknowledgement + policy is met and so the transaction is considered + durable. </p> </li> <li> <p> - You shutdown sites B and C. Now only A has the transaction. + You shut down sites B and C. Now only A has the + transaction. </p> </li> <li> <p> - You increase the size of your replication group to 3 - using <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a>. + You decrease the size of your replication group + to 3 using <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a>. </p> </li> <li> <p> - You shutdown or otherwise lose site A. + You shut down or otherwise lose site A. </p> </li> <li> - <p> - Sites D and E hold an election. Because the size of the - replication group is 3, they have enough sites to - successfully hold an election. However, neither site - has the transaction in question. In this way, the - transaction can become lost. + <p> + Sites D and E hold an election. Because the + size of the replication group is 3, they have + enough sites to successfully hold an election. + However, neither site has the transaction in + question. In this way, the transaction can become + lost. </p> </li> </ol> </div> <p> - An alternative scenario exists where you do not change the size - of your replication group, or you actually increase the size of - your replication group, but in the process you happen to remove - the exact wrong sites: + An alternative scenario exists where you do not change + the size of your replication group, or you actually + increase the size of your replication group, but in the + process you happen to remove the exact wrong sites: </p> <div class="orderedlist"> <ol type="1"> <li> <p> - Master A replicates a transaction to replicas B and C. - Those sites acknowledge the write activity. + Master A replicates a transaction to replicas B + and C. Those sites acknowledge the write activity. </p> </li> <li> - <p> - Sites D and E do not receive the transaction. However, - B and C have acknowledged the transaction, which means the - acknowledgement policy is met and so the transaction is - considered durable. + <p> + Sites D and E do not receive the transaction. + However, B and C have acknowledged the + transaction, which means the acknowledgement + policy is met and so the transaction is considered + durable. </p> </li> <li> - <p> - You shutdown sites B and C. Now only A has the transaction. + <p> + You shut down sites B and C. Now only A has the + transaction. </p> </li> <li> <p> - You add three new sites to your replication group: F, - G and H, increasing the size of your replication group - to 6 using <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a>. + You add three new sites to your replication + group: F, G and H, increasing the size of your + replication group to 6 using <a href="../api_reference/C/repnsites.html" class="olink">DB_ENV->rep_set_nsites()</a>. </p> </li> <li> <p> - You shutdown or otherwise lose site A before F, G and H - can be fully populated with data. + You shut down or otherwise lose site A before + F, G and H can be fully populated with data. </p> </li> <li> <p> - Sites D, E, F, G and H hold an election. Because the size of the - replication group is 6, they have enough sites to - successfully hold an election. However, none of these sites - has the transaction in question. In this way, the - transaction can become lost. + Sites D, E, F, G and H hold an election. + Because the size of the replication group is 6, + they have enough sites to successfully hold an + election. However, none of these sites has the + transaction in question. In this way, the + transaction can become lost. </p> </li> </ol> </div> <p> - This scenario represents a race condition that would be highly - unlikely to be seen outside of a lab environment. To minimize - the chance of this race condition occurring to the absolute - minimum, do one or more of the following when using master - leases with the Base API: + This scenario represents a race condition that would be + highly unlikely to be seen outside of a lab environment. + To minimize the chance of this race condition occurring to + the absolute minimum, do one or more of the following when + using master leases with the Base API: </p> <div class="orderedlist"> <ol type="1"> <li> <p> - Require all sites to acknowledge transaction commits. + Require all sites to acknowledge transaction + commits. </p> </li> <li> <p> - Never change the size of your replication group unless - all sites in the group are running and communicating - normally with one another. + Never change the size of your replication group + unless all sites in the group are running and + communicating normally with one another. </p> </li> <li> - <p> - Don't remove (or replace) a large percentage of your - sites from your replication group unless all sites in - the group are running and communicating normally with - one another. If you are going to remove a large - percentage of your sites from your replication group, - try removing just one site at a time, pausing in - between each removal to give the replication group a - chance to fully distribute all writes before removing - the next site. + <p> + Don't remove (or replace) a large percentage of + your sites from your replication group unless all + sites in the group are running and communicating + normally with one another. If you are going to + remove a large percentage of your sites from your + replication group, try removing just one site at a + time, pausing in between each removal to give the + replication group a chance to fully distribute all + writes before removing the next site. </p> </li> </ol> |
