From 780b92ada9afcf1d58085a83a0b9e6bc982203d1 Mon Sep 17 00:00:00 2001
From: Lorry Tar Creator
- Some applications have strict requirements about the consistency of
- data read on a master site. Berkeley DB provides a mechanism called
- master leases to provide such consistency. Without master leases, it
- is sometimes possible for Berkeley DB to return old data to an
- application when newer data is available due to unfortunate scheduling
- as illustrated below:
-
+ Some applications have strict requirements about the
+ consistency of data read on a master site. Berkeley DB
+ provides a mechanism called master leases to provide such
+ consistency. Without master leases, it is sometimes possible
+ for Berkeley DB to return old data to an application when
+ newer data is available due to unfortunate scheduling as
+ illustrated below:
+
- By using master leases, Berkeley DB can provide guarantees about the
- consistency of data read on a master site. The master site can be
- considered a recognized authority for the data and consequently can
- provide authoritative reads. Clients grant master leases to a master
- site. By doing so, clients acknowledge the right of that site to
- retain the role of master for a period of time. During that period of
- time, clients cannot elect a new master, become master, nor grant their
- lease to another site.
-
- By holding a collection of granted leases, a master site can guarantee
- to the application that the data returned is the current, authoritative
- value. As a master performs operations, it continually requests
- updated grants from the clients. When a read operation is required,
- the master guarantees that it holds a valid collection of lease grants
- from clients before returning data to the application. By holding
- leases, Berkeley DB provides several guarantees to the application:
-
+ By using master leases, Berkeley DB can provide guarantees
+ about the consistency of data read on a master site. The
+ master site can be considered a recognized authority for the
+ data and consequently can provide authoritative reads. Clients
+ grant master leases to a master site. By doing so, clients
+ acknowledge the right of that site to retain the role of
+ master for a period of time. During that period of time,
+ clients cannot elect a new master, become master, or grant
+ their lease to another site.
+
+ By holding a collection of granted leases, a master site
+ can guarantee to the application that the data returned is the
+ current, authoritative value. As a master performs operations,
+ it continually requests updated grants from the clients. When
+ a read operation is required, the master guarantees that it
+ holds a valid collection of lease grants from clients before
+ returning data to the application. By holding leases, Berkeley
+ DB provides several guarantees to the application:
+
- Durability from rollback: A guarantee that the data being
- written or read by the application is permanent across a
- majority of client sites and will never be rolled back.
-
- The rollback guarantee also depends on the DB_TXN_NOSYNC flag. The
- guarantee is effective as long as there isn't total replication group
- failure while clients have granted leases but are holding the updates
- in their cache. The application must weigh the performance impact of
- synchronous transactions against the risk of total replication group
- failure. If clients grant a lease while holding updated data in cache,
- and total failure occurs, then the data is no longer present on the
- clients and rollback can occur if the master also crashes.
-
- The guarantee that data will not be rolled back applies only to data
- successfully committed on a master. Data read on a client, or read
- while ignoring leases can be rolled back.
-
+ The guarantee that data will not be rolled back
+ applies only to data successfully committed on a
+ master. Data read on a client, or read while ignoring
+ leases can be rolled back.
+
+ Freshness: A guarantee that the data being read by
+ the application on the master is
+ up-to-date and has not been modified or removed during
+ the read.
+
- Freshness: A guarantee that the data being read by the
- application on the master is up-to-date
- and has not been modified or removed during the read.
-
- The read authority is only on the master. Read operations on a
- client always ignore leases and consequently, these operations
- can return stale data.
-
- Master viability: A guarantee that a current master with valid
- leases cannot encounter a duplicate master situation.
-
- Leases remove the possibility of a duplicate master situation that
- forces the current master to downgrade to a client. However, it is
- still possible that old masters with expired leases can discover a
- later master and return DB_REP_DUPMASTER to the application.
-
+ Master viability: A guarantee that a current master
+ with valid leases cannot encounter a duplicate master
+ situation.
+
+ Leases remove the possibility of a duplicate master
+ situation that forces the current master to downgrade
+ to a client. However, it is still possible that old
+ masters with expired leases can discover a later
+ master and return DB_REP_DUPMASTER to the
+ application.
+
- There are several requirements of the application using leases:
-
-
+ Master leases are based on timeouts. Berkeley DB assumes + that time always runs forward. Users who change the system + clock on either client or master sites when leases are in use + void all guarantees and can get undefined behavior. See the + DB_ENV->rep_set_timeout() method for more information. +
- Master leases are based on timeouts. Berkeley DB assumes that time - always runs forward. Users who change the system clock on either - client or master sites when leases are in use void all guarantees and - can get undefined behavior. See the DB_ENV->rep_set_timeout() method for more - information. -
-
- Applications using master leases should be prepared to handle
- DB_REP_LEASE_EXPIRED errors from read operations
- on a master and from the DB_TXN->commit() method.
-
- Read operations on a master that should not be subject to leases can - use the DB_IGNORE_LEASE flag to the DB->get() method. Read operations - on a client always imply leases are ignored. -
-- Master lease checks cannot succeed until a majority of sites have - completed client synchronization. Read operations on a master performed - before this condition is met can use the DB_IGNORE_LEASE flag to - avoid errors. -
+ Applications using master leases should be prepared to + handleDB_REP_LEASE_EXPIRED errors from
+ read operations on a master and from the DB_TXN->commit() method.
+
+ + Read operations on a master that should not be subject to + leases can use the DB_IGNORE_LEASE flag to the DB->get() + method. Read operations on a client always imply leases are + ignored. +
++ Master lease checks cannot succeed until a majority of + sites have completed client synchronization. Read operations + on a master performed before this condition is met can use the + DB_IGNORE_LEASE flag to avoid errors. +
- Clients are forbidden from participating in elections while they have - an outstanding lease granted to a master. Therefore, if the DB_ENV->rep_elect() - method is called, then Berkeley DB will block, waiting until its lease - grant expires before participating in any election. While it waits, - the client attempts to contact the current master. If the client finds - a current master, then it returns from the DB_ENV->rep_elect() method. When - leases are configured and the lease has never yet been granted (on - start-up), clients must wait a full lease timeout before participating - in an election. -
+ Clients are forbidden from participating in elections while + they have an outstanding lease granted to a master. Therefore, + if the DB_ENV->rep_elect() method is called, then Berkeley DB will + block, waiting until its lease grant expires before + participating in any election. While it waits, the client + attempts to contact the current master. If the client finds a + current master, then it returns from the DB_ENV->rep_elect() method. + When leases are configured and the lease has never yet been + granted (on start-up), clients must wait a full lease timeout + before participating in an election. +- If you are using master leases and you change the size of your - replication group, there is a remote possibility that you can - lose some data previously thought to be durable. This is only - true for users of the Base API. + If you are using master leases and you change the size + of your replication group, there is a remote possibility + that you can lose some data previously thought to be + durable. This is only true for users of the Base API.
- The problem can arise if you are removing sites from your - replication group. (You might be increasing the size of your - site overall, but if you remove all of the wrong sites you can - lose data.) + The problem can arise if you are removing sites from + your replication group. (You might be increasing the size + of your group overall, but if you remove all of the wrong + sites you can lose data.)
-- Suppose you have a replication group with five sites; A, B, C, - D and E; and you are using a quorum acknowledgement policy. Then: +
+ Suppose you have a replication group with five sites; + A, B, C, D and E; and you are using a quorum + acknowledgement policy. Then:
- Master A replicates a transaction to replicas B and C. - Those sites acknowledge the write activity. +
+ Master A replicates a transaction to replicas B + and C. Those sites acknowledge the write activity.
- Sites D and E do not receive the transaction. However, - B and C have acknowledged the transaction, which means the - acknowledgement policy is met and so the transaction is - considered durable. +
+ Sites D and E do not receive the transaction. + However, B and C have acknowledged the + transaction, which means the acknowledgement + policy is met and so the transaction is considered + durable.
- You shutdown sites B and C. Now only A has the transaction. + You shut down sites B and C. Now only A has the + transaction.
- You increase the size of your replication group to 3 - using DB_ENV->rep_set_nsites(). + You decrease the size of your replication group + to 3 using DB_ENV->rep_set_nsites().
- You shutdown or otherwise lose site A. + You shut down or otherwise lose site A.
- Sites D and E hold an election. Because the size of the - replication group is 3, they have enough sites to - successfully hold an election. However, neither site - has the transaction in question. In this way, the - transaction can become lost. +
+ Sites D and E hold an election. Because the + size of the replication group is 3, they have + enough sites to successfully hold an election. + However, neither site has the transaction in + question. In this way, the transaction can become + lost.
- An alternative scenario exists where you do not change the size - of your replication group, or you actually increase the size of - your replication group, but in the process you happen to remove - the exact wrong sites: + An alternative scenario exists where you do not change + the size of your replication group, or you actually + increase the size of your replication group, but in the + process you happen to remove the exact wrong sites:
- Master A replicates a transaction to replicas B and C. - Those sites acknowledge the write activity. + Master A replicates a transaction to replicas B + and C. Those sites acknowledge the write activity.
- Sites D and E do not receive the transaction. However, - B and C have acknowledged the transaction, which means the - acknowledgement policy is met and so the transaction is - considered durable. +
+ Sites D and E do not receive the transaction. + However, B and C have acknowledged the + transaction, which means the acknowledgement + policy is met and so the transaction is considered + durable.
- You shutdown sites B and C. Now only A has the transaction. +
+ You shut down sites B and C. Now only A has the + transaction.
- You add three new sites to your replication group: F, - G and H, increasing the size of your replication group - to 6 using DB_ENV->rep_set_nsites(). + You add three new sites to your replication + group: F, G and H, increasing the size of your + replication group to 6 using DB_ENV->rep_set_nsites().
- You shutdown or otherwise lose site A before F, G and H - can be fully populated with data. + You shut down or otherwise lose site A before + F, G and H can be fully populated with data.
- Sites D, E, F, G and H hold an election. Because the size of the - replication group is 6, they have enough sites to - successfully hold an election. However, none of these sites - has the transaction in question. In this way, the - transaction can become lost. + Sites D, E, F, G and H hold an election. + Because the size of the replication group is 6, + they have enough sites to successfully hold an + election. However, none of these sites has the + transaction in question. In this way, the + transaction can become lost.
- This scenario represents a race condition that would be highly - unlikely to be seen outside of a lab environment. To minimize - the chance of this race condition occurring to the absolute - minimum, do one or more of the following when using master - leases with the Base API: + This scenario represents a race condition that would be + highly unlikely to be seen outside of a lab environment. + To minimize the chance of this race condition occurring to + the absolute minimum, do one or more of the following when + using master leases with the Base API:
- Require all sites to acknowledge transaction commits. + Require all sites to acknowledge transaction + commits.
- Never change the size of your replication group unless - all sites in the group are running and communicating - normally with one another. + Never change the size of your replication group + unless all sites in the group are running and + communicating normally with one another.
- Don't remove (or replace) a large percentage of your - sites from your replication group unless all sites in - the group are running and communicating normally with - one another. If you are going to remove a large - percentage of your sites from your replication group, - try removing just one site at a time, pausing in - between each removal to give the replication group a - chance to fully distribute all writes before removing - the next site. +
+ Don't remove (or replace) a large percentage of + your sites from your replication group unless all + sites in the group are running and communicating + normally with one another. If you are going to + remove a large percentage of your sites from your + replication group, try removing just one site at a + time, pausing in between each removal to give the + replication group a chance to fully distribute all + writes before removing the next site.