From 780b92ada9afcf1d58085a83a0b9e6bc982203d1 Mon Sep 17 00:00:00 2001 From: Lorry Tar Creator Date: Tue, 17 Feb 2015 17:25:57 +0000 Subject: Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz. --- docs/programmer_reference/rep_lease.html | 511 +++++++++++++++++-------------- 1 file changed, 277 insertions(+), 234 deletions(-) (limited to 'docs/programmer_reference/rep_lease.html') diff --git a/docs/programmer_reference/rep_lease.html b/docs/programmer_reference/rep_lease.html index 6db976d0..301e11af 100644 --- a/docs/programmer_reference/rep_lease.html +++ b/docs/programmer_reference/rep_lease.html @@ -3,7 +3,7 @@ - Master Leases + Master leases @@ -14,17 +14,15 @@ -

- Some applications have strict requirements about the consistency of - data read on a master site. Berkeley DB provides a mechanism called - master leases to provide such consistency. Without master leases, it - is sometimes possible for Berkeley DB to return old data to an - application when newer data is available due to unfortunate scheduling - as illustrated below: -

+

+ Some applications have strict requirements about the + consistency of data read on a master site. Berkeley DB + provides a mechanism called master leases to provide such + consistency. Without master leases, it is sometimes possible + for Berkeley DB to return old data to an application when + newer data is available due to unfortunate scheduling as + illustrated below: +

    -
  1. Application on master site: Read data item - foo via Berkeley DB DB->get() or DBC->get() call. -
  2. -
  3. Application on master site: sleep, get descheduled, etc. -
  4. -
  5. System: Master changes role, becomes a client. -
  6. -
  7. System: New site is elected master. -
  8. -
  9. System: New master modifies data item - foo. -
  10. -
  11. Application: Berkeley DB returns old data for - foo to application. -
  12. +
  13. Application on master + site: Read data item + foo via Berkeley DB DB->get() or + DBC->get() call. +
  14. +
  15. Application on master + site: sleep, get descheduled, etc. +
  16. +
  17. System: Master changes + role, becomes a client. +
  18. +
  19. System: New site is + elected master. +
  20. +
  21. System: New master + modifies data item foo. +
  22. +
  23. Application: Berkeley DB + returns old data for foo to + application. +
-

- By using master leases, Berkeley DB can provide guarantees about the - consistency of data read on a master site. The master site can be - considered a recognized authority for the data and consequently can - provide authoritative reads. Clients grant master leases to a master - site. By doing so, clients acknowledge the right of that site to - retain the role of master for a period of time. During that period of - time, clients cannot elect a new master, become master, nor grant their - lease to another site. -

-

- By holding a collection of granted leases, a master site can guarantee - to the application that the data returned is the current, authoritative - value. As a master performs operations, it continually requests - updated grants from the clients. When a read operation is required, - the master guarantees that it holds a valid collection of lease grants - from clients before returning data to the application. By holding - leases, Berkeley DB provides several guarantees to the application: -

+

+ By using master leases, Berkeley DB can provide guarantees + about the consistency of data read on a master site. The + master site can be considered a recognized authority for the + data and consequently can provide authoritative reads. Clients + grant master leases to a master site. By doing so, clients + acknowledge the right of that site to retain the role of + master for a period of time. During that period of time, + clients cannot elect a new master, become master, or grant + their lease to another site. +

+

+ By holding a collection of granted leases, a master site + can guarantee to the application that the data returned is the + current, authoritative value. As a master performs operations, + it continually requests updated grants from the clients. When + a read operation is required, the master guarantees that it + holds a valid collection of lease grants from clients before + returning data to the application. By holding leases, Berkeley + DB provides several guarantees to the application: +

  1. - Authoritative reads: A guarantee that the data being read by the - application is the current value. -
  2. + Authoritative reads: A guarantee that the data + being read by the application is the current value. +
  3. - Durability from rollback: A guarantee that the data being - written or read by the application is permanent across a - majority of client sites and will never be rolled back. -

    -

    - The rollback guarantee also depends on the DB_TXN_NOSYNC flag. The - guarantee is effective as long as there isn't total replication group - failure while clients have granted leases but are holding the updates - in their cache. The application must weigh the performance impact of - synchronous transactions against the risk of total replication group - failure. If clients grant a lease while holding updated data in cache, - and total failure occurs, then the data is no longer present on the - clients and rollback can occur if the master also crashes. -

    + Durability from rollback: A guarantee that the data + being written or read by the application is permanent + across a majority of client sites and will never be + rolled back. +

    - The guarantee that data will not be rolled back applies only to data - successfully committed on a master. Data read on a client, or read - while ignoring leases can be rolled back. -

    + The rollback guarantee also depends on the + DB_TXN_NOSYNC flag. The guarantee is effective as + long as there isn't a failure of half of the + replication group while clients have granted leases + but are holding the updates in their cache. The + application must weigh the performance impact of + synchronous transactions against the risk of the + failure of at least half of the replication group. If + clients grant a lease while holding updated data in + cache, and failure occurs, then the data is no longer + present on the clients and rollback can occur if a + sufficient number of other sites also crash. +

    +

    + The guarantee that data will not be rolled back + applies only to data successfully committed on a + master. Data read on a client, or read while ignoring + leases can be rolled back. +

  4. +

    + Freshness: A guarantee that the data being read by + the application on the master is + up-to-date and has not been modified or removed during + the read. +

    - Freshness: A guarantee that the data being read by the - application on the master is up-to-date - and has not been modified or removed during the read. -

    -

    - The read authority is only on the master. Read operations on a - client always ignore leases and consequently, these operations - can return stale data. -

    + The read authority is only on the master. Read + operations on a client always ignore leases and + consequently, these operations can return stale data. +

  5. -

    - Master viability: A guarantee that a current master with valid - leases cannot encounter a duplicate master situation. -

    -

    - Leases remove the possibility of a duplicate master situation that - forces the current master to downgrade to a client. However, it is - still possible that old masters with expired leases can discover a - later master and return DB_REP_DUPMASTER to the application. -

    +

    + Master viability: A guarantee that a current master + with valid leases cannot encounter a duplicate master + situation. +

    +

    + Leases remove the possibility of a duplicate master + situation that forces the current master to downgrade + to a client. However, it is still possible that old + masters with expired leases can discover a later + master and return DB_REP_DUPMASTER to the + application. +

- There are several requirements of the application using leases: -

+ There are several requirements of the application using + leases: +

    +
  1. + Replication Manager applications must configure a + majority (or larger) acknowledgement policy via the + DB_ENV->repmgr_set_ack_policy() method. Base API applications must + implement and enforce such a policy on their own. +
  2. +
  3. + Base API applications must return an error from the + send callback function when the majority acknowledgement + policy is not met for permanent records marked with + DB_REP_PERMANENT. Note that the Replication Manager + automatically fulfills this requirement. +
  4. +
  5. + Base API applications must set the number of sites + in the group using the DB_ENV->rep_set_nsites() method before starting + replication and cannot change it during operation. +
  6. +
  7. + Using leases in a replication group is all or none. + Behavior is undefined when some sites configure leases and + others do not. Use the DB_ENV->rep_set_config() method to turn on + leases. +
  8. - Replication Manager applications must configure a majority (or - larger) acknowledgement policy via the DB_ENV->repmgr_set_ack_policy() method. - Base API applications must implement and enforce such a policy on - their own. -
  9. -
  10. - Base API applications must return an error from the send callback - function when the majority acknowledgement policy is not met for - permanent records marked with DB_REP_PERMANENT. Note that the - Replication Manager automatically fulfills this requirement. -
  11. + The configured lease timeout value must be the same + on all sites in a replication group, set via the + DB_ENV->rep_set_timeout() method. + +
  12. + The configured clock skew ratio must be the same on + all sites in a replication group. This value defaults to + no skew, but can be set via the DB_ENV->rep_set_clockskew() method. +
  13. +
  14. + Applications that care about read guarantees must + perform all read operations on the master. Reading on a + client does not guarantee freshness. +
  15. - Base API applications must set the number of sites in the group - using the DB_ENV->rep_set_nsites() method before starting replication and cannot - change it during operation. -
  16. + The application must use elections to choose a + master site. It must never simply declare a master without + having won an election (as is allowed without Master + Leases). +
  17. - Using leases in a replication group is all or none. Behavior is - undefined when some sites configure leases and others do not. Use - the DB_ENV->rep_set_config() method to turn on leases. -
  18. -
  19. - The configured lease timeout value must be the same on all sites - in a replication group, set via the DB_ENV->rep_set_timeout() method. -
  20. -
  21. - The configured clock_scale_factor value must be the same on all - sites in a replication group. This value defaults to no skew, but - can be set via the DB_ENV->rep_set_clockskew() method. -
  22. -
  23. - Applications that care about read guarantees must perform all read - operations on the master. Reading on a client does not guarantee - freshness. -
  24. -
  25. - The application must use elections to choose a master site. It - must never simply declare a master without having won an election - (as is allowed without Master Leases). -
  26. + Unelectable (zero priority) sites never grant + leases and cannot be used to guarantee data durability. A + majority of sites in the replication group must be + electable in order to meet the requirement of getting + lease grants from a majority of sites. Minimizing the + number of unelectable sites improves replication group + availability. +
+

+ Master leases are based on timeouts. Berkeley DB assumes + that time always runs forward. Users who change the system + clock on either client or master sites when leases are in use + void all guarantees and can get undefined behavior. See the + DB_ENV->rep_set_timeout() method for more information. +

- Master leases are based on timeouts. Berkeley DB assumes that time - always runs forward. Users who change the system clock on either - client or master sites when leases are in use void all guarantees and - can get undefined behavior. See the DB_ENV->rep_set_timeout() method for more - information. -

-

- Applications using master leases should be prepared to handle - DB_REP_LEASE_EXPIRED errors from read operations - on a master and from the DB_TXN->commit() method. -

-

- Read operations on a master that should not be subject to leases can - use the DB_IGNORE_LEASE flag to the DB->get() method. Read operations - on a client always imply leases are ignored. -

-

- Master lease checks cannot succeed until a majority of sites have - completed client synchronization. Read operations on a master performed - before this condition is met can use the DB_IGNORE_LEASE flag to - avoid errors. -

+ Applications using master leases should be prepared to + handle DB_REP_LEASE_EXPIRED errors from + read operations on a master and from the DB_TXN->commit() method. +

+

+ Read operations on a master that should not be subject to + leases can use the DB_IGNORE_LEASE flag to the DB->get() + method. Read operations on a client always imply leases are + ignored. +

+

+ Master lease checks cannot succeed until a majority of + sites have completed client synchronization. Read operations + on a master performed before this condition is met can use the + DB_IGNORE_LEASE flag to avoid errors. +

- Clients are forbidden from participating in elections while they have - an outstanding lease granted to a master. Therefore, if the DB_ENV->rep_elect() - method is called, then Berkeley DB will block, waiting until its lease - grant expires before participating in any election. While it waits, - the client attempts to contact the current master. If the client finds - a current master, then it returns from the DB_ENV->rep_elect() method. When - leases are configured and the lease has never yet been granted (on - start-up), clients must wait a full lease timeout before participating - in an election. -

+ Clients are forbidden from participating in elections while + they have an outstanding lease granted to a master. Therefore, + if the DB_ENV->rep_elect() method is called, then Berkeley DB will + block, waiting until its lease grant expires before + participating in any election. While it waits, the client + attempts to contact the current master. If the client finds a + current master, then it returns from the DB_ENV->rep_elect() method. + When leases are configured and the lease has never yet been + granted (on start-up), clients must wait a full lease timeout + before participating in an election. +

-

Changing Group Size

+

Changing group + size

- If you are using master leases and you change the size of your - replication group, there is a remote possibility that you can - lose some data previously thought to be durable. This is only - true for users of the Base API. + If you are using master leases and you change the size + of your replication group, there is a remote possibility + that you can lose some data previously thought to be + durable. This is only true for users of the Base API.

- The problem can arise if you are removing sites from your - replication group. (You might be increasing the size of your - site overall, but if you remove all of the wrong sites you can - lose data.) + The problem can arise if you are removing sites from + your replication group. (You might be increasing the size + of your group overall, but if you remove all of the wrong + sites you can lose data.)

-

- Suppose you have a replication group with five sites; A, B, C, - D and E; and you are using a quorum acknowledgement policy. Then: +

+ Suppose you have a replication group with five sites; + A, B, C, D and E; and you are using a quorum + acknowledgement policy. Then:

  1. -

    - Master A replicates a transaction to replicas B and C. - Those sites acknowledge the write activity. +

    + Master A replicates a transaction to replicas B + and C. Those sites acknowledge the write activity.

  2. -

    - Sites D and E do not receive the transaction. However, - B and C have acknowledged the transaction, which means the - acknowledgement policy is met and so the transaction is - considered durable. +

    + Sites D and E do not receive the transaction. + However, B and C have acknowledged the + transaction, which means the acknowledgement + policy is met and so the transaction is considered + durable.

  3. - You shutdown sites B and C. Now only A has the transaction. + You shut down sites B and C. Now only A has the + transaction.

  4. - You increase the size of your replication group to 3 - using DB_ENV->rep_set_nsites(). + You decrease the size of your replication group + to 3 using DB_ENV->rep_set_nsites().

  5. - You shutdown or otherwise lose site A. + You shut down or otherwise lose site A.

  6. -

    - Sites D and E hold an election. Because the size of the - replication group is 3, they have enough sites to - successfully hold an election. However, neither site - has the transaction in question. In this way, the - transaction can become lost. +

    + Sites D and E hold an election. Because the + size of the replication group is 3, they have + enough sites to successfully hold an election. + However, neither site has the transaction in + question. In this way, the transaction can become + lost.

- An alternative scenario exists where you do not change the size - of your replication group, or you actually increase the size of - your replication group, but in the process you happen to remove - the exact wrong sites: + An alternative scenario exists where you do not change + the size of your replication group, or you actually + increase the size of your replication group, but in the + process you happen to remove the exact wrong sites:

  1. - Master A replicates a transaction to replicas B and C. - Those sites acknowledge the write activity. + Master A replicates a transaction to replicas B + and C. Those sites acknowledge the write activity.

  2. -

    - Sites D and E do not receive the transaction. However, - B and C have acknowledged the transaction, which means the - acknowledgement policy is met and so the transaction is - considered durable. +

    + Sites D and E do not receive the transaction. + However, B and C have acknowledged the + transaction, which means the acknowledgement + policy is met and so the transaction is considered + durable.

  3. -

    - You shutdown sites B and C. Now only A has the transaction. +

    + You shut down sites B and C. Now only A has the + transaction.

  4. - You add three new sites to your replication group: F, - G and H, increasing the size of your replication group - to 6 using DB_ENV->rep_set_nsites(). + You add three new sites to your replication + group: F, G and H, increasing the size of your + replication group to 6 using DB_ENV->rep_set_nsites().

  5. - You shutdown or otherwise lose site A before F, G and H - can be fully populated with data. + You shut down or otherwise lose site A before + F, G and H can be fully populated with data.

  6. - Sites D, E, F, G and H hold an election. Because the size of the - replication group is 6, they have enough sites to - successfully hold an election. However, none of these sites - has the transaction in question. In this way, the - transaction can become lost. + Sites D, E, F, G and H hold an election. + Because the size of the replication group is 6, + they have enough sites to successfully hold an + election. However, none of these sites has the + transaction in question. In this way, the + transaction can become lost.

- This scenario represents a race condition that would be highly - unlikely to be seen outside of a lab environment. To minimize - the chance of this race condition occurring to the absolute - minimum, do one or more of the following when using master - leases with the Base API: + This scenario represents a race condition that would be + highly unlikely to be seen outside of a lab environment. + To minimize the chance of this race condition occurring to + the absolute minimum, do one or more of the following when + using master leases with the Base API:

  1. - Require all sites to acknowledge transaction commits. + Require all sites to acknowledge transaction + commits.

  2. - Never change the size of your replication group unless - all sites in the group are running and communicating - normally with one another. + Never change the size of your replication group + unless all sites in the group are running and + communicating normally with one another.

  3. -

    - Don't remove (or replace) a large percentage of your - sites from your replication group unless all sites in - the group are running and communicating normally with - one another. If you are going to remove a large - percentage of your sites from your replication group, - try removing just one site at a time, pausing in - between each removal to give the replication group a - chance to fully distribute all writes before removing - the next site. +

    + Don't remove (or replace) a large percentage of + your sites from your replication group unless all + sites in the group are running and communicating + normally with one another. If you are going to + remove a large percentage of your sites from your + replication group, try removing just one site at a + time, pausing in between each removal to give the + replication group a chance to fully distribute all + writes before removing the next site.

-- cgit v1.2.1