summaryrefslogtreecommitdiff
path: root/docs/programmer_reference/rep_trans.html
diff options
context:
space:
mode:
authorLorry Tar Creator <lorry-tar-importer@baserock.org>2015-02-17 17:25:57 +0000
committer <>2015-03-17 16:26:24 +0000
commit780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch)
tree598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/rep_trans.html
parent7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff)
downloadberkeleydb-master.tar.gz
Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz.HEADdb-6.1.23master
Diffstat (limited to 'docs/programmer_reference/rep_trans.html')
-rw-r--r--docs/programmer_reference/rep_trans.html565
1 files changed, 322 insertions, 243 deletions
diff --git a/docs/programmer_reference/rep_trans.html b/docs/programmer_reference/rep_trans.html
index 059ad396..91c2a827 100644
--- a/docs/programmer_reference/rep_trans.html
+++ b/docs/programmer_reference/rep_trans.html
@@ -9,12 +9,12 @@
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="rep.html" title="Chapter 12.  Berkeley DB Replication" />
<link rel="prev" href="rep_bulk.html" title="Bulk transfer" />
- <link rel="next" href="rep_lease.html" title="Master Leases" />
+ <link rel="next" href="rep_lease.html" title="Master leases" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
- <p>Library Version 11.2.5.3</p>
+ <p>Library Version 12.1.6.1</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
@@ -22,9 +22,7 @@
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="rep_bulk.html">Prev</a> </td>
- <th width="60%" align="center">Chapter 12. 
- Berkeley DB Replication
- </th>
+ <th width="60%" align="center">Chapter 12.  Berkeley DB Replication </th>
<td width="20%" align="right"> <a accesskey="n" href="rep_lease.html">Next</a></td>
</tr>
</table>
@@ -38,269 +36,350 @@
</div>
</div>
</div>
- <p>It is important to consider replication in the context of the overall
-database environment's transactional guarantees. To briefly review,
-transactional guarantees in a non-replicated application are based on
-the writing of log file records to "stable storage", usually a disk
-drive. If the application or system then fails, the Berkeley DB logging
-information is reviewed during recovery, and the databases are updated
-so that all changes made as part of committed transactions appear, and
-all changes made as part of uncommitted transactions do not appear. In
-this case, no information will have been lost.</p>
- <p>If a database environment does not require the log be flushed to
-stable storage on transaction commit (using the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a>
-flag to increase performance at the cost of sacrificing transactional
-durability), Berkeley DB recovery will only be able to restore the system to
-the state of the last commit found on stable storage. In this case,
-information may have been lost (for example, the changes made by some
-committed transactions may not appear in the databases after recovery).</p>
- <p>Further, if there is database or log file loss or corruption (for
-example, if a disk drive fails), then catastrophic recovery is
-necessary, and Berkeley DB recovery will only be able to restore the system
-to the state of the last archived log file. In this case, information
-may also have been lost.</p>
- <p>Replicating the database environment extends this model, by adding a
-new component to "stable storage": the client's replicated information.
-If a database environment is replicated, there is no lost information
-in the case of database or log file loss, because the replicated system
-can be configured to contain a complete set of databases and log records
-up to the point of failure. A database environment that loses a disk
-drive can have the drive replaced, and it can then rejoin the
-replication group.</p>
- <p>Because of this new component of stable storage, specifying
-<a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> in a replicated environment no longer sacrifices
-durability, as long as one or more clients have acknowledged receipt of
-the messages sent by the master. Since network connections are often
-faster than local synchronous disk writes, replication becomes a way
-for applications to significantly improve their performance as well as
-their reliability.</p>
- <p>The return status from the application's <span class="bold"><strong>send</strong></span> function must be
-set by the application to ensure the transactional guarantees the
-application wants to provide. Whenever the <span class="bold"><strong>send</strong></span> function
-returns failure, the local database environment's log is flushed as
-necessary to ensure that any information critical to database integrity
-is not lost. Because this flush is an expensive operation in terms of
-database performance, applications should avoid returning an error from
-the <span class="bold"><strong>send</strong></span> function, if at all possible.</p>
- <p>The only interesting message type for replication transactional
-guarantees is when the application's <span class="bold"><strong>send</strong></span> function was called
-with the <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag specified. There is no reason
-for the <span class="bold"><strong>send</strong></span> function to ever return failure unless the
-<a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag was specified -- messages without the
-<a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag do not make visible changes to databases,
-and the <span class="bold"><strong>send</strong></span> function can return success to Berkeley DB as soon as
-the message has been sent to the client(s) or even just copied to local
-application memory in preparation for being sent.</p>
- <p>When a client receives a <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> message, the client
-will flush its log to stable storage before returning (unless the client
-environment has been configured with the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> option).
-If the client is unable to flush a complete transactional record to disk
-for any reason (for example, there is a missing log record before the
-flagged message), the call to the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method on the client
-will return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_NOTPERM" class="olink">DB_REP_NOTPERM</a> and return the LSN of this record
-to the application in the <span class="bold"><strong>ret_lsnp</strong></span> parameter.
-The application's client or master
-message handling loops should take proper action to ensure the correct
-transactional guarantees in this case. When missing records arrive
-and allow subsequent processing of previously stored permanent
-records, the call to the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method on the client will
-return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_ISPERM" class="olink">DB_REP_ISPERM</a> and return the largest LSN of the
-permanent records that were flushed to disk. Client applications
-can use these LSNs to know definitively if any particular LSN is
-permanently stored or not.</p>
- <p>An application relying on a client's ability to become a master and
-guarantee that no data has been lost will need to write the <span class="bold"><strong>send</strong></span>
-function to return an error whenever it cannot guarantee the site that
-will win the next election has the record. Applications not requiring
-this level of transactional guarantees need not have the <span class="bold"><strong>send</strong></span>
-function return failure (unless the master's database environment has
-been configured with <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a>), as any information critical
-to database integrity has already been flushed to the local log before
-<span class="bold"><strong>send</strong></span> was called.</p>
- <p>To sum up, the only reason for the <span class="bold"><strong>send</strong></span> function to return
-failure is when the master database environment has been configured to
-not synchronously flush the log on transaction commit (that is,
-<a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> was configured on the master), the
-<a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag is specified for the message, and the
-<span class="bold"><strong>send</strong></span> function was unable to determine that some number of
-clients have received the current message (and all messages preceding
-the current message). How many clients need to receive the message
-before the <span class="bold"><strong>send</strong></span> function can return success is an application
-choice (and may not depend as much on a specific number of clients
-reporting success as one or more geographically distributed clients).</p>
- <p>If, however, the application does require on-disk durability on the master,
-the master should be configured to synchronously flush the log on commit.
-If clients are not configured to synchronously flush the log,
-that is, if a client is running with <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> configured,
-then it is up to the application to reconfigure that client
-appropriately when it becomes a master. That is, the
-application must explicitly call <a href="../api_reference/C/envset_flags.html" class="olink">DB_ENV-&gt;set_flags()</a> to
-disable asynchronous log flushing as part of re-configuring
-the client as the new master.</p>
- <p>Of course, it is important to ensure that the replicated master and
-client environments are truly independent of each other. For example,
-it does not help matters that a client has acknowledged receipt of a
-message if both master and clients are on the same power supply, as the
-failure of the power supply will still potentially lose information.</p>
- <p>Configuring your replication-based application to achieve the proper
-mix of performance and transactional guarantees can be complex. In
-brief, there are a few controls an application can set to configure the
-guarantees it makes: specification of <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> for the
-master environment, specification of <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> for the
-client environment, the priorities of different sites participating in
-an election, and the behavior of the application's <span class="bold"><strong>send</strong></span>
-function.</p>
- <p>Applications using Replication Manager are free to use
-<a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> at the master and/or clients as they see fit. The
-behavior of the <span class="bold"><strong>send</strong></span> function that Replication Manager provides
-on the application's behalf is determined by an "acknowledgement
-policy", which is configured by the <a href="../api_reference/C/repmgrset_ack_policy.html" class="olink">DB_ENV-&gt;repmgr_set_ack_policy()</a> method.
-Clients always send acknowledgements for <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a>
-messages (unless the acknowledgement policy in effect indicates that the
-master doesn't care about them). For a <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a>
-message, the master blocks the sending thread until either it receives
-the proper number of acknowledgements, or the <a href="../api_reference/C/repset_timeout.html#set_timeout_DB_REP_ACK_TIMEOUT" class="olink">DB_REP_ACK_TIMEOUT</a>
-expires. In the case of timeout, Replication Manager returns an error
-code from the <span class="bold"><strong>send</strong></span> function, causing Berkeley DB to flush the
-transaction log before returning to the application, as previously
-described. The default acknowledgement policy is
-<a href="../api_reference/C/repmgrset_ack_policy.html#ackspolicy_DB_REPMGR_ACKS_QUORUM" class="olink">DB_REPMGR_ACKS_QUORUM</a>, which ensures that the effect of a
-permanent record remains durable following an election.</p>
- <p>First, it is rarely useful to write and synchronously flush the log when
-a transaction commits on a replication client. It may be useful where
-systems share resources and multiple systems commonly fail at the same
-time. By default, all Berkeley DB database environments, whether master or
-client, synchronously flush the log on transaction commit or prepare.
-Generally, replication masters and clients turn log flush off for
-transaction commit using the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> flag.</p>
- <p>Consider two systems connected by a network interface. One acts as the
-master, the other as a read-only client. The client takes over as
-master if the master crashes and the master rejoins the replication
-group after such a failure. Both master and client are configured to
-not synchronously flush the log on transaction commit (that is,
-<a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> was configured on both systems). The
-application's <span class="bold"><strong>send</strong></span> function never returns failure to the Berkeley DB
-library, simply forwarding messages to the client (perhaps over a
-broadcast mechanism), and always returning success. On the client, any
-<a href="../api_reference/C/repmessage.html#repmsg_DB_REP_NOTPERM" class="olink">DB_REP_NOTPERM</a> returns from the client's <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method
-are ignored, as well. This system configuration has excellent
-performance, but may lose data in some failure modes.</p>
- <p>If both the master and the client crash at once, it is possible to lose
-committed transactions, that is, transactional durability is not being
-maintained. Reliability can be increased by providing separate power
-supplies for the systems and placing them in separate physical locations.</p>
- <p>If the connection between the two machines fails (or just some number
-of messages are lost), and subsequently the master crashes, it is
-possible to lose committed transactions. Again, transactional
-durability is not being maintained. Reliability can be improved in a
-couple of ways:</p>
+ <p>
+ It is important to consider replication in the context of
+ the overall database environment's transactional guarantees.
+ To briefly review, transactional guarantees in a
+ non-replicated application are based on the writing of log
+ file records to "stable storage", usually a disk drive. If the
+ application or system then fails, the Berkeley DB logging
+ information is reviewed during recovery, and the databases are
+ updated so that all changes made as part of committed
+ transactions appear, and all changes made as part of
+ uncommitted transactions do not appear. In this case, no
+ information will have been lost.
+ </p>
+ <p>
+ If a database environment does not require the log be
+ flushed to stable storage on transaction commit (using the
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> flag to increase performance at the cost of
+ sacrificing transactional durability), Berkeley DB recovery
+ will only be able to restore the system to the state of the
+ last commit found on stable storage. In this case, information
+ may have been lost (for example, the changes made by some
+ committed transactions may not appear in the databases after
+ recovery).
+ </p>
+ <p>
+ Further, if there is database or log file loss or corruption
+ (for example, if a disk drive fails), then catastrophic
+ recovery is necessary, and Berkeley DB recovery will only be
+ able to restore the system to the state of the last archived
+ log file. In this case, information may also have been
+ lost.
+ </p>
+ <p>
+ Replicating the database environment extends this model, by
+ adding a new component to "stable storage": the client's
+ replicated information. If a database environment is
+ replicated, there is no lost information in the case of
+ database or log file loss, because the replicated system can
+ be configured to contain a complete set of databases and log
+ records up to the point of failure. A database environment
+ that loses a disk drive can have the drive replaced, and it
+ can then rejoin the replication group.
+ </p>
+ <p>
+ Because of this new component of stable storage, specifying
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> in a replicated environment no longer
+ sacrifices durability, as long as one or more clients have
+ acknowledged receipt of the messages sent by the master. Since
+ network connections are often faster than local synchronous
+ disk writes, replication becomes a way for applications to
+ significantly improve their performance as well as their
+ reliability.
+ </p>
+ <p>
+ Applications using Replication Manager are free to use
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> at the master and/or clients as they see fit.
+ The behavior of the <span class="bold"><strong>send</strong></span>
+ function that Replication Manager provides on the
+ application's behalf is determined by an "acknowledgement
+ policy", which is configured by the <a href="../api_reference/C/repmgrset_ack_policy.html" class="olink">DB_ENV-&gt;repmgr_set_ack_policy()</a> method.
+ Clients always send acknowledgements for <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a>
+ messages (unless the acknowledgement policy in effect
+ indicates that the master doesn't care about them). For a
+ <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> message, the master blocks the sending
+ thread until either it receives the proper number of
+ acknowledgements, or the <a href="../api_reference/C/repset_timeout.html#set_timeout_DB_REP_ACK_TIMEOUT" class="olink">DB_REP_ACK_TIMEOUT</a> expires. In the
+ case of timeout, Replication Manager returns an error code
+ from the <span class="bold"><strong>send</strong></span> function,
+ causing Berkeley DB to flush the transaction log before
+ returning to the application, as previously described. The
+ default acknowledgement policy is <a href="../api_reference/C/repmgrset_ack_policy.html#ackspolicy_DB_REPMGR_ACKS_QUORUM" class="olink">DB_REPMGR_ACKS_QUORUM</a>,
+ which ensures that the effect of a permanent record remains
+ durable following an election.
+ </p>
+ <p>
+ The rest of this section discusses transactional guarantee
+ considerations in Base API applications.
+ </p>
+ <p>
+ The return status from the application's <span class="bold"><strong>send</strong></span> function must be set by the
+ application to ensure the transactional guarantees the
+ application wants to provide. Whenever the <span class="bold"><strong>send</strong></span> function returns failure, the
+ local database environment's log is flushed as necessary to
+ ensure that any information critical to database integrity is
+ not lost. Because this flush is an expensive operation in
+ terms of database performance, applications should avoid
+ returning an error from the <span class="bold"><strong>send</strong></span> function,
+ if at all possible.
+ </p>
+ <p>
+ The only interesting message type for replication
+ transactional guarantees is when the application's <span class="bold"><strong>send</strong></span> function was called with the
+ <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag specified. There is no reason for the
+ <span class="bold"><strong>send</strong></span> function to ever
+ return failure unless the <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag was
+ specified -- messages without the <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag do
+ not make visible changes to databases, and the <span class="bold"><strong>send</strong></span> function can return success to
+ Berkeley DB as soon as the message has been sent to the
+ client(s) or even just copied to local application memory in
+ preparation for being sent.
+ </p>
+ <p>
+ When a client receives a <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> message, the
+ client will flush its log to stable storage before returning
+ (unless the client environment has been configured with the
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> option). If the client is unable to flush a
+ complete transactional record to disk for any reason (for
+ example, there is a missing log record before the flagged
+ message), the call to the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method on the client
+ will return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_NOTPERM" class="olink">DB_REP_NOTPERM</a> and return the LSN of this record
+ to the application in the <span class="bold"><strong>ret_lsnp</strong></span> parameter.
+ The application's client or master message handling loops should take proper
+ action to ensure the correct transactional guarantees in this case. When
+ missing records arrive and allow subsequent processing of
+ previously stored permanent records, the call to the
+ <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method on the client will return <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_ISPERM" class="olink">DB_REP_ISPERM</a>
+ and return the largest LSN of the permanent records that were
+ flushed to disk. Client applications can use these LSNs to
+ know definitively if any particular LSN is permanently stored
+ or not.
+ </p>
+ <p>
+ An application relying on a client's ability to become a
+ master and guarantee that no data has been lost will need to
+ write the <span class="bold"><strong>send</strong></span> function to
+ return an error whenever it cannot guarantee the site that
+ will win the next election has the record. Applications not
+ requiring this level of transactional guarantees need not have
+ the <span class="bold"><strong>send</strong></span> function return
+ failure (unless the master's database environment has been
+ configured with <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a>), as any information critical
+ to database integrity has already been flushed to the local
+ log before <span class="bold"><strong>send</strong></span> was
+ called.
+ </p>
+ <p>
+ To sum up, the only reason for the <span class="bold"><strong>send</strong></span> function
+ to return failure is when the
+ master database environment has been configured to not
+ synchronously flush the log on transaction commit (that is,
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> was configured on the master), the
+ <a href="../api_reference/C/reptransport.html#transport_DB_REP_PERMANENT" class="olink">DB_REP_PERMANENT</a> flag is specified for the message, and the
+ <span class="bold"><strong>send</strong></span> function was unable
+ to determine that some number of clients have received the
+ current message (and all messages preceding the current
+ message). How many clients need to receive the message before
+ the <span class="bold"><strong>send</strong></span> function can return
+ success is an application choice (and may not depend as much
+ on a specific number of clients reporting success as one or
+ more geographically distributed clients).
+ </p>
+ <p>
+ If, however, the application does require on-disk durability
+ on the master, the master should be configured to
+ synchronously flush the log on commit. If clients are not
+ configured to synchronously flush the log, that is, if a
+ client is running with <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> configured, then it is
+ up to the application to reconfigure that client appropriately
+ when it becomes a master. That is, the application must
+ explicitly call <a href="../api_reference/C/envset_flags.html" class="olink">DB_ENV-&gt;set_flags()</a> to disable asynchronous log
+ flushing as part of re-configuring the client as the new
+ master.
+ </p>
+ <p>
+ Of course, it is important to ensure that the replicated
+ master and client environments are truly independent of each
+ other. For example, it does not help matters that a client has
+ acknowledged receipt of a message if both master and clients
+ are on the same power supply, as the failure of the power
+ supply will still potentially lose information.
+ </p>
+ <p>
+ Configuring a Base API application to achieve the proper mix
+ of performance and transactional guarantees can be complex. In
+ brief, there are a few controls an application can set to
+ configure the guarantees it makes: specification of
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> for the master environment, specification of
+ <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> for the client environment, the priorities of
+ different sites participating in an election, and the behavior
+ of the application's <span class="bold"><strong>send</strong></span>
+ function.
+ </p>
+ <p>
+ First, it is rarely useful to write and synchronously flush
+ the log when a transaction commits on a replication client. It
+ may be useful where systems share resources and multiple
+ systems commonly fail at the same time. By default, all
+ Berkeley DB database environments, whether master or client,
+ synchronously flush the log on transaction commit or prepare.
+ Generally, replication masters and clients turn log flush off
+ for transaction commit using the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> flag.
+ </p>
+ <p>
+ Consider two systems connected by a network interface. One
+ acts as the master, the other as a read-only client. The
+ client takes over as master if the master crashes and the
+ master rejoins the replication group after such a failure.
+ Both master and client are configured to not synchronously
+ flush the log on transaction commit (that is, <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a>
+ was configured on both systems). The application's <span class="bold"><strong>send</strong></span> function never returns failure
+ to the Berkeley DB library, simply forwarding messages to the
+ client (perhaps over a broadcast mechanism), and always
+ returning success. On the client, any <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_NOTPERM" class="olink">DB_REP_NOTPERM</a> returns
+ from the client's <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method are ignored, as well.
+ This system configuration has excellent performance, but may
+ lose data in some failure modes.
+ </p>
+ <p>
+ If both the master and the client crash at once, it is
+ possible to lose committed transactions, that is,
+ transactional durability is not being maintained. Reliability
+ can be increased by providing separate power supplies for the
+ systems and placing them in separate physical
+ locations.
+ </p>
+ <p>
+ If the connection between the two machines fails (or just
+ some number of messages are lost), and subsequently the master
+ crashes, it is possible to lose committed transactions. Again,
+ transactional durability is not being maintained. Reliability
+ can be improved in a couple of ways:
+ </p>
<div class="orderedlist">
<ol type="1">
<li>
<p>
- Use a reliable network protocol (for example, TCP/IP instead of UDP).
- </p>
+ Use a reliable network protocol (for example,
+ TCP/IP instead of UDP).
+ </p>
</li>
<li>
<p>
- Increase the number of clients and network paths to make it
- less likely that a message will be lost. In this case, it is
- important to also make sure a client that did receive the
- message wins any subsequent election. If a client that did not
- receive the message wins a subsequent election, data can still
- be lost.
- </p>
+ Increase the number of clients and network paths to
+ make it less likely that a message will be lost. In
+ this case, it is important to also make sure a client
+ that did receive the message wins any subsequent
+ election. If a client that did not receive the message
+ wins a subsequent election, data can still be lost.
+ </p>
</li>
</ol>
</div>
- <p>Further, systems may want to guarantee message delivery to the client(s)
-(for example, to prevent a network connection from simply discarding
-messages). Some systems may want to ensure clients never return
-out-of-date information, that is, once a transaction commit returns
-success on the master, no client will return old information to a
-read-only query. Some of the following changes to a Base API application
-may be used to address these issues:</p>
+ <p>
+ Further, systems may want to guarantee message delivery to
+ the client(s) (for example, to prevent a network connection
+ from simply discarding messages). Some systems may want to
+ ensure clients never return out-of-date information, that is,
+ once a transaction commit returns success on the master, no
+ client will return old information to a read-only query. Some
+ of the following changes to a Base API application may be used
+ to address these issues:
+ </p>
<div class="orderedlist">
<ol type="1">
<li>
- <p>
- Write the application's <span class="bold"><strong>send</strong></span>
- function to not return to Berkeley DB until one or more clients
- have acknowledged receipt of the message. The number of
- clients chosen will be dependent on the application: you will
- want to consider likely network partitions (ensure that a
- client at each physical site receives the message) and
- geographical diversity (ensure that a client on each coast
- receives the message).
- </p>
+ <p>
+ Write the application's <span class="bold"><strong>send</strong></span> function
+ to not return to Berkeley DB until one or more clients have
+ acknowledged receipt of the message. The number of
+ clients chosen will be dependent on the application:
+ you will want to consider likely network partitions
+ (ensure that a client at each physical site receives
+ the message) and geographical diversity (ensure that a
+ client on each coast receives the message).
+ </p>
</li>
<li>
<p>
- Write the client's message processing loop to not acknowledge
- receipt of the message until a call to the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method
- has returned success. Messages resulting in a return of
- <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_NOTPERM" class="olink">DB_REP_NOTPERM</a> from the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method mean the message
- could not be flushed to the client's disk. If the client does
- not acknowledge receipt of such messages to the master until a
- subsequent call to the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method returns
- <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_ISPERM" class="olink">DB_REP_ISPERM</a> and the LSN returned is at least as large as
- this message's LSN, then the master's <span class="bold"><strong>send</strong></span> function will not return
- success to the Berkeley DB library. This means the thread
- committing the transaction on the master will not be allowed to
- proceed based on the transaction having committed until the
- selected set of clients have received the message and consider
- it complete.
- </p>
+ Write the client's message processing loop to not
+ acknowledge receipt of the message until a call to the
+ <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method has returned success. Messages
+ resulting in a return of <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_NOTPERM" class="olink">DB_REP_NOTPERM</a> from the
+ <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method mean the message could not be
+ flushed to the client's disk. If the client does not
+ acknowledge receipt of such messages to the master
+ until a subsequent call to the <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method
+ returns <a href="../api_reference/C/repmessage.html#repmsg_DB_REP_ISPERM" class="olink">DB_REP_ISPERM</a> and the LSN returned is at
+ least as large as this message's LSN, then the
+ master's <span class="bold"><strong>send</strong></span>
+ function will not return success to the Berkeley DB
+ library. This means the thread committing the
+ transaction on the master will not be allowed to
+ proceed based on the transaction having committed
+ until the selected set of clients have received the
+ message and consider it complete.
+ </p>
<p>
- Alternatively, the client's message processing loop could
- acknowledge the message to the master, but with an error code
- indicating that the application's <span class="bold"><strong>send</strong></span> function should not return to
- the Berkeley DB library until a subsequent acknowledgement from
- the same client indicates success.
- </p>
+ Alternatively, the client's message processing loop
+ could acknowledge the message to the master, but with
+ an error code indicating that the application's
+ <span class="bold"><strong>send</strong></span> function
+ should not return to the Berkeley DB library until a
+ subsequent acknowledgement from the same client
+ indicates success.
+ </p>
<p>
- The application send callback function invoked by Berkeley DB
- contains an LSN of the record being sent (if appropriate for
- that record). When <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a> method returns indicators that
- a permanent record has been written then it also returns the
- maximum LSN of the permanent record written.
- </p>
+ The application send callback function invoked by
+ Berkeley DB contains an LSN of the record being sent
+ (if appropriate for that record). When <a href="../api_reference/C/repmessage.html" class="olink">DB_ENV-&gt;rep_process_message()</a>
+ method returns indicators that a permanent record has
+ been written then it also returns the maximum LSN of
+ the permanent record written.
+ </p>
</li>
</ol>
</div>
- <p>There is one final pair of failure scenarios to consider. First, it is
-not possible to abort transactions after the application's <span class="bold"><strong>send</strong></span>
-function has been called, as the master may have already written the
-commit log records to disk, and so abort is no longer an option.
-Second, a related problem is that even though the master will attempt
-to flush the local log if the <span class="bold"><strong>send</strong></span> function returns failure,
-that flush may fail (for example, when the local disk is full). Again,
-the transaction cannot be aborted as one or more clients may have
-committed the transaction even if <span class="bold"><strong>send</strong></span> returns failure. Rare
-applications may not be able to tolerate these unlikely failure modes.
-In that case the application may want to:</p>
+ <p>
+ There is one final pair of failure scenarios to consider.
+ First, it is not possible to abort transactions after the
+ application's <span class="bold"><strong>send</strong></span> function
+ has been called, as the master may have already written the
+ commit log records to disk, and so abort is no longer an
+ option. Second, a related problem is that even though the
+ master will attempt to flush the local log if the <span class="bold"><strong>send</strong></span> function returns failure, that
+ flush may fail (for example, when the local disk is full).
+ Again, the transaction cannot be aborted as one or more
+ clients may have committed the transaction even if <span class="bold"><strong>send</strong></span> returns failure. Rare
+ applications may not be able to tolerate these unlikely
+ failure modes. In that case the application may want
+ to:
+ </p>
<div class="orderedlist">
<ol type="1">
<li>
- <p>
- Configure the master to do always local synchronous commits
- (turning off the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a> configuration). This will
- decrease performance significantly, of course (one of the
- reasons to use replication is to avoid local disk writes.) In
- this configuration, failure to write the local log will cause
- the transaction to abort in all cases.
- </p>
+ <p>
+ Configure the master to do always local synchronous
+ commits (turning off the <a href="../api_reference/C/envset_flags.html#envset_flags_DB_TXN_NOSYNC" class="olink">DB_TXN_NOSYNC</a>
+ configuration). This will decrease performance
+ significantly, of course (one of the reasons to use
+ replication is to avoid local disk writes.) In this
+ configuration, failure to write the local log will
+ cause the transaction to abort in all cases.
+ </p>
</li>
<li>
- <p>
- Do not return from the application's <span class="bold"><strong>send</strong></span> function under any conditions,
- until the selected set of clients has acknowledged the message.
- Until the <span class="bold"><strong>send</strong></span> function
- returns to the Berkeley DB library, the thread committing the
- transaction on the master will wait, and so no application will
- be able to act on the knowledge that the transaction has
- committed.
- </p>
+ <p>
+ Do not return from the application's <span class="bold"><strong>send</strong></span> function under any
+ conditions, until the selected set of clients has
+ acknowledged the message. Until the <span class="bold"><strong>send</strong></span> function returns to
+ the Berkeley DB library, the thread committing the
+ transaction on the master will wait, and so no
+ application will be able to act on the knowledge that
+ the transaction has committed.
+ </p>
</li>
</ol>
</div>
@@ -320,7 +399,7 @@ In that case the application may want to:</p>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
- <td width="40%" align="right" valign="top"> Master Leases</td>
+ <td width="40%" align="right" valign="top"> Master leases</td>
</tr>
</table>
</div>