diff options
| author | Lorry Tar Creator <lorry-tar-importer@baserock.org> | 2015-02-17 17:25:57 +0000 |
|---|---|---|
| committer | <> | 2015-03-17 16:26:24 +0000 |
| commit | 780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch) | |
| tree | 598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/transapp_app.html | |
| parent | 7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff) | |
| download | berkeleydb-master.tar.gz | |
Diffstat (limited to 'docs/programmer_reference/transapp_app.html')
| -rw-r--r-- | docs/programmer_reference/transapp_app.html | 736 |
1 files changed, 429 insertions, 307 deletions
diff --git a/docs/programmer_reference/transapp_app.html b/docs/programmer_reference/transapp_app.html index abab1363..b5b09e82 100644 --- a/docs/programmer_reference/transapp_app.html +++ b/docs/programmer_reference/transapp_app.html @@ -14,17 +14,16 @@ <body> <div xmlns="" class="navheader"> <div class="libver"> - <p>Library Version 11.2.5.3</p> + <p>Library Version 12.1.6.1</p> </div> <table width="100%" summary="Navigation header"> <tr> - <th colspan="3" align="center">Architecting Transactional Data Store applications</th> + <th colspan="3" align="center">Architecting Transactional Data + Store applications</th> </tr> <tr> <td width="20%" align="left"><a accesskey="p" href="transapp_fail.html">Prev</a> </td> - <th width="60%" align="center">Chapter 11. - Berkeley DB Transactional Data Store Applications - </th> + <th width="60%" align="center">Chapter 11. Berkeley DB Transactional Data Store Applications </th> <td width="20%" align="right"> <a accesskey="n" href="transapp_env_open.html">Next</a></td> </tr> </table> @@ -34,350 +33,473 @@ <div class="titlepage"> <div> <div> - <h2 class="title" style="clear: both"><a id="transapp_app"></a>Architecting Transactional Data Store applications</h2> + <h2 class="title" style="clear: both"><a id="transapp_app"></a>Architecting Transactional Data + Store applications</h2> </div> </div> </div> + <p> + When building Transactional Data Store applications, the + architecture decisions involve application startup (running + recovery) and handling system or application failure. For + details on performing recovery, see the <a class="xref" href="transapp_recovery.html" title="Recovery procedures">Recovery procedures</a>. + </p> + <p> + Recovery in a database environment is a single-threaded + procedure, that is, one thread of control or process must + complete database environment recovery before any other thread + of control or process operates in the Berkeley DB environment. + </p> + <p> + Performing recovery first marks any existing database + environment as "failed" and then removes it, causing threads + of control running in the database environment to fail and + return to the application. This feature allows applications to + recover environments without concern for threads of control + that might still be running in the removed environment. The + subsequent re-creation of the database environment is + serialized, so multiple threads of control attempting to + create a database environment will serialize behind a single + creating thread. + </p> + <p> + One consideration in removing (as part of recovering) a + database environment which may be in use by another thread, is + the type of mutex being used by the Berkeley DB library. In + the case of database environment failure when using + test-and-set mutexes, threads of control waiting on a mutex + when the environment is marked "failed" will quickly notice + the failure and will return an error from the Berkeley DB API. + In the case of environment failure when using blocking + mutexes, where the underlying system mutex implementation does + not unblock mutex waiters after the thread of control holding + the mutex dies, threads waiting on a mutex when an environment + is recovered might hang forever. Applications blocked on + events (for example, an application blocked on a network + socket, or a GUI event) may also fail to notice environment + recovery within a reasonable amount of time. Systems with such + mutex implementations are rare, but do exist; applications on + such systems should use an application architecture where the + thread recovering the database environment can explicitly + terminate any process using the failed environment, or + configure Berkeley DB for test-and-set mutexes, or incorporate + some form of long-running timer or watchdog process to wake or + kill blocked processes should they block for too long. + </p> <p> - When building Transactional Data Store applications, the architecture - decisions involve application startup (running recovery) and handling - system or application failure. For details on performing recovery, see - the <a class="xref" href="transapp_recovery.html" title="Recovery procedures">Recovery procedures</a>. -</p> + Regardless, it makes little sense for multiple threads of + control to simultaneously attempt recovery of a database + environment, since the last one to run will remove all + database environments created by the threads of control that + ran before it. However, for some applications, it may make + sense for applications to have a single thread of control that + performs recovery and then removes the database environment, + after which the application launches a number of processes, + any of which will create the database environment and continue + forward. + </p> <p> - Recovery in a database environment is a single-threaded procedure, that - is, one thread of control or process must complete database environment - recovery before any other thread of control or process operates in the - Berkeley DB environment. -</p> - <p> - Performing recovery first marks any existing database environment as - "failed" and then removes it, causing threads of control running in the - database environment to fail and return to the application. This - feature allows applications to recover environments without concern for - threads of control that might still be running in the removed - environment. The subsequent re-creation of the database environment is - serialized, so multiple threads of control attempting to create a - database environment will serialize behind a single creating thread. -</p> - <p> - One consideration in removing (as part of recovering) a database - environment which may be in use by another thread, is the type of mutex - being used by the Berkeley DB library. In the case of database - environment failure when using test-and-set mutexes, threads of control - waiting on a mutex when the environment is marked "failed" will quickly - notice the failure and will return an error from the Berkeley DB API. - In the case of environment failure when using blocking mutexes, where - the underlying system mutex implementation does not unblock mutex - waiters after the thread of control holding the mutex dies, threads - waiting on a mutex when an environment is recovered might hang forever. - Applications blocked on events (for example, an application blocked on - a network socket, or a GUI event) may also fail to notice environment - recovery within a reasonable amount of time. Systems with such mutex - implementations are rare, but do exist; applications on such systems - should use an application architecture where the thread recovering the - database environment can explicitly terminate any process using the - failed environment, or configure Berkeley DB for test-and-set mutexes, - or incorporate some form of long-running timer or watchdog process to - wake or kill blocked processes should they block for too long. -</p> - <p> - Regardless, it makes little sense for multiple threads of control to - simultaneously attempt recovery of a database environment, since the - last one to run will remove all database environments created by the - threads of control that ran before it. However, for some applications, - it may make sense for applications to have a single thread of control - that performs recovery and then removes the database environment, after - which the application launches a number of processes, any of which will - create the database environment and continue forward. -</p> - <p> - There are three common ways to architect Berkeley DB Transactional Data - Store applications. The one chosen is usually based on whether or not - the application is comprised of a single process or group of processes - descended from a single process (for example, a server started when the - system first boots), or if the application is comprised of unrelated - processes (for example, processes started by web connections or users - logged into the system). -</p> + There are four ways to architect Berkeley DB + Transactional Data Store applications. The one chosen is + usually based on whether or not the application is comprised + of a single process or group of processes descended from a + single process (for example, a server started when the system + first boots), or if the application is comprised of unrelated + processes (for example, processes started by web connections + or users logged into the system). + </p> <div class="orderedlist"> <ol type="1"> <li> <p> - The first way to architect Transactional Data Store - applications is as a single process (the process may or may not - be multithreaded.) - </p> + The first way to architect Transactional Data Store + applications is as a single process (the process may + or may not be multithreaded.) + </p> + <p> + When this process starts, it runs recovery on the + database environment and then opens its databases. The + application can subsequently create new threads as it + chooses. Those threads can either share already open + Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and <a href="../api_reference/C/db.html" class="olink">DB</a> handles, or create their + own. In this architecture, databases are rarely opened + or closed when more than a single thread of control is + running; that is, they are opened when only a single + thread is running, and closed after all threads but + one have exited. The last thread of control to exit + closes the databases and the database environment. + </p> + <p> + This architecture is simplest to implement because + thread serialization is easy and failure detection + does not require monitoring multiple processes. + </p> <p> - When this process starts, it runs recovery on the database - environment and then opens its databases. The application can - subsequently create new threads as it chooses. Those threads - can either share already open Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and <a href="../api_reference/C/db.html" class="olink">DB</a> - handles, or create their own. In this architecture, databases - are rarely opened or closed when more than a single thread of - control is running; that is, they are opened when only a single - thread is running, and closed after all threads but one have - exited. The last thread of control to exit closes the - databases and the database environment. - </p> + If the application's thread model allows processes + to continue after thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> + method can be used to determine if the database + environment is usable after thread failure. If the + application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or + <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, + the application must + behave as if there has been a system failure, + performing recovery and re-creating the database + environment. Once these actions have been taken, other + threads of control can continue (as long as all + existing Berkeley DB handles are first discarded). + </p> <p> - This architecture is simplest to implement because thread - serialization is easy and failure detection does not require - monitoring multiple processes. - </p> - <p> - If the application's thread model allows processes to continue - after thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method can be used to - determine if the database environment is usable after thread - failure. If the application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or - <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns - <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, - the application must - behave as if there has been a system failure, performing - recovery and re-creating the database environment. Once these - actions have been taken, other threads of control can continue - (as long as all existing Berkeley DB handles are first - discarded). - </p> + Note that by default <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> will only notify the + calling thread that the database environment is unusable. + However, you can optionally cause <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> to broadcast + this to other threads of control by using the + <code class="literal">--enable-failchk_broadcast</code> flag when you + compile your Berkeley DB library. If this option is turned + on, then all threads of control using the database + environment will return + <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a> + when they attempt to obtain a mutex lock. In this + situation, a <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_FAILCHK_PANIC" class="olink">DB_EVENT_FAILCHK_PANIC</a> or + <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_MUTEX_DIED" class="olink">DB_EVENT_MUTEX_DIED</a> event will also be + raised. (You use <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> to examine events). + </p> </li> <li> + <p> + The second way to architect Transactional Data + Store applications is as a group of related processes + (the processes may or may not be multithreaded). + </p> <p> - The second way to architect Transactional Data Store - applications is as a group of related processes (the processes - may or may not be multithreaded). - </p> - <p> - This architecture requires the order in which threads of control are - created be controlled to serialize database environment recovery. - </p> - <p> - In addition, this architecture requires that threads of control - be monitored. If any thread of control exits with open - Berkeley DB handles, the application may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> - method to detect lost mutexes and locks and determine if the - application can continue. If the application does not call - <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the database - environment can no longer be used, the application must behave - as if there has been a system failure, performing recovery and - creating a new database environment. Once these actions have - been taken, other threads of control can be continued (as long - as all existing Berkeley DB handles are first discarded), or - - </p> + This architecture requires the order in which + threads of control are created be controlled to + serialize database environment recovery. + </p> <p> - The easiest way to structure groups of related processes is to - first create a single "watcher" process (often a script) that - starts when the system first boots, runs recovery on the - database environment and then creates the processes or threads - that will actually perform work. The initial thread has no - further responsibilities other than to wait on the threads of - control it has started, to ensure none of them unexpectedly - exit. If a thread of control exits, the watcher process - optionally calls the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. If the application - does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the - environment can no longer be used, the watcher kills all of the - threads of control using the failed environment, runs recovery, - and starts new threads of control to perform work. - </p> + In addition, this architecture requires that + threads of control be monitored. If any thread of + control exits with open Berkeley DB handles, the + application may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method to detect + lost mutexes and locks and determine if the + application can continue. If the application does not + call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the + database environment can no longer be used, the + application must behave as if there has been a system + failure, performing recovery and creating a new + database environment. Once these actions have been + taken, other threads of control can be continued (as + long as all existing Berkeley DB handles are first + discarded). + </p> + <p> + The easiest way to structure groups of related + processes is to first create a single "watcher" + process (often a script) that starts when the system + first boots, runs recovery on the database environment + and then creates the processes or threads that will + actually perform work. The initial thread has no + further responsibilities other than to wait on the + threads of control it has started, to ensure none of + them unexpectedly exit. If a thread of control exits, + the watcher process optionally calls the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> + method. If the application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> + or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the environment can no + longer be used, the watcher kills all of the threads + of control using the failed environment, runs + recovery, and starts new threads of control to perform + work. + </p> </li> <li> <p> - The third way to architect Transactional Data Store - applications is as a group of unrelated processes (the - processes may or may not be multithreaded). This is the most - difficult architecture to implement because of the level of - difficulty in some systems of finding and monitoring unrelated - processes. There are several possible techniques to implement - this architecture. - </p> + The third way to architect Transactional Data Store + applications is as a group of related processes that rely + on <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> broadcasting to inform other threads and + processes that recovery is required. <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> + broadcasting is not enabled by default for the DB + library, but using broadcasting means that a watcher + process is not required. Instead, if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> fails + then all other threads and processes operating in that + environment will also be notified of that failure so that + they can know to run recovery. + </p> <p> - One solution is to log a thread of control ID when a new - Berkeley DB handle is opened. For example, an initial - "watcher" process could run recovery on the database - environment and then create a sentinel file. Any "worker" - process wanting to use the environment would check for the - sentinel file. If the sentinel file does not exist, the worker - would fail or wait for the sentinel file to be created. Once - the sentinel file exists, the worker would register its process - ID with the watcher (via shared memory, IPC or some other - registry mechanism), and then the worker would open its - <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed. When the worker finishes - using the environment, it would unregister its process ID with - the watcher. The watcher periodically checks to ensure that no - worker has failed while using the environment. If a worker - fails while using the environment, the watcher removes the - sentinel file, kills all of the workers currently using the - environment, runs recovery on the environment, and finally - creates a new sentinel file. - </p> + To enable <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> broadcasting use the + <code class="literal">--enable-failchk_broadcast</code> flag when you + configure the library. On Windows, use + <code class="literal">HAVE_FAILCHK_BROADCAST</code> when you compile + the library. + </p> <p> - The weakness of this approach is that, on some systems, it is - difficult to determine if an unrelated process is still - running. For example, POSIX systems generally disallow sending - signals to unrelated processes. The trick to monitoring - unrelated processes is to find a system resource held by the - process that will be modified if the process dies. On POSIX - systems, flock- or fcntl-style locking will work, as will - LockFile on Windows systems. Other systems may have to use - other process-related information such as file reference counts - or modification times. In the worst case, threads of control - can be required to periodically re-register with the watcher - process: if the watcher has not heard from a thread of control - in a specified period of time, the watcher will take action, - recovering the environment. - </p> - <p> - The Berkeley DB library includes one built-in implementation of this approach, - the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> method's <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag: - </p> - <p> - If the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is set, each process opening the - database environment first checks to see if recovery needs to - be performed. If recovery needs to be performed for any reason - (including the initial creation of the database environment), - and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is also specified, recovery will be performed - and then the open will proceed normally. If recovery needs to - be performed and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not specified, - <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a> - will be returned. If recovery does not need to be performed, - <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> will be ignored. - </p> - <p> - Prior to the actual recovery beginning, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_PANIC" class="olink">DB_EVENT_REG_PANIC</a> - event is set for the environment. Processes in the application using - the <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> method will be notified when they do their next - operations in the environment. Processes receiving this event should - exit the environment. Also, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_ALIVE" class="olink">DB_EVENT_REG_ALIVE</a> event will be - triggered if there are other processes currently attached to the - environment. Only the process doing the recovery will receive this - event notification. It will receive this notification once for each - process still attached to the environment. The parameter of the - <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> callback will contain the process identifier of the - process still attached. The process doing the recovery can then - signal the attached process or perform some other operation prior to - recovery (i.e. kill the attached process). - </p> - <p> - The <a href="../api_reference/C/envset_timeout.html" class="olink">DB_ENV->set_timeout()</a> method's <a href="../api_reference/C/envset_timeout.html#set_timeout_DB_SET_REG_TIMEOUT" class="olink">DB_SET_REG_TIMEOUT</a> flag can be set to - establish a wait period before starting recovery. This creates a - window of time for other processes to receive the DB_EVENT_REG_PANIC - event and exit the environment. - </p> - <p> - There are three additional requirements for the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> - architecture to work: - </p> + If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> broadcasting is enabled for your library + and a thread of control encounters a failure when + <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> is run, then all other threads and processes + operating in that environment will be notified. If a + failure is broadcast, then threads and processes will + receive + <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a> + when they attempt to preform any one of a range of + activities, including: + </p> <div class="itemizedlist"> <ul type="disc"> <li> <p> - First, all applications using the database environment must - specify the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag when opening the environment. - However, there is no additional requirement if the application - chooses a single process to recover the environment, as the - first process to open the database environment will know to - perform recovery. - </p> + When entering a DB API. + </p> </li> <li> <p> - Second, there can only be a single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database - environment in each process. As the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> locking is - per-process, not per-thread, multiple <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles in a single - environment could race with each other, potentially causing - data corruption. - </p> + When locking a mutex. + </p> </li> <li> <p> - Third, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> implementation does not explicitly - terminate processes using the database environment which is - being recovered. Instead, it relies on the processes - themselves noticing the database environment has been discarded - from underneath them. For this reason, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag - should be used with a mutex implementation that does not block - in the operating system, as that risks a thread of control - blocking forever on a mutex which will never be granted. Using - any test-and-set mutex implementation ensures this cannot - happen, and for that reason the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is generally - used with a test-and-set mutex implementation. - </p> + When performing disk or network I/O. + </p> </li> </ul> </div> <p> - A second solution for groups of unrelated processes is also - based on a "watcher process". This solution is intended for - systems where it is not practical to monitor the processes - sharing a database environment, but it is possible to monitor - the environment to detect if a thread of control has failed - holding open Berkeley DB handles. This would be done by having - a "watcher" process periodically call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. - If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the environment can no longer be - used, the watcher would then take action, recovering the - environment. - </p> + Threads and processes that are + monitoring events will also receive + <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_FAILCHK_PANIC" class="olink">DB_EVENT_FAILCHK_PANIC</a> or + <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_MUTEX_DIED" class="olink">DB_EVENT_MUTEX_DIED</a>. You use + <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> to examine events. + </p> + </li> + <li> + <p> + The fourth way to architect Transactional Data Store + applications is as a group of unrelated processes (the + processes may or may not be multithreaded). This is + the most difficult architecture to implement because + of the level of difficulty in some systems of finding + and monitoring unrelated processes. There are several + possible techniques to implement this architecture. + </p> + <p> + One solution is to log a thread of control ID when + a new Berkeley DB handle is opened. For example, an + initial "watcher" process could run recovery on the + database environment and then create a sentinel file. + Any "worker" process wanting to use the environment + would check for the sentinel file. If the sentinel + file does not exist, the worker would fail or wait for + the sentinel file to be created. Once the sentinel + file exists, the worker would register its process ID + with the watcher (via shared memory, IPC or some other + registry mechanism), and then the worker would open + its <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed. When the worker + finishes using the environment, it would unregister + its process ID with the watcher. The watcher + periodically checks to ensure that no worker has + failed while using the environment. If a worker fails + while using the environment, the watcher removes the + sentinel file, kills all of the workers currently + using the environment, runs recovery on the + environment, and finally creates a new sentinel file. + </p> + <p> + The weakness of this approach is that, on some + systems, it is difficult to determine if an unrelated + process is still running. For example, POSIX systems + generally disallow sending signals to unrelated + processes. The trick to monitoring unrelated processes + is to find a system resource held by the process that + will be modified if the process dies. On POSIX + systems, flock- or fcntl-style locking will work, as + will LockFile on Windows systems. Other systems may + have to use other process-related information such as + file reference counts or modification times. In the + worst case, threads of control can be required to + periodically re-register with the watcher process: if + the watcher has not heard from a thread of control in + a specified period of time, the watcher will take + action, recovering the environment. + </p> + <p> + The Berkeley DB library includes one built-in + implementation of this approach, the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> + method's <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag: + </p> <p> - The weakness of this approach is that all threads of control - using the environment must specify an "ID" function and an - "is-alive" function using the <a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV->set_thread_id()</a> method. (In - other words, the Berkeley DB library must be able to assign a - unique ID to each thread of control, and additionally determine - if the thread of control is still running. It can be difficult - to portably provide that information in applications using a - variety of different programming languages and running on a - variety of different platforms.) - </p> + If the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is set, each process + opening the database environment first checks to see + if recovery needs to be performed. If recovery needs + to be performed for any reason (including the initial + creation of the database environment), and + <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is also specified, recovery will be + performed and then the open will proceed normally. If + recovery needs to be performed and <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not + specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a> + will be returned. If recovery does not need to be performed, <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> + will be ignored. + </p> <p> - A third solution for groups of unrelated processes is a hybrid of the two - above. Along with implementing the built-in sentinel approach with the - the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> methods <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag, the <a href="../api_reference/C/envopen.html#envopen_DB_FAILCHK" class="olink">DB_FAILCHK</a> flag can be specified. - When using both flags, each process opening the database environment first - checks to see if recocvery needs to be performed. If recovery needs to be - performed for any reason, it will first determine if a thread of control - exited while holding database read locks, and release those. Then it will - abort any unresolved transactions. If these steps are successful, the process - opening the environment will continue without the need for any - additional recocvery. If these steps are unsuccessful, then additional - recovery will be performed if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is specified and if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not - specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>will be returned. - </p> + Prior to the actual recovery beginning, the + <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_PANIC" class="olink">DB_EVENT_REG_PANIC</a> event is set for the environment. + Processes in the application using the + <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> method will be notified when they do + their next operations in the environment. Processes + receiving this event should exit the environment. + Also, the <a href="../api_reference/C/envevent_notify.html#event_notify_DB_EVENT_REG_ALIVE" class="olink">DB_EVENT_REG_ALIVE</a> event will be triggered + if there are other processes currently attached to the + environment. Only the process doing the recovery will + receive this event notification. It will receive this + notification once for each process still attached to + the environment. The parameter of the + <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> callback will contain the process + identifier of the process still attached. The process + doing the recovery can then signal the attached + process or perform some other operation prior to + recovery (i.e. kill the attached process). + </p> <p> - Since this solution is hybrid of the first two, all of the requirements of both - of them must be implemented (will need "ID" function, "is-alive" function, - single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database, etc.) - </p> + The <a href="../api_reference/C/envset_timeout.html" class="olink">DB_ENV->set_timeout()</a> method's <a href="../api_reference/C/envset_timeout.html#set_timeout_DB_SET_REG_TIMEOUT" class="olink">DB_SET_REG_TIMEOUT</a> + flag can be set to establish a wait period before + starting recovery. This creates a window of time for + other processes to receive the DB_EVENT_REG_PANIC + event and exit the environment. + </p> <p> - The described approaches are different, and should not be - combined. Applications might use either the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> - approach, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> or the hybrid approach, but not together in - the same application. For example, a POSIX application written - as a library underneath a wide variety of interfaces and - differing APIs might choose the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach for a - few reasons: first, it does not require making periodic calls - to the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method; second, when implementing in a - variety of languages, is may be more difficult to specify - unique IDs for each thread of control; third, it may be more - difficult determine if a thread of control is still running, as - any particular thread of control is likely to lack sufficient - permissions to signal other processes. Alternatively, an - application with a dedicated watcher process, running with - appropriate permissions, might choose the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> approach - as supporting higher overall throughput and reliability, as - that approach allows the application to abort unresolved - transactions and continue forward without having to recover the - database environment. The hybrid approach is useful in situations - where running a dedicated watcher process is not practical but getting the - equivalent of <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> on the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> is important. - </p> + There are three additional requirements for the + <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> architecture to work: + </p> + <div class="itemizedlist"> + <ul type="disc"> + <li> + <p> + First, all applications using the database + environment must specify the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> + flag when opening the environment. However, + there is no additional requirement if the + application chooses a single process to + recover the environment, as the first process + to open the database environment will know to + perform recovery. + </p> + </li> + <li> + <p> + Second, there can only be a single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> + handle per database environment in each + process. As the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> locking is + per-process, not per-thread, multiple <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> + handles in a single environment could race + with each other, potentially causing data + corruption. + </p> + </li> + <li> + <p> + Third, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> implementation + does not explicitly terminate processes using + the database environment which is being + recovered. Instead, it relies on the processes + themselves noticing the database environment + has been discarded from underneath them. For + this reason, the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag should be + used with a mutex implementation that does not + block in the operating system, as that risks a + thread of control blocking forever on a mutex + which will never be granted. Using any + test-and-set mutex implementation ensures this + cannot happen, and for that reason the + <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag is generally used with a + test-and-set mutex implementation. + </p> + </li> + </ul> + </div> + <p> + A second solution for groups of unrelated processes + is also based on a "watcher process". This solution is + intended for systems where it is not practical to + monitor the processes sharing a database environment, + but it is possible to monitor the environment to + detect if a thread of control has failed holding open + Berkeley DB handles. This would be done by having a + "watcher" process periodically call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> + method. If <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns that the environment + can no longer be used, the watcher would then take + action, recovering the environment. + </p> + <p> + The weakness of this approach is that all threads + of control using the environment must specify an "ID" + function and an "is-alive" function using the + <a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV->set_thread_id()</a> method. (In other words, the + Berkeley DB library must be able to assign a unique ID + to each thread of control, and additionally determine + if the thread of control is still running. It can be + difficult to portably provide that information in + applications using a variety of different programming + languages and running on a variety of different + platforms.) + </p> + <p> + A third solution for groups of unrelated processes + is a hybrid of the two above. Along with implementing + the built-in sentinel approach with the the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> + methods <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> flag, the <a href="../api_reference/C/envopen.html#envopen_DB_FAILCHK" class="olink">DB_FAILCHK</a> flag can + be specified. When using both flags, each process + opening the database environment first checks to see + if recovery needs to be performed. If recovery needs + to be performed for any reason, it will first + determine if a thread of control exited while holding + database read locks, and release those. Then it will + abort any unresolved transactions. If these steps are + successful, the process opening the environment will + continue without the need for any additional recovery. + If these steps are unsuccessful, then additional + recovery will be performed if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is + specified and if <a href="../api_reference/C/envopen.html#envopen_DB_RECOVER" class="olink">DB_RECOVER</a> is not specified, <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a> + will be returned. + </p> + <p> + Since this solution is hybrid of the first two, all + of the requirements of both of them must be + implemented (will need "ID" function, "is-alive" + function, single <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handle per database, etc.) + </p> + <p> + The described approaches are different, and should + not be combined. Applications might use either the + <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> or the hybrid + approach, but not together in the same application. + For example, a POSIX application written as a library + underneath a wide variety of interfaces and differing + APIs might choose the <a href="../api_reference/C/envopen.html#envopen_DB_REGISTER" class="olink">DB_REGISTER</a> approach for a few + reasons: first, it does not require making periodic + calls to the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method; second, when + implementing in a variety of languages, is may be more + difficult to specify unique IDs for each thread of + control; third, it may be more difficult determine if + a thread of control is still running, as any + particular thread of control is likely to lack + sufficient permissions to signal other processes. + Alternatively, an application with a dedicated watcher + process, running with appropriate permissions, might + choose the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> approach as supporting higher + overall throughput and reliability, as that approach + allows the application to abort unresolved + transactions and continue forward without having to + recover the database environment. The hybrid approach + is useful in situations where running a dedicated + watcher process is not practical but getting the + equivalent of <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> on the <a href="../api_reference/C/envopen.html" class="olink">DB_ENV->open()</a> is + important. + </p> </li> </ol> </div> <p> - Obviously, when implementing a process to monitor other threads of - control, it is important the watcher process' code be as simple and - well-tested as possible, because the application may hang if it fails. -</p> + Obviously, when implementing a process to monitor other + threads of control, it is important the watcher process' code + be as simple and well-tested as possible, because the + application may hang if it fails. + </p> </div> <div class="navfooter"> <hr /> |
