diff options
Diffstat (limited to 'docs/programmer_reference/transapp_reclimit.html')
| -rw-r--r-- | docs/programmer_reference/transapp_reclimit.html | 322 |
1 files changed, 165 insertions, 157 deletions
diff --git a/docs/programmer_reference/transapp_reclimit.html b/docs/programmer_reference/transapp_reclimit.html index 47994027..c93d399b 100644 --- a/docs/programmer_reference/transapp_reclimit.html +++ b/docs/programmer_reference/transapp_reclimit.html @@ -14,7 +14,7 @@ <body> <div xmlns="" class="navheader"> <div class="libver"> - <p>Library Version 11.2.5.3</p> + <p>Library Version 12.1.6.1</p> </div> <table width="100%" summary="Navigation header"> <tr> @@ -22,9 +22,7 @@ </tr> <tr> <td width="20%" align="left"><a accesskey="p" href="transapp_filesys.html">Prev</a> </td> - <th width="60%" align="center">Chapter 11. - Berkeley DB Transactional Data Store Applications - </th> + <th width="60%" align="center">Chapter 11. Berkeley DB Transactional Data Store Applications </th> <td width="20%" align="right"> <a accesskey="n" href="transapp_tune.html">Next</a></td> </tr> </table> @@ -38,29 +36,30 @@ </div> </div> </div> - <p> - Berkeley DB recovery is based on write-ahead logging. This means - that when a change is made to a database page, a description of the - change is written into a log file. This description in the log - file is guaranteed to be written to stable storage before the - database pages that were changed are written to stable storage. - This is the fundamental feature of the logging system that makes - durability and rollback work. + <p> + Berkeley DB recovery is based on write-ahead logging. This + means that when a change is made to a database page, a + description of the change is written into a log file. This + description in the log file is guaranteed to be written to + stable storage before the database pages that were changed are + written to stable storage. This is the fundamental feature of + the logging system that makes durability and rollback work. </p> - <p> - If the application or system crashes, the log is reviewed during - recovery. Any database changes described in the log that were part - of committed transactions and that were never written to the actual - database itself are written to the database as part of recovery. - Any database changes described in the log that were never committed - and that were written to the actual database itself are backed-out - of the database as part of recovery. This design allows the - database to be written lazily, and only blocks from the log file - have to be forced to disk as part of transaction commit. + <p> + If the application or system crashes, the log is reviewed + during recovery. Any database changes described in the log + that were part of committed transactions and that were never + written to the actual database itself are written to the + database as part of recovery. Any database changes described + in the log that were never committed and that were written to + the actual database itself are backed-out of the database as + part of recovery. This design allows the database to be + written lazily, and only blocks from the log file have to be + forced to disk as part of transaction commit. </p> - <p> - There are two interfaces that are a concern when considering - Berkeley DB recoverability: + <p> + There are two interfaces that are a concern when + considering Berkeley DB recoverability: </p> <div class="orderedlist"> <ol type="1"> @@ -69,153 +68,162 @@ system/filesystem. </li> <li> - The interface between the operating system/filesystem and the - underlying stable storage hardware. + The interface between the operating + system/filesystem and the underlying stable storage + hardware. </li> </ol> </div> - <p> - Berkeley DB uses the operating system interfaces and its underlying - filesystem when writing its files. This means that Berkeley DB can - fail if the underlying filesystem fails in some unrecoverable way. - Otherwise, the interface requirements here are simple: The system - call that Berkeley DB uses to flush data to disk (normally fsync or - fdatasync), must guarantee that all the information necessary for a - file's recoverability has been written to stable storage before it - returns to Berkeley DB, and that no possible application or system - crash can cause that file to be unrecoverable. + <p> + Berkeley DB uses the operating system interfaces and its + underlying filesystem when writing its files. This means that + Berkeley DB can fail if the underlying filesystem fails in + some unrecoverable way. Otherwise, the interface requirements + here are simple: The system call that Berkeley DB uses to + flush data to disk (normally fsync or fdatasync), must + guarantee that all the information necessary for a file's + recoverability has been written to stable storage before it + returns to Berkeley DB, and that no possible application or + system crash can cause that file to be unrecoverable. </p> - <p> - In addition, Berkeley DB implicitly uses the interface between the - operating system and the underlying hardware. The interface - requirements here are not as simple. + <p> + In addition, Berkeley DB implicitly uses the interface + between the operating system and the underlying hardware. The + interface requirements here are not as simple. </p> - <p> - First, it is necessary to consider the underlying page size of the - Berkeley DB databases. The Berkeley DB library performs all - database writes using the page size specified by the application, - and Berkeley DB assumes pages are written atomically. This means - that if the operating system performs filesystem I/O in blocks of - different sizes than the database page size, it may increase the - possibility for database corruption. For example, assume that - Berkeley DB is writing 32KB pages for a database, and the operating - system does filesystem I/O in 16KB blocks. If the operating system - writes the first 16KB of the database page successfully, but - crashes before being able to write the second 16KB of the database, - the database has been corrupted and this corruption may or may not - be detected during recovery. For this reason, it may be important - to select database page sizes that will be written as single block - transfers by the underlying operating system. If you do not select - a page size that the underlying operating system will write as a - single block, you may want to configure the database to use - checksums (see the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> flag for more information). By - configuring checksums, you guarantee this kind of corruption will - be detected at the expense of the CPU required to generate the - checksums. When such an error is detected, the only course of - recovery is to perform catastrophic recovery to restore the - database. + <p> + First, it is necessary to consider the underlying page size + of the Berkeley DB databases. The Berkeley DB library performs + all database writes using the page size specified by the + application, and Berkeley DB assumes pages are written + atomically. This means that if the operating system performs + filesystem I/O in blocks of different sizes than the database + page size, it may increase the possibility for database + corruption. For example, assume that Berkeley DB is writing + 32KB pages for a database, and the operating system does + filesystem I/O in 16KB blocks. If the operating system writes + the first 16KB of the database page successfully, but crashes + before being able to write the second 16KB of the database, + the database has been corrupted and this corruption may or may + not be detected during recovery. For this reason, it may be + important to select database page sizes that will be written + as single block transfers by the underlying operating system. + If you do not select a page size that the underlying operating + system will write as a single block, you may want to configure + the database to use checksums (see the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> flag for + more information). By configuring checksums, you guarantee + this kind of corruption will be detected at the expense of the + CPU required to generate the checksums. When such an error is + detected, the only course of recovery is to perform + catastrophic recovery to restore the database. </p> - <p> - Second, if you are copying database files (either as part of doing - a hot backup or creation of a hot failover area), there is an - additional question related to the page size of the Berkeley DB - databases. You must copy databases atomically, in units of the - database page size. In other words, the reads made by the copy - program must not be interleaved with writes by other threads of - control, and the copy program must read the databases in multiples - of the underlying database page size. On Unix systems, this is not - a problem, as these operating systems already make this guarantee - and system utilities normally read in power-of-2 sized chunks, - which are larger than the largest possible Berkeley DB database - page size. Other operating systems, particularly those based on - Linux and Windows, do not provide this guarantee and hot backups may - not be performed on these systems by reading data from the file - system. The <a href="../api_reference/C/db_hotbackup.html" class="olink">db_hotbackup</a> utility should be used on these - systems. + <p> + Second, if you are copying database files (either as part + of doing a hot backup or creation of a hot failover area), + there is an additional question related to the page size of + the Berkeley DB databases. You must copy databases atomically, + in units of the database page size. In other words, the reads + made by the copy program must not be interleaved with writes + by other threads of control, and the copy program must read + the databases in multiples of the underlying database page + size. On Unix systems, this is not a problem, as these + operating systems already make this guarantee and system + utilities normally read in power-of-2 sized chunks, which are + larger than the largest possible Berkeley DB database page + size. Other operating systems, particularly those based on + Linux and Windows, do not provide this guarantee and hot + backups may not be performed on these systems by reading data + from the file system. The <a href="../api_reference/C/db_hotbackup.html" class="olink">db_hotbackup</a> utility should be used on + these systems. </p> <p> - An additional problem we have seen in this area was in some - releases of Solaris where the cp utility was implemented using the - mmap system call rather than the read system call. Because the - Solaris' mmap system call did not make the same guarantee of read - atomicity as the read system call, using the cp utility could - create corrupted copies of the databases. Another problem we have - seen is implementations of the tar utility doing 10KB block reads - by default, and even when an output block size was specified to - that utility, not reading from the underlying databases in - multiples of the block size. Using the dd utility instead of the - cp or tar utilities (and specifying an appropriate block size), - fixes these problems. If you plan to use a system utility to copy - database files, you may want to use a system call trace utility - (for example, ktrace or truss) to check for an I/O size smaller - than or not a multiple of the database page size and system calls - other than read. + An additional problem we have seen in this area was in some + releases of Solaris where the cp utility was implemented using + the mmap system call rather than the read system call. Because + the Solaris' mmap system call did not make the same guarantee + of read atomicity as the read system call, using the cp + utility could create corrupted copies of the databases. + Another problem we have seen is implementations of the tar + utility doing 10KB block reads by default, and even when an + output block size was specified to that utility, not reading + from the underlying databases in multiples of the block size. + Using the dd utility instead of the cp or tar utilities (and + specifying an appropriate block size), fixes these problems. + If you plan to use a system utility to copy database files, + you may want to use a system call trace utility (for example, + ktrace or truss) to check for an I/O size smaller than or not + a multiple of the database page size and system calls other + than read. </p> - <p> - Third, it is necessary to consider the behavior of the system's - underlying stable storage hardware. For example, consider a SCSI - controller that has been configured to cache data and return to the - operating system that the data has been written to stable storage, - when, in fact, it has only been written into the controller RAM - cache. If power is lost before the controller is able to flush its - cache to disk, and the controller cache is not stable (that is, the - writes will not be flushed to disk when power returns), the writes - will be lost. If the writes include database blocks, there is no - loss because recovery will correctly update the database. If the - writes include log file blocks, it is possible that transactions - that were already committed may not appear in the recovered - database, although the recovered database will be coherent after a - crash. + <p> + Third, it is necessary to consider the behavior of the + system's underlying stable storage hardware. For example, + consider a SCSI controller that has been configured to cache + data and return to the operating system that the data has been + written to stable storage, when, in fact, it has only been + written into the controller RAM cache. If power is lost before + the controller is able to flush its cache to disk, and the + controller cache is not stable (that is, the writes will not + be flushed to disk when power returns), the writes will be + lost. If the writes include database blocks, there is no loss + because recovery will correctly update the database. If the + writes include log file blocks, it is possible that + transactions that were already committed may not appear in the + recovered database, although the recovered database will be + coherent after a crash. </p> - <p> - If the underlying hardware can fail in any way so that only part of - the block was written, the failure conditions are the same as those - described previously for an operating system failure that writes - only part of a logical database block. In such cases, configuring - the database for checksums will ensure the corruption is - detected. + <p> + If the underlying hardware can fail in any way so that only + part of the block was written, the failure conditions are the + same as those described previously for an operating system + failure that writes only part of a logical database block. In + such cases, configuring the database for checksums will ensure + the corruption is detected. </p> - <p> - For these reasons, it may be important to select hardware that does - not do partial writes and does not cache data writes (or does not - return that the data has been written to stable storage until it - has either been written to stable storage or the actual writing of - all of the data is guaranteed, barring catastrophic hardware - failure — that is, your disk drive exploding). + <p> + For these reasons, it may be important to select hardware + that does not do partial writes and does not cache data writes + (or does not return that the data has been written to stable + storage until it has either been written to stable storage or + the actual writing of all of the data is guaranteed, barring + catastrophic hardware failure — that is, your disk drive + exploding). </p> - <p> - If the disk drive on which you are storing your databases explodes, - you can perform normal Berkeley DB catastrophic recovery, because - it requires only a snapshot of your databases plus the log files - you have archived since those snapshots were taken. In this case, - you should lose no database changes at all. + <p> + If the disk drive on which you are storing your databases + explodes, you can perform normal Berkeley DB catastrophic + recovery, because it requires only a snapshot of your + databases plus the log files you have archived since those + snapshots were taken. In this case, you should lose no + database changes at all. </p> - <p> - If the disk drive on which you are storing your log files explodes, - you can also perform catastrophic recovery, but you will lose any - database changes made as part of transactions committed since your - last archival of the log files. Alternatively, if your database - environment and databases are still available after you lose the - log file disk, you should be able to dump your databases. However, - you may see an inconsistent snapshot of your data after doing the - dump, because changes that were part of transactions that were not - yet committed may appear in the database dump. Depending on the - value of the data, a reasonable alternative may be to perform both - the database dump and the catastrophic recovery and then compare - the databases created by the two methods. + <p> + If the disk drive on which you are storing your log files + explodes, you can also perform catastrophic recovery, but you + will lose any database changes made as part of transactions + committed since your last archival of the log files. + Alternatively, if your database environment and databases are + still available after you lose the log file disk, you should + be able to dump your databases. However, you may see an + inconsistent snapshot of your data after doing the dump, + because changes that were part of transactions that were not + yet committed may appear in the database dump. Depending on + the value of the data, a reasonable alternative may be to + perform both the database dump and the catastrophic recovery + and then compare the databases created by the two methods. </p> - <p> - Regardless, for these reasons, storing your databases and log files - on different disks should be considered a safety measure as well as - a performance enhancement. + <p> + Regardless, for these reasons, storing your databases and + log files on different disks should be considered a safety + measure as well as a performance enhancement. </p> - <p> - Finally, you should be aware that Berkeley DB does not protect - against all cases of stable storage hardware failure, nor does it - protect against simple hardware misbehavior (for example, a disk - controller writing incorrect data to the disk). However, - configuring the database for checksums will ensure that any such - corruption is detected. + <p> + Finally, you should be aware that Berkeley DB does not + protect against all cases of stable storage hardware failure, + nor does it protect against simple hardware misbehavior (for + example, a disk controller writing incorrect data to the + disk). However, configuring the database for checksums will + ensure that any such corruption is detected. </p> </div> <div class="navfooter"> |
