From 780b92ada9afcf1d58085a83a0b9e6bc982203d1 Mon Sep 17 00:00:00 2001 From: Lorry Tar Creator Date: Tue, 17 Feb 2015 17:25:57 +0000 Subject: Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz. --- docs/programmer_reference/general_am_conf.html | 532 ++++++++++++++----------- 1 file changed, 296 insertions(+), 236 deletions(-) (limited to 'docs/programmer_reference/general_am_conf.html') diff --git a/docs/programmer_reference/general_am_conf.html b/docs/programmer_reference/general_am_conf.html index 729e1439..73f3909f 100644 --- a/docs/programmer_reference/general_am_conf.html +++ b/docs/programmer_reference/general_am_conf.html @@ -14,7 +14,7 @@ -

- There are a series of configuration tasks which are common to all - access methods. They are described in the following sections. -

+

+ There are a series of configuration tasks which are common + to all access methods. They are described in the following + sections. +

@@ -80,83 +80,95 @@

- The size of the pages used in the underlying database can be specified - by calling the DB->set_pagesize() method. The minimum page size is 512 - bytes and the maximum page size is 64K bytes, and must be a power of - two. If no page size is specified by the application, a page size is - selected based on the underlying filesystem I/O block size. (A page - size selected in this way has a lower limit of 512 bytes and an upper - limit of 16K bytes.) -

+ The size of the pages used in the underlying database can + be specified by calling the DB->set_pagesize() method. The + minimum page size is 512 bytes and the maximum page size is + 64K bytes, and must be a power of two. If no page size is + specified by the application, a page size is selected based on + the underlying filesystem I/O block size. (A page size + selected in this way has a lower limit of 512 bytes and an + upper limit of 16K bytes.) +

+

+ There are several issues to consider when selecting a + pagesize: overflow record sizes, locking, I/O efficiency, and + recoverability. +

- There are several issues to consider when selecting a pagesize: overflow - record sizes, locking, I/O efficiency, and recoverability. -

+ First, the page size implicitly sets the size of an + overflow record. Overflow records are key or data items that + are too large to fit on a normal database page because of + their size, and are therefore stored in overflow pages. + Overflow pages are pages that exist outside of the normal + database structure. For this reason, there is often a + significant performance penalty associated with retrieving or + modifying overflow records. Selecting a page size that is too + small, and which forces the creation of large numbers of + overflow pages, can seriously impact the performance of an + application. +

+

+ Second, in the Btree, Hash and Recno access methods, the + finest-grained lock that Berkeley DB acquires is for a page. + (The Queue access method generally acquires record-level locks + rather than page-level locks.) Selecting a page size that is + too large, and which causes threads or processes to wait + because other threads of control are accessing or modifying + records on the same page, can impact the performance of your + application. +

- First, the page size implicitly sets the size of an overflow record. - Overflow records are key or data items that are too large to fit on a - normal database page because of their size, and are therefore stored in - overflow pages. Overflow pages are pages that exist outside of the - normal database structure. For this reason, there is often a - significant performance penalty associated with retrieving or modifying - overflow records. Selecting a page size that is too small, and which - forces the creation of large numbers of overflow pages, can seriously - impact the performance of an application. -

+ Third, the page size specifies the granularity of I/O from + the database to the operating system. Berkeley DB will give a + page-sized unit of bytes to the operating system to be + scheduled for reading/writing from/to the disk. For many + operating systems, there is an internal + block size which is used as the granularity of + I/O from the operating system to the disk. Generally, it will + be more efficient for Berkeley DB to write filesystem-sized + blocks to the operating system and for the operating system to + write those same blocks to the disk. +

+

+ Selecting a database page size smaller than the filesystem + block size may cause the operating system to coalesce or + otherwise manipulate Berkeley DB pages and can impact the + performance of your application. When the page size is smaller + than the filesystem block size and a page written by Berkeley + DB is not found in the operating system's cache, the operating + system may be forced to read a block from the disk, copy the + page into the block it read, and then write out the block to + disk, rather than simply writing the page to disk. + Additionally, as the operating system is reading more data + into its buffer cache than is strictly necessary to satisfy + each Berkeley DB request for a page, the operating system + buffer cache may be wasting memory. +

- Second, in the Btree, Hash and Recno access methods, the finest-grained - lock that Berkeley DB acquires is for a page. (The Queue access method - generally acquires record-level locks rather than page-level locks.) - Selecting a page size that is too large, and which causes threads or - processes to wait because other threads of control are accessing or - modifying records on the same page, can impact the performance of your - application. -

-

- Third, the page size specifies the granularity of I/O from the database - to the operating system. Berkeley DB will give a page-sized unit of - bytes to the operating system to be scheduled for reading/writing - from/to the disk. For many operating systems, there is an internal - block size which is used as the - granularity of I/O from the operating system to the disk. Generally, - it will be more efficient for Berkeley DB to write filesystem-sized - blocks to the operating system and for the operating system to write - those same blocks to the disk. -

-

- Selecting a database page size smaller than the filesystem block size - may cause the operating system to coalesce or otherwise manipulate - Berkeley DB pages and can impact the performance of your application. - When the page size is smaller than the filesystem block size and a page - written by Berkeley DB is not found in the operating system's cache, - the operating system may be forced to read a block from the disk, copy - the page into the block it read, and then write out the block to disk, - rather than simply writing the page to disk. Additionally, as the - operating system is reading more data into its buffer cache than is - strictly necessary to satisfy each Berkeley DB request for a page, the - operating system buffer cache may be wasting memory. -

-

- Alternatively, selecting a page size larger than the filesystem block - size may cause the operating system to read more data than necessary. - On some systems, reading filesystem blocks sequentially may cause the - operating system to begin performing read-ahead. If requesting a - single database page implies reading enough filesystem blocks to - satisfy the operating system's criteria for read-ahead, the operating - system may do more I/O than is required. -

-

- Fourth, when using the Berkeley DB Transactional Data Store product, - the page size may affect the errors from which your database can - recover See Berkeley DB recoverability for more information. -

+ Alternatively, selecting a page size larger than the + filesystem block size may cause the operating system to read + more data than necessary. On some systems, reading filesystem + blocks sequentially may cause the operating system to begin + performing read-ahead. If requesting a single database page + implies reading enough filesystem blocks to satisfy the + operating system's criteria for read-ahead, the operating + system may do more I/O than is required. +

+

+ Fourth, when using the Berkeley DB Transactional Data Store + product, the page size may affect the errors from which your + database can recover See Berkeley DB recoverability for more information. +

Note

-

-The db_tuner utility suggests a page size for btree databases that optimizes cache -efficiency and storage space requirements. This utility works only when given a pre-populated database. -So, it is useful when tuning an existing application and not when first implementing an application. -

+

+ The db_tuner utility suggests a page size for btree databases + that optimizes cache efficiency and storage space + requirements. This utility works only when given a + pre-populated database. So, it is useful when tuning an + existing application and not when first implementing an + application. +

@@ -167,51 +179,74 @@ So, it is useful when tuning an existing application and not when first implemen
-

The size of the cache used for the underlying database can be specified -by calling the DB->set_cachesize() method. -Choosing a cache size is, unfortunately, an art. Your cache must be at -least large enough for your working set plus some overlap for unexpected -situations.

-

When using the Btree access method, you must have a cache big enough for -the minimum working set for a single access. This will include a root -page, one or more internal pages (depending on the depth of your tree), -and a leaf page. If your cache is any smaller than that, each new page -will force out the least-recently-used page, and Berkeley DB will re-read the -root page of the tree anew on each database request.

-

If your keys are of moderate size (a few tens of bytes) and your pages -are on the order of 4KB to 8KB, most Btree applications will be only -three levels. For example, using 20 byte keys with 20 bytes of data -associated with each key, a 8KB page can hold roughly 400 keys (or 200 -key/data pairs), so a fully populated three-level Btree will hold 32 -million key/data pairs, and a tree with only a 50% page-fill factor will -still hold 16 million key/data pairs. We rarely expect trees to exceed -five levels, although Berkeley DB will support trees up to 255 levels.

-

The rule-of-thumb is that cache is good, and more cache is better. -Generally, applications benefit from increasing the cache size up to a -point, at which the performance will stop improving as the cache size -increases. When this point is reached, one of two things have happened: -either the cache is large enough that the application is almost never -having to retrieve information from disk, or, your application is doing -truly random accesses, and therefore increasing size of the cache doesn't -significantly increase the odds of finding the next requested information -in the cache. The latter is fairly rare -- almost all applications show -some form of locality of reference.

-

That said, it is important not to increase your cache size beyond the -capabilities of your system, as that will result in reduced performance. -Under many operating systems, tying down enough virtual memory will cause -your memory and potentially your program to be swapped. This is -especially likely on systems without unified OS buffer caches and virtual -memory spaces, as the buffer cache was allocated at boot time and so -cannot be adjusted based on application requests for large amounts of -virtual memory.

-

For example, even if accesses are truly random within a Btree, your -access pattern will favor internal pages to leaf pages, so your cache -should be large enough to hold all internal pages. In the steady state, -this requires at most one I/O per operation to retrieve the appropriate -leaf page.

-

You can use the db_stat utility to monitor the effectiveness of -your cache. The following output is excerpted from the output of that -utility's -m option:

+

+ The size of the cache used for the underlying database can + be specified by calling the DB->set_cachesize() method. Choosing + a cache size is, unfortunately, an art. Your cache must be at + least large enough for your working set plus some overlap for + unexpected situations. +

+

+ When using the Btree access method, you must have a cache + big enough for the minimum working set for a single access. + This will include a root page, one or more internal pages + (depending on the depth of your tree), and a leaf page. If + your cache is any smaller than that, each new page will force + out the least-recently-used page, and Berkeley DB will re-read + the root page of the tree anew on each database + request. +

+

+ If your keys are of moderate size (a few tens of bytes) and + your pages are on the order of 4KB to 8KB, most Btree + applications will be only three levels. For example, using 20 + byte keys with 20 bytes of data associated with each key, a + 8KB page can hold roughly 400 keys (or 200 key/data pairs), so + a fully populated three-level Btree will hold 32 million + key/data pairs, and a tree with only a 50% page-fill factor + will still hold 16 million key/data pairs. We rarely expect + trees to exceed five levels, although Berkeley DB will support + trees up to 255 levels. +

+

+ The rule-of-thumb is that cache is good, and more cache is + better. Generally, applications benefit from increasing the + cache size up to a point, at which the performance will stop + improving as the cache size increases. When this point is + reached, one of two things have happened: either the cache is + large enough that the application is almost never having to + retrieve information from disk, or, your application is doing + truly random accesses, and therefore increasing size of the + cache doesn't significantly increase the odds of finding the + next requested information in the cache. The latter is fairly + rare -- almost all applications show some form of locality of + reference. +

+

+ That said, it is important not to increase your cache size + beyond the capabilities of your system, as that will result in + reduced performance. Under many operating systems, tying down + enough virtual memory will cause your memory and potentially + your program to be swapped. This is especially likely on + systems without unified OS buffer caches and virtual memory + spaces, as the buffer cache was allocated at boot time and so + cannot be adjusted based on application requests for large + amounts of virtual memory. +

+

+ For example, even if accesses are truly random within a + Btree, your access pattern will favor internal pages to leaf + pages, so your cache should be large enough to hold all + internal pages. In the steady state, this requires at most one + I/O per operation to retrieve the appropriate leaf + page. +

+

+ You can use the db_stat utility to monitor the effectiveness of + your cache. The following output is excerpted from the output + of that utility's -m + option: +

prompt: db_stat -m
 131072  Cache size (128K).
 4273    Requested pages found in the cache (97%).
@@ -223,13 +258,15 @@ utility's -m option:

13 Dirty pages forced from the cache. 0 Dirty buffers written by trickle-sync thread. 130 Current clean buffer count. -4 Current dirty buffer count. -
-

The statistics for this cache say that there have been 4,273 requests of -the cache, and only 116 of those requests required an I/O from disk. This -means that the cache is working well, yielding a 97% cache hit rate. The -db_stat utility will present these statistics both for the cache -as a whole and for each file within the cache separately.

+4 Current dirty buffer count. +

+ The statistics for this cache say that there have been 4,273 + requests of the cache, and only 116 of those requests required + an I/O from disk. This means that the cache is working well, + yielding a 97% cache hit rate. The db_stat utility will present + these statistics both for the cache as a whole and for each + file within the cache separately. +

@@ -239,25 +276,30 @@ as a whole and for each file within the cache separately.

-

Database files created by Berkeley DB can be created in either little- or -big-endian formats. The byte order used for the underlying database -is specified by calling the DB->set_lorder() method. If no order -is selected, the native format of the machine on which the database is -created will be used.

-

Berkeley DB databases are architecture independent, and any format database can -be used on a machine with a different native format. In this case, as -each page that is read into or written from the cache must be converted -to or from the host format, and databases with non-native formats will -incur a performance penalty for the run-time conversion.

- - It is important to note that the Berkeley DB access methods do no data -conversion for application specified data. Key/data pairs written on a -little-endian format architecture will be returned to the application -exactly as they were written when retrieved on a big-endian format -architecture. - -

+ Database files created by Berkeley DB can be created in + either little- or big-endian formats. The byte order used for + the underlying database is specified by calling the + DB->set_lorder() method. If no order is selected, the native + format of the machine on which the database is created will be + used. +

+

+ Berkeley DB databases are architecture independent, and any + format database can be used on a machine with a different + native format. In this case, as each page that is read into or + written from the cache must be converted to or from the host + format, and databases with non-native formats will incur a + performance penalty for the run-time conversion. +

+

+ It is important to note that the + Berkeley DB access methods do no data conversion for + application specified data. Key/data pairs written on a + little-endian format architecture will be returned to the + application exactly as they were written when retrieved on + a big-endian format architecture. +

@@ -268,105 +310,123 @@ architecture.

- The Btree and Hash access methods support the creation of multiple data - items for a single key item. By default, multiple data items are not - permitted, and each database store operation will overwrite any - previous data item for that key. To configure Berkeley DB for - duplicate data items, call the DB->set_flags() method with the DB_DUP - flag. Only one copy of the key will be stored for each set of - duplicate data items. If the Btree access method comparison routine - returns that two keys compare equally, it is undefined which of the two - keys will be stored and returned from future database operations. -

+ The Btree and Hash access methods support the creation of + multiple data items for a single key item. By default, + multiple data items are not permitted, and each database store + operation will overwrite any previous data item for that key. + To configure Berkeley DB for duplicate data items, call the + DB->set_flags() method with the DB_DUP flag. Only one copy of + the key will be stored for each set of duplicate data items. + If the Btree access method comparison routine returns that two + keys compare equally, it is undefined which of the two keys + will be stored and returned from future database operations. +

- By default, Berkeley DB stores duplicates in the order in which they - were added, that is, each new duplicate data item will be stored after - any already existing data items. This default behavior can be - overridden by using the DBC->put() method and one of the DB_AFTER, - DB_BEFORE, DB_KEYFIRST or DB_KEYLAST flags. Alternatively, - Berkeley DB may be configured to sort duplicate data items. -

+ By default, Berkeley DB stores duplicates in the order in + which they were added, that is, each new duplicate data item + will be stored after any already existing data items. This + default behavior can be overridden by using the DBC->put() + method and one of the DB_AFTER, DB_BEFORE, DB_KEYFIRST + or DB_KEYLAST flags. Alternatively, Berkeley DB may be + configured to sort duplicate data items. +

- When stepping through the database sequentially, duplicate data items - will be returned individually, as a key/data pair, where the key item - only changes after the last duplicate data item has been returned. For - this reason, duplicate data items cannot be accessed using the DB->get() - method, as it always returns the first of the duplicate data items. - Duplicate data items should be retrieved using a Berkeley DB cursor - interface such as the DBC->get() method. -

+ When stepping through the database sequentially, duplicate + data items will be returned individually, as a key/data pair, + where the key item only changes after the last duplicate data + item has been returned. For this reason, duplicate data items + cannot be accessed using the DB->get() method, as it always + returns the first of the duplicate data items. Duplicate data + items should be retrieved using a Berkeley DB cursor interface + such as the DBC->get() method. +

+

+ There is a flag that permits applications to request the + following data item only if it is + a duplicate data item of the current entry, + see DB_NEXT_DUP for more information. There is a flag that + permits applications to request the following data item only + if it is not a duplicate data + item of the current entry, see DB_NEXT_NODUP and + DB_PREV_NODUP for more information. +

+

+ It is also possible to maintain duplicate records in sorted + order. Sorting duplicates will significantly increase + performance when searching them and performing equality joins + — both of which are common operations when using + secondary indices. To configure Berkeley DB to sort duplicate + data items, the application must call the DB->set_flags() method + with the DB_DUPSORT flag. Note that DB_DUPSORT + automatically turns on the DB_DUP flag for you, so you do + not have to also set that flag; however, it is not an error to + also set DB_DUP when configuring for sorted duplicate + records. +

- There is a flag that permits applications to request the following data - item only if it is a duplicate data - item of the current entry, see DB_NEXT_DUP for more information. - There is a flag that permits applications to request the following data - item only if it is not a duplicate - data item of the current entry, see DB_NEXT_NODUP and DB_PREV_NODUP - for more information. -

+ When configuring sorted duplicate records, you can also + specify a custom comparison function using the + DB->set_dup_compare() method. If the DB_DUPSORT flag is given, + but no comparison routine is specified, then Berkeley DB + defaults to the same lexicographical sorting used for Btree + keys, with shorter items collating before longer items. +

- It is also possible to maintain duplicate records in sorted order. - Sorting duplicates will significantly increase performance when - searching them and performing equality joins — both of which are - common operations when using secondary indices. To configure Berkeley - DB to sort duplicate data items, the application must call the - DB->set_flags() method with the DB_DUPSORT flag. Note that DB_DUPSORT - automatically turns on the DB_DUP flag for you, so you do not - have to also set that flag; however, it is not an error to also set DB_DUP - when configuring for sorted duplicate records. -

+ If the duplicate data items are unsorted, applications may + store identical duplicate data items, or, for those that just + like the way it sounds, duplicate + duplicates. +

- When configuring sorted duplicate records, you can also specify a - custom comparison function using the DB->set_dup_compare() method. If - the DB_DUPSORT flag is given, but no comparison routine is specified, - then Berkeley DB defaults to the same lexicographical sorting used for - Btree keys, with shorter items collating before longer items. -

+ It is an error to attempt to store + identical duplicate data items when duplicates are being + stored in a sorted order. Any such attempt + results in the error message "Duplicate data items are not + supported with sorted data" with a + DB_KEYEXIST return code. +

- If the duplicate data items are unsorted, applications may store - identical duplicate data items, or, for those that just like the way it - sounds, duplicate duplicates. -

-

- It is an error to attempt to store identical - duplicate data items when duplicates are being stored in a sorted - order. Any such attempt results in the - error message "Duplicate data items are not supported with sorted - data" with a DB_KEYEXIST return code. -

-

- Note that you can suppress the error message "Duplicate data items are - not supported with sorted data" by using the DB_NODUPDATA flag. Use - of this flag does not change the database's basic behavior; storing - duplicate data items in a database configured for sorted duplicates is - still an error and so you will continue to receive the - DB_KEYEXIST return code if you try to do that. -

-

- For further information on how searching and insertion behaves in the - presence of duplicates (sorted or not), see the DB->get() DB->put(), - DBC->get() and DBC->put() documentation. -

+ Note that you can suppress the error message "Duplicate + data items are not supported with sorted data" by using the + DB_NODUPDATA flag. Use of this flag does not change the + database's basic behavior; storing duplicate data items in a + database configured for sorted duplicates is still an error + and so you will continue to receive the + DB_KEYEXIST return code if you try to + do that. +

+

+ For further information on how searching and insertion + behaves in the presence of duplicates (sorted or not), see the + DB->get() DB->put(), DBC->get() and DBC->put() documentation. +

-

Non-local memory allocation

+

Non-local memory + allocation

-

Berkeley DB allocates memory for returning key/data pairs and statistical -information which becomes the responsibility of the application. -There are also interfaces where an application will allocate memory -which becomes the responsibility of Berkeley DB.

-

On systems in which there may be multiple library versions of the -standard allocation routines (notably Windows NT), transferring memory -between the library and the application will fail because the Berkeley DB -library allocates memory from a different heap than the application -uses to free it, or vice versa. To avoid this problem, the -DB_ENV->set_alloc() and DB->set_alloc() methods can be used to -give Berkeley DB references to the application's allocation routines.

+

+ Berkeley DB allocates memory for returning key/data pairs + and statistical information which becomes the responsibility + of the application. There are also interfaces where an + application will allocate memory which becomes the + responsibility of Berkeley DB. +

+

+ On systems in which there may be multiple library versions + of the standard allocation routines (notably Windows NT), + transferring memory between the library and the application + will fail because the Berkeley DB library allocates memory + from a different heap than the application uses to free it, or + vice versa. To avoid this problem, the DB_ENV->set_alloc() and + DB->set_alloc() methods can be used to give Berkeley DB + references to the application's allocation routines. +