From 780b92ada9afcf1d58085a83a0b9e6bc982203d1 Mon Sep 17 00:00:00 2001
From: Lorry Tar Creator
- There are a series of configuration tasks which you can perform when
- using the Btree access method. They are described in the following sections.
-
The Btree data structure is a sorted, balanced tree structure storing -associated key/data pairs. By default, the sort order is lexicographical, -with shorter keys collating before longer keys. The user can specify the -sort order for the Btree by using the DB->set_bt_compare() method.
-Sort routines are passed pointers to keys as arguments. The keys are -represented as DBT structures. The routine must return an integer -less than, equal to, or greater than zero if the first argument is -considered to be respectively less than, equal to, or greater than the -second argument. The only fields that the routines may examine in the -DBT structures are data and size fields.
-An example routine that might be used to sort integer keys in the database -is as follows:
++ The Btree data structure is a sorted, balanced tree + structure storing associated key/data pairs. By default, the + sort order is lexicographical, with shorter keys collating + before longer keys. The user can specify the sort order for + the Btree by using the DB->set_bt_compare() method. +
++ Sort routines are passed pointers to keys as arguments. The + keys are represented as DBT structures. The routine must + return an integer less than, equal to, or greater than zero if + the first argument is considered to be respectively less than, + equal to, or greater than the second argument. The only fields + that the routines may examine in the DBT structures are + data and size fields. +
++ An example routine that might be used to sort integer keys + in the database is as follows: +
--int -compare_int(DB *dbp, const DBT *a, const DBT *b) +int +compare_int(DB *dbp, const DBT *a, const DBT *b, size_t *locp) { int ai, bi; + + locp = NULL; /* * Returns: * < 0 if a < b @@ -109,23 +118,29 @@ compare_int(DB *dbp, const DBT *a, const DBT *b) return (ai - bi); }-Note that the data must first be copied into memory that is appropriately -aligned, as Berkeley DB does not guarantee any kind of alignment of the -underlying data, including for comparison routines. When writing -comparison routines, remember that databases created on machines of -different architectures may have different integer byte orders, for which -your code may need to compensate.
-An example routine that might be used to sort keys based on the first -five bytes of the key (ignoring any subsequent bytes) is as follows:
++ Note that the data must first be copied into memory that is + appropriately aligned, as Berkeley DB does not guarantee any + kind of alignment of the underlying data, including for + comparison routines. When writing comparison routines, + remember that databases created on machines of different + architectures may have different integer byte orders, for + which your code may need to compensate. +
++ An example routine that might be used to sort keys based on + the first five bytes of the key (ignoring any subsequent + bytes) is as follows: +
--int -compare_dbt(DB *dbp, const DBT *a, const DBT *b) +int +compare_dbt(DB *dbp, const DBT *a, const DBT *b, size_t *locp) { int len; u_char *p1, *p2; + locp = NULL; /* * Returns: * < 0 if a < b @@ -138,52 +153,71 @@ compare_dbt(DB *dbp, const DBT *a, const DBT *b) return (0); }-All comparison functions must cause the keys in the database to be -well-ordered. The most important implication of being well-ordered is -that the key relations must be transitive, that is, if key A is less -than key B, and key B is less than key C, then the comparison routine -must also return that key A is less than key C.
-It is reasonable for a comparison function to not examine an entire key -in some applications, which implies partial keys may be specified to the -Berkeley DB interfaces. When partial keys are specified to Berkeley DB, interfaces -which retrieve data items based on a user-specified key (for example, -DB->get() and DBC->get() with the DB_SET flag), will -modify the user-specified key by returning the actual key stored in the -database.
++ All comparison functions must cause the keys in the database + to be well-ordered. The most important implication of being + well-ordered is that the key relations must be transitive, + that is, if key A is less than key B, and key B is less than + key C, then the comparison routine must also return that key A + is less than key C. +
++ It is reasonable for a comparison function to not examine an + entire key in some applications, which implies partial keys + may be specified to the Berkeley DB interfaces. When partial + keys are specified to Berkeley DB, interfaces which retrieve + data items based on a user-specified key (for example, DB->get() + and DBC->get() with the DB_SET flag), will modify the + user-specified key by returning the actual key stored in the + database. +
-The Berkeley DB Btree implementation maximizes the number of keys that can be -stored on an internal page by storing only as many bytes of each key as -are necessary to distinguish it from adjacent keys. The prefix -comparison routine is what determines this minimum number of bytes (that -is, the length of the unique prefix), that must be stored. A prefix -comparison function for the Btree can be specified by calling -DB->set_bt_prefix().
-The prefix comparison routine must be compatible with the overall -comparison function of the Btree, since what distinguishes any two keys -depends entirely on the function used to compare them. This means that -if a prefix comparison routine is specified by the application, a -compatible overall comparison routine must also have been specified.
-Prefix comparison routines are passed pointers to keys as arguments. -The keys are represented as DBT structures. The only fields -the routines may examine in the DBT structures are data -and size fields.
-The prefix comparison function must return the number of bytes necessary -to distinguish the two keys. If the keys are identical (equal and equal -in length), the length should be returned. If the keys are equal up to -the smaller of the two lengths, then the length of the smaller key plus -1 should be returned.
-An example prefix comparison routine follows:
++ The Berkeley DB Btree implementation maximizes the number of + keys that can be stored on an internal page by storing only as + many bytes of each key as are necessary to distinguish it from + adjacent keys. The prefix comparison routine is what + determines this minimum number of bytes (that is, the length + of the unique prefix), that must be stored. A prefix + comparison function for the Btree can be specified by calling + DB->set_bt_prefix(). +
++ The prefix comparison routine must be compatible with the + overall comparison function of the Btree, since what + distinguishes any two keys depends entirely on the function + used to compare them. This means that if a prefix comparison + routine is specified by the application, a compatible overall + comparison routine must also have been specified. +
++ Prefix comparison routines are passed pointers to keys as + arguments. The keys are represented as DBT structures. The + only fields the routines may examine in the DBT structures + are data and size fields. +
++ The prefix comparison function must return the number of + bytes necessary to distinguish the two keys. If the keys are + identical (equal and equal in length), the length should be + returned. If the keys are equal up to the smaller of the two + lengths, then the length of the smaller key plus 1 should be + returned. +
++ An example prefix comparison routine follows: +
--size_t +size_t compare_prefix(DB *dbp, const DBT *a, const DBT *b) { @@ -207,8 +241,11 @@ compare_prefix(DB *dbp, const DBT *a, const DBT *b) return (b->size); }-The usefulness of this functionality is data-dependent, but in some data -sets can produce significantly reduced tree sizes and faster search times.
++ The usefulness of this functionality is data-dependent, but + in some data sets can produce significantly reduced tree sizes + and faster search times. +
-@@ -218,38 +255,58 @@ sets can produce significantly reduced tree sizes and faster search times.The number of keys stored on each page affects the size of a Btree and -how it is maintained. Therefore, it also affects the retrieval and search -performance of the tree. For each Btree, Berkeley DB computes a maximum key -and data size. This size is a function of the page size and the fact that -at least two key/data pairs must fit on any Btree page. Whenever key or -data items exceed the calculated size, they are stored on overflow pages -instead of in the standard Btree leaf pages.
-Applications may use the DB->set_bt_minkey() method to change the minimum -number of keys that must fit on a Btree page from two to another value. -Altering this value in turn alters the on-page maximum size, and can be -used to force key and data items which would normally be stored in the -Btree leaf pages onto overflow pages.
-Some data sets can benefit from this tuning. For example, consider an -application using large page sizes, with a data set almost entirely -consisting of small key and data items, but with a few large items. By -setting the minimum number of keys that must fit on a page, the -application can force the outsized items to be stored on overflow pages. -That in turn can potentially keep the tree more compact, that is, with -fewer internal levels to traverse during searches.
-The following calculation is similar to the one performed by the Btree -implementation. (The minimum_keys value is multiplied by 2 -because each key/data pair requires 2 slots on a Btree page.)
++ The number of keys stored on each page affects the size of a + Btree and how it is maintained. Therefore, it also affects the + retrieval and search performance of the tree. For each Btree, + Berkeley DB computes a maximum key and data size. This size is + a function of the page size and the fact that at least two + key/data pairs must fit on any Btree page. Whenever key or + data items exceed the calculated size, they are stored on + overflow pages instead of in the standard Btree leaf + pages. +
++ Applications may use the DB->set_bt_minkey() method to change + the minimum number of keys that must fit on a Btree page from + two to another value. Altering this value in turn alters the + on-page maximum size, and can be used to force key and data + items which would normally be stored in the Btree leaf pages + onto overflow pages. +
++ Some data sets can benefit from this tuning. For example, + consider an application using large page sizes, with a data + set almost entirely consisting of small key and data items, + but with a few large items. By setting the minimum number of + keys that must fit on a page, the application can force the + outsized items to be stored on overflow pages. That in turn + can potentially keep the tree more compact, that is, with + fewer internal levels to traverse during searches. +
++ The following calculation is similar to the one performed by + the Btree implementation. (The minimum_keys + value is multiplied by 2 because + each key/data pair requires 2 slots on a Btree page.) +
maximum_size = page_size / (minimum_keys * 2)-Using this calculation, if the page size is 8KB and the default -minimum_keys value of 2 is used, then any key or data items -larger than 2KB will be forced to an overflow page. If an application -were to specify a minimum_key value of 100, then any key or data -items larger than roughly 40 bytes would be forced to overflow pages.
-It is important to remember that accesses to overflow pages do not perform -as well as accesses to the standard Btree leaf pages, and so setting the -value incorrectly can result in overusing overflow pages and decreasing -the application's overall performance.
++ Using this calculation, if the page size is 8KB and the + default minimum_keys value of + 2 is used, then any key or data items larger than 2KB will be + forced to an overflow page. If an application were to specify + a minimum_key value of 100, + then any key or data items larger than roughly 40 bytes would + be forced to overflow pages. +
++ It is important to remember that accesses to overflow pages + do not perform as well as accesses to the standard Btree leaf + pages, and so setting the value incorrectly can result in + overusing overflow pages and decreasing the application's + overall performance. +
-@@ -259,21 +316,28 @@ the application's overall performance.The Btree access method optionally supports retrieval by logical record -numbers. To configure a Btree to support record numbers, call the -DB->set_flags() method with the DB_RECNUM flag.
-Configuring a Btree for record numbers should not be done lightly. -While often useful, it may significantly slow down the speed at which -items can be stored into the database, and can severely impact -application throughput. Generally it should be avoided in trees with -a need for high write concurrency.
-To retrieve by record number, use the DB_SET_RECNO flag to the -DB->get() and DBC->get() methods. The following is an example of -a routine that displays the data item for a Btree database created with -the DB_RECNUM option.
++ The Btree access method optionally supports retrieval by + logical record numbers. To configure a Btree to support record + numbers, call the DB->set_flags() method with the DB_RECNUM + flag. +
++ Configuring a Btree for record numbers should not be done + lightly. While often useful, it may significantly slow down + the speed at which items can be stored into the database, and + can severely impact application throughput. Generally it + should be avoided in trees with a need for high write + concurrency. +
++ To retrieve by record number, use the DB_SET_RECNO flag + to the DB->get() and DBC->get() methods. The following is an + example of a routine that displays the data item for a Btree + database created with the DB_RECNUM option. +
--int +int rec_display(DB *dbp, db_recno_t recno) { @@ -292,12 +356,14 @@ rec_display(DB *dbp, db_recno_t recno) return (0); }-To determine a key's record number, use the DB_GET_RECNO flag -to the DBC->get() method. The following is an example of a routine that -displays the record number associated with a specific key.
++ To determine a key's record number, use the DB_GET_RECNO + flag to the DBC->get() method. The following is an example of a + routine that displays the record number associated with a + specific key. +
--int +int recno_display(DB *dbp, char *keyvalue) { @@ -355,55 +421,59 @@ err: /* Close the cursor. */ -- The Btree access method supports the automatic compression of key/data - pairs upon their insertion into the database. The key/data pairs are - decompressed before they are returned to the application, making an - application's interaction with a compressed database identical to that - for a non-compressed database. To configure Berkeley DB for - compression, call the DB->set_bt_compress() method and specify custom - compression and decompression functions. If DB->set_bt_compress() is - called with NULL compression and decompression functions, Berkeley DB - will use its default compression functions. -
++ The Btree access method supports the automatic compression + of key/data pairs upon their insertion into the database. The + key/data pairs are decompressed before they are returned to + the application, making an application's interaction with a + compressed database identical to that for a non-compressed + database. To configure Berkeley DB for compression, call the + DB->set_bt_compress() method and specify custom compression and + decompression functions. If DB->set_bt_compress() is called with + NULL compression and decompression functions, Berkeley DB will + use its default compression functions. +
Note
-- Compression only works with the Btree access method, and then only - so long as your database is not configured for unsorted duplicates. -
++ Compression only works with the Btree access method, + and then only so long as your database is not configured + for unsorted duplicates. +
Note
-- The default compression function is not guaranteed to reduce the - size of the on-disk database in every case. It has been tested - and shown to work well with English-language text. Of course, in - order to determine if the default compression algorithm is beneficial - for your application, it is important to test both the final size - and the performance using a representative set of data and access - patterns. -
++ The default compression function is not guaranteed to + reduce the size of the on-disk database in every case. It + has been tested and shown to work well with + English-language text. Of course, in order to determine if + the default compression algorithm is beneficial for your + application, it is important to test both the final size + and the performance using a representative set of data and + access patterns. +
- The default compression function performs prefix compression on each - key added to the database. This means that, for a key - n bytes in length, the first - i bytes that match the first - i bytes of the previous key exactly are omitted - and only the final n-i bytes are stored in the - database. If the bytes of key being stored match the bytes of the - previous key exactly, then the same prefix compression algorithm is - applied to the data value being stored. To use Berkeley DB's default - compression behavior, both the default compression and decompression - functions must be used. -
+ The default compression function performs prefix + compression on each key added to the database. This means + that, for a key n bytes in length, the + first i bytes that match the first + i bytes of the previous key exactly + are omitted and only the final n-i bytes + are stored in the database. If the bytes of key being stored + match the bytes of the previous key exactly, then the same + prefix compression algorithm is applied to the data value + being stored. To use Berkeley DB's default compression + behavior, both the default compression and decompression + functions must be used. +- For example, to configure your database for default compression: -
+ For example, to configure your database for default + compression: + -- DB *dbp = NULL; +DB *dbp = NULL; DB_ENV *envp = NULL; u_int32_t db_flags; const char *file_name = "mydb.db"; @@ -441,60 +511,64 @@ err: /* Close the cursor. */- An application wishing to perform its own compression may supply a - compression and decompression function which will be called instead of - Berkeley DB's default functions. The compression function is - passed five DBT structures: -
+ An application wishing to perform its own compression + may supply a compression and decompression function which + will be called instead of Berkeley DB's default functions. + The compression function is passed five DBT structures: ++
- -
- The key and data immediately preceeding the key/data pair - that is being stored. -
++ The key and data immediately preceeding the + key/data pair that is being stored. +
- The key and data being stored in the tree. -
+ The key and data being stored in the tree. +- -
- The buffer where the compressed data should be written. -
++ The buffer where the compressed data should be + written. +
+ The total size of the buffer used to store the + compressed data is identified in the DBT's +
+ulen
field. If the compressed data + cannot fit in the buffer, the compression function should + store the amount of space needed in DBT's +size
field and then return +DB_BUFFER_SMALL
. Berkeley DB will + subsequently re-call the compression function with the + required amount of space allocated in the compression data + buffer. ++ Multiple compressed key/data pairs will likely be + written to the same buffer and the compression function + should take steps to ensure it does not overwrite data. +
- The total size of the buffer used to store the compressed data is - identified in the DBT's
-ulen
field. If the - compressed data cannot fit in the buffer, the compression function - should store the amount of space needed in DBT's -size
field and then return -DB_BUFFER_SMALL
. Berkeley DB will subsequently - re-call the compression function with the required amount of space - allocated in the compression data buffer. -- Multiple compressed key/data pairs will likely be written to the - same buffer and the compression function should take steps to - ensure it does not overwrite data. -
-- For example, the following code fragments illustrate the use of a custom - compression routine. This code is actually a much simplified - example of the default compression provided by Berkeley DB. It does - simple prefix compression on the key part of the data. -
+ For example, the following code fragments illustrate + the use of a custom compression routine. This code is + actually a much simplified example of the default + compression provided by Berkeley DB. It does simple prefix + compression on the key part of the data. + -- int compress(DB *dbp, const DBT *prevKey, const DBT *prevData, +int compress(DB *dbp, const DBT *prevKey, const DBT *prevData, const DBT *key, const DBT *data, DBT *dest) { u_int8_t *dest_data_ptr; @@ -538,49 +612,53 @@ err: /* Close the cursor. */ return (0); }- The corresponding decompression function is likewise passed five DBT structures: -
+ The corresponding decompression function is likewise + passed five DBT structures: ++
- -
- The key and data DBTs immediately preceding the - decompressed key and data. -
++ The key and data DBTs immediately preceding + the decompressed key and data. +
- -
- The compressed data from the database. -
++ The compressed data from the database. +
- -
- One to store the decompressed key and another one for the - decompressed data. -
++ One to store the decompressed key and another + one for the decompressed data. +
+ Because the compression of
record X
+ relies uponrecord X-1
, the + decompression function can be called repeatedly to + linearally decompress a set of records stored in the + compressed buffer. +- Because the compression of
-record X
relies upon -record X-1
, the decompression function can be - called repeatedly to linearally decompress a set of records stored - in the compressed buffer. -- The total size of the buffer available to store the decompressed data is - identified in the destination DBT's
+ The total size of the buffer available to store the + decompressed data is identified in the destination DBT's +ulen
field. If the - decompressed data cannot fit in the buffer, the decompression function - should store the amount of space needed in the destination DBT's -size
field and then return -DB_BUFFER_SMALL
. Berkeley DB will subsequently - re-call the decompression function with the required amount of space - allocated in the decompression data buffer. -ulen
field. If the decompressed + data cannot fit in the buffer, the decompression function + should store the amount of space needed in the destination + DBT'ssize
field and then return +DB_BUFFER_SMALL
. Berkeley DB will + subsequently re-call the decompression function with the + required amount of space allocated in the decompression + data buffer. +- For example, the decompression routine that corresponds to the - example compression routine provided above is: -
+ For example, the decompression routine that corresponds + to the example compression routine provided above is: +int decompress(DB *dbp, const DBT *prevKey, const DBT *prevData, DBT *compressed, DBT *destKey, DBT *destData) @@ -642,46 +720,49 @@ err: /* Close the cursor. */ -- As you use compression with your databases, be aware of the - following: -
++ As you use compression with your databases, be aware of + the following: +
@@ -703,7 +784,8 @@ err: /* Close the cursor. */
- -
- Compression works by placing key/data pairs from a single - database page into a single block of compressed data. This is true - whether you use DB's default compression, or you write - your own compression. Because all of key/data data is - placed in a single block of memory, you cannot decompress - data unless you have decompressed everything that came - before it in the block. That is, you cannot decompress item - n in the data block, unless you also - decompress items 0 through - n-1. -
++ Compression works by placing key/data pairs + from a single database page into a single block of + compressed data. This is true whether you use + DB's default compression, or you write your + own compression. Because all of key/data data is + placed in a single block of memory, you cannot + decompress data unless you have decompressed + everything that came before it in the block. That + is, you cannot decompress item + n in the data block, + unless you also decompress items + 0 through + n-1. +
- If you increase the minimum number of key/data pairs placed - on a Btree leaf page (using DB->set_bt_minkey()), you will - decrease your seek times on a compressed database. However, - this will also decrease the effectiveness of the - compression. -
+ If you increase the minimum number of key/data + pairs placed on a Btree leaf page (using + DB->set_bt_minkey()), you will decrease your seek + times on a compressed database. However, this will + also decrease the effectiveness of the + compression. +- -
- Compressed databases are fastest if bulk load is used to - add data to them. See - Retrieving and updating records in bulk - for information on using bulk load. -
++ Compressed databases are fastest if bulk load + is used to add data to them. See Retrieving and updating records in bulk for information + on using bulk load. +
Home -Hash access method specific configuration +Hash access method specific + configuration -- cgit v1.2.1