All comparison functions must cause the keys in the database to be -well-ordered. The most important implication of being well-ordered is -that the key relations must be transitive, that is, if key A is less -than key B, and key B is less than key C, then the comparison routine -must also return that key A is less than key C.

Prev	Chapter 2. - Access Method Configuration -	Chapter 2. Access Method Configuration	Next

It is reasonable for a comparison function to not examine an entire key -in some applications, which implies partial keys may be specified to the -Berkeley DB interfaces. When partial keys are specified to Berkeley DB, interfaces -which retrieve data items based on a user-specified key (for example, -DB->get() and DBC->get() with the DB_SET flag), will -modify the user-specified key by returning the actual key stored in the -database.

+ All comparison functions must cause the keys in the database + to be well-ordered. The most important implication of being + well-ordered is that the key relations must be transitive, + that is, if key A is less than key B, and key B is less than + key C, then the comparison routine must also return that key A + is less than key C. +

+ It is reasonable for a comparison function to not examine an + entire key in some applications, which implies partial keys + may be specified to the Berkeley DB interfaces. When partial + keys are specified to Berkeley DB, interfaces which retrieve + data items based on a user-specified key (for example, DB->get() + and DBC->get() with the DB_SET flag), will modify the + user-specified key by returning the actual key stored in the + database. +

Btree prefix comparison

Btree prefix + comparison

The Berkeley DB Btree implementation maximizes the number of keys that can be -stored on an internal page by storing only as many bytes of each key as -are necessary to distinguish it from adjacent keys. The prefix -comparison routine is what determines this minimum number of bytes (that -is, the length of the unique prefix), that must be stored. A prefix -comparison function for the Btree can be specified by calling -DB->set_bt_prefix().

The prefix comparison routine must be compatible with the overall -comparison function of the Btree, since what distinguishes any two keys -depends entirely on the function used to compare them. This means that -if a prefix comparison routine is specified by the application, a -compatible overall comparison routine must also have been specified.

Prefix comparison routines are passed pointers to keys as arguments. -The keys are represented as DBT structures. The only fields -the routines may examine in the DBT structures are data -and size fields.

The prefix comparison function must return the number of bytes necessary -to distinguish the two keys. If the keys are identical (equal and equal -in length), the length should be returned. If the keys are equal up to -the smaller of the two lengths, then the length of the smaller key plus -1 should be returned.

An example prefix comparison routine follows:

+ The Berkeley DB Btree implementation maximizes the number of + keys that can be stored on an internal page by storing only as + many bytes of each key as are necessary to distinguish it from + adjacent keys. The prefix comparison routine is what + determines this minimum number of bytes (that is, the length + of the unique prefix), that must be stored. A prefix + comparison function for the Btree can be specified by calling + DB->set_bt_prefix(). +

+ The prefix comparison routine must be compatible with the + overall comparison function of the Btree, since what + distinguishes any two keys depends entirely on the function + used to compare them. This means that if a prefix comparison + routine is specified by the application, a compatible overall + comparison routine must also have been specified. +

+ Prefix comparison routines are passed pointers to keys as + arguments. The keys are represented as DBT structures. The + only fields the routines may examine in the DBT structures + are data and size fields. +

+ The prefix comparison function must return the number of + bytes necessary to distinguish the two keys. If the keys are + identical (equal and equal in length), the length should be + returned. If the keys are equal up to the smaller of the two + lengths, then the length of the smaller key plus 1 should be + returned. +

+ An example prefix comparison routine follows: +

-size_t
+        size_t
 compare_prefix(DB *dbp, const DBT *a, const DBT *b)
    
 {
@@ -207,8 +241,11 @@ compare_prefix(DB *dbp, const DBT *a, const DBT *b)
     return (b->size);
 }
 
-        The usefulness of this functionality is data-dependent, but in some data
-sets can produce significantly reduced tree sizes and faster search times.
+        
+        The usefulness of this functionality is data-dependent, but
+        in some data sets can produce significantly reduced tree sizes
+        and faster search times.
+

@@ -218,38 +255,58 @@ sets can produce significantly reduced tree sizes and faster search times.

The number of keys stored on each page affects the size of a Btree and -how it is maintained. Therefore, it also affects the retrieval and search -performance of the tree. For each Btree, Berkeley DB computes a maximum key -and data size. This size is a function of the page size and the fact that -at least two key/data pairs must fit on any Btree page. Whenever key or -data items exceed the calculated size, they are stored on overflow pages -instead of in the standard Btree leaf pages.

Applications may use the DB->set_bt_minkey() method to change the minimum -number of keys that must fit on a Btree page from two to another value. -Altering this value in turn alters the on-page maximum size, and can be -used to force key and data items which would normally be stored in the -Btree leaf pages onto overflow pages.

Some data sets can benefit from this tuning. For example, consider an -application using large page sizes, with a data set almost entirely -consisting of small key and data items, but with a few large items. By -setting the minimum number of keys that must fit on a page, the -application can force the outsized items to be stored on overflow pages. -That in turn can potentially keep the tree more compact, that is, with -fewer internal levels to traverse during searches.

The following calculation is similar to the one performed by the Btree -implementation. (The minimum_keys value is multiplied by 2 -because each key/data pair requires 2 slots on a Btree page.)

+ The number of keys stored on each page affects the size of a + Btree and how it is maintained. Therefore, it also affects the + retrieval and search performance of the tree. For each Btree, + Berkeley DB computes a maximum key and data size. This size is + a function of the page size and the fact that at least two + key/data pairs must fit on any Btree page. Whenever key or + data items exceed the calculated size, they are stored on + overflow pages instead of in the standard Btree leaf + pages. +

+ Applications may use the DB->set_bt_minkey() method to change + the minimum number of keys that must fit on a Btree page from + two to another value. Altering this value in turn alters the + on-page maximum size, and can be used to force key and data + items which would normally be stored in the Btree leaf pages + onto overflow pages. +

+ Some data sets can benefit from this tuning. For example, + consider an application using large page sizes, with a data + set almost entirely consisting of small key and data items, + but with a few large items. By setting the minimum number of + keys that must fit on a page, the application can force the + outsized items to be stored on overflow pages. That in turn + can potentially keep the tree more compact, that is, with + fewer internal levels to traverse during searches. +

+ The following calculation is similar to the one performed by + the Btree implementation. (The minimum_keys + value is multiplied by 2 because + each key/data pair requires 2 slots on a Btree page.) +

Using this calculation, if the page size is 8KB and the default -minimum_keys value of 2 is used, then any key or data items -larger than 2KB will be forced to an overflow page. If an application -were to specify a minimum_key value of 100, then any key or data -items larger than roughly 40 bytes would be forced to overflow pages.

It is important to remember that accesses to overflow pages do not perform -as well as accesses to the standard Btree leaf pages, and so setting the -value incorrectly can result in overusing overflow pages and decreasing -the application's overall performance.

+ Using this calculation, if the page size is 8KB and the + default minimum_keys value of + 2 is used, then any key or data items larger than 2KB will be + forced to an overflow page. If an application were to specify + a minimum_key value of 100, + then any key or data items larger than roughly 40 bytes would + be forced to overflow pages. +

+ It is important to remember that accesses to overflow pages + do not perform as well as accesses to the standard Btree leaf + pages, and so setting the value incorrectly can result in + overusing overflow pages and decreasing the application's + overall performance. +

@@ -259,21 +316,28 @@ the application's overall performance.

The Btree access method optionally supports retrieval by logical record -numbers. To configure a Btree to support record numbers, call the -DB->set_flags() method with the DB_RECNUM flag.

Configuring a Btree for record numbers should not be done lightly. -While often useful, it may significantly slow down the speed at which -items can be stored into the database, and can severely impact -application throughput. Generally it should be avoided in trees with -a need for high write concurrency.

To retrieve by record number, use the DB_SET_RECNO flag to the -DB->get() and DBC->get() methods. The following is an example of -a routine that displays the data item for a Btree database created with -the DB_RECNUM option.

+ The Btree access method optionally supports retrieval by + logical record numbers. To configure a Btree to support record + numbers, call the DB->set_flags() method with the DB_RECNUM + flag. +

+ Configuring a Btree for record numbers should not be done + lightly. While often useful, it may significantly slow down + the speed at which items can be stored into the database, and + can severely impact application throughput. Generally it + should be avoided in trees with a need for high write + concurrency. +

+ To retrieve by record number, use the DB_SET_RECNO flag + to the DB->get() and DBC->get() methods. The following is an + example of a routine that displays the data item for a Btree + database created with the DB_RECNUM option. +

Btree prefix comparison

Btree prefix + comparison

Note

Note

Custom compression

Custom + compression

Programmer Notes

Programmer Notes