There are a few different issues to consider when tuning the performance -of Berkeley DB access method applications.

+ There are a few different issues to consider when tuning the + performance of Berkeley DB access method applications. +

Prev	Chapter 4. - Access Method Wrapup -	Chapter 4. Access Method Wrapup	Next

access method: An application's choice of a database access method can significantly -affect performance. Applications using fixed-length records and integer -keys are likely to get better performance from the Queue access method. -Applications using variable-length records are likely to get better -performance from the Btree access method, as it tends to be faster for -most applications than either the Hash or Recno access methods. Because -the access method APIs are largely identical between the Berkeley DB access -methods, it is easy for applications to benchmark the different access -methods against each other. See Selecting an access method for more information.; + An application's choice of a database access + method can significantly affect performance. + Applications using fixed-length records and integer + keys are likely to get better performance from the + Queue access method. Applications using + variable-length records are likely to get better + performance from the Btree access method, as it tends + to be faster for most applications than either the + Hash or Recno access methods. Because the access + method APIs are largely identical between the Berkeley + DB access methods, it is easy for applications to + benchmark the different access methods against each + other. See Selecting an access method for more + information. +
cache size: The Berkeley DB database cache defaults to a fairly small size, and most -applications concerned with performance will want to set it explicitly. -Using a too-small cache will result in horrible performance. The first -step in tuning the cache size is to use the db_stat utility (or the -statistics returned by the DB->stat() function) to measure the -effectiveness of the cache. The goal is to maximize the cache's hit -rate. Typically, increasing the size of the cache until the hit rate -reaches 100% or levels off will yield the best performance. However, -if your working set is sufficiently large, you will be limited by the -system's available physical memory. Depending on the virtual memory -and file system buffering policies of your system, and the requirements -of other applications, the maximum cache size will be some amount -smaller than the size of physical memory. If you find that -the db_stat utility shows that increasing the cache size improves your hit -rate, but performance is not improving (or is getting worse), then it's -likely you've hit other system limitations. At this point, you should -review the system's swapping/paging activity and limit the size of the -cache to the maximum size possible without triggering paging activity. -Finally, always remember to make your measurements under conditions as -close as possible to the conditions your deployed application will run -under, and to test your final choices under worst-case conditions.; + The Berkeley DB database cache defaults to a + fairly small size, and most applications concerned + with performance will want to set it explicitly. Using + a too-small cache will result in horrible performance. + The first step in tuning the cache size is to use the + db_stat utility (or the statistics returned by the + DB->stat() function) to measure the effectiveness of the + cache. The goal is to maximize the cache's hit rate. + Typically, increasing the size of the cache until the + hit rate reaches 100% or levels off will yield the + best performance. However, if your working set is + sufficiently large, you will be limited by the + system's available physical memory. Depending on the + virtual memory and file system buffering policies of + your system, and the requirements of other + applications, the maximum cache size will be some + amount smaller than the size of physical memory. If + you find that the db_stat utility shows that increasing the + cache size improves your hit rate, but performance is + not improving (or is getting worse), then it's likely + you've hit other system limitations. At this point, + you should review the system's swapping/paging + activity and limit the size of the cache to the + maximum size possible without triggering paging + activity. Finally, always remember to make your + measurements under conditions as close as possible to + the conditions your deployed application will run + under, and to test your final choices under worst-case + conditions. +
shared memory: By default, Berkeley DB creates its database environment shared regions in -filesystem backed memory. Some systems do not distinguish between -regular filesystem pages and memory-mapped pages backed by the -filesystem, when selecting dirty pages to be flushed back to disk. For -this reason, dirtying pages in the Berkeley DB cache may cause intense -filesystem activity, typically when the filesystem sync thread or -process is run. In some cases, this can dramatically affect application -throughput. The workaround to this problem is to create the shared -regions in system shared memory (DB_SYSTEM_MEM) or application -private memory (DB_PRIVATE), or, in cases where this behavior -is configurable, to turn off the operating system's flushing of -memory-mapped pages.; + By default, Berkeley DB creates its database + environment shared regions in filesystem backed + memory. Some systems do not distinguish between + regular filesystem pages and memory-mapped pages + backed by the filesystem, when selecting dirty pages + to be flushed back to disk. For this reason, dirtying + pages in the Berkeley DB cache may cause intense + filesystem activity, typically when the filesystem + sync thread or process is run. In some cases, this can + dramatically affect application throughput. The + workaround to this problem is to create the shared + regions in system shared memory (DB_SYSTEM_MEM) or + application private memory (DB_PRIVATE), or, in + cases where this behavior is configurable, to turn off + the operating system's flushing of memory-mapped + pages. +
large key/data items: Storing large key/data items in a database can alter the performance -characteristics of Btree, Hash and Recno databases. The first parameter -to consider is the database page size. When a key/data item is too -large to be placed on a database page, it is stored on "overflow" pages -that are maintained outside of the normal database structure (typically, -items that are larger than one-quarter of the page size are deemed to -be too large). Accessing these overflow pages requires at least one -additional page reference over a normal access, so it is usually better -to increase the page size than to create a database with a large number -of overflow pages. Use the db_stat utility (or the statistics -returned by the DB->stat() method) to review the number of overflow -pages in the database. -
The second issue is using large key/data items instead of duplicate data -items. While this can offer performance gains to some applications -(because it is possible to retrieve several data items in a single get -call), once the key/data items are large enough to be pushed off-page, -they will slow the application down. Using duplicate data items is -usually the better choice in the long run.; + Storing large key/data items in a database can + alter the performance characteristics of Btree, Hash + and Recno databases. The first parameter to consider + is the database page size. When a key/data item is too + large to be placed on a database page, it is stored on + "overflow" pages that are maintained outside of the + normal database structure (typically, items that are + larger than one-quarter of the page size are deemed to + be too large). Accessing these overflow pages requires + at least one additional page reference over a normal + access, so it is usually better to increase the page + size than to create a database with a large number of + overflow pages. Use the db_stat utility (or the statistics + returned by the DB->stat() method) to review the number + of overflow pages in the database. +
+ The second + issue is using large key/data items instead of + duplicate data items. While this can offer + performance gains to some applications (because it + is possible to retrieve several data items in a + single get call), once the key/data items are + large enough to be pushed off-page, they will slow + the application down. Using duplicate data items + is usually the better choice in the long + run. +

A common question when tuning Berkeley DB applications is scalability. For -example, people will ask why, when adding additional threads or -processes to an application, the overall database throughput decreases, -even when all of the operations are read-only queries.

First, while read-only operations are logically concurrent, they still -have to acquire mutexes on internal Berkeley DB data structures. For example, -when searching a linked list and looking for a database page, the linked -list has to be locked against other threads of control attempting to add -or remove pages from the linked list. The more threads of control you -add, the more contention there will be for those shared data structure -resources.

Second, once contention starts happening, applications will also start -to see threads of control convoy behind locks (especially on -architectures supporting only test-and-set spin mutexes, rather than -blocking mutexes). On test-and-set architectures, threads of control -waiting for locks must attempt to acquire the mutex, sleep, check the -mutex again, and so on. Each failed check of the mutex and subsequent -sleep wastes CPU and decreases the overall throughput of the system.

Third, every time a thread acquires a shared mutex, it has to shoot down -other references to that memory in every other CPU on the system. Many -modern snoopy cache architectures have slow shoot down characteristics.

Fourth, schedulers don't care what application-specific mutexes a thread -of control might hold when de-scheduling a thread. If a thread of -control is descheduled while holding a shared data structure mutex, -other threads of control will be blocked until the scheduler decides to -run the blocking thread of control again. The more threads of control -that are running, the smaller their quanta of CPU time, and the more -likely they will be descheduled while holding a Berkeley DB mutex.

The results of adding new threads of control to an application, on the -application's throughput, is application and hardware specific and -almost entirely dependent on the application's data access pattern and -hardware. In general, using operating systems that support blocking -mutexes will often make a tremendous difference, and limiting threads -of control to to some small multiple of the number of CPUs is usually -the right choice to make.

+ A common question when tuning Berkeley DB applications is + scalability. For example, people will ask why, when adding + additional threads or processes to an application, the overall + database throughput decreases, even when all of the operations + are read-only queries. +

+ First, while read-only operations are logically concurrent, + they still have to acquire mutexes on internal Berkeley DB + data structures. For example, when searching a linked list and + looking for a database page, the linked list has to be locked + against other threads of control attempting to add or remove + pages from the linked list. The more threads of control you + add, the more contention there will be for those shared data + structure resources. +

+ Second, once contention starts happening, applications will + also start to see threads of control convoy behind locks + (especially on architectures supporting only test-and-set spin + mutexes, rather than blocking mutexes). On test-and-set + architectures, threads of control waiting for locks must + attempt to acquire the mutex, sleep, check the mutex again, + and so on. Each failed check of the mutex and subsequent sleep + wastes CPU and decreases the overall throughput of the + system. +

+ Third, every time a thread acquires a shared mutex, it has + to shoot down other references to that memory in every other + CPU on the system. Many modern snoopy cache architectures have + slow shoot down characteristics. +

+ Fourth, schedulers don't care what application-specific + mutexes a thread of control might hold when de-scheduling a + thread. If a thread of control is descheduled while holding a + shared data structure mutex, other threads of control will be + blocked until the scheduler decides to run the blocking thread + of control again. The more threads of control that are + running, the smaller their quanta of CPU time, and the more + likely they will be descheduled while holding a Berkeley DB + mutex. +

+ The results of adding new threads of control to an + application, on the application's throughput, is application + and hardware specific and almost entirely dependent on the + application's data access pattern and hardware. In general, + using operating systems that support blocking mutexes will + often make a tremendous difference, and limiting threads of + control to to some small multiple of the number of CPUs is + usually the right choice to make. +