Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz.HEAD db-6.1.23 master

author: Lorry Tar Creator <lorry-tar-importer@baserock.org> 2015-02-17 17:25:57 +0000
committer: <> 2015-03-17 16:26:24 +0000
commit: 780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch)
tree: 598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/am_misc_tune.html
parent: 7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff)
download: berkeleydb-master.tar.gz
1 files changed, 156 insertions, 102 deletions
diff --git a/docs/programmer_reference/am_misc_tune.html b/docs/programmer_reference/am_misc_tune.html
index 8e22aa03..b5f9ad51 100644
--- a/docs/programmer_reference/am_misc_tune.html
+++ b/docs/programmer_reference/am_misc_tune.html
@@ -14,7 +14,7 @@
   <body>
     <div xmlns="" class="navheader">
       <div class="libver">
-        <p>Library Version 11.2.5.3</p>
+        <p>Library Version 12.1.6.1</p>
       </div>
       <table width="100%" summary="Navigation header">
         <tr>
@@ -22,9 +22,7 @@
         </tr>
         <tr>
           <td width="20%" align="left"><a accesskey="p" href="am_misc_db_sql.html">Prev</a> </td>
-          <th width="60%" align="center">Chapter 4. 
-		Access Method Wrapup
-        </th>
+          <th width="60%" align="center">Chapter 4.  Access Method Wrapup </th>
           <td width="20%" align="right"> <a accesskey="n" href="am_misc_faq.html">Next</a></td>
         </tr>
       </table>
@@ -38,119 +36,174 @@
           </div>
         </div>
       </div>
-      <p>There are a few different issues to consider when tuning the performance
-of Berkeley DB access method applications.</p>
+      <p>
+        There are a few different issues to consider when tuning the
+        performance of Berkeley DB access method applications.
+    </p>
       <div class="variablelist">
         <dl>
           <dt>
             <span class="term">access method</span>
           </dt>
-          <dd>An application's choice of a database access method can significantly
-affect performance.  Applications using fixed-length records and integer
-keys are likely to get better performance from the Queue access method.
-Applications using variable-length records are likely to get better
-performance from the Btree access method, as it tends to be faster for
-most applications than either the Hash or Recno access methods.  Because
-the access method APIs are largely identical between the Berkeley DB access
-methods, it is easy for applications to benchmark the different access
-methods against each other.  See <a class="xref" href="am_conf_select.html" title="Selecting an access method">Selecting an access method</a> for more information.</dd>
+          <dd>
+                An application's choice of a database access
+                method can significantly affect performance.
+                Applications using fixed-length records and integer
+                keys are likely to get better performance from the
+                Queue access method. Applications using
+                variable-length records are likely to get better
+                performance from the Btree access method, as it tends
+                to be faster for most applications than either the
+                Hash or Recno access methods. Because the access
+                method APIs are largely identical between the Berkeley
+                DB access methods, it is easy for applications to
+                benchmark the different access methods against each
+                other. See <a class="xref" href="am_conf_select.html" title="Selecting an access method">Selecting an access method</a> for more
+                information.
+            </dd>
           <dt>
             <span class="term">cache size</span>
           </dt>
-          <dd>The Berkeley DB database cache defaults to a fairly small size, and most
-applications concerned with performance will want to set it explicitly.
-Using a too-small cache will result in horrible performance.  The first
-step in tuning the cache size is to use the db_stat utility (or the
-statistics returned by the <a href="../api_reference/C/dbstat.html" class="olink">DB-&gt;stat()</a> function) to measure the
-effectiveness of the cache.  The goal is to maximize the cache's hit
-rate.  Typically, increasing the size of the cache until the hit rate
-reaches 100% or levels off will yield the best performance.  However,
-if your working set is sufficiently large, you will be limited by the
-system's available physical memory.  Depending on the virtual memory
-and file system buffering policies of your system, and the requirements
-of other applications, the maximum cache size will be some amount
-smaller than the size of physical memory.  If you find that
-the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility shows that increasing the cache size improves your hit
-rate, but performance is not improving (or is getting worse), then it's
-likely you've hit other system limitations.  At this point, you should
-review the system's swapping/paging activity and limit the size of the
-cache to the maximum size possible without triggering paging activity.
-Finally, always remember to make your measurements under conditions as
-close as possible to the conditions your deployed application will run
-under, and to test your final choices under worst-case conditions.</dd>
+          <dd>
+                The Berkeley DB database cache defaults to a
+                fairly small size, and most applications concerned
+                with performance will want to set it explicitly. Using
+                a too-small cache will result in horrible performance.
+                The first step in tuning the cache size is to use the
+                db_stat utility (or the statistics returned by the
+                <a href="../api_reference/C/dbstat.html" class="olink">DB-&gt;stat()</a> function) to measure the effectiveness of the
+                cache. The goal is to maximize the cache's hit rate.
+                Typically, increasing the size of the cache until the
+                hit rate reaches 100% or levels off will yield the
+                best performance. However, if your working set is
+                sufficiently large, you will be limited by the
+                system's available physical memory. Depending on the
+                virtual memory and file system buffering policies of
+                your system, and the requirements of other
+                applications, the maximum cache size will be some
+                amount smaller than the size of physical memory. If
+                you find that the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility shows that increasing the
+                cache size improves your hit rate, but performance is
+                not improving (or is getting worse), then it's likely
+                you've hit other system limitations. At this point,
+                you should review the system's swapping/paging
+                activity and limit the size of the cache to the
+                maximum size possible without triggering paging
+                activity. Finally, always remember to make your
+                measurements under conditions as close as possible to
+                the conditions your deployed application will run
+                under, and to test your final choices under worst-case
+                conditions.
+            </dd>
           <dt>
             <span class="term">shared memory</span>
           </dt>
-          <dd>By default, Berkeley DB creates its database environment shared regions in
-filesystem backed memory.  Some systems do not distinguish between
-regular filesystem pages and memory-mapped pages backed by the
-filesystem, when selecting dirty pages to be flushed back to disk.  For
-this reason, dirtying pages in the Berkeley DB cache may cause intense
-filesystem activity, typically when the filesystem sync thread or
-process is run.  In some cases, this can dramatically affect application
-throughput.  The workaround to this problem is to create the shared
-regions in system shared memory (<a href="../api_reference/C/envopen.html#envopen_DB_SYSTEM_MEM" class="olink">DB_SYSTEM_MEM</a>) or application
-private memory (<a href="../api_reference/C/envopen.html#envopen_DB_PRIVATE" class="olink">DB_PRIVATE</a>), or, in cases where this behavior
-is configurable, to turn off the operating system's flushing of
-memory-mapped pages.</dd>
+          <dd>
+                By default, Berkeley DB creates its database
+                environment shared regions in filesystem backed
+                memory. Some systems do not distinguish between
+                regular filesystem pages and memory-mapped pages
+                backed by the filesystem, when selecting dirty pages
+                to be flushed back to disk. For this reason, dirtying
+                pages in the Berkeley DB cache may cause intense
+                filesystem activity, typically when the filesystem
+                sync thread or process is run. In some cases, this can
+                dramatically affect application throughput. The
+                workaround to this problem is to create the shared
+                regions in system shared memory (<a href="../api_reference/C/envopen.html#envopen_DB_SYSTEM_MEM" class="olink">DB_SYSTEM_MEM</a>) or
+                application private memory (<a href="../api_reference/C/envopen.html#envopen_DB_PRIVATE" class="olink">DB_PRIVATE</a>), or, in
+                cases where this behavior is configurable, to turn off
+                the operating system's flushing of memory-mapped
+                pages.
+            </dd>
           <dt>
             <span class="term">large key/data items</span>
           </dt>
-          <dd>Storing large key/data items in a database can alter the performance
-characteristics of Btree, Hash and Recno databases.  The first parameter
-to consider is the database page size.  When a key/data item is too
-large to be placed on a database page, it is stored on "overflow" pages
-that are maintained outside of the normal database structure (typically,
-items that are larger than one-quarter of the page size are deemed to
-be too large).  Accessing these overflow pages requires at least one
-additional page reference over a normal access, so it is usually better
-to increase the page size than to create a database with a large number
-of overflow pages.  Use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility (or the statistics
-returned by the <a href="../api_reference/C/dbstat.html" class="olink">DB-&gt;stat()</a> method) to review the number of overflow
-pages in the database.
-<p>The second issue is using large key/data items instead of duplicate data
-items.  While this can offer performance gains to some applications
-(because it is possible to retrieve several data items in a single get
-call), once the key/data items are large enough to be pushed off-page,
-they will slow the application down.  Using duplicate data items is
-usually the better choice in the long run.</p></dd>
+          <dd>
+                Storing large key/data items in a database can
+                alter the performance characteristics of Btree, Hash
+                and Recno databases. The first parameter to consider
+                is the database page size. When a key/data item is too
+                large to be placed on a database page, it is stored on
+                "overflow" pages that are maintained outside of the
+                normal database structure (typically, items that are
+                larger than one-quarter of the page size are deemed to
+                be too large). Accessing these overflow pages requires
+                at least one additional page reference over a normal
+                access, so it is usually better to increase the page
+                size than to create a database with a large number of
+                overflow pages. Use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility (or the statistics
+                returned by the <a href="../api_reference/C/dbstat.html" class="olink">DB-&gt;stat()</a> method) to review the number
+                of overflow pages in the database. 
+                <p>
+                    The second
+                    issue is using large key/data items instead of
+                    duplicate data items. While this can offer
+                    performance gains to some applications (because it
+                    is possible to retrieve several data items in a
+                    single get call), once the key/data items are
+                    large enough to be pushed off-page, they will slow
+                    the application down. Using duplicate data items
+                    is usually the better choice in the long
+                    run.
+                </p></dd>
         </dl>
       </div>
-      <p>A common question when tuning Berkeley DB applications is scalability.  For
-example, people will ask why, when adding additional threads or
-processes to an application, the overall database throughput decreases,
-even when all of the operations are read-only queries.</p>
-      <p>First, while read-only operations are logically concurrent, they still
-have to acquire mutexes on internal Berkeley DB data structures.  For example,
-when searching a linked list and looking for a database page, the linked
-list has to be locked against other threads of control attempting to add
-or remove pages from the linked list.  The more threads of control you
-add, the more contention there will be for those shared data structure
-resources.</p>
-      <p>Second, once contention starts happening, applications will also start
-to see threads of control convoy behind locks (especially on
-architectures supporting only test-and-set spin mutexes, rather than
-blocking mutexes).  On test-and-set architectures, threads of control
-waiting for locks must attempt to acquire the mutex, sleep, check the
-mutex again, and so on.  Each failed check of the mutex and subsequent
-sleep wastes CPU and decreases the overall throughput of the system.</p>
-      <p>Third, every time a thread acquires a shared mutex, it has to shoot down
-other references to that memory in every other CPU on the system.  Many
-modern snoopy cache architectures have slow shoot down characteristics.</p>
-      <p>Fourth, schedulers don't care what application-specific mutexes a thread
-of control might hold when de-scheduling a thread.  If a thread of
-control is descheduled while holding a shared data structure mutex,
-other threads of control will be blocked until the scheduler decides to
-run the blocking thread of control again.  The more threads of control
-that are running, the smaller their quanta of CPU time, and the more
-likely they will be descheduled while holding a Berkeley DB mutex.</p>
-      <p>The results of adding new threads of control to an application, on the
-application's throughput, is application and hardware specific and
-almost entirely dependent on the application's data access pattern and
-hardware.  In general, using operating systems that support blocking
-mutexes will often make a tremendous difference, and limiting threads
-of control to to some small multiple of the number of CPUs is usually
-the right choice to make.</p>
+      <p>
+        A common question when tuning Berkeley DB applications is
+        scalability. For example, people will ask why, when adding
+        additional threads or processes to an application, the overall
+        database throughput decreases, even when all of the operations
+        are read-only queries.
+    </p>
+      <p>
+        First, while read-only operations are logically concurrent,
+        they still have to acquire mutexes on internal Berkeley DB
+        data structures. For example, when searching a linked list and
+        looking for a database page, the linked list has to be locked
+        against other threads of control attempting to add or remove
+        pages from the linked list. The more threads of control you
+        add, the more contention there will be for those shared data
+        structure resources.
+    </p>
+      <p>
+        Second, once contention starts happening, applications will
+        also start to see threads of control convoy behind locks
+        (especially on architectures supporting only test-and-set spin
+        mutexes, rather than blocking mutexes). On test-and-set
+        architectures, threads of control waiting for locks must
+        attempt to acquire the mutex, sleep, check the mutex again,
+        and so on. Each failed check of the mutex and subsequent sleep
+        wastes CPU and decreases the overall throughput of the
+        system.
+    </p>
+      <p>
+        Third, every time a thread acquires a shared mutex, it has
+        to shoot down other references to that memory in every other
+        CPU on the system. Many modern snoopy cache architectures have
+        slow shoot down characteristics.
+    </p>
+      <p>
+        Fourth, schedulers don't care what application-specific
+        mutexes a thread of control might hold when de-scheduling a
+        thread. If a thread of control is descheduled while holding a
+        shared data structure mutex, other threads of control will be
+        blocked until the scheduler decides to run the blocking thread
+        of control again. The more threads of control that are
+        running, the smaller their quanta of CPU time, and the more
+        likely they will be descheduled while holding a Berkeley DB
+        mutex.
+    </p>
+      <p>
+        The results of adding new threads of control to an
+        application, on the application's throughput, is application
+        and hardware specific and almost entirely dependent on the
+        application's data access pattern and hardware. In general,
+        using operating systems that support blocking mutexes will
+        often make a tremendous difference, and limiting threads of
+        control to to some small multiple of the number of CPUs is
+        usually the right choice to make.
+    </p>
     </div>
     <div class="navfooter">
       <hr />
@@ -163,7 +216,8 @@ the right choice to make.</p>
           <td width="40%" align="right"> <a accesskey="n" href="am_misc_faq.html">Next</a></td>
         </tr>
         <tr>
-          <td width="40%" align="left" valign="top">Specifying a Berkeley DB schema using SQL DDL </td>
+          <td width="40%" align="left" valign="top">Specifying a Berkeley DB schema
+        using SQL DDL </td>
           <td width="20%" align="center">
             <a accesskey="h" href="index.html">Home</a>
           </td>
author	Lorry Tar Creator <lorry-tar-importer@baserock.org>	2015-02-17 17:25:57 +0000
committer	<>	2015-03-17 16:26:24 +0000
commit	780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch)
tree	598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/am_misc_tune.html
parent	7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff)
download	berkeleydb-master.tar.gz