summaryrefslogtreecommitdiff
path: root/docs/programmer_reference/am_misc_diskspace.html
diff options
context:
space:
mode:
authorLorry Tar Creator <lorry-tar-importer@baserock.org>2015-02-17 17:25:57 +0000
committer <>2015-03-17 16:26:24 +0000
commit780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch)
tree598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/am_misc_diskspace.html
parent7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff)
downloadberkeleydb-master.tar.gz
Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz.HEADdb-6.1.23master
Diffstat (limited to 'docs/programmer_reference/am_misc_diskspace.html')
-rw-r--r--docs/programmer_reference/am_misc_diskspace.html308
1 files changed, 187 insertions, 121 deletions
diff --git a/docs/programmer_reference/am_misc_diskspace.html b/docs/programmer_reference/am_misc_diskspace.html
index 8d33bd15..2f565f81 100644
--- a/docs/programmer_reference/am_misc_diskspace.html
+++ b/docs/programmer_reference/am_misc_diskspace.html
@@ -9,23 +9,22 @@
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="am_misc.html" title="Chapter 4.  Access Method Wrapup" />
<link rel="prev" href="am_misc_dbsizes.html" title="Database limits" />
- <link rel="next" href="am_misc_db_sql.html" title="Specifying a Berkeley DB schema using SQL DDL" />
+ <link rel="next" href="blobs.html" title="BLOB support" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
- <p>Library Version 11.2.5.3</p>
+ <p>Library Version 12.1.6.1</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
- <th colspan="3" align="center">Disk space requirements</th>
+ <th colspan="3" align="center">Disk space
+ requirements</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="am_misc_dbsizes.html">Prev</a> </td>
- <th width="60%" align="center">Chapter 4. 
- Access Method Wrapup
- </th>
- <td width="20%" align="right"> <a accesskey="n" href="am_misc_db_sql.html">Next</a></td>
+ <th width="60%" align="center">Chapter 4.  Access Method Wrapup </th>
+ <td width="20%" align="right"> <a accesskey="n" href="blobs.html">Next</a></td>
</tr>
</table>
<hr />
@@ -34,7 +33,8 @@
<div class="titlepage">
<div>
<div>
- <h2 class="title" style="clear: both"><a id="am_misc_diskspace"></a>Disk space requirements</h2>
+ <h2 class="title" style="clear: both"><a id="am_misc_diskspace"></a>Disk space
+ requirements</h2>
</div>
</div>
</div>
@@ -42,158 +42,224 @@
<dl>
<dt>
<span class="sect2">
- <a href="am_misc_diskspace.html#idp1074008">Btree</a>
+ <a href="am_misc_diskspace.html#idp595712">Btree</a>
</span>
</dt>
<dt>
<span class="sect2">
- <a href="am_misc_diskspace.html#idp1074072">Hash</a>
+ <a href="am_misc_diskspace.html#idp595776">Hash</a>
</span>
</dt>
</dl>
</div>
- <p>It is possible to estimate the total database size based on the size of
-the data. The following calculations are an estimate of how many bytes
-you will need to hold a set of data and then how many pages it will take
-to actually store it on disk.</p>
- <p>Space freed by deleting key/data pairs from a Btree or Hash database is
-never returned to the filesystem, although it is reused where possible.
-This means that the Btree and Hash databases are grow-only. If enough
-keys are deleted from a database that shrinking the underlying file is
-desirable, you should use the <a href="../api_reference/C/dbcompact.html" class="olink">DB-&gt;compact()</a> method to reclaim disk space. Alternatively,
-you can create a new database and copy the records from
-the old one into it.</p>
- <p>These are rough estimates at best. For example, they do not take into
-account overflow records, filesystem metadata information, large sets
-of duplicate data items (where the key is only stored once), or
-real-life situations where the sizes of key and data items are wildly
-variable, and the page-fill factor changes over time.</p>
+ <p>
+ It is possible to estimate the total database size based on
+ the size of the data. The following calculations are an
+ estimate of how many bytes you will need to hold a set of data
+ and then how many pages it will take to actually store it on
+ disk.
+ </p>
+ <p>
+ Space freed by deleting key/data pairs from a Btree or Hash
+ database is never returned to the filesystem, although it is
+ reused where possible. This means that the Btree and Hash
+ databases are grow-only. If enough keys are deleted from a
+ database that shrinking the underlying file is desirable, you
+ should use the <a href="../api_reference/C/dbcompact.html" class="olink">DB-&gt;compact()</a> method to reclaim disk space.
+ Alternatively, you can create a new database and copy the
+ records from the old one into it.
+ </p>
+ <p>
+ These are rough estimates at best. For example, they do not
+ take into account overflow records, filesystem metadata
+ information, large sets of duplicate data items (where the key
+ is only stored once), or real-life situations where the sizes
+ of key and data items are wildly variable, and the page-fill
+ factor changes over time.
+ </p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
- <h3 class="title"><a id="idp1074008"></a>Btree</h3>
+ <h3 class="title"><a id="idp595712"></a>Btree</h3>
</div>
</div>
</div>
- <p>The formulas for the Btree access method are as follows:</p>
+ <p>
+ The formulas for the Btree access method are as
+ follows:
+ </p>
<pre class="programlisting">useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor
-<p></p>
+
bytes-of-data = n-records *
- (bytes-per-entry + page-overhead-for-two-entries)
-<p></p>
-n-pages-of-data = bytes-of-data / useful-bytes-per-page
-<p></p>
-total-bytes-on-disk = n-pages-of-data * page-size
-</pre>
- <p>The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a measure of the bytes on each page
-that will actually hold the application data. It is computed as the total
-number of bytes on the page that are available to hold application data,
-corrected by the percentage of the page that is likely to contain data.
-The reason for this correction is that the percentage of a page that
-contains application data can vary from close to 50% after a page split
-to almost 100% if the entries in the database were inserted in sorted
-order. Obviously, the <span class="bold"><strong>page-fill-factor</strong></span> can drastically alter
-the amount of disk space required to hold any particular data set. The
-page-fill factor of any existing database can be displayed using the
-<a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility.</p>
- <p>The page-overhead for Btree databases is 26 bytes. As an example, using
-an 8K page size, with an 85% page-fill factor, there are 6941 bytes of
-useful space on each page:</p>
+(bytes-per-entry + page-overhead-for-two-entries)
+
+n-pages-of-data = bytes-of-data / useful-bytes-per-page
+
+total-bytes-on-disk = n-pages-of-data * page-size</pre>
+ <p>
+ The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a
+ measure of the bytes on each page that will actually hold the application
+ data. It is computed as the total number of bytes on the
+ page that are available to hold application data,
+ corrected by the percentage of the page that is likely to
+ contain data. The reason for this correction is that the
+ percentage of a page that contains application data can
+ vary from close to 50% after a page split to almost 100%
+ if the entries in the database were inserted in sorted
+ order. Obviously, the <span class="bold"><strong>page-fill-factor</strong></span>
+ can drastically alter the amount of disk space required to hold any
+ particular data set. The page-fill factor of any existing database can be
+ displayed using the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility.
+ </p>
+ <p>
+ The page-overhead for Btree databases is 26 bytes. As an
+ example, using an 8K page size, with an 85% page-fill
+ factor, there are 6941 bytes of useful space on each
+ page:
+ </p>
<pre class="programlisting">6941 = (8192 - 26) * .85</pre>
- <p>The total <span class="bold"><strong>bytes-of-data</strong></span> is an easy calculation: It is the
-number of key or data items plus the overhead required to store each
-item on a page. The overhead to store a key or data item on a Btree
-page is 5 bytes. So, it would take 1560000000 bytes, or roughly 1.34GB
-of total data to store 60,000,000 key/data pairs, assuming each key or
-data item was 8 bytes long:</p>
+ <p>
+ The total <span class="bold"><strong>bytes-of-data</strong></span>
+ is an easy calculation: It is the number of key or data
+ items plus the overhead required to store each item on a
+ page. The overhead to store a key or data item on a Btree
+ page is 5 bytes. So, it would take 1560000000 bytes, or
+ roughly 1.34GB of total data to store 60,000,000 key/data
+ pairs, assuming each key or data item was 8 bytes
+ long:
+ </p>
<pre class="programlisting">1560000000 = 60000000 * ((8 + 5) * 2)</pre>
- <p>The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>, is the
-<span class="bold"><strong>bytes-of-data</strong></span> divided by the <span class="bold"><strong>useful-bytes-per-page</strong></span>. In
-the example, there are 224751 pages of data.</p>
+ <p>
+ The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>,
+ is the <span class="bold"><strong>bytes-of-data</strong></span> divided by the
+ <span class="bold"><strong>useful-bytes-per-page</strong></span>. In the example,
+ there are 224751 pages of data.
+ </p>
<pre class="programlisting">224751 = 1560000000 / 6941</pre>
- <p>The total bytes of disk space for the database is <span class="bold"><strong>n-pages-of-data</strong></span>
-multiplied by the <span class="bold"><strong>page-size</strong></span>. In the example, the result is
-1841160192 bytes, or roughly 1.71GB.</p>
+ <p>
+ The total bytes of disk space for the database is
+ <span class="bold"><strong>n-pages-of-data</strong></span>
+ multiplied by the <span class="bold"><strong>page-size</strong></span>. In the
+ example, the result is 1841160192 bytes, or roughly 1.71GB.
+ </p>
<pre class="programlisting">1841160192 = 224751 * 8192</pre>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
- <h3 class="title"><a id="idp1074072"></a>Hash</h3>
+ <h3 class="title"><a id="idp595776"></a>Hash</h3>
</div>
</div>
</div>
- <p>The formulas for the Hash access method are as follows:</p>
+ <p>
+ The formulas for the Hash access method are as
+ follows:
+ </p>
<pre class="programlisting">useful-bytes-per-page = (page-size - page-overhead)
-<p></p>
+
bytes-of-data = n-records *
- (bytes-per-entry + page-overhead-for-two-entries)
-<p></p>
+(bytes-per-entry + page-overhead-for-two-entries)
+
n-pages-of-data = bytes-of-data / useful-bytes-per-page
-<p></p>
-total-bytes-on-disk = n-pages-of-data * page-size
-</pre>
- <p>The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a measure of the bytes on each page
-that will actually hold the application data. It is computed as the total
-number of bytes on the page that are available to hold application data.
-If the application has explicitly set a page-fill factor, pages will
-not necessarily be kept full. For databases with a preset fill factor,
-see the calculation below. The page-overhead for Hash databases is 26
-bytes and the page-overhead-for-two-entries is 6 bytes.</p>
- <p>As an example, using an 8K page size, there are 8166 bytes of useful space
-on each page:</p>
+
+total-bytes-on-disk = n-pages-of-data * page-size</pre>
+ <p>
+ The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a measure
+ of the bytes on each page that will actually hold the application
+ data. It is computed as the total number of bytes on the
+ page that are available to hold application data. If the
+ application has explicitly set a page-fill factor, pages
+ will not necessarily be kept full. For databases with a
+ preset fill factor, see the calculation below. The
+ page-overhead for Hash databases is 26 bytes and the
+ page-overhead-for-two-entries is 6 bytes.
+ </p>
+ <p>
+ As an example, using an 8K page size, there are 8166
+ bytes of useful space on each page:
+ </p>
<pre class="programlisting">8166 = (8192 - 26)</pre>
- <p>The total <span class="bold"><strong>bytes-of-data</strong></span> is an easy calculation: it is the number
-of key/data pairs plus the overhead required to store each pair on a page.
-In this case that's 6 bytes per pair. So, assuming 60,000,000 key/data
-pairs, each of which is 8 bytes long, there are 1320000000 bytes, or
-roughly 1.23GB of total data:</p>
+ <p>
+ The total <span class="bold"><strong>bytes-of-data</strong></span>
+ is an easy calculation: it is the number of key/data pairs
+ plus the overhead required to store each pair on a page.
+ In this case that's 6 bytes per pair. So, assuming
+ 60,000,000 key/data pairs, each of which is 8 bytes long,
+ there are 1320000000 bytes, or roughly 1.23GB of total
+ data:
+ </p>
<pre class="programlisting">1320000000 = 60000000 * (16 + 6)</pre>
- <p>The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>, is the
-<span class="bold"><strong>bytes-of-data</strong></span> divided by the <span class="bold"><strong>useful-bytes-per-page</strong></span>. In
-this example, there are 161646 pages of data.</p>
+ <p>
+ The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>,
+ is the <span class="bold"><strong>bytes-of-data</strong></span> divided by the
+ <span class="bold"><strong>useful-bytes-per-page</strong></span>. In this example,
+ there are 161646 pages of data.
+ </p>
<pre class="programlisting">161646 = 1320000000 / 8166</pre>
- <p>The total bytes of disk space for the database is <span class="bold"><strong>n-pages-of-data</strong></span>
-multiplied by the <span class="bold"><strong>page-size</strong></span>. In the example, the result is
-1324204032 bytes, or roughly 1.23GB.</p>
+ <p>
+ The total bytes of disk space for the database is
+ <span class="bold"><strong>n-pages-of-data</strong></span>
+ multiplied by the <span class="bold"><strong>page-size</strong></span>. In the
+ example, the result is 1324204032 bytes, or roughly 1.23GB.
+ </p>
<pre class="programlisting">1324204032 = 161646 * 8192</pre>
- <p>Now, let's assume that the application specified a fill factor explicitly.
-The fill factor indicates the target number of items to place on a single
-page (a fill factor might reduce the utilization of each page, but it can
-be useful in avoiding splits and preventing buckets from becoming too
-large). Using our estimates above, each item is 22 bytes (16 + 6), and
-there are 8166 useful bytes on a page (8192 - 26). That means that, on
-average, you can fit 371 pairs per page.</p>
+ <p>
+ Now, let's assume that the application specified a fill
+ factor explicitly. The fill factor indicates the target
+ number of items to place on a single page (a fill factor
+ might reduce the utilization of each page, but it can be
+ useful in avoiding splits and preventing buckets from
+ becoming too large). Using our estimates above, each item
+ is 22 bytes (16 + 6), and there are 8166 useful bytes on a
+ page (8192 - 26). That means that, on average, you can fit
+ 371 pairs per page.
+ </p>
<pre class="programlisting">371 = 8166 / 22</pre>
- <p>However, let's assume that the application designer knows that although
-most items are 8 bytes, they can sometimes be as large as 10, and it's
-very important to avoid overflowing buckets and splitting. Then, the
-application might specify a fill factor of 314.</p>
+ <p>
+ However, let's assume that the application designer
+ knows that although most items are 8 bytes, they can
+ sometimes be as large as 10, and it's very important to
+ avoid overflowing buckets and splitting. Then, the
+ application might specify a fill factor of 314.
+ </p>
<pre class="programlisting">314 = 8166 / 26</pre>
- <p>With a fill factor of 314, then the formula for computing database size
-is</p>
+ <p>
+ With a fill factor of 314, then the formula for
+ computing database size is
+ </p>
<pre class="programlisting">n-pages-of-data = npairs / pairs-per-page</pre>
- <p>or 191082.</p>
+ <p>
+ or 191082.
+ </p>
<pre class="programlisting">191082 = 60000000 / 314</pre>
- <p>At 191082 pages, the total database size would be 1565343744, or 1.46GB.</p>
+ <p>
+ At 191082 pages, the total database size would be
+ 1565343744, or 1.46GB.
+ </p>
<pre class="programlisting">1565343744 = 191082 * 8192</pre>
- <p>There are a few additional caveats with respect to Hash databases. This
-discussion assumes that the hash function does a good job of evenly
-distributing keys among hash buckets. If the function does not do this,
-you may find your table growing significantly larger than you expected.
-Secondly, in order to provide support for Hash databases coexisting with
-other databases in a single file, pages within a Hash database are
-allocated in power-of-two chunks. That means that a Hash database with 65
-buckets will take up as much space as a Hash database with 128 buckets;
-each time the Hash database grows beyond its current power-of-two number
-of buckets, it allocates space for the next power-of-two buckets. This
-space may be sparsely allocated in the file system, but the files will
-appear to be their full size. Finally, because of this need for
-contiguous allocation, overflow pages and duplicate pages can be allocated
-only at specific points in the file, and this too can lead to sparse hash
-tables.</p>
+ <p>
+ There are a few additional caveats with respect to Hash
+ databases. This discussion assumes that the hash function
+ does a good job of evenly distributing keys among hash
+ buckets. If the function does not do this, you may find
+ your table growing significantly larger than you expected.
+ Secondly, in order to provide support for Hash databases
+ coexisting with other databases in a single file, pages
+ within a Hash database are allocated in power-of-two
+ chunks. That means that a Hash database with 65 buckets
+ will take up as much space as a Hash database with 128
+ buckets; each time the Hash database grows beyond its
+ current power-of-two number of buckets, it allocates space
+ for the next power-of-two buckets. This space may be
+ sparsely allocated in the file system, but the files will
+ appear to be their full size. Finally, because of this
+ need for contiguous allocation, overflow pages and
+ duplicate pages can be allocated only at specific points
+ in the file, and this too can lead to sparse hash
+ tables.
+ </p>
</div>
</div>
<div class="navfooter">
@@ -204,14 +270,14 @@ tables.</p>
<td width="20%" align="center">
<a accesskey="u" href="am_misc.html">Up</a>
</td>
- <td width="40%" align="right"> <a accesskey="n" href="am_misc_db_sql.html">Next</a></td>
+ <td width="40%" align="right"> <a accesskey="n" href="blobs.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Database limits </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
- <td width="40%" align="right" valign="top"> Specifying a Berkeley DB schema using SQL DDL</td>
+ <td width="40%" align="right" valign="top"> BLOB support</td>
</tr>
</table>
</div>