diff options
| author | Lorry Tar Creator <lorry-tar-importer@baserock.org> | 2015-02-17 17:25:57 +0000 |
|---|---|---|
| committer | <> | 2015-03-17 16:26:24 +0000 |
| commit | 780b92ada9afcf1d58085a83a0b9e6bc982203d1 (patch) | |
| tree | 598f8b9fa431b228d29897e798de4ac0c1d3d970 /docs/programmer_reference/am_misc_diskspace.html | |
| parent | 7a2660ba9cc2dc03a69ddfcfd95369395cc87444 (diff) | |
| download | berkeleydb-master.tar.gz | |
Diffstat (limited to 'docs/programmer_reference/am_misc_diskspace.html')
| -rw-r--r-- | docs/programmer_reference/am_misc_diskspace.html | 308 |
1 files changed, 187 insertions, 121 deletions
diff --git a/docs/programmer_reference/am_misc_diskspace.html b/docs/programmer_reference/am_misc_diskspace.html index 8d33bd15..2f565f81 100644 --- a/docs/programmer_reference/am_misc_diskspace.html +++ b/docs/programmer_reference/am_misc_diskspace.html @@ -9,23 +9,22 @@ <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" /> <link rel="up" href="am_misc.html" title="Chapter 4. Access Method Wrapup" /> <link rel="prev" href="am_misc_dbsizes.html" title="Database limits" /> - <link rel="next" href="am_misc_db_sql.html" title="Specifying a Berkeley DB schema using SQL DDL" /> + <link rel="next" href="blobs.html" title="BLOB support" /> </head> <body> <div xmlns="" class="navheader"> <div class="libver"> - <p>Library Version 11.2.5.3</p> + <p>Library Version 12.1.6.1</p> </div> <table width="100%" summary="Navigation header"> <tr> - <th colspan="3" align="center">Disk space requirements</th> + <th colspan="3" align="center">Disk space + requirements</th> </tr> <tr> <td width="20%" align="left"><a accesskey="p" href="am_misc_dbsizes.html">Prev</a> </td> - <th width="60%" align="center">Chapter 4. - Access Method Wrapup - </th> - <td width="20%" align="right"> <a accesskey="n" href="am_misc_db_sql.html">Next</a></td> + <th width="60%" align="center">Chapter 4. Access Method Wrapup </th> + <td width="20%" align="right"> <a accesskey="n" href="blobs.html">Next</a></td> </tr> </table> <hr /> @@ -34,7 +33,8 @@ <div class="titlepage"> <div> <div> - <h2 class="title" style="clear: both"><a id="am_misc_diskspace"></a>Disk space requirements</h2> + <h2 class="title" style="clear: both"><a id="am_misc_diskspace"></a>Disk space + requirements</h2> </div> </div> </div> @@ -42,158 +42,224 @@ <dl> <dt> <span class="sect2"> - <a href="am_misc_diskspace.html#idp1074008">Btree</a> + <a href="am_misc_diskspace.html#idp595712">Btree</a> </span> </dt> <dt> <span class="sect2"> - <a href="am_misc_diskspace.html#idp1074072">Hash</a> + <a href="am_misc_diskspace.html#idp595776">Hash</a> </span> </dt> </dl> </div> - <p>It is possible to estimate the total database size based on the size of -the data. The following calculations are an estimate of how many bytes -you will need to hold a set of data and then how many pages it will take -to actually store it on disk.</p> - <p>Space freed by deleting key/data pairs from a Btree or Hash database is -never returned to the filesystem, although it is reused where possible. -This means that the Btree and Hash databases are grow-only. If enough -keys are deleted from a database that shrinking the underlying file is -desirable, you should use the <a href="../api_reference/C/dbcompact.html" class="olink">DB->compact()</a> method to reclaim disk space. Alternatively, -you can create a new database and copy the records from -the old one into it.</p> - <p>These are rough estimates at best. For example, they do not take into -account overflow records, filesystem metadata information, large sets -of duplicate data items (where the key is only stored once), or -real-life situations where the sizes of key and data items are wildly -variable, and the page-fill factor changes over time.</p> + <p> + It is possible to estimate the total database size based on + the size of the data. The following calculations are an + estimate of how many bytes you will need to hold a set of data + and then how many pages it will take to actually store it on + disk. + </p> + <p> + Space freed by deleting key/data pairs from a Btree or Hash + database is never returned to the filesystem, although it is + reused where possible. This means that the Btree and Hash + databases are grow-only. If enough keys are deleted from a + database that shrinking the underlying file is desirable, you + should use the <a href="../api_reference/C/dbcompact.html" class="olink">DB->compact()</a> method to reclaim disk space. + Alternatively, you can create a new database and copy the + records from the old one into it. + </p> + <p> + These are rough estimates at best. For example, they do not + take into account overflow records, filesystem metadata + information, large sets of duplicate data items (where the key + is only stored once), or real-life situations where the sizes + of key and data items are wildly variable, and the page-fill + factor changes over time. + </p> <div class="sect2" lang="en" xml:lang="en"> <div class="titlepage"> <div> <div> - <h3 class="title"><a id="idp1074008"></a>Btree</h3> + <h3 class="title"><a id="idp595712"></a>Btree</h3> </div> </div> </div> - <p>The formulas for the Btree access method are as follows:</p> + <p> + The formulas for the Btree access method are as + follows: + </p> <pre class="programlisting">useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor -<p></p> + bytes-of-data = n-records * - (bytes-per-entry + page-overhead-for-two-entries) -<p></p> -n-pages-of-data = bytes-of-data / useful-bytes-per-page -<p></p> -total-bytes-on-disk = n-pages-of-data * page-size -</pre> - <p>The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a measure of the bytes on each page -that will actually hold the application data. It is computed as the total -number of bytes on the page that are available to hold application data, -corrected by the percentage of the page that is likely to contain data. -The reason for this correction is that the percentage of a page that -contains application data can vary from close to 50% after a page split -to almost 100% if the entries in the database were inserted in sorted -order. Obviously, the <span class="bold"><strong>page-fill-factor</strong></span> can drastically alter -the amount of disk space required to hold any particular data set. The -page-fill factor of any existing database can be displayed using the -<a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility.</p> - <p>The page-overhead for Btree databases is 26 bytes. As an example, using -an 8K page size, with an 85% page-fill factor, there are 6941 bytes of -useful space on each page:</p> +(bytes-per-entry + page-overhead-for-two-entries) + +n-pages-of-data = bytes-of-data / useful-bytes-per-page + +total-bytes-on-disk = n-pages-of-data * page-size</pre> + <p> + The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a + measure of the bytes on each page that will actually hold the application + data. It is computed as the total number of bytes on the + page that are available to hold application data, + corrected by the percentage of the page that is likely to + contain data. The reason for this correction is that the + percentage of a page that contains application data can + vary from close to 50% after a page split to almost 100% + if the entries in the database were inserted in sorted + order. Obviously, the <span class="bold"><strong>page-fill-factor</strong></span> + can drastically alter the amount of disk space required to hold any + particular data set. The page-fill factor of any existing database can be + displayed using the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility. + </p> + <p> + The page-overhead for Btree databases is 26 bytes. As an + example, using an 8K page size, with an 85% page-fill + factor, there are 6941 bytes of useful space on each + page: + </p> <pre class="programlisting">6941 = (8192 - 26) * .85</pre> - <p>The total <span class="bold"><strong>bytes-of-data</strong></span> is an easy calculation: It is the -number of key or data items plus the overhead required to store each -item on a page. The overhead to store a key or data item on a Btree -page is 5 bytes. So, it would take 1560000000 bytes, or roughly 1.34GB -of total data to store 60,000,000 key/data pairs, assuming each key or -data item was 8 bytes long:</p> + <p> + The total <span class="bold"><strong>bytes-of-data</strong></span> + is an easy calculation: It is the number of key or data + items plus the overhead required to store each item on a + page. The overhead to store a key or data item on a Btree + page is 5 bytes. So, it would take 1560000000 bytes, or + roughly 1.34GB of total data to store 60,000,000 key/data + pairs, assuming each key or data item was 8 bytes + long: + </p> <pre class="programlisting">1560000000 = 60000000 * ((8 + 5) * 2)</pre> - <p>The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>, is the -<span class="bold"><strong>bytes-of-data</strong></span> divided by the <span class="bold"><strong>useful-bytes-per-page</strong></span>. In -the example, there are 224751 pages of data.</p> + <p> + The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>, + is the <span class="bold"><strong>bytes-of-data</strong></span> divided by the + <span class="bold"><strong>useful-bytes-per-page</strong></span>. In the example, + there are 224751 pages of data. + </p> <pre class="programlisting">224751 = 1560000000 / 6941</pre> - <p>The total bytes of disk space for the database is <span class="bold"><strong>n-pages-of-data</strong></span> -multiplied by the <span class="bold"><strong>page-size</strong></span>. In the example, the result is -1841160192 bytes, or roughly 1.71GB.</p> + <p> + The total bytes of disk space for the database is + <span class="bold"><strong>n-pages-of-data</strong></span> + multiplied by the <span class="bold"><strong>page-size</strong></span>. In the + example, the result is 1841160192 bytes, or roughly 1.71GB. + </p> <pre class="programlisting">1841160192 = 224751 * 8192</pre> </div> <div class="sect2" lang="en" xml:lang="en"> <div class="titlepage"> <div> <div> - <h3 class="title"><a id="idp1074072"></a>Hash</h3> + <h3 class="title"><a id="idp595776"></a>Hash</h3> </div> </div> </div> - <p>The formulas for the Hash access method are as follows:</p> + <p> + The formulas for the Hash access method are as + follows: + </p> <pre class="programlisting">useful-bytes-per-page = (page-size - page-overhead) -<p></p> + bytes-of-data = n-records * - (bytes-per-entry + page-overhead-for-two-entries) -<p></p> +(bytes-per-entry + page-overhead-for-two-entries) + n-pages-of-data = bytes-of-data / useful-bytes-per-page -<p></p> -total-bytes-on-disk = n-pages-of-data * page-size -</pre> - <p>The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a measure of the bytes on each page -that will actually hold the application data. It is computed as the total -number of bytes on the page that are available to hold application data. -If the application has explicitly set a page-fill factor, pages will -not necessarily be kept full. For databases with a preset fill factor, -see the calculation below. The page-overhead for Hash databases is 26 -bytes and the page-overhead-for-two-entries is 6 bytes.</p> - <p>As an example, using an 8K page size, there are 8166 bytes of useful space -on each page:</p> + +total-bytes-on-disk = n-pages-of-data * page-size</pre> + <p> + The <span class="bold"><strong>useful-bytes-per-page</strong></span> is a measure + of the bytes on each page that will actually hold the application + data. It is computed as the total number of bytes on the + page that are available to hold application data. If the + application has explicitly set a page-fill factor, pages + will not necessarily be kept full. For databases with a + preset fill factor, see the calculation below. The + page-overhead for Hash databases is 26 bytes and the + page-overhead-for-two-entries is 6 bytes. + </p> + <p> + As an example, using an 8K page size, there are 8166 + bytes of useful space on each page: + </p> <pre class="programlisting">8166 = (8192 - 26)</pre> - <p>The total <span class="bold"><strong>bytes-of-data</strong></span> is an easy calculation: it is the number -of key/data pairs plus the overhead required to store each pair on a page. -In this case that's 6 bytes per pair. So, assuming 60,000,000 key/data -pairs, each of which is 8 bytes long, there are 1320000000 bytes, or -roughly 1.23GB of total data:</p> + <p> + The total <span class="bold"><strong>bytes-of-data</strong></span> + is an easy calculation: it is the number of key/data pairs + plus the overhead required to store each pair on a page. + In this case that's 6 bytes per pair. So, assuming + 60,000,000 key/data pairs, each of which is 8 bytes long, + there are 1320000000 bytes, or roughly 1.23GB of total + data: + </p> <pre class="programlisting">1320000000 = 60000000 * (16 + 6)</pre> - <p>The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>, is the -<span class="bold"><strong>bytes-of-data</strong></span> divided by the <span class="bold"><strong>useful-bytes-per-page</strong></span>. In -this example, there are 161646 pages of data.</p> + <p> + The total pages of data, <span class="bold"><strong>n-pages-of-data</strong></span>, + is the <span class="bold"><strong>bytes-of-data</strong></span> divided by the + <span class="bold"><strong>useful-bytes-per-page</strong></span>. In this example, + there are 161646 pages of data. + </p> <pre class="programlisting">161646 = 1320000000 / 8166</pre> - <p>The total bytes of disk space for the database is <span class="bold"><strong>n-pages-of-data</strong></span> -multiplied by the <span class="bold"><strong>page-size</strong></span>. In the example, the result is -1324204032 bytes, or roughly 1.23GB.</p> + <p> + The total bytes of disk space for the database is + <span class="bold"><strong>n-pages-of-data</strong></span> + multiplied by the <span class="bold"><strong>page-size</strong></span>. In the + example, the result is 1324204032 bytes, or roughly 1.23GB. + </p> <pre class="programlisting">1324204032 = 161646 * 8192</pre> - <p>Now, let's assume that the application specified a fill factor explicitly. -The fill factor indicates the target number of items to place on a single -page (a fill factor might reduce the utilization of each page, but it can -be useful in avoiding splits and preventing buckets from becoming too -large). Using our estimates above, each item is 22 bytes (16 + 6), and -there are 8166 useful bytes on a page (8192 - 26). That means that, on -average, you can fit 371 pairs per page.</p> + <p> + Now, let's assume that the application specified a fill + factor explicitly. The fill factor indicates the target + number of items to place on a single page (a fill factor + might reduce the utilization of each page, but it can be + useful in avoiding splits and preventing buckets from + becoming too large). Using our estimates above, each item + is 22 bytes (16 + 6), and there are 8166 useful bytes on a + page (8192 - 26). That means that, on average, you can fit + 371 pairs per page. + </p> <pre class="programlisting">371 = 8166 / 22</pre> - <p>However, let's assume that the application designer knows that although -most items are 8 bytes, they can sometimes be as large as 10, and it's -very important to avoid overflowing buckets and splitting. Then, the -application might specify a fill factor of 314.</p> + <p> + However, let's assume that the application designer + knows that although most items are 8 bytes, they can + sometimes be as large as 10, and it's very important to + avoid overflowing buckets and splitting. Then, the + application might specify a fill factor of 314. + </p> <pre class="programlisting">314 = 8166 / 26</pre> - <p>With a fill factor of 314, then the formula for computing database size -is</p> + <p> + With a fill factor of 314, then the formula for + computing database size is + </p> <pre class="programlisting">n-pages-of-data = npairs / pairs-per-page</pre> - <p>or 191082.</p> + <p> + or 191082. + </p> <pre class="programlisting">191082 = 60000000 / 314</pre> - <p>At 191082 pages, the total database size would be 1565343744, or 1.46GB.</p> + <p> + At 191082 pages, the total database size would be + 1565343744, or 1.46GB. + </p> <pre class="programlisting">1565343744 = 191082 * 8192</pre> - <p>There are a few additional caveats with respect to Hash databases. This -discussion assumes that the hash function does a good job of evenly -distributing keys among hash buckets. If the function does not do this, -you may find your table growing significantly larger than you expected. -Secondly, in order to provide support for Hash databases coexisting with -other databases in a single file, pages within a Hash database are -allocated in power-of-two chunks. That means that a Hash database with 65 -buckets will take up as much space as a Hash database with 128 buckets; -each time the Hash database grows beyond its current power-of-two number -of buckets, it allocates space for the next power-of-two buckets. This -space may be sparsely allocated in the file system, but the files will -appear to be their full size. Finally, because of this need for -contiguous allocation, overflow pages and duplicate pages can be allocated -only at specific points in the file, and this too can lead to sparse hash -tables.</p> + <p> + There are a few additional caveats with respect to Hash + databases. This discussion assumes that the hash function + does a good job of evenly distributing keys among hash + buckets. If the function does not do this, you may find + your table growing significantly larger than you expected. + Secondly, in order to provide support for Hash databases + coexisting with other databases in a single file, pages + within a Hash database are allocated in power-of-two + chunks. That means that a Hash database with 65 buckets + will take up as much space as a Hash database with 128 + buckets; each time the Hash database grows beyond its + current power-of-two number of buckets, it allocates space + for the next power-of-two buckets. This space may be + sparsely allocated in the file system, but the files will + appear to be their full size. Finally, because of this + need for contiguous allocation, overflow pages and + duplicate pages can be allocated only at specific points + in the file, and this too can lead to sparse hash + tables. + </p> </div> </div> <div class="navfooter"> @@ -204,14 +270,14 @@ tables.</p> <td width="20%" align="center"> <a accesskey="u" href="am_misc.html">Up</a> </td> - <td width="40%" align="right"> <a accesskey="n" href="am_misc_db_sql.html">Next</a></td> + <td width="40%" align="right"> <a accesskey="n" href="blobs.html">Next</a></td> </tr> <tr> <td width="40%" align="left" valign="top">Database limits </td> <td width="20%" align="center"> <a accesskey="h" href="index.html">Home</a> </td> - <td width="40%" align="right" valign="top"> Specifying a Berkeley DB schema using SQL DDL</td> + <td width="40%" align="right" valign="top"> BLOB support</td> </tr> </table> </div> |
