1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
|
<!--$Id: diskspace.so,v 10.9 2000/03/22 21:56:11 bostic Exp $-->
<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
<!--All rights reserved.-->
<html>
<head>
<title>Berkeley DB Reference Guide: Disk space requirements</title>
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
</head>
<body bgcolor=white>
<a name="2"><!--meow--></a>
<table><tr valign=top>
<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Programmer Notes</dl></h3></td>
<td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p>
<h1 align=center>Disk space requirements</h1>
<p>It is possible to estimate the total database size based on the size of
the data. Simply put, the following calculations attempt to figure out
how many bytes you will need to hold a set of data and then how many pages
it will take to actually store it on disk.
<p>Space freed by deleting key/data pairs from a Btree or Hash database is
never returned to the filesystem, although it is reused where possible.
This means that the Btree and Hash databases are grow-only. If enough
keys are deleted from a database that shrinking the underlying file is
desirable, you should create a new database and insert the records from
the old one into it.
<p>These are rough estimates at best. For example, they do not take into
account overflow records, filesystem metadata information, or real-life
situations where the sizes of key and data items are wildly variable, and
the page-fill factor changes over time.
<h3>Btree</h3>
<p>The formulas for the Btree access method are as follows:
<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor
<p>
bytes-of-data = n-records *
(bytes-per-entry + page-overhead-for-two-entries)
<p>
n-pages-of-data = bytes-of-data / bytes-per-page
<p>
total-pages-on-disk = n-pages-of-data * page-size
</pre></blockquote>
<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
that will actually hold the application data. It is computed as the total
number of bytes on the page that are available to hold application data,
corrected by the percentage of the page that is likely to contain data.
The reason for this correction is that the percentage of a page that
contains application data can vary from close to 50% after a page split,
to almost 100% if the entries in the database were inserted in sorted
order. Obviously, the <b>page-fill-factor</b> can drastically alter
the amount of disk space required to hold any particular data set. The
page-fill factor of any existing database can be displayed using the
<a href="../../utility/db_stat.html">db_stat</a> utility.
<p>As an example, using an 8K page size, with an 85% page-fill factor, there
are 6941 bytes of useful space on each page:
<p><blockquote><pre>6941 = (8192 - 26) * .85</pre></blockquote>
<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
of key/data pairs plus the overhead required to store each pair on a page.
The overhead to store a single item on a Btree page is 5 bytes. So,
assuming 60,000,000 key/data pairs, each of which is 8 bytes long, there
are 1440000000 bytes, or roughly 1.34GB, of total data:
<p><blockquote><pre>1560000000 = 60000000 * ((8 * 2) + (5 * 2))</pre></blockquote>
<p>The total pages of data, <b>n-pages-of-data</b>, is the
<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In
the example, there are 224751 pages of data.
<p><blockquote><pre>224751 = 1560000000 / 6941</pre></blockquote>
<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
multiplied by the <b>page-size</b>. In the example, the result is
1841160192 bytes, or roughly 1.71GB.
<p><blockquote><pre>1841160192 = 224751 * 8192</pre></blockquote>
<h3>Hash</h3>
<p>The formulas for the Hash access method are as follows:
<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead)
<p>
bytes-of-data = n-records *
(bytes-per-entry + page-overhead-for-two-entries)
<p>
n-pages-of-data = bytes-of-data / bytes-per-page
<p>
total-pages-on-disk = n-pages-of-data * page-size
</pre></blockquote>
<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
that will actually hold the application data. It is computed as the total
number of bytes on the page that are available to hold application data.
If the application has explicitly set a page fill factor, then pages will
not necessarily be kept full. For databases with a preset fill factor,
see the calculation below. The page-overhead for Hash databases is 26
bytes and the page-overhead-for-two-entries is 6 bytes.
<p>As an example, using an 8K page size, there are 8166 bytes of useful space
on each page:
<p><blockquote><pre>8166 = (8192 - 26)</pre></blockquote>
<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
of key/data pairs plus the overhead required to store each pair on a page.
In this case that's 6 bytes per pair. So, assuming 60,000,000 key/data
pairs, each of which is 8 bytes long, there are 1320000000 bytes, or
roughly 1.23GB, of total data:
<p><blockquote><pre>1320000000 = 60000000 * ((16 + 6))</pre></blockquote>
<p>The total pages of data, <b>n-pages-of-data</b>, is the
<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In
this example, there are 161646 pages of data.
<p><blockquote><pre>161646 = 1320000000 / 8166</pre></blockquote>
<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
multiplied by the <b>page-size</b>. In the example, the result is
1324204032 bytes, or roughly 1.23GB.
<p><blockquote><pre>1324204032 = 161646 * 8192</pre></blockquote>
<p>Now, let's assume that the application specified a fill factor explicitly.
The fill factor indicates the target number of items to place on a single
page (a fill factor might reduce the utilization of each page, but it can
be useful in avoiding splits and preventing buckets from becoming too
large. Using our estimates above, each item is 22 bytes (16 + 6) and
there are 8166 useful bytes on a page (8192 - 26). That means that, on
average, you can fit 371 pairs per page.
<p><blockquote><pre>371 = 8166 / 22</pre></blockquote>
<p>However, let's assume that the application designer knows that while most
items are 8 bytes, they can sometimes be as large as 10 and it's very
important to avoid overflowing buckets and splitting. Then, the
application might specify a fill factor of 314.
<p><blockquote><pre>314 = 8166 / 26</pre></blockquote>
<p>With a fill factor of 314, then the formula for computing database size
is:
<p><blockquote><pre>npages = npairs / pairs-per-page</pre></blockquote>
<p>or 191082.
<p><blockquote><pre>191082 = 60000000 / 314</pre></blockquote>
<p>At 191082 pages, the total database size would be 1565343744 or 1.46GB.
<p><blockquote><pre>1565343744 = 191082 * 8192 </pre></blockquote>
<p>There are a few additional caveats with respect to Hash databases. This
discussion assumes that the hash function does a good job of evenly
distributing keys among hash buckets. If the function does not do this,
you may find your table growing significantly larger than you expected.
Secondly, in order to provide support for Hash databases co-existing with
other databases in a single file, pages within a Hash database are
allocated in power-of-2 chunks. That means that a Hash database with 65
buckets will take up as much space as a Hash database with 128 buckets;
each time the Hash database grows beyond its current power-of-two number
of buckets, it allocates space for the next power-of-two buckets. This
space may be sparsely allocated in the file system, but the files will
appear to be their full size. Finally, because of this need for
contiguous allocation, overflow pages and duplicate pages can be allocated
only at specific points in the file, and this too can lead to sparse hash
tables.
<table><tr><td><br></td><td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
</body>
</html>
|