summaryrefslogtreecommitdiff
path: root/docs/programmer_reference/am_partition.html
blob: 9bc8879ddfe0070d3ca842b65002b6869eb46f69 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Partitioning databases</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="am.html" title="Chapter 3.  Access Method Operations" />
    <link rel="prev" href="am_opensub.html" title="Opening multiple databases in a single file" />
    <link rel="next" href="am_get.html" title="Retrieving records" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 12.1.6.1</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Partitioning databases</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="am_opensub.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 3.  Access Method Operations </th>
          <td width="20%" align="right"> <a accesskey="n" href="am_get.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="am_partition"></a>Partitioning databases</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="am_partition.html#am_partition_keys">Specifying partition
            keys</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="am_partition.html#am_partition_function">Partitioning
            callback</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="am_partition.html#partition_file_placement">Placing partition
            files</a>
            </span>
          </dt>
        </dl>
      </div>
      <p> 
        You can improve concurrency on your database reads and
        writes by splitting access to a single database into multiple
        databases. This helps to avoid contention for internal
        database pages, as well as allowing you to spread your
        databases across multiple disks, which can help to improve
        disk I/O. 
    </p>
      <p> 
        While you can manually do this by creating and using more
        than one database for your data, DB is capable of
        partitioning your database for you. When you use DB's
        built-in database partitioning feature, your access to your
        data is performed in exactly the same way as if you were only
        using one database; all the work of knowing which database to
        use to access a particular record is handled for you under the
        hood. 
    </p>
      <p> 
        Only the BTree and Hash access methods are supported for
        partitioned databases. 
    </p>
      <p> 
        You indicate that you want your database to be partitioned
        by calling <a href="../api_reference/C/dbset_partition.html" class="olink">DB-&gt;set_partition()</a> before opening your database the
        first time. You can indicate the directory in which each
        partition is contained using the <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a>
        method. 
    </p>
      <p> 
        Once you have partitioned a database, you cannot change
        your partitioning scheme.
    </p>
      <p> 
        There are two ways to indicate what key/data pairs should
        go on which partition. The first is by specifying an array of
        <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s that indicate the minimum key value for a given
        partition. The second is by providing a callback that returns
        the number of the partition on which a specified key is
        placed. 
    </p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_partition_keys"></a>Specifying partition
            keys</h3>
            </div>
          </div>
        </div>
        <p>
            For simple cases, you can partition your database by
            providing an array of <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s, each element of which
            provides the minimum key value to be placed on a
            partition. There must be one fewer elements in this array
            than you have partitions. The first element of the array
            indicates the minimum key value for the second partition
            in your database. Key values that are less than the first
            key value provided in this array are placed on the first
            partition (partition 0).
        </p>
        <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <h3 class="title">Note</h3>
          <p> 
                You can use partition keys only if you are using
                the Btree access method. 
            </p>
        </div>
        <p>
            For example, suppose you had a database of fruit, and
            you want three partitions for your database. Then you need
            a <a href="../api_reference/C/dbt.html" class="olink">DBT</a> array of size two. The first element in this array
            indicates the minimum keys that should be placed on
            partition 1. The second element in this array indicates
            the minimum key value placed on partition 2. Keys that
            compare less than the first <a href="../api_reference/C/dbt.html" class="olink">DBT</a> in the array are placed
            on partition 0. 
        </p>
        <p> 
            All comparisons are performed according to the
            lexicographic comparison used by your platform.
        </p>
        <p> 
            For example, suppose you want all fruits whose names
            begin with:
        </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>
              <p> 'a' - 'f' to go on partition 0 </p>
            </li>
            <li>
              <p> 'g' - 'p' to go on partition 1 </p>
            </li>
            <li>
              <p> 'q' - 'z' to go on partition 2. </p>
            </li>
          </ul>
        </div>
        <p> 
            Then you would accomplish this with the following code
            fragment: 
        </p>
        <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <h3 class="title">Note</h3>
          <p> 
                The <a href="../api_reference/C/dbset_partition.html" class="olink">DB-&gt;set_partition()</a> partition callback parameter
                must be <code class="literal">NULL</code> if you are using an
                array of <a href="../api_reference/C/dbt.html" class="olink">DBT</a>s to partition your database.
            </p>
        </div>
        <a id="prog_am10"></a>
        <pre class="programlisting">DB *dbp = NULL;
    DB_ENV *envp = NULL;
    DBT partKeys[2];
    u_int32_t db_flags;
    const char *file_name = "mydb.db";
    int ret;

...
    
    /* Skipping environment open to shorten this example */


    /* Initialize the DB handle */
    ret = db_create(&amp;dbp, envp, 0);
    if (ret != 0) {
        fprintf(stderr, "%s\n", db_strerror(ret));
        return (EXIT_FAILURE);
    }

    /* Setup the partition keys */
     memset(&amp;partKeys[0], 0, sizeof(DBT));
     partKeys[0].data = "g";
     partKeys[0].size = sizeof("g") - 1;

     memset(&amp;partKeys[1], 0, sizeof(DBT));
     partKeys[1].data = "q";
     partKeys[1].size = sizeof("q") - 1;

     dbp-&gt;set_partition(dbp, 3, partKeys, NULL);

    /* Now open the database */
    db_flags = DB_CREATE;       /* Allow database creation */

    ret = dbp-&gt;open(dbp,        /* Pointer to the database */
                    NULL,       /* Txn pointer */
                    file_name,  /* File name */
                    NULL,       /* Logical db name */
                    DB_BTREE,   /* Database type (using btree) */
                    db_flags,   /* Open flags */
                    0);         /* File mode. Using defaults */
    if (ret != 0) {
        dbp-&gt;err(dbp, ret, "Database '%s' open failed",
            file_name);
        return (EXIT_FAILURE);
    } </pre>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="am_partition_function"></a>Partitioning
            callback</h3>
            </div>
          </div>
        </div>
        <p>
            In some cases, a simple lexicographical comparison of
            key data will not sufficiently support a partitioning
            scheme. For those situations, you should write a
            partitioning function. This function accepts a pointer to
            the <a href="../api_reference/C/db.html" class="olink">DB</a> and the <a href="../api_reference/C/dbt.html" class="olink">DBT</a>, and it returns the number of the
            partition on which the key belongs.
        </p>
        <p>
            Note that <a href="../api_reference/C/db.html" class="olink">DB</a> actually places the key on the partition
            calculated by:
        </p>
        <pre class="programlisting">returned_partition modulo number_of_partitions</pre>
        <p>
            Also, remember that if you use a partitioning function
            when you create your database, then you must use the same
            partitioning function every time you open that database in
            the future. 
        </p>
        <p>
            The following code fragment illustrates a partition
            callback:
        </p>
        <a id="prog_am11"></a>
        <pre class="programlisting">u_int32_t db_partition_fn(DB *db, DBT *key) {
    char *key_data;
    u_int32_t ret_number;
    /* Obtain your key data, unpacking it as necessary
     * Here, we do the very simple thing just for illustrative purposes.
     */

    key_data = (char *)key-&gt;data;

    /* Here you would perform whatever comparison you require to determine
     * what partition the key belongs on. If you return either 0 or the
     * number of partitions in the database, the key is placed in the first
     * database partition. Else, it is placed on:
     *
     *       returned_number mod number_of_partitions
     */

    ret_number = 0;

    return ret_number;
} </pre>
        <p> 
            You then cause your partition callback to be used by
            providing it to the <a href="../api_reference/C/dbset_partition.html" class="olink">DB-&gt;set_partition()</a> method, as
            illustrated by the following code fragment. 
        </p>
        <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <h3 class="title">Note</h3>
          <p>
                The <a href="../api_reference/C/dbset_partition.html" class="olink">DB-&gt;set_partition()</a> <a href="../api_reference/C/dbt.html" class="olink">DBT</a> array parameter must be
                    <code class="literal">NULL</code> if you are using a
                partition call back to partition your database.
            </p>
        </div>
        <a id="prog_am12"></a>
        <pre class="programlisting">DB *dbp = NULL;
    DB_ENV *envp = NULL;
    u_int32_t db_flags;
    const char *file_name = "mydb.db";
    int ret;

...
    
    /* Skipping environment open to shorten this example */


    /* Initialize the DB handle */
    ret = db_create(&amp;dbp, envp, 0);
    if (ret != 0) {
        fprintf(stderr, "%s\n", db_strerror(ret));
        return (EXIT_FAILURE);
    }

     dbp-&gt;set_partition(dbp, 3, NULL, db_partition_fn);

    /* Now open the database */
    db_flags = DB_CREATE;       /* Allow database creation */

    ret = dbp-&gt;open(dbp,        /* Pointer to the database */
                    NULL,       /* Txn pointer */
                    file_name,  /* File name */
                    NULL,       /* Logical db name */
                    DB_BTREE,   /* Database type (using btree) */
                    db_flags,   /* Open flags */
                    0);         /* File mode. Using defaults */
    if (ret != 0) {
        dbp-&gt;err(dbp, ret, "Database '%s' open failed",
            file_name);
        return (EXIT_FAILURE);
    } </pre>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="partition_file_placement"></a>Placing partition
            files</h3>
            </div>
          </div>
        </div>
        <p>
            When you partition a database, a database file is
            created on disk in the same way as if you were not
            partitioning the database. That is, this file uses the
            name you provide to the <a href="../api_reference/C/dbopen.html" class="olink">DB-&gt;open()</a> <code class="literal">file</code>
            parameter. 
        </p>
        <p> 
            However, DB then also creates a series of database
            files on disk, one for each partition that you want to
            use. These partition files share the same name as the
            database file name, but are also number sequentially. So
            if you create a database named
                <code class="filename">mydb.db</code>, and you create 3
            partitions for it, then you will see the following
            database files on disk: 
        </p>
        <pre class="programlisting">            mydb.db
            __dbp.mydb.db.000
            __dbp.mydb.db.001
            __dbp.mydb.db.002 </pre>
        <p> 
            All of the database's contents go into the numbered
            database files. You can cause these files to be placed in
            different directories (and, hence, different disk
            partitions or even disks) by using the
            <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a> method. 
        </p>
        <p> 
            <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a> takes a NULL-terminated array of
            strings, each one of which should represent an existing
            filesystem directory. 
        </p>
        <p> 
            If you are using an environment, the directories
            specified using <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a> must also be
            included in the environment list specified by
            <a href="../api_reference/C/envadd_data_dir.html" class="olink">DB_ENV-&gt;add_data_dir()</a>.
        </p>
        <p>
            If you are not using an environment, then the the
            directories specified to <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a> can be
            either complete paths to currently existing directories,
            or paths relative to the application's current working
            directory.
        </p>
        <p> 
            Ideally, you will provide <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a> with
            an array that is the same size as the number of partitions
            you are creating for your database. Partition files are
            then placed according to the order that directories are
            contained in the array; partition 0 is placed in
            directory_array[0], partition 1 in directory_array[1], and
            so forth. However, if you provide an array of directories
            that is smaller than the number of database partitions,
            then the directories are used on a round-robin fashion.
        </p>
        <p>
            You must call <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a> before you create
            your database, and before you open your database each time
            thereafter. The array provided to <a href="../api_reference/C/dbset_partition_dirs.html" class="olink">DB-&gt;set_partition_dirs()</a>
            must not change after the database has been created.
        </p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="am_opensub.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="am.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="am_get.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Opening multiple databases in a
        single file </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Retrieving records</td>
        </tr>
      </table>
    </div>
  </body>
</html>