summaryrefslogtreecommitdiff
path: root/src/third_party/wiredtiger/src/docs/backup.dox
blob: 610033d05cfd0dbb087168b8be29b1e27947a4d8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
/*! @class doc_bulk_durability

Bulk-loads are not commit-level durable, that is, the creation and
bulk-load of an object will not appear in the database log files.\  For
this reason, applications doing incremental backups after a full backup
should repeat the full backup step after doing a bulk-load to make the
bulk-load durable.\ In addition, incremental backups after a bulk-load
can cause recovery to report errors because there are log records that
apply to data files which don't appear in the backup.

*/

/*! @m_page{{c,java},backup,Backups}

WiredTiger cursors provide access to data from a variety of sources.
One of these sources is the list of files required to perform a backup
of the database.  The list may be the files required by all of the
objects in the database, or a subset of the objects in the database.

WiredTiger backups are "on-line" or "hot" backups, and applications may
continue to read and write the databases while a snapshot is taken.

@section backup_process Backup from an application

1. Open a cursor on the \c "backup:" data source, which begins the
   process of a backup.

2. Copy each file returned by the WT_CURSOR::next method to the backup
   location, for example, a different directory. Do not reuse backup
   locations unless all files have first been removed from them, in
   other words, remove any previous backup information before using a
   backup location.

3. Close the cursor; the cursor must not be closed until all of the
   files have been copied.

The directory into which the files are copied may subsequently be
specified as a directory to the ::wiredtiger_open function and
accessed as a WiredTiger database home.

Copying the database files for a backup does not require any special
alignment or block size (specifically, Linux or Windows filesystems that
do not support read/write isolation can be safely read for backups).

The database file may grow in size during the copy, and the file copy
should not consider that an error. Blocks appended to the file after the
copy starts can be safely ignored, that is, it is correct for the copy
to determine an initial size of the file and then copy that many bytes,
ignoring any bytes appended after the backup cursor was opened.

The cursor must not be closed until all of the files have been copied,
however, there is no requirement the files be copied in any order or in
any relationship to the WT_CURSOR::next calls, only that all files have
been copied before the cursor is closed.  For example, applications might
aggregate the file names from the cursor and then list the file names as
arguments to a file archiver such as the system tar utility.

During the period the backup cursor is open, database checkpoints can
be created, but no checkpoints can be deleted.  This may result in
significant file growth. Additionally while the backup cursor is open
automatic log file archiving, even if enabled, will not reclaim any
log files.

Additionally, if a crash occurs during the period the backup cursor is
open and logging is disabled (in other words, when depending on
checkpoints for durability), then the system will be restored to the
most recent checkpoint prior to the opening of the backup cursor, even
if later database checkpoints were completed. <b>Note this exception to
WiredTiger's checkpoint durability guarantees.</b>

The following is a programmatic example of creating a backup:

@snippet ex_all.c backup

When logging is enabled, opening the backup cursor forces a log file switch.
The reason is so that only data that was committed and visible at the time of
the backup is available in the backup when that log file is included in the
list of files. WiredTiger offers a mechanism to gather additional log files that
may be created during the backup.

Since backups can take a long time, it may be desirable to catch up at the
end of a backup with the log files so that operations that occurred during
backup can be recovered. WiredTiger provides the ability to open a duplicate
backup cursor with the configuration \c target=log:. This secondary backup
cursor will return the file names of all log files via \c dup_cursor->get_key().
There will be overlap with log file names returned in the original cursor. The user
only needs to copy file names that are new but there is no error copying all
log file names returned. This secondary cursor must be closed explicitly prior
to closing the parent backup cursor.

@snippet ex_all.c backup log duplicate

In cases where the backup is desired for a checkpoint other than the
most recent, applications can discard all checkpoints subsequent to the
checkpoint they want using the WT_SESSION::checkpoint method.  For
example:

@snippet ex_all.c backup of a checkpoint

@section backup_util Backup from the command line

The @ref_single util_backup command may also be used to create backups:

@code
rm -rf /path/database.backup &&
    mkdir /path/database.backup &&
    wt -h /path/database.source backup /path/database.backup
@endcode

@section backup_incremental-block Block-based Incremental backup

Once a full backup has been done, it can be rolled forward incrementally by
copying only modified blocks and new files to the backup copy directory.
The application is responsible for removing files that
are no longer part of the backup when later incremental backups no longer
return their name. This is especially important for WiredTiger log files
that are no longer needed and must be removed before recovery is run.

@copydoc doc_bulk_durability

The following is the procedure for incrementally backing up a database
using block modifications:

1. Perform a full backup of the database (as described above), with the
additional configuration \c incremental=(enabled=true,this_id=”ID1”).
The identifier specified in \c this_id starts block tracking and that
identifier can be used in the future as the source of an incremental
backup.

2. Begin the incremental backup by opening a backup cursor with the
\c backup: URI and config string of \c incremental=(src_id="ID1",this_id="ID2").
Call this \c backup_cursor. Like a normal full backup cursor,
this cursor will return the filename as the key.  There is no associated
value.  The information returned will be based on blocks tracked since the time of
the previous backup designated with "ID1".  New block tracking will be started as
"ID2" as well.  WiredTiger will maintain modifications from two IDs, the current
and the most recent completed one. Note that all backup identifiers are subject to
the same naming restrictions as other configuration naming. See @ref config_intro
for details.

3. For each file returned by \c backup_cursor->next(), open a duplicate
backup cursor to do the incremental backup on that file.  The list
returned will also include log files (prefixed by \c WiredTigerLog) that need to
be copied. Configure that duplicate cursor with \c incremental=(file=name).
The \c name comes from the string returned from \c backup_cursor->get_key().
Call this incr_cursor.

4. The key format for the duplicate backup cursor, \c incr_cursor, is
\c qqq, representing a file offset and size pair plus a type indicator
for the range given. There is no associated value. The type indicator
will be one of \c WT_BACKUP_FILE or \c WT_BACKUP_RANGE. For \c WT_BACKUP_RANGE,
read the block from the source database file indicated by the file offset and
size pair and write the block to the same offset in the
backup database file, replacing the portion of the file represented by
the offset/size pair.  It is not an error for an offset/size pair to extend past
the current end of the source file, and any missing file data should be ignored.
For \c WT_BACKUP_FILE, the user can choose to copy the entire file in
any way they choose, or to use the offset/size pair which will
indicate the expected size WiredTiger knew at the time of the call.

5. Close the duplicate backup cursor, \c incr_cursor.

6. Repeat steps 3-5 as many times as necessary while \c backup_cursor->next()
returns files to copy.

7. Close the backup cursor, \c backup_cursor.

8. Repeat steps 2-7 as often as desired.

Full and incremental backups may be repeated as long as the backup
database directory has not been opened and recovery run.  Once recovery
has run in a backup directory, you can no longer back up to that
database directory.

An example of opening the backup data source for block-based incremental backup:

@snippet ex_all.c incremental block backup

@section backup_incremental Log-based Incremental backup

Once a backup has been done, it can be rolled forward incrementally by
adding log files to the backup copy. Adding log files to the copy
decreases potential data loss from switching to the copy, but increases
the recovery time necessary to switch to the copy.  To reset the
recovery time necessary to switch to the copy, perform a full backup of
the database.  For example, an application might do a full backup of the
database once a week during a quiet period, and then incrementally copy
new log files into the backup directory for the rest of the week.
Incremental backups may also save time when the tables are very large.

@copydoc doc_bulk_durability

By default, WiredTiger automatically removes log files no longer
required for recovery.  Applications wanting to use log files for
incremental backup must first disable automatic log file removal using
the \c log=(archive=false) configuration to ::wiredtiger_open.

The following is the procedure for incrementally backing up a database
and removing log files from the original database home:

1. Perform a full backup of the database (as described above).

2. Open a cursor on the \c "backup:" data source, configured with the
   \c "target=(\"log:\\")" target specified, which begins the process
   of an incremental backup.

3. Copy each log file returned by the WT_CURSOR::next method to the backup
   directory.  It is not an error to copy a log file which has been copied
   before, but care should be taken to ensure each log file is completely copied
   as the most recent log file may grow in size while being copied.

4. If all log files have been successfully copied, archive the log
   files by calling the WT_SESSION::truncate method with the URI
   <code>log:</code> and specifying the backup cursor as the start
   cursor to that method. (Note there is no requirement backups be
   coordinated with database checkpoints, however, an incremental backup
   will repeatedly copy the same files, and will not make additional log
   files available for archival, unless there was a checkpoint after the
   previous incremental backup.)

5. Close the backup cursor.

Steps 2-5 can be repeated any number of times before step 1 is repeated.
Full and incremental backups may be repeated as long as the backup
database directory has not been opened and recovery run.  Once recovery
has run in a backup directory, you can no longer back up to that
database directory.

An example of opening the backup data source for log-based incremental backup:

@snippet ex_all.c incremental backup

@section backup_o_direct Backup and O_DIRECT

Many Linux systems do not support mixing \c O_DIRECT and memory mapping
or normal I/O to the same file.   If \c O_DIRECT is configured for data
or log files on Linux systems (using the wiredtiger_open \c direct_io
configuration), any program used to copy files during backup should also
specify \c O_DIRECT when configuring its file access.  Likewise, when
\c O_DIRECT is not configured by the database application, programs
copying files should not configure \c O_DIRECT.

*/