summaryrefslogtreecommitdiff
path: root/rdiff-backup/FAQ.wml
blob: c7065e439787619dc167d1527d762a018172edf7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
#include 'template.wml' curpage=faq title="rdiff-backup: FAQ"

<divert body>
<p><h2>FAQ:</h2>

<h3>Table of contents</h3>

<ol><li><a href="#__future__">When I try to run rdiff-backup it says
"ImportError: No module named __future__" or "SyntaxError: invalid
syntax".  What's happening?</a></li>

<li><a href="#verbosity">What do the different verbosity levels mean?</a></li>

<li><a href="#windows">Does rdiff-backup run under Windows?</a></li>

<li><a href="#remove_dir">My backup set contains some files that I just realized I don't want/need backed up.  How do I remove them from the backup volume to save space?</li>

<li><a href="#redhat">How do I install the RPMs on Redhat linux system?</a></li>

<li><a href="#solaris">Does rdiff-backup work under Solaris?</a></li>

<li><a href="#speed">How fast is rdiff-backup?  Can it be run on large
data sets?</a></li>

<li><a href="#statistics">What do the various fields mean in the
session statistics and directory statistics files?</a></li>

<li><a href="#bwlimit">Is there some way to limit rdiff-backup's
bandwidth usage, as in rsync's --bwlimit option?</a></li>

</ol>

<h3>Questions and Answers</h3>

<ol>

<a name="__future__">
<li><strong>When I try to run rdiff-backup it says "ImportError: No
module named __future__" or "SyntaxError: invalid syntax".  What's
happening?</strong>

<P>rdiff-backup versions 0.2.x require Python version 2.1 or later,
and versions 0.3.x and later require Python version 2.2 or later.  If
you don't know what version of python you are running, type in "python
-V" from the shell.  I'm sorry if this is inconvenient, but
rdiff-backup uses generators, iterators, nested scoping, and
static/class methods extensively, and these were only added in version
2.2.

<P>If you have two versions of python installed, and running "python"
defaults to an early version, you'll probably have to change the first
line of the rdiff-backup script.  For instance, you could set it to:

<pre>#!/usr/bin/env python2.2</pre>
</li>

<a name="verbosity">
<li><strong>What do the different verbosity levels mean?</strong>

<P>There is no formal specification, but here is a rough description
(settings are always cumulative, so 5 displays everything 4 does):

<P>
<table cellspacing="10">
<tr><td>0</td><td>No information given</td></tr>
<tr><td>1</td><td>Fatal Errors displayed</td></tr>
<tr><td>2</td><td>Warnings</td></tr>
<tr><td>3</td><td>Important messages, and maybe later some global statistics (default)</td></tr>
<tr><td>4</td><td>Some global settings, miscellaneous messages</td></tr>
<tr><td>5</td><td>Mentions which files were changed</td></tr>
<tr><td>6</td><td>More information on each file processed</td></tr>
<tr><td>7</td><td>More information on various things</td></tr>
<tr><td>8</td><td>All logging is dated</td></tr>
<tr><td>9</td><td>Details on which objects are moving across the connection</td></tr>
</table>

<a name="windows">
<li><strong>Does rdiff-backup run under Windows?</strong>

<P>Yes, apparently it is possible.  First, follow Jason Piterak's
instructions:

<pre>
Subject: Cygwin rdiff-backup
From: Jason  Piterak &lt;Jason_Piterak@c-i-s.com&gt;
Date: Mon, 4 Feb 2002 16:54:24 -0500 (13:54 PST)
To: rdiff-backup@keywest.Stanford.EDU

Hello all,
  On a lark, I thought I would attempt to get rdiff-backup to work under
Windows98 under Cygwin. We have a number of NT/Win2K servers in the field
that I'd love to be backing up via rdiff-backup, and this was the start of
getting that working. 

SUMMARY: 
  o You can get all the pieces for rdiff-backup working under Cygwin.
  o The backup process works up to the point of writing any files with
timestamps.
      ... This is because the ':' character is reserved for Alternate Data
Stream (ADS) file designations under NTFS.

HOW TO GET IT WORKING (to a point, anyway):
  o Install Cygwin
  o Download the Python 2.2 update through the Cygwin installer and install.
  o Download the librsync libraries from the usual place, but before
compiling...
  o Cygwin does not use/provide glibc. Because of this, you have to repoint
some header files in the Makefile:

   -- Make sure that you have /usr/include/inttypes.h
      redirected to /usr/include/sys/types.h. Do this by:

      create a file /usr/include/inttypes.h with the contents:
      #include &lt;sys/types.h&gt;
  o Put rdiff-backup in your PATH, as you normally would.

</pre>

Then, whenever you use rdiff-backup (or at least if you are backing up
to or restoring from a Windows system), use the
<strong>--windows-time-format</strong> switch, which will tell
rdiff-backup not to put a colon (":") in a filename (this option was
added after Jason posted his message).  Finally, as Michael Muegel
points out, you have to exclude all files from the source directory
which have colons in them, so add something like the --exclude ".*:.*"
option.  In the near future some quoting facility may be added to deal
with these issues.
</li>

<P>
<a name="remove_dir">
<li><strong>My backup set contains some files that I just realized I
don't want/need backed up.  How do I remove them from the backup
volume to save space?</strong>

<P>Let's take an example.  Suppose you ran
<pre>rdiff-backup /usr /backup</pre>
and now realize that you don't want /usr/local backed up on /backup.
Next time you back up, you run
<pre>rdiff-backup --exclude /usr/local /usr /backup</pre>
so that /usr/local is no longer copied to /backup/usr/local.

However, old information about /usr/local is still present in
/backup/rdiff-backup-data/increments/usr/local.  You could wait for
this information to expire and then run rdiff-backup with the
--remove-older-than option, or you could remove the increments
manually by typing:
<pre>rm -rf /backup/rdiff-backup-data/increments/usr/local
rm /backup/rdiff-backup-data/increments/usr/local.*.dir</pre>

</li>

<P>
<a name="redhat">
<li><strong>How do I install the RPMs on a Redhat linux system?</strong>

<P>The problem is that the default version of python for Redhat 7.x is
1.5.x, and rdiff-backup requires python >= 2.2.  Redhat/rawhide
provides python 2.2 RPMs, but they are packaged under the "python2"
name.

<P>So, if you are running Redhat 7.x:

<ol>
<li>Make sure the python2 >= 2.2 package is installed,
leaving python 1.5 the way it is
<li>Install the rdiff-backup RPM, using --nodeps if it only complains
    about python 2.2 missing.
<li>Edit the first line of /usr/bin/rdiff-backup so it says<pre>
#!/usr/bin/env python2
</pre>
so "python2" gets run instead of "python".
</ol>

<P>You can also upgrade using a non-Redhat python 2.2 RPM and avoid
the above steps (this is what I did).  Because of all the dependencies
it is usually easier to use source RPMs for this.
</li>

<P>
<a name="solaris">
<li><strong>Does rdiff-backup work under Solaris?</strong>

<P>There may be a problem with rdiff-backup and Solaris' libthread.
Adding "ulimit -n unlimited" may fix the problem though.  Here is a
post by Kevin Spicer on the subject:

<pre>
Subject: RE: Crash report....still not^H^H^H working
From: "Spicer, Kevin" <Kevin.Spicer@bmrb.co.uk>
Date: Sat, 11 May 2002 23:36:42 +0100
To: rdiff-backup@keywest.Stanford.EDU

Quick mail to follow up on this.. 
My rdiff backup (on Solaris 2.6 if you remember) has now worked
reliably for nearly two weeks after I added...

    ulimit -n unlimited 

to the start of my cron job and created a wrapper script on the remote
machine which looked like this...

    #!/bin/sh 
    ulimit -n unlimited 
    rdiff-backup --server 
    exit 

And changed the remote schema on the command line of rdiff-backup to
call the wrapper script rather than rdiff-backup itself on the remote
machine.  As for the /dev/zero thing I've done a bit of Googleing and
it seems that /dev/zero is used internally by libthread on Solaris
(which doesn't really explain why its opening more than 64 files - but
at least I think I've now got round it).
</pre>
</li>

<P>
<a name="speed">
<li><strong>How fast is rdiff-backup?  Can it be run on large
data sets?</strong>

<P>rdiff-backup can be limited by the CPU, disk IO, or available
bandwidth, and the length of a session can be affected by the amount
of data, how much the data changed, and how many files are present.
That said, in the typical case the number/size of changed files is
relatively small compared to that of unchanged files, and rdiff-backup
is often either CPU or bandwidth bound, and takes time proportional to
the total number of files.  Initial mirrorings will usually be
bandwidth or disk bound, and will take much longer than subsequent
updates.

<P>To give two arbitrary data points, when I back up my personal HD
locally (about 9GB, 600000 files, maybe 50 MB turnover, 1.1Ghz athlon)
rdiff-backup takes about 35 minutes and is usually CPU bound.  Another
user reports an rdiff-backup session takes about 3 hours (80GB, ~1mil
files, 2GB turnover) to back up remotely Tru64 -> linux.
</li>

<p>
<a name="statistics">
<li><strong>What do the various fields mean in the
session statistics and directory statistics files?</strong>

<P>Let's examine an example session statistics file:

<pre>
StartTime 1028200920.44 (Thu Aug  1 04:22:00 2002)
EndTime 1028203082.77 (Thu Aug  1 04:58:02 2002)
ElapsedTime 2162.33 (36 minutes 2.33 seconds)
SourceFiles 494619
SourceFileSize 8535991560 (7.95 GB)
MirrorFiles 493797
MirrorFileSize 8521756994 (7.94 GB)
NewFiles 1053
NewFileSize 23601632 (22.5 MB)
DeletedFiles 231
DeletedFileSize 10346238 (9.87 MB)
ChangedFiles 572
ChangedSourceSize 86207321 (82.2 MB)
ChangedMirrorSize 85228149 (81.3 MB)
IncrementFiles 1857
IncrementFileSize 13799799 (13.2 MB)
TotalDestinationSizeChange 28034365 (26.7 MB)
Errors 0
</pre>

<P>StartTime and EndTime are measured in seconds since the epoch.
ElapsedTime is just EndTime - StartTime, the length of the
rdiff-backup session.

<P>SourceFiles are the number of files found in the source directory,
and SourceFileSize is the total size of those files.  MirrorFiles are
the number of files found in the mirror directory (not including the
rdiff-backup-data directory) and MirrorFileSize is the total size of
those files.  All sizes are in bytes.  If the source directory hasn't
changed since the last backup, MirrorFiles == SourceFiles and
SourceFileSize == MirrorFileSize.

<P>NewFiles and NewFileSize are the total number and size of the files
found in the source directory but not in the mirror directory.  They
are new as of the last backup.

<P>DeletedFiles and DeletedFileSize are the total number and size of
the files found in the mirror directory but not the source directory.
They have been deleted since the last backup.

<P>ChangedFiles are the number of files that exist both on the mirror
and on the source directories and have changed since the previous
backup.  ChangedSourceSize is their total size on the source
directory, and ChangedMirrorSize is their total size on the mirror
directory.

<P>IncrementFiles is the number of increment files written to the
rdiff-backup-data directory, and IncrementFileSize is their total
size.  Generally one increment file will be written for every new,
deleted, and changed file.

<P>TotalDestinationSizeChange is the number of bytes the destination
directory as a whole (mirror portion and rdiff-backup-data directory)
has grown during the given rdiff-backup session.  This is usually
close to IncrementFileSize + NewFileSize - DeletedFileSize +
ChangedSourceSize - ChangedMirrorSize, but it also includes the space
taken up by the hardlink_data file to record hard links.
</li>

<a name="bwlimit">
<li><strong>Is there some way to limit rdiff-backup's
bandwidth usage, as in rsync's --bwlimit option?</strong>

<P>There is no internal rdiff-backup option to do this.  However, the
--sleep-ratio option can limit overall resource usage, including
bandwidth.  Also, external utilities such as <a href="http://www.cons.org/cracauer/cstream.html">cstream</a> can be
used to monitor bandwidth explicitly.  trevor@tecnopolis.ca writes:

<pre>
rdiff-backup --remote-schema
  'cstream -v 1 -t 10000 | ssh %s '\''rdiff-backup --server'\'' | cstream -t 20000'
  'netbak@foo.bar.com::/mnt/backup' localbakdir

(must run from a bsh-type shell, not a csh type)

That would apply a limit in both directions [10000 bytes/sec outgoing,
20000 bytes/sec incoming].  I don't think you'd ever really want to do
this though as really you just want to limit it in one direction.
Also, note how I only -v 1 in one direction.  You probably don't want
to output stats for both directions as it will confuse whatever script
you have parsing the output.  I guess it wouldn't hurt for manual runs
however.
</pre>

To only limit bandwidth in one directory, simply remove one of the
cstream commands.  Two cstream caveats may be worth mentioning:

<ol> <li>Because cstream is limiting the uncompressed data heading
into or out of ssh, if ssh compression is turned on, cstream may be
overly restrictive.</li>

<li>cstream may be "bursty", limiting average bandwidth but allowing
rdiff-backup to exceed it for significant periods.</li>

</ol>

Another option is to limit bandwidth at a lower (and perhaps more
appropriate) level.  Adam Lazur suggests <a
href="http://lartc.org/wondershaper/">The Wonder Shaper</a>.

</li>

</ol>

</divert>