summaryrefslogtreecommitdiff
path: root/doc/main.dox
blob: 0711034aa84bc68ee33bb3d82d22bc5e810c0b7e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
-*- text -*-
/**
\mainpage The librsync delta-encoding library
\author Martin Pool <mbp@samba.org>

$Id$

This document is also available in printable form:

  - http://rproxy.samba.org/doxygen/librsync/refman.ps.gz
  - http://rproxy.samba.org/doxygen/librsync/refman.pdf

More information about librsync can be found on the rproxy web site:

  - http://rproxy.samba.org/

librsync can be downloaded from http://rproxy.samba.org/download.html,
and used, modified and redistributed under the terms of the GNU Lesser
General Public License (version 2.1 or later).

\section intro Introduction

librsync is a library for calculating and applying network deltas,
with an interface designed to ease integration into diverse
network applications.  \em librsync is being developed as part of the \b
rproxy <http://rproxy.samba.org/> and \b rsync
<http://rsync.samba.org/> projects.

librsync encapsulates the core algorithms of the rsync protocol, which
help with efficient calculation of the differences between two files.
The rsync algorithm is different from most differencing algorithms
because it does not require the presence of the two files to calculate
the delta.  Instead, it requires a set of checksums of each block of
one file, which together form a signature for that file.  Blocks at
any in the other file which have the same checksum are likely to be
identical, and whatever remains is the difference.

The library does not deal with file metadata or structure, such as
filenames, permissions, or directories. To this library, a file is
just a stream of bytes.  Higher-level tools, such as rsync
<http://rsync.samba.org> can deal with such issues in a way
appropriate to their users.

The library supports three basic operations:

   -# Generating the signature S of a file A .
   -# Calculating a delta D from S and a new file B.
   -# Applying D to A to reconstruct B.

The library also provides the \ref rdiff command-line tool, which
makes this functionality available to users and scripting languages.

\section api Programming interface

The public interface to librsync (librsync.h) has functions in several
main areas:

   - \ref api_streaming
   - \ref api_delta
   - \ref api_buffers
   - \ref api_whole
   - \ref api_trace
   - \ref api_stats
   - \ref api_utility

All external symbols have the prefix ``<tt>rs_</tt>'', or
``<tt>RS_</tt>'' in the case of preprocessor symbols.

\subsection api_streaming Data streaming

A key design requirement for librsync is that it should handle data as
and when the hosting application requires it.  librsync can be used
inside applications that do non-blocking IO or filtering of network
streams, because it never does IO directly, or needs to block waiting
for data.

The programming interface to librsync is similar to that of zlib and
bzlib.  Arbitrary-length input and output buffers are passed to the
library by the application, through an instance of ::rs_buffers_t.  The
library proceeds as far as it can, and returns an ::rs_result value
indicating whether it needs more data or space.

All the state needed by the library to resume processing when more
data is available is kept in a small opaque ::rs_job_t structure.
After creation of a job, repeated calls to rs_job_iter() in between
filling and emptying the buffers keeps data flowing through the
stream.  The ::rs_result_t values returned may indicate

 - ::RS_DONE:  processing is complete
 - ::RS_BLOCKED: processing has blocked pending more data
 - one of various possible errors in processing

These can be converted to a human-readable string by rs_strerror().

\note Smaller buffers have high relative handling costs.  Application
performance will be improved by using buffers of at least 32kb or so
on each call.

\subsection api_delta Generating and applying deltas

All encoding operations are performed by using a <tt>*_begin</tt>
function to create a ::rs_job_t object, passing in any necessary
initialization parameters.  The various jobs available are:

 - rs_sig_begin(): Calculate the signature of a file.
 - rs_loadsig_begin(): Load a signature into memory.
 - rs_delta_begin(): Calculate the delta between a signature and a new
   file.
 - rs_patch_begin(): Apply a delta to a basis to recreate the new
   file.

\subsection api_buffers Buffers

After creating a job, input and output buffers are passed to
rs_job_iter() in an ::rs_buffers_t structure.

On input, the buffers structure must contain the address and length of
the input and output buffers.  The library updates these values to
indicate the amount of \b remaining buffer.  So, on return, \c
avail_out is not the amount of output data produced, but rather the
amount of output buffer space unfilled.  This means that the values on
return are consistent with the values on entry, but not necessarily
what you would expect.

A similar system is used by \p libz and \p libbz2.

\warning The input may not be completely consumed by the iteration if
there is not enough output space.  The application must retain unused
input data, and pass it in again when it is ready for more output.

\subsection api_whole Processing whole files

  Some applications do not require fine-grained control over IO, but
  rather just want to process a whole file with a single call.
  librsync provides `whole-file' functionality to do exactly that.

  Processing of a whole file begins with creation of a ::rs_job_t
  object for the appropriate operation, just as if the application was
  going to do buffering itself.  After creation, the job may be passed
  to rs_whole_run(), which will feed it to and from two FILEs as
  necessary until end of file is reached or the operation completes.

\subsection api_trace Debugging trace and error logging

librsync can output trace or log messages as it proceeds.  These
follow a fairly standard priority-based filtering system
(rs_trace_set_level()), using the same severity levels as UNIX syslog.
Messages by default are sent to stderr, but may be passed to an
application-provided callback (rs_trace_to(), rs_trace_fn_t()).

\subsection api_stats Encoding statistics

  Encoding and decoding routines accumulate compression performance
  statistics in a ::rs_stats_t structure as they run.  These may be
  converted to human-readable form or written to the log file using
  rs_format_stats() or rs_log_stats() respectively.

\subsection api_utility Utility functions

Some additional functions are used internally and also exposed in the
API:

 - encoding/decoding binary data: rs_base64(), rs_unbase64(),
   rs_hexify().
 - MD4 message digests: rs_mdfour(), rs_mdfour_begin(),
   rs_mdfour_update(), rs_mdfour_result().

*/