Extensive redrafting of documentation

author: Martin Pool <mbp@sourcefrog.net> 2015-12-20 17:19:59 -0800
committer: Martin Pool <mbp@sourcefrog.net> 2015-12-20 17:19:59 -0800
commit: 1ea10c84fd7551e0095d0b42a9ef0b2fe47d0965 (patch)
tree: b5c0aafa82473d6790e7bae386a8afb25312b75d
parent: 3f66c75e274eb6c8c1892dd69639001a90f8b517 (diff)
download: librsync-1ea10c84fd7551e0095d0b42a9ef0b2fe47d0965.tar.gz
16 files changed, 394 insertions, 655 deletions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index df2bfd8..abf0149 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,4 +1,4 @@
-# Contributing to librsync
+# Contributing to librsync {#page_contributing}
 
 Instructions and conventions for people wanting to work on librsync.  Please
 consider these guidelines even if you're doing your own fork.
@@ -7,7 +7,9 @@ consider these guidelines even if you're doing your own fork.
 
 [NEWS.md](NEWS.md) contains a list of user-visible changes in the library between
 releases version. This includes changes to the way it's packaged,
-bug fixes, portability notes, changes to the API, and so on. Add
+bug fixes, portability notes, changes to the API, and so on.
+
+Add
 and update items under a "Changes in X.Y.Z" heading at the top of
 the file. Do this as you go along, so that we don't need to work
 out what happened when it's time for a release.
@@ -23,12 +25,11 @@ If you are making a new tarball release of librsync, follow this checklist:
 * NEWS.md - make sure the top "Changes in X.Y.Z" is correct, and the date is
   correct.
 
-* CMakeLists.txt - version is correct.
-
-* librsync.spec - make sure version and URL are right.
+* `CMakeLists.txt` - version is correct.
 
+* `librsync.spec` - make sure version and URL are right.
 
-Do a complete configure and distcheck to ensure everything is properly
-configured, built, and tested.
+* Run `make all doc check` in a clean checkout of the release tag.
 
-TODO: Instructions on how.
+Test results for builds of public github branches are at
+https://travis-ci.org/librsync/librsync.
diff --git a/README.md b/README.md
index 1aecfb2..8eb7454 100644
--- a/README.md
+++ b/README.md
@@ -14,16 +14,12 @@ must redistribute the librsync source, with any modifications you have made.
 [LGPL]: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html
 
 librsync contains the BLAKE2 hash algorithm, written by Samuel Neves and
-released under the
-[CC0 public domain dedication][CC0].
+released under the [CC0 public domain dedication][CC0].
 
 [CC0]: http://creativecommons.org/publicdomain/zero/1.0/
 
 
-[TOC]
-
-
-\section intro Introduction
+## Introduction
 
 librsync is a library for calculating and applying network deltas,
 with an interface designed to ease integration into diverse
@@ -58,7 +54,7 @@ librsync is used by: [Dropbox](https://dropbox.com/),
 [Duplicity](http://www.nongnu.org/duplicity/), and others.
 (If you would like to be listed here, let me know.)
 
-\subsection is_not What librsync is not
+### What librsync is not
 
 1. librsync does not implement the rsync wire protocol. If you want to talk to
 an rsync server to transfer files you'll need to shell out to `rsync`.
@@ -74,241 +70,13 @@ or any other server. To access a remote filesystem, you need to provide
 your own code or make use of some other virtual filesystem layer.
 
 
-\section coordinates Coordinates
-
-librsync's home is http://librsync.sourcefrog.net/ and built documentation
-is available there.
-
-If you are reading the Doxygen version of this file, see
-the @ref rdiff page about the command line tool.
-
-Source and bug tracking is at https://github.com/librsync/librsync/.
-
-There are two mailing lists:
-
-- https://groups.google.com/forum/#!forum/librsync-announce
-- https://groups.google.com/forum/#!forum/librsync
-
-There are some [questions and answers about librsync on stackoverflow.com tagged
-`librsync`][stackoverflow].
-That is a good place to see if your question has already been answered.
-
-[stackoverflow]: http://stackoverflow.com/questions/tagged/librsync
-
-Source tarballs and git tags are at
-https://github.com/librsync/librsync/releases.
-
-Test results for builds of public github branches are at
-https://travis-ci.org/librsync/librsync.
-
-\section requirements Requirements
-
-To build librsync you will need:
-
-* A C compiler and appropriate headers and libraries
-
-* Make
-
-* `popt` command line parsing library (http://rpm5.org/files/popt/)
-
-* CMake (http://cmake.org/)
-
-* Doxygen (optional to build docs) (https://www.stack.nl/~dimitri/doxygen)
-
-
-\section building Building
-
-Generate the Makefile by running
-
-    $ cmake .
-
-After building you can install `rdiff` and `librsync` for system-wide use.
-
-    $ make
-    
-To run the tests:
-
-    $ make test
-    
-(Note that [CMake will not automatically build before testing](https://github.com/librsync/librsync/issues/49).)
-
-To install:
-
-    $ sudo make install
-    
-To build the documentation:
-
-    $ make doc
-
-librsync should be widely portable. Patches to fix portability bugs are
-welcome.
-
-If you are using GNU libc, you might like to use
-
-    MALLOC_CHECK_=2 ./rdiff
-
-to detect some allocation bugs.
-
-librsync has annotations for the SPLINT static checking tool.
-
-\subsection building_cygwin Cygwin
-
-With cygwin you can build using gcc as under a normal unix system. It
-is also possible to compile under cygwin using MSVC++. You must have
-environment variables needed by MSCV set using the Vcvars32.bat
-script.
-
-\section versioning Versioning
-
-librsync uses the [SemVer] approach to versioning: the major version number
-changes when the API changes in an incompatible way, the minor version
-changes when new features are added, and the patchlevel changes when there
-are improvements or fixes that do not change the API.
-
-[SemVer]: http://semver.org/
-
-The solib/dylib version is simply the major number of the library version.
-
-The librsync signature and patch files are separately versioned under
-application control.
-
-See [NEWS.md](NEWS.md) for a list of changes.
-
-
-\section api API Overview
-
-The library supports three basic operations:
-
--# \b sig: Generating the signature S of a file A .
--# \b loadsig: Read a signature from a file into memory.
--# \b delta: Calculating a delta D from S and a new file B.
--# \b path: Applying D to A to reconstruct B.
-
-The librsync tree also provides the \ref rdiff command-line tool, which
-makes this functionality available to users and scripting languages.
-
-The public interface to librsync (\ref librsync.h) has functions in several
-main areas:
-
-- \ref api_whole - for applications that just
-  want to make and use signatures and deltas with a single function call.
-- \ref api_streaming - for blocking or non-blocking IO and processing of
-  encapsulated, encrypted or compressed streams.
-- \ref api_delta
-- \ref api_buffers
-- \ref api_trace - aid debugging by showing messages about librsync's state.
-- \ref api_stats
-- \ref api_utility
-
-
-\subsection naming Naming conventions
-
-All external symbols have the prefix \c rs_, or
-\c RS_ in the case of preprocessor symbols.
-
-Symbols beginning with \c rs__ (double underscore) are private and should
-not be called from outside the library.
-
-
-\subsection api_streaming Data streaming
-
-A key design requirement for librsync is that it should handle data as
-and when the hosting application requires it.  librsync can be used
-inside applications that do non-blocking IO or filtering of network
-streams, because it never does IO directly, or needs to block waiting
-for data.
-
-The programming interface to librsync is similar to that of zlib and
-bzlib.  Arbitrary-length input and output buffers are passed to the
-library by the application, through an instance of ::rs_buffers_t.  The
-library proceeds as far as it can, and returns an ::rs_result value
-indicating whether it needs more data or space.
-
-All the state needed by the library to resume processing when more
-data is available is kept in a small opaque ::rs_job_t structure.
-After creation of a job, repeated calls to rs_job_iter() in between
-filling and emptying the buffers keeps data flowing through the
-stream.  The ::rs_result values returned may indicate
-
-- ::RS_DONE:  processing is complete
-- ::RS_BLOCKED: processing has blocked pending more data
-- one of various possible errors in processing
-
-These can be converted to a human-readable string by rs_strerror().
-
-\note Smaller buffers have high relative handling costs.  Application
-performance will be improved by using buffers of at least 32kb or so
-on each call.
-
-\subsection api_delta Generating and applying deltas
-
-All encoding operations are performed by using a <tt>_begin</tt>
-function to create a ::rs_job_t object, passing in any necessary
-initialization parameters.  The various jobs available are:
-
-- rs_sig_begin(): Calculate the signature of a file.
-- rs_loadsig_begin(): Load a signature into memory.
-- rs_delta_begin(): Calculate the delta between a signature and a new
-file.
-- rs_patch_begin(): Apply a delta to a basis to recreate the new
-file.
-
-\subsection api_buffers Buffers
-
-After creating a job, input and output buffers are passed to
-rs_job_iter() in an ::rs_buffers_s structure.
-
-On input, the buffers structure must contain the address and length of
-the input and output buffers.  The library updates these values to
-indicate the amount of \b remaining buffer.  So, on return, \c
-avail_out is not the amount of output data produced, but rather the
-amount of output buffer space unfilled.  This means that the values on
-return are consistent with the values on entry, but not necessarily
-what you would expect.
-
-A similar system is used by \p libz and \p libbz2.
-
-\warning The input may not be completely consumed by the iteration if
-there is not enough output space.  The application must retain unused
-input data, and pass it in again when it is ready for more output.
-
-\subsection api_whole Processing whole files
-
-Some applications do not require fine-grained control over IO, but
-rather just want to process a whole file with a single call.
-librsync provides whole-file APIs to do exactly that.
-
-These functions open files, process the entire contents, and return an
-overall result. The whole-file operations are the core of the
-\ref rdiff program.
-
-Processing of a whole file begins with creation of a ::rs_job_t
-object for the appropriate operation, just as if the application was
-going to do buffering itself.  After creation, the job may be passed
-to rs_whole_run(), which will feed it to and from two FILEs as
-necessary until end of file is reached or the operation completes.
-
-\see rs_sig_file()
-\see rs_loadsig_file()
-\see rs_mdfour_file()
-\see rs_delta_file()
-\see rs_patch_file()
-
-\subsection api_stats Encoding statistics
-
-Encoding and decoding routines accumulate compression performance
-statistics in a ::rs_stats_t structure as they run.  These may be
-converted to human-readable form or written to the log file using
-rs_format_stats() or rs_log_stats() respectively.
-
-NULL may be passed as the \p stats pointer if you don't want the stats.
-
-\subsection api_utility Utility functions
-
-Some additional functions are used internally and also exposed in the
-API:
+## More information
 
-- encoding/decoding binary data: rs_base64(), rs_unbase64(),
-rs_hexify().
-- MD4 message digests: rs_mdfour(), rs_mdfour_begin(),
-rs_mdfour_update(), rs_mdfour_result().
+* \ref page_downloads
+* \ref versioning
+* \ref page_install
+* \ref page_api
+* \ref page_support
+* \ref page_contributing
+* \ref rdiff command line interface
+* \ref NEWS.md
diff --git a/doc/buffer_internals.md b/doc/buffer_internals.md
new file mode 100644
index 0000000..6cca05b
--- /dev/null
+++ b/doc/buffer_internals.md
@@ -0,0 +1,84 @@
+# Buffer internals {#buffer_internals}
+
+## Input scoop
+
+A module called the *scoop* is used for buffering data going into
+librsync.  It accumulates data when the application does not supply it
+in large enough chunks for librsync to make use of it.
+
+The scoop object is a set of fields in the rs_job_t object::
+
+    char       *scoop_buf;             /* the allocation pointer */
+    size_t      scoop_alloc;           /* the allocation size */
+    size_t      scoop_avail;           /* the data size */
+
+Data from the read callback always goes into the scoop buffer.
+
+The state functions call rs__scoop_read when they need some input
+data.  If the read callback blocks, it might take multiple attempts
+before it can be filled.  Each time, the state function will also need
+to block, and then be reawakened by the library.
+
+Once the scoop has been sufficiently filled, it must be completely
+consumed by the state function.  This is easy if the state function
+always requests one unit of work at a time: a block, a file header
+element, etc.
+
+All this means that the valid data is always located at the start of
+the scoop, continuing for scoop_avail bytes.  The library is never
+allowed to consume only part of the data.
+
+One the state function has consumed the data, it should call
+rs__scoop_reset(), which resets scoop_avail to 0.
+
+
+## Output queue
+
+The library can set up data to be written out by putting a
+pointer/length for it in the output queue::
+
+    char       *outq_ptr;
+    size_t      outq_bytes;
+
+The job infrastructure will make sure this is written out before the
+next call into the state machine.
+
+There is only one outq_ptr, so any given state function can only
+produce one contiguous block of output.
+
+
+## Buffer sharing
+
+The scoop buffer may be used by the output queue.  This means that
+data can traverse the library with no extra copies: one copy into the
+scoop buffer, and one copy out.  In this case outq_ptr  points into
+scoop_buf, and outq_bytes tells how much data needs to be written.
+
+The state function calls rs__scoop_reset before returning when it is
+finished with the data in the scoop.  However, the outq may still
+point into the scoop buffer, if it has not yet been able to be copied
+out.  This means that there is data in the scoop beyond scoop_avail
+that must still be retained.
+
+This is safe because neither the scoop nor the state function will
+get to run before the output queue has completely drained.
+
+
+## Readahead
+
+How much readahead is required?
+
+At the moment (??) our rollsum and MD4 routines require a full
+contiguous block to calculate a checksum.  This could be relaxed, at a
+possible loss of efficiency.
+
+So calculating block checksums requires one full block to be in
+memory.
+
+When applying a patch, we only need enough readahead to unpack the
+command header.
+
+When calculating a delta, we need a full block to calculate its
+checksum, plus space for the missed data.  We can accumulate any
+amount of missed data before emitting it as a literal; the more we can
+accumulate the more compact the encoding will be.
diff --git a/doc/callbacks.md b/doc/callbacks.md
new file mode 100644
index 0000000..f3a74c7
--- /dev/null
+++ b/doc/callbacks.md
@@ -0,0 +1,41 @@
+# IO callbacks {#api_callbacks}
+
+librsync jobs use IO callbacks to read and write files. These callbacks
+might write the data directly to a file or network connection, or they
+might do some additional work such as compression or encryption.
+
+Callbacks are passed a *baton*, which is chosen by the application when
+setting up the job. The baton can hold context or state for the
+callback, such as a file handle or descriptor.
+
+There are three types of callbacks, for input, output, and a special one
+for random-access reads of the basis file when patching. Different types
+of job use different callbacks. The callbacks are assigned when the job
+is created and cannot be changed. (If the behavior of the callback
+needs to change during the job, that can be controlled by variables in
+the baton.)
+
+IO callbacks are passed the address of a buffer allocated by librsync
+which they read data into or write data from, plus the length of the
+buffer.
+
+Callbacks return a ::rs_result value to indicate success, an error, or
+being blocked. Callbacks must set the appropriate `bytes_read` or
+`bytes_written` to indicate how much data was processed. They may
+process only part of the requested data, in which case they still return
+::RS_DONE. In this case librsync will call the callback again later
+until it either completes, fails, or blocks.
+
+When a read callback reaches end-of-file and can return no more data, it
+should return ::RS_INPUT_ENDED. In this case no data should be returned; the
+output value of bytes\_read is ignored. If the callback has just a
+little data left before end of file, then it should return that data
+with ::RS_DONE. On the next call, unless the file has grown, it can
+return ::RS_INPUT_ENDED.
+
+If the callbacks return an error, that error will typically be passed
+back to the application.
+
+IO callbacks are only called from within rs_job_iter(), never
+spontaneously. Different callbacks may be called several times in a
+single invocation of rs_job_iter().
diff --git a/doc/downloads.md b/doc/downloads.md
new file mode 100644
index 0000000..0669a21
--- /dev/null
+++ b/doc/downloads.md
@@ -0,0 +1,9 @@
+# Downloads {#page_downloads}
+
+librsync's home is http://librsync.sourcefrog.net/ and built documentation
+is available there.
+
+Source and bug tracking is at https://github.com/librsync/librsync/.
+
+Source tarballs and git tags are at
+https://github.com/librsync/librsync/releases.
diff --git a/doc/install.md b/doc/install.md
new file mode 100644
index 0000000..8d1df20
--- /dev/null
+++ b/doc/install.md
@@ -0,0 +1,60 @@
+# Installing librsync {#page_install}
+
+## Requirements
+
+To build librsync you will need:
+
+* A C compiler and appropriate headers and libraries
+
+* Make
+
+* [popt] command line parsing library
+
+* CMake (http://cmake.org/)
+
+* Doxygen (optional to build docs) (https://www.stack.nl/~dimitri/doxygen)
+
+[popt]: http://rpm5.org/files/popt/
+
+
+## Building
+
+Generate the Makefile by running
+
+    $ cmake .
+
+After building you can install `rdiff` and `librsync` for system-wide use.
+
+    $ make
+    
+To run the tests:
+
+    $ make test
+    
+(Note that [CMake will not automatically build before testing](https://github.com/librsync/librsync/issues/49).)
+
+To install:
+
+    $ sudo make install
+    
+To build the documentation:
+
+    $ make doc
+
+librsync should be widely portable. Patches to fix portability bugs are
+welcome.
+
+If you are using GNU libc, you might like to use
+
+    MALLOC_CHECK_=2 ./rdiff
+
+to detect some allocation bugs.
+
+librsync has annotations for the SPLINT static checking tool.
+
+## Cygwin
+
+With Cygwin you can build using gcc as under a normal unix system. It
+is also possible to compile under Cygwin using MSVC++. You must have
+environment variables needed by MSVC set using the Vcvars32.bat
+script.
diff --git a/doc/librsync.md b/doc/librsync.md
index 83e5996..dbb62ae 100644
--- a/doc/librsync.md
+++ b/doc/librsync.md
@@ -1,203 +1,37 @@
+# API Overview {#page_api}
 
-## API overview
+The library supports four basic operations:
 
-### Debug messages
+-# \b sig: Generating the signature S of a file A .
+-# \b loadsig: Read a signature from a file into memory.
+-# \b delta: Calculating a delta D from S and a new file B.
+-# \b path: Applying D to A to reconstruct B.
 
-IO callbacks
-============
+These are all available in two different modes:
 
-librsync jobs use IO callbacks to read and write files. These callbacks
-might write the data directly to a file or network connection, or they
-might do some additional work such as compression or encryption.
+- \ref api_whole - for applications that just
+  want to make and use signatures and deltas with a single function call.
+- \ref api_streaming - for blocking or non-blocking IO and processing of
+  encapsulated, encrypted or compressed streams.
 
-Callbacks are passed a *baton*, which is chosen by the application when
-setting up the job. The baton can hold context or state for the
-callback, such as a file handle or descriptor.
+The librsync tree also provides the \ref rdiff command-line tool, which
+makes this functionality available to users and scripting languages.
 
-There are three types of callbacks, for input, output, and a special one
-for random-access reads of the basis file when patching. Different types
-of job use different callbacks. The callbacks are assigned when the job
-is created and cannot be changed. (If the behaviour of the callback
-needs to change during the job, that can be controlled by variables in
-the baton.)
+The public interface to librsync (\ref librsync.h) has functions in several
+main areas:
 
-There are three function typedefs for these callbacks:
+- \ref api_trace - aid debugging by showing messages about librsync's state.
+- \ref api_callbacks
+- \ref api_stats
+- \ref api_utility
+- \ref versioning
 
-    typedef rs_result rs_cb_read(void *baton,
-                                 char *buf,
-                                 size_t buf_len,
-                                 size_t *bytes_read);
+## Naming conventions
 
-    typedef rs_result rs_cb_basis(void *baton,
-                                  char *buf,
-                                  size_t buf_len,
-                                  off_t offset,
-                                  size_t *bytes_read);
+All external symbols have the prefix \c rs_, or
+\c RS_ in the case of preprocessor symbols.
+(There are some private symbols that currently don't match this, but these
+are considered to be bugs.)
 
-    typedef rs_result rs_cb_write(void *baton,
-                                  const char *buf,
-                                  size_t buf_len,
-                                  size_t *bytes_written);
-
-IO callbacks are passed the address of a buffer allocated by librsync
-which they read data into or write data from, plus the length of the
-buffer.
-
-Callbacks return an `rs_result` value to indicate success, an error, or
-being blocked. Callbacks must set the appropriate `bytes_read` or
-`bytes_written` to indicate how much data was processed. They may
-process only part of the requested data, in which case they still return
-`RS_DONE`. In this case librsync will call the callback again later
-until it either completes, fails, or blocks.
-
-When a read callback reaches end-of-file and can return no more data, it
-should return `RS_EOF`. In this case no data should be returned; the
-output value of bytes\_read is ignored. If the callback has just a
-little data left before end of file, then it should return that data
-with `RS_DONE`. On the next call, unless the file has grown, it can
-return `RS_EOF`.
-
-If the callbacks return an error, that error will typically be passed
-back to the application.
-
-IO callbacks are only called from within `rs_job_run`, never
-spontaneously. Different callbacks may be called several times in a
-single invocation of `rs_job_run`.
-
-stdio callbacks
----------------
-
-librsync provides predefined IO callbacks that wrap the C stdio
-facility. The baton argument for all these functions is a `FILE*`:
-
-    rs_result rs_cb_read_stdio(void*,
-                               char *buf,
-                               size_t buf_len,
-                               size_t *bytes_read);
-
-    rs_result rs_cb_basis_stdio(void *,
-                                char *buf,
-                                size_t buf_len,
-                                off_t offset,
-                                size_t *bytes_read);
-
-    rs_result rs_cb_write_stdio(void *voidp,
-                                const char *buf,
-                                size_t buf_len,
-                                size_t *bytes_written);
-
-There is also a utility function that wraps `fopen`. It reports any
-errors through the librsync error log, and translates return values. It
-also treats `-` as stdin or stdout as appropriate. :
-
-    rs_result rs_stdio_open(const char *file,
-                            const char *mode,
-                            FILE **filp_out);
-
-Creating Jobs
-=============
-
-There are functions to create jobs for each operation: gensig, delta,
-loadsig and patch. These functions create a new job object, which can
-then be run using `rs_job_run`. These creation functions are passed the
-IO callbacks and batons to be used for the job.
-
-    rs_result rs_gensig_begin(rs_job_t **job_out,
-                              size_t block_len,
-                              size_t strong_sum_len,
-                              rs_cb_read *read_cb, void *read_baton,
-                              rs_cb_write *write_cb, void *write_baton);
-
-A newly allocated job object is stored in `*job_out`.
-
-The patch job accepts the patch as input, and uses a callback to look up
-blocks within the basis file.
-
-You must configure read, write and basis callbacks after creating the
-job but before it is run.
-
-After creating the job, call `rs_job_run` to feed in patch data and
-retrieve output data. When the job is complete, call `rs_job_finish` to
-dispose of the job object and free memory.
-
-Running Jobs
-============
-
-The work of the operation is done when the application calls
-`rs_job_run`. This includes reading from input files via the callback,
-running the rsync algorithms, and writing output.
-
-The IO callbacks are only called from inside `rs_job_run`. If any of
-them return an error, `rs_job_run` will generally return the same error.
-
-When librsync needs to do input or output, it calls one of the callback
-functions. `rs_job_run` returns when the operation has completed or
-failed, or when one of the IO callbacks has blocked.
-
-`rs_job_run` will usually be called in a loop, perhaps alternating
-librsync processing with other application functions.
-
-    rs_result rs_job_run(rs_job_t *job);
-
-Deleting Jobs
-=============
-
-A job is deleted and its memory freed up using `rs_job_free`:
-
-    rs_result rs_job_free(rs_job_t *job);
-
-This is typically called when the job has completed or failed. It can be
-called earlier if the application decides it wants to cancell
-processing.
-
-`rs_job_free` does not delete the output of the job, such as the sumset
-loaded into memory. It does delete the job's statistics.
-
-Non-blocking IO
-===============
-
-The librsync interface allows non-blocking streaming processing of data.
-This means that the library will accept input and produce output when it
-suits the application. If nonblocking file IO is used and the IO
-callbacks support it, then librsync will never block waiting for IO.
-
-Normally callbacks will read/write the whole buffer when they're called,
-but in some cases they might not be able to process all of it, or
-perhaps not process any at all. This might happen if the callbacks are
-connected to a nonblocking socket. Either of two things can happen in
-this case. If the callback returns `RS_BLOCKED`, then `rs_job_run` will
-also return `RS_BLOCKED` shortly.
-
-When an IO callback blocks, it is the responsibility of the application
-to work out when it will be able to make progress and therefore when it
-is worth calling `rs_job_run` again. Typically this involves a mechanism
-like `poll` or `select` to wait for the file descriptor to be ready.
-
-Threaded IO
-===========
-
-librsync may be used from threaded programs. librsync does no
-synchronization itself. Each job should be guarded by a monitor or used
-by only a single thread.
-
-Job Statistics
-==============
-
-Jobs accumulate statistics while they run, such as the number of input
-and output bytes. The particular statistics collected depend on the type
-of job. :
-
-    const rs_stats_t * rs_job_statistics(rs_job_t *job);
-
-`rs_job_statistics` returns a pointer to statistics for the job. The
-pointer is valid throughout the life of the job, until the job is freed.
-The statistics are updated during processing and can be used to measure
-progress.
-
-Statistics can be written to the trace file in human-readable form:
-
-    int rs_log_stats(rs_stats_t const *stats);
-
-Statistics are held in a structure referenced by the job object. The
-statistics are kept up-to-date as the job runs and so can be used for
-progress indicators.
+Symbols beginning with \c rs__ (double underscore) are private and should
+not be called from outside the library.
diff --git a/doc/statemachine.md b/doc/statemachine.md
deleted file mode 100644
index 56ffb63..0000000
--- a/doc/statemachine.md
+++ /dev/null
@@ -1,173 +0,0 @@
-# librsync state machine
-
-## State Machines
-
-
-Internally, the operations are implemented as state machines that move
-through various states as input and output buffers become available.
-
-All computers and programs are state machines.  So why is the
-representation as a state machine a little more explicit (and perhaps
-verbose) in librsync than other places?  Because we need to be able to
-let the real computer go off and do something else like waiting for
-network traffic, while still remembering where it was in the librsync
-state machine.
-
-librsync will never block waiting for IO, unless the callbacks do
-that.
-
-The current state is represented by the private field
-`job->statefn`, which points to a function with a name like
-`rs_OPERATION_s_STATE`.  Every time librsync tries to make progress,
-it will call this function.
-
-The state function returns one of the ::rs_result values.  The
-most important values are
-
- * ::RS_DONE: Completed successfully.
-
- * ::RS_BLOCKED: Cannot make further progress at this point.
-
- * ::RS_RUNNING: The state function has neither completed nor blocked but
-    wants to be called again.  **XXX**: Perhaps this should be removed?
-
-States need to correspond to suspension points.  The only place the
-job can resume after blocking is at the entry to a state function.
-
-Therefore states must be "all or nothing" in that they can either
-complete, or restart without losing information.
-
-Basically every state needs to work from one input buffer to one
-output buffer.
-
-States should never generally return RS_DONE directly.  Instead, they
-should call rs__job_done, which sets the state function to
-rs__s_done.  This makes sure that any pending output is flushed out
-before RS_DONE is returned to the application.
-
-
-## Blocking input and output
-
-The IO callbacks are allowed to block or to process only part of the
-requested data.  The library needs to cope with this frustration.
-
-The library might not get as much input as it wanted when it is first
-called.  If it gets a partial read, it needs to hold onto that
-valuable and irreplaceable data.
-
-It cannot keep it on the stack, because it will be lost if the read
-blocks.  It needs to be kept in the job structure, or in somewhere
-referenced from there.
-
-The state function probably cannot proceed until it has all the needed
-input.  So possibly this can be expressed at a high level of the job
-structure.  Or perhaps it should just be done by each particular state
-function.
-
-When the library has output to write out, the callback might not be
-able to accept all of it at the time it is called.  Deferred outgoing
-data needs to be stored in a buffer referenced from the job structure.
-
-I think it's always OK to try to flush this when entering rs_job_run.
-I think it's OK to not do anything else until all the outgoing data
-has been flushed.
-
-In many cases we would like to pass a pointer into the input (or
-pread) buffer straight to the output callback.  In other cases, we
-need a different buffer to build up literal outgoing data.
-
-librsync deals with short, bounded-size headers and checksums, and
-with arbitrarily-large streaming data.  Although the commands are of
-bounded size, they are not of fixed size, because there are different
-encodings to suit different situations.
-
-The situation is very similar to fetching variable-length headers from
-a socket.  We cannot read the whole command in a single input, because
-we don't know how long it is.  As a general principle I think we
-should *not* read in too much data and buffer it, because this
-complicates things.  Therefore we need to read the type byte first,
-and then possibly read some parameters.
-
-
-## Input scoop
-
-A module called the *scoop* is used for buffering data going into
-librsync.  It accumulates data when the application does not supply it
-in large enough chunks for librsync to make use of it.
-
-The scoop object is a set of fields in the rs_job_t object::
-
-    char       *scoop_buf;             /* the allocation pointer */
-    size_t      scoop_alloc;           /* the allocation size */
-    size_t      scoop_avail;           /* the data size */
-
-Data from the read callback always goes into the scoop buffer.
-
-The state functions call rs__scoop_read when they need some input
-data.  If the read callback blocks, it might take multiple attempts
-before it can be filled.  Each time, the state function will also need
-to block, and then be reawakened by the library.
-
-Once the scoop has been sufficiently filled, it must be completely
-consumed by the state function.  This is easy if the state function
-always requests one unit of work at a time: a block, a file header
-element, etc.
-
-All this means that the valid data is always located at the start of
-the scoop, continuing for scoop_avail bytes.  The library is never
-allowed to consume only part of the data.
-
-One the state function has consumed the data, it should call
-rs__scoop_reset, which resets scoop_avail to 0.
-
-
-## Output queue
-
-The library can set up data to be written out by putting a
-pointer/length for it in the output queue::
-
-    char       *outq_ptr;
-    size_t      outq_bytes;
-
-The job infrastructure will make sure this is written out before the
-next call into the state machine.  This implies it is
-
-There is only one outq_ptr, so any given state function can only
-produce one contiguous block of output.
-
-
-## Buffer sharing
-
-The scoop buffer may be used by the output queue.  This means that
-data can traverse the library with no extra copies: one copy into the
-scoop buffer, and one copy out.  In this case outq_ptr  points into
-scoop_buf, and outq_bytes tells how much data needs to be written.
-
-The state function calls rs__scoop_reset before returning when it is
-finished with the data in the scoop.  However, the outq may still
-point into the scoop buffer, if it has not yet been able to be copied
-out.  This means that there is data in the scoop beyond scoop_avail
-that must still be retained.
-
-This is safe because neither the scoop nor the state function will
-get to run before the output queue has completely drained.
-
-
-## Readahead
-
-How much readahead is required?
-
-At the moment (??) our rollsum and MD4 routines require a full
-contiguous block to calculate a checksum.  This could be relaxed, at a
-possible loss of efficiency.
-
-So calculating block checksums requires one full block to be in
-memory.
-
-When applying a patch, we only need enough readahead to unpack the
-command header.
-
-When calculating a delta, we need a full block to calculate its
-checksum, plus space for the missed data.  We can accumulate any
-amount of missed data before emitting it as a literal; the more we can
-accumulate the more compact the encoding will be.
diff --git a/doc/stats.md b/doc/stats.md
new file mode 100644
index 0000000..5974ff8
--- /dev/null
+++ b/doc/stats.md
@@ -0,0 +1,24 @@
+# Stats {#api_stats}
+
+Encoding and decoding routines accumulate compression performance
+statistics, such as the number of bytes read and written, indicators
+a ::rs_stats_t structure.  
+
+The particular statistics collected depend on the type
+of job.
+
+Stats may be
+converted to human-readable form or written to the log file using
+::rs_format_stats() or ::rs_log_stats() respectively.
+
+Statistics are held in a structure referenced by the job object. The
+statistics are kept up-to-date as the job runs and so can be used for
+progress indicators.
+ 
+::rs_job_statistics returns a pointer to statistics for the job. The
+pointer is valid throughout the life of the job, until the job is freed.
+The statistics are updated during processing and can be used to measure
+progress.
+
+Whole-file functions write statistics into a structure supplied by the caller.
+\c NULL may be passed as the \p stats pointer if you don't want the stats.
diff --git a/doc/support.md b/doc/support.md
new file mode 100644
index 0000000..b61868d
--- /dev/null
+++ b/doc/support.md
@@ -0,0 +1,12 @@
+# Support {#page_support}
+
+There are two mailing lists:
+
+- https://groups.google.com/forum/#!forum/librsync-announce
+- https://groups.google.com/forum/#!forum/librsync
+
+There are some [questions and answers about librsync on stackoverflow.com tagged
+`librsync`][stackoverflow].
+That is a good place to see if your question has already been answered.
+
+[stackoverflow]: http://stackoverflow.com/questions/tagged/librsync
diff --git a/doc/utilities.md b/doc/utilities.md
new file mode 100644
index 0000000..abac6ad
--- /dev/null
+++ b/doc/utilities.md
@@ -0,0 +1,10 @@
+# Utility functions {#api_utility}
+
+Some additional functions are used internally and also exposed in the
+API:
+
+- encoding/decoding binary data: rs_base64(), rs_unbase64(),
+  rs_hexify().
+  
+- MD4 message digests: rs_mdfour(), rs_mdfour_begin(),
+  rs_mdfour_update(), rs_mdfour_result().
diff --git a/doc/versioning.md b/doc/versioning.md
new file mode 100644
index 0000000..4c8d354
--- /dev/null
+++ b/doc/versioning.md
@@ -0,0 +1,19 @@
+# Versioning {#versioning}
+
+librsync uses the [SemVer] approach to versioning: the major version number
+changes when the API changes in an incompatible way, the minor version
+changes when new features are added, and the patchlevel changes when there
+are improvements or fixes that do not change the API.
+
+[SemVer]: http://semver.org/
+
+The solib/dylib version is simply the major number of the library version.
+
+The librsync signature and patch files are separately versioned under
+application control, by passing a ::rs_magic_number when creating a job.
+
+The library version can be checked at runtime in ::rs_librsync_version.
+
+A brief summary of the licence on librsync is in ::rs_licence_string.
+
+See [NEWS.md](NEWS.md) for a list of changes.
diff --git a/doc/whole.md b/doc/whole.md
new file mode 100644
index 0000000..2ccc883
--- /dev/null
+++ b/doc/whole.md
@@ -0,0 +1,23 @@
+# Whole-file API {#api_whole}
+
+Some applications do not require the fine-grained control over IO, but
+rather just want to process a whole file with a single call.
+librsync provides whole-file APIs to do exactly that.
+
+These functions open files, process the entire contents, and return an
+overall result. The whole-file operations are the core of the
+\ref rdiff program.
+
+Processing of a whole file begins with creation of a ::rs_job_t
+object for the appropriate operation, just as if the application was
+going to do buffering itself.  After creation, the job may be passed
+to rs_whole_run(), which will feed it to and from two FILEs as
+necessary until end of file is reached or the operation completes.
+
+\see rs_sig_file()
+\see rs_loadsig_file()
+\see rs_mdfour_file()
+\see rs_delta_file()
+\see rs_patch_file()
+
+\see api_streaming
diff --git a/src/job.c b/src/job.c
index 524678c..768f5a5 100644
--- a/src/job.c
+++ b/src/job.c
@@ -34,8 +34,11 @@
  *
  * The point of this is
  * that we need to be able to suspend and resume processing at any
- * point at which the buffers may block.  We could do that using
- * setjmp or similar tricks, but this is probably simpler.
+ * point at which the buffers may block.
+ *
+ * \see \ref api_streaming
+ * \see rs_job_iter()
+ * \see ::rs_job
  */
 
 
@@ -124,17 +127,6 @@ static rs_result rs_job_complete(rs_job_t *job, rs_result result)
 }
 
 
-/**
- * \brief Run a ::rs_job state machine until it blocks
- * (::RS_BLOCKED), returns an error, or completes (::RS_DONE).
- *
- * \return The ::rs_result that caused iteration to stop.
- *
- * \c job->stream->eof_in should be true if there is no more data after what's
- * in the
- * input buffer.  The final block checksum will run across whatever's
- * in there, without trying to accumulate anything else.
- */
 rs_result rs_job_iter(rs_job_t *job, rs_buffers_t *buffers)
 {
     rs_result       result;
@@ -216,10 +208,6 @@ rs_job_input_is_ending(rs_job_t *job)
 
 
 
-/**
- * Actively process a job, by making callbacks to fill and empty the
- * buffers until the job is done.
- */
 rs_result
 rs_job_drive(rs_job_t *job, rs_buffers_t *buf,
              rs_driven_cb in_cb, void *in_opaque,
diff --git a/src/job.h b/src/job.h
index e242ceb..e452371 100644
--- a/src/job.h
+++ b/src/job.h
@@ -24,6 +24,7 @@
 
 /**
  * \struct rs_job
+ * The contents of this structure are private.
  */
 struct rs_job {
     int                 dogtag;
diff --git a/src/librsync.h b/src/librsync.h
index 32aa49d..d131604 100644
--- a/src/librsync.h
+++ b/src/librsync.h
@@ -43,7 +43,13 @@
 extern "C" {
 #endif
 
+/** Library version string.
+ * \see \ref versioning
+ */
 extern char const rs_librsync_version[];
+
+/** Summary of the licence for librsync.
+ */
 extern char const rs_licence_string[];
 
 typedef unsigned char rs_byte_t;
@@ -208,6 +214,8 @@ void rs_base64(unsigned char const *buf, int n, char *out);
 /**
  * \enum rs_result
  * \brief Return codes from nonblocking rsync operations.
+ * \see rs_strerror()
+ * \see api_callbacks
  */
 typedef enum rs_result {
     RS_DONE =		0,	/**< Completed successfully. */
@@ -304,6 +312,9 @@ void rs_mdfour_result(rs_mdfour_t * md, unsigned char *out);
 
 char *rs_format_stats(rs_stats_t const *, char *, size_t);
 
+/**
+ * Write statistics into the current log as text.
+ */
 int rs_log_stats(rs_stats_t const *stats);
 
 
@@ -317,23 +328,26 @@ void rs_sumset_dump(rs_signature_t const *);
 
 
 /**
- * Stream through which the calling application feeds data to and from the
- * library.
+ * Description of input and output buffers.
  *
  * On each call to ::rs_job_iter(), the caller can make available
  *
  *  - #avail_in bytes of input data at #next_in
  *  - #avail_out bytes of output space at #next_out
- *  - some of both
+ *  - or some of both
+ *
+ * Buffers must be allocated and passed in by the caller.
  *
- * Buffers must be allocated and passed in by the caller.  This
- * routine never allocates, reallocates or frees buffers.
+ * On input, the buffers structure must contain the address and length of
+ * the input and output buffers.  The library updates these values to
+ * indicate the amount of \b remaining buffer.  So, on return,
+ * #avail_out is not the amount of output data produced, but rather the
+ * amount of output buffer space still available.
  *
- * Pay attention to the meaning of the returned pointer and length
- * values.  They do \b not indicate the location and amount of
- * returned data.  Rather, if #next_out was originally set to \c
- * out_buf, then the output data begins at \c out_buf, and has length
- * <code>*next_out - \p out_buf</code>.
+ * This means that the values on
+ * return are consistent with the values on entry, and suitable to be passed
+ * in on a second call, but they don't directly tell you how much output
+ * data was produced.
  *
  * Note also that if *#avail_in is nonzero on return, then not all of
  * the input data has been consumed.  The caller should either provide
@@ -342,7 +356,7 @@ void rs_sumset_dump(rs_signature_t const *);
  * persistent buffer and call rs_job_iter() with it again when there is
  * more output space.
  *
- * \sa \ref api_buffers
+ * \sa rs_job_iter()
  */
 struct rs_buffers_s {
     /** \brief Next input byte.
@@ -384,11 +398,7 @@ struct rs_buffers_s {
 };
 
 /**
- * Stream through which the calling application feeds data to and from the
- * library.
- *
- * \sa struct rs_buffers_s
- * \sa \ref api_buffers
+ * \see ::rs_buffers_s
  */
 typedef struct rs_buffers_s rs_buffers_t;
 
@@ -396,12 +406,18 @@ typedef struct rs_buffers_s rs_buffers_t;
 #define RS_DEFAULT_BLOCK_LEN 2048
 
 
-/** \typedef rs_job_t
- *
+/**
  * \brief Job of work to be done.
  *
  * Created by functions such as rs_sig_begin(), and then iterated
- * over by ::rs_job_iter(). */
+ * over by rs_job_iter().
+ *
+ * The contents are opaque to the application, and instances are always
+ * allocated by the library.
+ *
+ * \see \ref api_streaming
+ * \see rs_job
+ */
 typedef struct rs_job rs_job_t;
 
 /**
@@ -413,18 +429,40 @@ typedef enum rs_work_options {
                                  * up. */
 } rs_work_options;
 
-
+/**
+ * \brief Run a ::rs_job state machine until it blocks
+ * (::RS_BLOCKED), returns an error, or completes (::RS_DONE).
+ *
+ * \return The ::rs_result that caused iteration to stop.
+ *
+ * \c job->stream->eof_in should be true if there is no more data after what's
+ * in the
+ * input buffer.  The final block checksum will run across whatever's
+ * in there, without trying to accumulate anything else.
+ */
 rs_result       rs_job_iter(rs_job_t *, rs_buffers_t *);
 
+/**
+ * \todo Document me.
+ */
 typedef rs_result rs_driven_cb(rs_job_t *job, rs_buffers_t *buf,
                                void *opaque);
 
+/**
+ * Actively process a job, by making callbacks to fill and empty the
+ * buffers until the job is done.
+ */
 rs_result rs_job_drive(rs_job_t *job, rs_buffers_t *buf,
                        rs_driven_cb in_cb, void *in_opaque,
                        rs_driven_cb out_cb, void *out_opaque);
 
+/**
+ * Return a pointer to the statistics in a job.
+ */
 const rs_stats_t * rs_job_statistics(rs_job_t *job);
 
+/** Deallocate job state.
+ */
 rs_result       rs_job_free(rs_job_t *);
 
 int             rs_accum_value(rs_job_t *, char *sum, size_t sum_len);
author	Martin Pool <mbp@sourcefrog.net>	2015-12-20 17:19:59 -0800
committer	Martin Pool <mbp@sourcefrog.net>	2015-12-20 17:19:59 -0800
commit	1ea10c84fd7551e0095d0b42a9ef0b2fe47d0965 (patch)
tree	b5c0aafa82473d6790e7bae386a8afb25312b75d
parent	3f66c75e274eb6c8c1892dd69639001a90f8b517 (diff)
download	librsync-1ea10c84fd7551e0095d0b42a9ef0b2fe47d0965.tar.gz