summaryrefslogtreecommitdiff
path: root/doc/buffer_internals.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/buffer_internals.md')
-rw-r--r--doc/buffer_internals.md84
1 files changed, 84 insertions, 0 deletions
diff --git a/doc/buffer_internals.md b/doc/buffer_internals.md
new file mode 100644
index 0000000..6cca05b
--- /dev/null
+++ b/doc/buffer_internals.md
@@ -0,0 +1,84 @@
+# Buffer internals {#buffer_internals}
+
+## Input scoop
+
+A module called the *scoop* is used for buffering data going into
+librsync. It accumulates data when the application does not supply it
+in large enough chunks for librsync to make use of it.
+
+The scoop object is a set of fields in the rs_job_t object::
+
+ char *scoop_buf; /* the allocation pointer */
+ size_t scoop_alloc; /* the allocation size */
+ size_t scoop_avail; /* the data size */
+
+Data from the read callback always goes into the scoop buffer.
+
+The state functions call rs__scoop_read when they need some input
+data. If the read callback blocks, it might take multiple attempts
+before it can be filled. Each time, the state function will also need
+to block, and then be reawakened by the library.
+
+Once the scoop has been sufficiently filled, it must be completely
+consumed by the state function. This is easy if the state function
+always requests one unit of work at a time: a block, a file header
+element, etc.
+
+All this means that the valid data is always located at the start of
+the scoop, continuing for scoop_avail bytes. The library is never
+allowed to consume only part of the data.
+
+One the state function has consumed the data, it should call
+rs__scoop_reset(), which resets scoop_avail to 0.
+
+
+## Output queue
+
+The library can set up data to be written out by putting a
+pointer/length for it in the output queue::
+
+ char *outq_ptr;
+ size_t outq_bytes;
+
+The job infrastructure will make sure this is written out before the
+next call into the state machine.
+
+There is only one outq_ptr, so any given state function can only
+produce one contiguous block of output.
+
+
+## Buffer sharing
+
+The scoop buffer may be used by the output queue. This means that
+data can traverse the library with no extra copies: one copy into the
+scoop buffer, and one copy out. In this case outq_ptr points into
+scoop_buf, and outq_bytes tells how much data needs to be written.
+
+The state function calls rs__scoop_reset before returning when it is
+finished with the data in the scoop. However, the outq may still
+point into the scoop buffer, if it has not yet been able to be copied
+out. This means that there is data in the scoop beyond scoop_avail
+that must still be retained.
+
+This is safe because neither the scoop nor the state function will
+get to run before the output queue has completely drained.
+
+
+## Readahead
+
+How much readahead is required?
+
+At the moment (??) our rollsum and MD4 routines require a full
+contiguous block to calculate a checksum. This could be relaxed, at a
+possible loss of efficiency.
+
+So calculating block checksums requires one full block to be in
+memory.
+
+When applying a patch, we only need enough readahead to unpack the
+command header.
+
+When calculating a delta, we need a full block to calculate its
+checksum, plus space for the missed data. We can accumulate any
+amount of missed data before emitting it as a literal; the more we can
+accumulate the more compact the encoding will be.