summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Pool <mbp@sourcefrog.net>2015-11-14 09:47:00 -0800
committerMartin Pool <mbp@sourcefrog.net>2015-11-14 09:47:00 -0800
commitceb03ad90e7fa6b115fc6a72c40941b07d4d3dd0 (patch)
treecd8d19a5cbd0047b6037abe471829ea90a30d41f
parent54e505667257fd1ea786454bea390784d817123c (diff)
downloadlibrsync-doc-formats.tar.gz
Add brief documentation of signature formatdoc-formats
Partly addresses #46. Still needs to be integrated with Doxygen, and to include delta format documentation.
-rw-r--r--doc/format.md64
1 files changed, 64 insertions, 0 deletions
diff --git a/doc/format.md b/doc/format.md
new file mode 100644
index 0000000..118c19c
--- /dev/null
+++ b/doc/format.md
@@ -0,0 +1,64 @@
+# librsync formats
+
+## Generalities
+
+There are two file formats used by `librsync` and `rdiff`: the
+*signature* file, which summarizes a data file, and the *delta* file,
+which describes the edits from one data file to another.
+
+librsync does not know or care about any formats in the data files.
+
+All integers are big-endian.
+
+## Magic numbers
+
+All librsync files start with a `uint32` magic number identifying them. These are declared in `librsync.h`:
+
+```
+/** A delta file. At present, there's only one delta format. **/
+RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */
+
+/**
+ * A signature file with MD4 signatures. Backward compatible with
+ * librsync < 1.0, but strongly deprecated because it creates a security
+ * vulnerability on files containing partly untrusted data. See
+ * <https://github.com/librsync/librsync/issues/5>.
+ **/
+RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */
+
+/**
+ * A signature file using the BLAKE2 hash. Supported from librsync 1.0.
+ **/
+RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */
+```
+
+## Signatures
+
+Signatures consist of a header followed by a number of block
+signatures.
+
+Each block signature gives signature hashes for one block of
+`block_len` bytes from the input data file. The final data block
+may be shorter. The number of blocks in the signature is therefore
+
+ ceil(input_len/block_len)
+
+The signature header is (see `rs_sig_s_header`):
+
+ u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC
+ u32 block_len; // bytes per block
+ u32 strong_sum_len; // bytes per strong sum in each block
+
+The block signature contains a rolling or weak checksum used to find
+moved data, and a strong hash used to check the match is correct.
+The weak checksum is computed as in `rollsum.c`. The strong hash is
+either MD4 or BLAKE2 depending on the magic number.
+
+To make the signatures smaller at a cost of a greater chance of collisions,
+the `strong_sum_len` in the header can cause the strong sum to be truncated
+to the left after computation.
+
+Each signature block format is (see `rs_sig_do_block`):
+
+ u32 weak_sum;
+ u8[strong_sum_len] strong_sum;