summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorDonovan Baarda <abo@minkirri.apana.org.au>2020-05-14 12:06:06 +1000
committerDonovan Baarda <abo@minkirri.apana.org.au>2020-05-14 12:06:06 +1000
commit985b1e3a058be46dea429b42d1ab7e69ada7e1ff (patch)
tree8808ad05b5b9fef02835adf460b5e5a17b2920c6 /doc
parent63956da1029a82adac454c1d5ee4b3a576dfe6e0 (diff)
downloadlibrsync-985b1e3a058be46dea429b42d1ab7e69ada7e1ff.tar.gz
Update and tidy format.md for new magic types.
Diffstat (limited to 'doc')
-rw-r--r--doc/format.md61
1 files changed, 24 insertions, 37 deletions
diff --git a/doc/format.md b/doc/format.md
index 6180b08..eb505c1 100644
--- a/doc/format.md
+++ b/doc/format.md
@@ -12,52 +12,39 @@ All integers are big-endian.
## Magic numbers
-All librsync files start with a `uint32` magic number identifying them.
-These are declared in `librsync.h`:
-
-```
-/** A delta file. At present, there's only one delta format. **/
-RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */
-
-/**
- * A signature file with MD4 signatures. Backward compatible with
- * librsync < 1.0, but strongly deprecated because it creates a security
- * vulnerability on files containing partly untrusted data. See
- * <https://github.com/librsync/librsync/issues/5>.
- **/
-RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */
-
-/**
- * A signature file using the BLAKE2 hash. Supported from librsync 1.0.
- **/
-RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */
-```
+All librsync files start with a u32 \ref rs_magic_number identifying them.
+These are declared in `librsync.h`, and there are different numbers for every
+different signature and delta file type. Note magic numbers for newer file
+types are not supported by older versions of librsync. Older librsync versions
+will immediately fail with an error when they encounter file types they don't
+support.
## Signatures
-Signatures consist of a header followed by a number of block
-signatures.
+Signatures consist of a header followed by a number of block signatures for
+each block in the data file.
-Each block signature gives signature hashes for one block of
-`block_len` bytes from the input data file. The final data block
-may be shorter. The number of blocks in the signature is therefore
+The signature header is:
- ceil(input_len/block_len)
+ u32 magic; // Some RS_*_SIG_MAGIC value.
+ u32 block_len; // Bytes per block.
+ u32 strong_sum_len; // Bytes per strong sum in each block.
-The signature header is (see `rs_sig_s_header`):
+Each block signature includes a weaksum followed by a truncated strongsum hash
+for one block of `block_len` bytes from the input data file. The strongsum
+signature will be truncated to the `strong_sum_len` in the header. The final
+data block may be shorter. The number of blocks in the signature is therefore
- u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC
- u32 block_len; // bytes per block
- u32 strong_sum_len; // bytes per strong sum in each block
+ ceil(input_len/block_len)
-The block signature contains a rolling or weak checksum used to find
-moved data, and a strong hash used to check the match is correct.
-The weak checksum is computed as in `rollsum.c`. The strong hash is
-either MD4 or BLAKE2 depending on the magic number.
+The block signature weak checksum is used as a rolling checksum to find moved
+data, and a strong hash used to check the match is correct. The weak checksum
+is either a rollsum (based on adler32) or (better alternative) rabinkarp, and
+the strong hash is either MD4 or BLAKE2 depending on the magic number.
-To make the signatures smaller at a cost of a greater chance of collisions,
-the `strong_sum_len` in the header can cause the strong sum to be truncated
-to the left after computation.
+Truncating the strongsum makes the signatures smaller at a cost of a greater
+chance of collisions. The strongsums are truncated by keeping the left most
+(first) bytes after computation.
Each signature block format is (see `rs_sig_do_block`):