diff options
author | Donovan Baarda <abo@minkirri.apana.org.au> | 2020-05-14 12:06:06 +1000 |
---|---|---|
committer | Donovan Baarda <abo@minkirri.apana.org.au> | 2020-05-14 12:06:06 +1000 |
commit | 985b1e3a058be46dea429b42d1ab7e69ada7e1ff (patch) | |
tree | 8808ad05b5b9fef02835adf460b5e5a17b2920c6 /doc | |
parent | 63956da1029a82adac454c1d5ee4b3a576dfe6e0 (diff) | |
download | librsync-985b1e3a058be46dea429b42d1ab7e69ada7e1ff.tar.gz |
Update and tidy format.md for new magic types.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/format.md | 61 |
1 files changed, 24 insertions, 37 deletions
diff --git a/doc/format.md b/doc/format.md index 6180b08..eb505c1 100644 --- a/doc/format.md +++ b/doc/format.md @@ -12,52 +12,39 @@ All integers are big-endian. ## Magic numbers -All librsync files start with a `uint32` magic number identifying them. -These are declared in `librsync.h`: - -``` -/** A delta file. At present, there's only one delta format. **/ -RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */ - -/** - * A signature file with MD4 signatures. Backward compatible with - * librsync < 1.0, but strongly deprecated because it creates a security - * vulnerability on files containing partly untrusted data. See - * <https://github.com/librsync/librsync/issues/5>. - **/ -RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */ - -/** - * A signature file using the BLAKE2 hash. Supported from librsync 1.0. - **/ -RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */ -``` +All librsync files start with a u32 \ref rs_magic_number identifying them. +These are declared in `librsync.h`, and there are different numbers for every +different signature and delta file type. Note magic numbers for newer file +types are not supported by older versions of librsync. Older librsync versions +will immediately fail with an error when they encounter file types they don't +support. ## Signatures -Signatures consist of a header followed by a number of block -signatures. +Signatures consist of a header followed by a number of block signatures for +each block in the data file. -Each block signature gives signature hashes for one block of -`block_len` bytes from the input data file. The final data block -may be shorter. The number of blocks in the signature is therefore +The signature header is: - ceil(input_len/block_len) + u32 magic; // Some RS_*_SIG_MAGIC value. + u32 block_len; // Bytes per block. + u32 strong_sum_len; // Bytes per strong sum in each block. -The signature header is (see `rs_sig_s_header`): +Each block signature includes a weaksum followed by a truncated strongsum hash +for one block of `block_len` bytes from the input data file. The strongsum +signature will be truncated to the `strong_sum_len` in the header. The final +data block may be shorter. The number of blocks in the signature is therefore - u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC - u32 block_len; // bytes per block - u32 strong_sum_len; // bytes per strong sum in each block + ceil(input_len/block_len) -The block signature contains a rolling or weak checksum used to find -moved data, and a strong hash used to check the match is correct. -The weak checksum is computed as in `rollsum.c`. The strong hash is -either MD4 or BLAKE2 depending on the magic number. +The block signature weak checksum is used as a rolling checksum to find moved +data, and a strong hash used to check the match is correct. The weak checksum +is either a rollsum (based on adler32) or (better alternative) rabinkarp, and +the strong hash is either MD4 or BLAKE2 depending on the magic number. -To make the signatures smaller at a cost of a greater chance of collisions, -the `strong_sum_len` in the header can cause the strong sum to be truncated -to the left after computation. +Truncating the strongsum makes the signatures smaller at a cost of a greater +chance of collisions. The strongsums are truncated by keeping the left most +(first) bytes after computation. Each signature block format is (see `rs_sig_do_block`): |