summaryrefslogtreecommitdiff
path: root/doc/developers/initial-push-pull.txt
blob: 046d80587b4011d13d13195c5670ba2c97c55123 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
Initial push / pull
===================

Optimal case
------------
(a motivating example of ultimate performance)
Assume there is a file with exactly the right data in compressed form.  This
may be a tarred branch, a bundle, or a blob format.  Performance in this case
scales with the size of the file.

Disk case
---------
Assume current repo format.  Attempt to achieve parity with ``cp -r``.  Read
each file only 1 time.

- read knit graph for revisions
- write filtered copy of revision knit O(d+a)
- write filtered copy of knit index O(d)
- Open knit index for inventory
- Write a filtered copy of inventory knit and simultaneously not all referenced
  file-ids O(b+d)
- Write filtered copy of inventory knit index O(d)
- For each referenced file-id:

  - Open knit index for each file knit O(e)
  - If acceptable threshold of irrelevant data hard-link O(f)
  - Otherwise write filtered copy of text knit and simultaneously write
    the fulltext to tree transform O(h)

- Write format markers O(1)

:a: size of aggregate revision metadata
:b: size of inventory changes for all revisions
:c: size of text changes for all files and all revisions (e * g)
:d: number of relevant revisions
:e: number of relevant versioned files
:f: size of the particular versioned file knit index
:g: size of the filtered versioned file knit
:h: size of the versioned file fulltext
:i: size of the largest file fulltext

Smart Network Case
------------------

Phase 1
~~~~~~~
Push: ask if there is a repository, and if not, what formats are okay
Pull: Nothing

Phase 2
~~~~~~~
Push: send initial push command, streaming data in acceptable format, following
disk case strategy
Pull: receive initial pull command, specifying format

Pull client complexity: O(a), memory cost O(1)
Push client complexity: procesing and memory cost same as disk case

Dumb Network Case
-----------------
Pull: same as disk case, but request all file knit indices at once and request
al file knits at once.
Push: same as disk case, but write all files at once.

Wants
-----
- Read partial graph
- Read multiple segments of multiple files on HTTP and SFTP
- Write multiple files over SFTP