summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorRussell Belfer <rb@github.com>2013-01-02 17:14:00 -0800
committerRussell Belfer <rb@github.com>2013-01-04 15:47:43 -0800
commit77cffa31db07187c2fa65457ace1b6cb2547dc5b (patch)
tree95228829b8f5f4db980e3f37501d9b4eed20addf /docs
parentb3fb9237c215e9a0e2e042afd9252d541ce40541 (diff)
downloadlibgit2-77cffa31db07187c2fa65457ace1b6cb2547dc5b.tar.gz
Simplify checkout documentation
This moves a lot of the detailed checkout documentation into a new file (docs/checkout-internals.md) and simplifies the public docs for the checkout API.
Diffstat (limited to 'docs')
-rw-r--r--docs/checkout-internals.md203
1 files changed, 203 insertions, 0 deletions
diff --git a/docs/checkout-internals.md b/docs/checkout-internals.md
new file mode 100644
index 000000000..cb646da5d
--- /dev/null
+++ b/docs/checkout-internals.md
@@ -0,0 +1,203 @@
+Checkout Internals
+==================
+
+Checkout has to handle a lot of different cases. It examines the
+differences between the target tree, the baseline tree and the working
+directory, plus the contents of the index, and groups files into five
+categories:
+
+1. UNMODIFIED - Files that match in all places.
+2. SAFE - Files where the working directory and the baseline content
+ match that can be safely updated to the target.
+3. DIRTY/MISSING - Files where the working directory differs from the
+ baseline but there is no conflicting change with the target. One
+ example is a file that doesn't exist in the working directory - no
+ data would be lost as a result of writing this file. Which action
+ will be taken with these files depends on the options you use.
+4. CONFLICTS - Files where changes in the working directory conflict
+ with changes to be applied by the target. If conflicts are found,
+ they prevent any other modifications from being made (although there
+ are options to override that and force the update, of course).
+5. UNTRACKED/IGNORED - Files in the working directory that are untracked
+ or ignored (i.e. only in the working directory, not the other places).
+
+Right now, this classification is done via 3 iterators (for the three
+trees), with a final lookup in the index. At some point, this may move to
+a 4 iterator version to incorporate the index better.
+
+The actual checkout is done in five phases (at least right now).
+
+1. The diff between the baseline and the target tree is used as a base
+ list of possible updates to be applied.
+2. Iterate through the diff and the working directory, building a list of
+ actions to be taken (and sending notifications about conflicts and
+ dirty files).
+3. Remove any files / directories as needed (because alphabetical
+ iteration means that an untracked directory will end up sorted *after*
+ a blob that should be checked out with the same name).
+4. Update all blobs.
+5. Update all submodules (after 4 in case a new .gitmodules blob was
+ checked out)
+
+Checkout could be driven either off a target-to-workdir diff or a
+baseline-to-target diff. There are pros and cons of each.
+
+Target-to-workdir means the diff includes every file that could be
+modified, which simplifies bookkeeping, but the code to constantly refer
+back to the baseline gets complicated.
+
+Baseline-to-target has simpler code because the diff defines the action to
+take, but needs special handling for untracked and ignored files, if they
+need to be removed.
+
+The current checkout implementation is based on a baseline-to-target diff.
+
+
+Picking Actions
+===============
+
+The most interesting aspect of this is phase 2, picking the actions that
+should be taken. There are a lot of corner cases, so it may be easier to
+start by looking at the rules for a simple 2-iterator diff:
+
+Key
+---
+- B1,B2,B3 - blobs with different SHAs,
+- Bi - ignored blob (WD only)
+- T1,T2,T3 - trees with different SHAs,
+- Ti - ignored tree (WD only)
+- x - nothing
+
+Diff with 2 non-workdir iterators
+---------------------------------
+
+ Old New
+ --- ---
+ 0 x x - nothing
+ 1 x B1 - added blob
+ 2 x T1 - added tree
+ 3 B1 x - removed blob
+ 4 B1 B1 - unmodified blob
+ 5 B1 B2 - modified blob
+ 6 B1 T1 - typechange blob -> tree
+ 7 T1 x - removed tree
+ 8 T1 B1 - typechange tree -> blob
+ 9 T1 T1 - unmodified tree
+ 10 T1 T2 - modified tree (implies modified/added/removed blob inside)
+
+
+Now, let's make the "New" iterator into a working directory iterator, so
+we replace "added" items with either untracked or ignored, like this:
+
+Diff with non-work & workdir iterators
+--------------------------------------
+
+ Old New-WD
+ --- ------
+ 0 x x - nothing
+ 1 x B1 - untracked blob
+ 2 x Bi - ignored file
+ 3 x T1 - untracked tree
+ 4 x Ti - ignored tree
+ 5 B1 x - removed blob
+ 6 B1 B1 - unmodified blob
+ 7 B1 B2 - modified blob
+ 8 B1 T1 - typechange blob -> tree
+ 9 B1 Ti - removed blob AND ignored tree as separate items
+ 10 T1 x - removed tree
+ 11 T1 B1 - typechange tree -> blob
+ 12 T1 Bi - removed tree AND ignored blob as separate items
+ 13 T1 T1 - unmodified tree
+ 14 T1 T2 - modified tree (implies modified/added/removed blob inside)
+
+Note: if there is a corresponding entry in the old tree, then a working
+directory item won't be ignored (i.e. no Bi or Ti for tracked items).
+
+
+Now, expand this to three iterators: a baseline tree, a target tree, and
+an actual working directory tree:
+
+Checkout From 3 Iterators (2 not workdir, 1 workdir)
+----------------------------------------------------
+
+(base == old HEAD; target == what to checkout; actual == working dir)
+
+ base target actual/workdir
+ ---- ------ ------
+ 0 x x x - nothing
+ 1 x x B1/Bi/T1/Ti - untracked/ignored blob/tree (SAFE)
+ 2+ x B1 x - add blob (SAFE)
+ 3 x B1 B1 - independently added blob (FORCEABLE-2)
+ 4* x B1 B2/Bi/T1/Ti - add blob with content conflict (FORCEABLE-2)
+ 5+ x T1 x - add tree (SAFE)
+ 6* x T1 B1/Bi - add tree with blob conflict (FORCEABLE-2)
+ 7 x T1 T1/i - independently added tree (SAFE+MISSING)
+ 8 B1 x x - independently deleted blob (SAFE+MISSING)
+ 9- B1 x B1 - delete blob (SAFE)
+ 10- B1 x B2 - delete of modified blob (FORCEABLE-1)
+ 11 B1 x T1/Ti - independently deleted blob AND untrack/ign tree (SAFE+MISSING !!!)
+ 12 B1 B1 x - locally deleted blob (DIRTY || SAFE+CREATE)
+ 13+ B1 B2 x - update to deleted blob (SAFE+MISSING)
+ 14 B1 B1 B1 - unmodified file (SAFE)
+ 15 B1 B1 B2 - locally modified file (DIRTY)
+ 16+ B1 B2 B1 - update unmodified blob (SAFE)
+ 17 B1 B2 B2 - independently updated blob (FORCEABLE-1)
+ 18+ B1 B2 B3 - update to modified blob (FORCEABLE-1)
+ 19 B1 B1 T1/Ti - locally deleted blob AND untrack/ign tree (DIRTY)
+ 20* B1 B2 T1/Ti - update to deleted blob AND untrack/ign tree (F-1)
+ 21+ B1 T1 x - add tree with locally deleted blob (SAFE+MISSING)
+ 22* B1 T1 B1 - add tree AND deleted blob (SAFE)
+ 23* B1 T1 B2 - add tree with delete of modified blob (F-1)
+ 24 B1 T1 T1 - add tree with deleted blob (F-1)
+ 25 T1 x x - independently deleted tree (SAFE+MISSING)
+ 26 T1 x B1/Bi - independently deleted tree AND untrack/ign blob (F-1)
+ 27- T1 x T1 - deleted tree (MAYBE SAFE)
+ 28+ T1 B1 x - deleted tree AND added blob (SAFE+MISSING)
+ 29 T1 B1 B1 - independently typechanged tree -> blob (F-1)
+ 30+ T1 B1 B2 - typechange tree->blob with conflicting blob (F-1)
+ 31* T1 B1 T1/T2 - typechange tree->blob (MAYBE SAFE)
+ 32+ T1 T1 x - restore locally deleted tree (SAFE+MISSING)
+ 33 T1 T1 B1/Bi - locally typechange tree->untrack/ign blob (DIRTY)
+ 34 T1 T1 T1/T2 - unmodified tree (MAYBE SAFE)
+ 35+ T1 T2 x - update locally deleted tree (SAFE+MISSING)
+ 36* T1 T2 B1/Bi - update to tree with typechanged tree->blob conflict (F-1)
+ 37 T1 T2 T1/T2/T3 - update to existing tree (MAYBE SAFE)
+
+The number is followed by ' ' if no change is needed or '+' if the case
+needs to write to disk or '-' if something must be deleted and '*' if
+there should be a delete followed by an write.
+
+There are four tiers of safe cases:
+
+- SAFE == completely safe to update
+- SAFE+MISSING == safe except the workdir is missing the expect content
+- MAYBE SAFE == safe if workdir tree matches (or is missing) baseline
+ content, which is unknown at this point
+- FORCEABLE == conflict unless FORCE is given
+- DIRTY == no conflict but change is not applied unless FORCE
+
+Some slightly unusual circumstances:
+
+ 8 - parent dir is only deleted when file is, so parent will be left if
+ empty even though it would be deleted if the file were present
+ 11 - core git does not consider this a conflict but attempts to delete T1
+ and gives "unable to unlink file" error yet does not skip the rest
+ of the operation
+ 12 - without FORCE file is left deleted (i.e. not restored) so new wd is
+ dirty (and warning message "D file" is printed), with FORCE, file is
+ restored.
+ 24 - This should be considered MAYBE SAFE since effectively it is 7 and 8
+ combined, but core git considers this a conflict unless forced.
+ 26 - This combines two cases (1 & 25) (and also implied 8 for tree content)
+ which are ok on their own, but core git treat this as a conflict.
+ If not forced, this is a conflict. If forced, this actually doesn't
+ have to write anything and leaves the new blob as an untracked file.
+ 32 - This is the only case where the baseline and target values match
+ and yet we will still write to the working directory. In all other
+ cases, if baseline == target, we don't touch the workdir (it is
+ either already right or is "dirty"). However, since this case also
+ implies that a ?/B1/x case will exist as well, it can be skipped.
+
+Cases 3, 17, 24, 26, and 29 are all considered conflicts even though
+none of them will require making any updates to the working directory.
+