summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAchilleas Pipinellis <axil@gitlab.com>2019-08-05 11:41:32 +0000
committerAchilleas Pipinellis <axil@gitlab.com>2019-08-05 11:41:32 +0000
commitfdb893490db51779c36a329d1a8c278f74d41798 (patch)
treeb150ff02d0bee7d1555b0e7bc4222ee65a03bf39
parent715f5de4aa0f1c8d32eecfd54b6610bf8eb67c76 (diff)
parent9cbd5994c059769ecd5794574b5df84082f06600 (diff)
downloadgitlab-ce-fdb893490db51779c36a329d1a8c278f74d41798.tar.gz
Merge branch 'jramsay/partial-clone-docs' into 'master'
Add minimal partial clone docs Closes gitaly#1581 See merge request gitlab-org/gitlab-ce!30295
-rw-r--r--doc/topics/git/index.md1
-rw-r--r--doc/topics/git/partial_clone.md147
2 files changed, 148 insertions, 0 deletions
diff --git a/doc/topics/git/index.md b/doc/topics/git/index.md
index 5b227ebebe0..6a539b526f3 100644
--- a/doc/topics/git/index.md
+++ b/doc/topics/git/index.md
@@ -72,6 +72,7 @@ The following are advanced topics for those who want to get the most out of Git:
- [Custom Git Hooks](../../administration/custom_hooks.md)
- [Git Attributes](../../user/project/git_attributes.md)
- Git Submodules: [Using Git submodules with GitLab CI](../../ci/git_submodules.md#using-git-submodules-with-gitlab-ci)
+- [Partial Clone](partial_clone.md)
## API
diff --git a/doc/topics/git/partial_clone.md b/doc/topics/git/partial_clone.md
new file mode 100644
index 00000000000..9b8cf269684
--- /dev/null
+++ b/doc/topics/git/partial_clone.md
@@ -0,0 +1,147 @@
+# Partial Clone for Large Repositories
+
+CAUTION: **Alpha:**
+Partial Clone is an experimental feature, and will significantly increase
+Gitaly resource utilization when performing a partial clone, and decrease
+performance of subsequent fetch operations.
+
+As Git repositories become very large, usability decreases as performance
+decreases. One major challenge is cloning the repository, because Git will
+download the entire repository including every commit and every version of
+every object. This can be slow to transfer, and require large amounts of disk
+space.
+
+Historically, performing a **shallow clone**
+([`--depth`](https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt))
+has been the only way to reduce the amount of data transferred when cloning
+a Git repository. This does not, however, allow filtering by sub-tree which is
+important for monolithic repositories containing many projects, or by object
+size preventing unnecessary large objects being downloaded.
+
+[Partial clone](https://github.com/git/git/blob/master/Documentation/technical/partial-clone.txt)
+is a performance optimization that "allows Git to function without having a
+complete copy of the repository. The goal of this work is to allow Git better
+handle extremely large repositories."
+
+Specifically, using partial clone, it should be possible for Git to natively
+support:
+
+- large objects, instead of using [Git LFS](https://git-lfs.github.com/)
+- enormous repositories
+
+Briefly, partial clone works by:
+
+- excluding objects from being transferred when cloning or fetching a
+repository using a new `--filter` flag
+- downloading missing objects on demand
+
+Follow [Git for enormous repositories](https://gitlab.com/groups/gitlab-org/-/epics/773) for roadmap and updates.
+
+## Enabling partial clone
+
+GitLab 12.1 uses Git 2.21.0 which has an arbitrary file access security
+vulnerability when `uploadpack.allowFilter` is enabled, and should not be
+enabled in production environments.
+
+A feature flag is planned to enable `uploadpack.allowFilter` and
+`uploadpack.allowAnySHA1InWant` once the version of Git used by GitLab has been
+updated to Git 2.22.0.
+
+Follow [this issue](https://gitlab.com/gitlab-org/gitaly/issues/1553) for
+updated.
+
+## Excluding objects by size
+
+Partial Clone allows large objects to be stored directly in the Git repository,
+and be excluded from clones as desired by the user. This eliminates the error
+prone process of deciding which objects should be stored in LFS or not. Using
+partial clone, all files – large or small – may be treated the same.
+
+With the `uploadpack.allowFilter` and `uploadpack.allowAnySHA1InWant` options
+enabled on the Git server:
+
+```bash
+# clone the repo, excluding blobs larger than 1 megabyte
+git clone --filter=blob:limit=1m <url>
+
+# in the checkout step of the clone, and any subsequent operations
+# any blobs that are needed will be downloaded on demand
+git checkout feature-branch
+```
+
+## Excluding objects by path
+
+Partial Clone allows clones to be filtered by path using a format similar to a
+`.gitignore` file stored inside the repository.
+
+With the `uploadpack.allowFilter` and `uploadpack.allowAnySHA1InWant` options
+enabled on the Git server:
+
+1. **Create a filter spec.** For example, consider a monolithic repository with
+many applications, each in a different subdirectory in the root. Create a file
+`shiny-app/.filterspec` using the GitLab web interface:
+
+ ```.gitignore
+ # Only the paths listed in the file will be downloaded when performing a
+ # partial clone using `--filter=sparse:oid=shiny-app/.gitfilterspec`
+
+ # Explicitly include filterspec needed to configure sparse checkout with
+ # git config --local core.sparsecheckout true
+ # git show master:snazzy-app/.gitfilterspec >> .git/info/sparse-checkout
+ shiny-app/.gitfilterspec
+
+ # Shiny App
+ shiny-app/
+
+ # Dependencies
+ shimmery-app/
+ shared-component-a/
+ shared-component-b/
+ ```
+
+2. *Create a new Git repository and fetch.* Support for `--filter=sparse:oid`
+using the clone command is incomplete, so we will emulate the clone command
+by hand, using `git init` and `git fetch`. Follow
+[gitaly#1769](https://gitlab.com/gitlab-org/gitaly/issues/1769) for updates.
+
+ ```bash
+ # Create a new directory for the Git repository
+ mkdir jumbo-repo && cd jumbo-repo
+
+ # Initialize a new Git repository
+ git init
+
+ # Add the remote
+ git remote add origin git@gitlab.com/example/jumbo-repo
+
+ # Enable partial clone support for the remote
+ git config --local extensions.partialClone origin
+
+ # Fetch the filtered set of objects using the filterspec stored on the
+ # server. WARNING: this step is slow!
+ git fetch --filter=sparse:oid=master:shiny-app/.gitfilterspec origin
+
+ # Optional: observe there are missing objects that we have not fetched
+ git rev-list --all --quiet --objects --missing=print | wc -l
+ ```
+
+ CAUTION: **IDE and Shell integrations:**
+ Git integrations with `bash`, `zsh`, etc and editors that automatically
+ show Git status information often run `git fetch` which will fetch the
+ entire repository. You many need to disable or reconfigure these
+ integrations.
+
+3. **Sparse checkout** must be enabled and configured to prevent objects from
+other paths being downloaded automatically when checking out branches. Follow
+[gitaly#1765](https://gitlab.com/gitlab-org/gitaly/issues/1765) for updates.
+
+ ```bash
+ # Enable sparse checkout
+ git config --local core.sparsecheckout true
+
+ # Configure sparse checkout
+ git show master:snazzy-app/.gitfilterspec >> .git/info/sparse-checkout
+
+ # Checkout master
+ git checkout master
+ ```