1 files changed, 147 insertions, 0 deletions
diff --git a/doc/topics/git/partial_clone.md b/doc/topics/git/partial_clone.md
new file mode 100644
index 00000000000..f2951308ba1
--- /dev/null
+++ b/doc/topics/git/partial_clone.md
@@ -0,0 +1,147 @@
+# Partial Clone for Large Repositories
+
+CAUTION: **Alpha:**
+Partial Clone is an experimental feature, and will significantly increase
+Gitaly resource utilization when performing a partial clone, and decrease
+performance of subsequent fetch operations.
+
+As Git repositories become very large, usability decreases as performance
+decreases. One major challenge is cloning the repository, because Git will
+download the entire repository including every commit and every version of
+every object. This can be slow to transfer, and require large amounts of disk
+space.
+
+Historically, performing a **shallow clone**
+([`--depth`](https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt))
+has been the only way to reduce the amount of data transferred when cloning
+a Git repository. This does not, however, allow filtering by sub-tree which is
+important for monolithic repositories containing many projects, or by object
+size preventing unnecessary large objects being downloaded.
+
+[Partial clone](https://github.com/git/git/blob/master/Documentation/technical/partial-clone.txt)
+is a performance optimization that "allows Git to function without having a
+complete copy of the repository. The goal of this work is to allow Git better
+handle extremely large repositories."
+
+Specifically, using partial clone, it should be possible for Git to natively
+support:
+
+- large objects, instead of using [Git LFS](https://git-lfs.github.com/)
+- enormous repositories
+
+Briefly, partial clone works by:
+
+- excluding objects from being transferred when cloning or fetching a
+  repository using a new `--filter` flag
+- downloading missing objects on demand
+
+Follow [Git for enormous repositories](https://gitlab.com/groups/gitlab-org/-/epics/773) for roadmap and updates.
+
+## Enabling partial clone
+
+GitLab 12.1 uses Git 2.21.0 which has an arbitrary file access security
+vulnerability when `uploadpack.allowFilter` is enabled, and should not be
+enabled in production environments.
+
+A feature flag is planned to enable `uploadpack.allowFilter` and
+`uploadpack.allowAnySHA1InWant` once the version of Git used by GitLab has been
+updated to Git 2.22.0.
+
+Follow [this issue](https://gitlab.com/gitlab-org/gitaly/issues/1553) for
+updated.
+
+## Excluding objects by size
+
+Partial Clone allows large objects to be stored directly in the Git repository,
+and be excluded from clones as desired by the user. This eliminates the error
+prone process of deciding which objects should be stored in LFS or not. Using
+partial clone, all files – large or small – may be treated the same.
+
+With the `uploadpack.allowFilter` and `uploadpack.allowAnySHA1InWant` options
+enabled on the Git server:
+
+```bash
+# clone the repo, excluding blobs larger than 1 megabyte
+git clone --filter=blob:limit=1m <url>
+
+# in the checkout step of the clone, and any subsequent operations
+# any blobs that are needed will be downloaded on demand
+git checkout feature-branch
+```
+
+## Excluding objects by path
+
+Partial Clone allows clones to be filtered by path using a format similar to a
+`.gitignore` file stored inside the repository.
+
+With the `uploadpack.allowFilter` and `uploadpack.allowAnySHA1InWant` options
+enabled on the Git server:
+
+1. **Create a filter spec.** For example, consider a monolithic repository with
+   many applications, each in a different subdirectory in the root. Create a file
+   `shiny-app/.filterspec` using the GitLab web interface:
+
+   ```.gitignore
+   # Only the paths listed in the file will be downloaded when performing a
+   # partial clone using `--filter=sparse:oid=shiny-app/.gitfilterspec`
+
+   # Explicitly include filterspec needed to configure sparse checkout with
+   # git config --local core.sparsecheckout true
+   # git show master:snazzy-app/.gitfilterspec >> .git/info/sparse-checkout
+   shiny-app/.gitfilterspec
+
+   # Shiny App
+   shiny-app/
+
+   # Dependencies
+   shimmery-app/
+   shared-component-a/
+   shared-component-b/
+   ```
+
+2. *Create a new Git repository and fetch.* Support for `--filter=sparse:oid`
+   using the clone command is incomplete, so we will emulate the clone command
+   by hand, using `git init` and `git fetch`. Follow
+   [gitaly#1769](https://gitlab.com/gitlab-org/gitaly/issues/1769) for updates.
+
+    ```bash
+    # Create a new directory for the Git repository
+    mkdir jumbo-repo && cd jumbo-repo
+
+    # Initialize a new Git repository
+    git init
+
+    # Add the remote
+    git remote add origin git@gitlab.com/example/jumbo-repo
+
+    # Enable partial clone support for the remote
+    git config --local extensions.partialClone origin
+
+    # Fetch the filtered set of objects using the filterspec stored on the
+    # server. WARNING: this step is slow!
+    git fetch --filter=sparse:oid=master:shiny-app/.gitfilterspec origin
+
+    # Optional: observe there are missing objects that we have not fetched
+    git rev-list --all --quiet --objects --missing=print | wc -l
+    ```
+
+    CAUTION: **IDE and Shell integrations:**
+    Git integrations with `bash`, `zsh`, etc and editors that automatically
+    show Git status information often run `git fetch` which will fetch the
+    entire repository. You many need to disable or reconfigure these
+    integrations.
+
+3. **Sparse checkout** must be enabled and configured to prevent objects from
+   other paths being downloaded automatically when checking out branches. Follow
+   [gitaly#1765](https://gitlab.com/gitlab-org/gitaly/issues/1765) for updates.
+
+    ```bash
+    # Enable sparse checkout
+    git config --local core.sparsecheckout true
+
+    # Configure sparse checkout
+    git show master:snazzy-app/.gitfilterspec >> .git/info/sparse-checkout
+
+    # Checkout master
+    git checkout master
+    ```