summaryrefslogtreecommitdiff
path: root/doc/development/lfs.md
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2020-10-22 18:00:36 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2020-10-22 18:00:36 +0000
commitd8a7223a9ae47971ee3ed3070f5d42f7d9aa2690 (patch)
treee84e06a07d4888eb4424aebfa802d7c8030bf6b9 /doc/development/lfs.md
parent9cf4cc503860dd3b836cf423d88d067f4d120591 (diff)
downloadgitlab-ce-d8a7223a9ae47971ee3ed3070f5d42f7d9aa2690.tar.gz
Add latest changes from gitlab-org/gitlab@13-5-stable-ee
Diffstat (limited to 'doc/development/lfs.md')
-rw-r--r--doc/development/lfs.md71
1 files changed, 71 insertions, 0 deletions
diff --git a/doc/development/lfs.md b/doc/development/lfs.md
index 32e2e3d1bde..3ba81e6a140 100644
--- a/doc/development/lfs.md
+++ b/doc/development/lfs.md
@@ -10,3 +10,74 @@ and the slides on [Google Slides](https://docs.google.com/presentation/d/1E-aw6-
and in [PDF](https://gitlab.com/gitlab-org/create-stage/uploads/07a89257a140db067bdfb484aecd35e1/Git_LFS_Deep_Dive__Create_.pdf).
Everything covered in this deep dive was accurate as of GitLab 11.10, and while specific
details may have changed since then, it should still serve as a good introduction.
+
+## Including LFS blobs in project archives
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/15079) in GitLab 13.5.
+
+The following diagram illustrates how GitLab resolves LFS files for project archives:
+
+```mermaid
+sequenceDiagram
+ autonumber
+ Client->>+Workhorse: GET /group/project/-/archive/master.zip
+ Workhorse->>+Rails: GET /group/project/-/archive/master.zip
+ Rails->>+Workhorse: Gitlab-Workhorse-Send-Data git-archive
+ Workhorse->>Gitaly: SendArchiveRequest
+ Gitaly->>Git: git archive master
+ Git->>Smudge: OID 12345
+ Smudge->>+Workhorse: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
+ Workhorse->>+Rails: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
+ Rails->>+Workhorse: Gitlab-Workhorse-Send-Data send-url
+ Workhorse->>Smudge: <LFS data>
+ Smudge->>Git: <LFS data>
+ Git->>Gitaly: <streamed data>
+ Gitaly->>Workhorse: <streamed data>
+ Workhorse->>Client: master.zip
+```
+
+1. The user requests the project archive from the UI.
+1. Workhorse forwards this request to Rails.
+1. If the user is authorized to download the archive, Rails replies with
+an HTTP header of `Gitlab-Workhorse-Send-Data` with a base64-encoded
+JSON payload prefaced with `git-archive`. This payload includes the
+`SendArchiveRequest` binary message, which is encoded again in base64.
+1. Workhorse decodes the `Gitlab-Workhorse-Send-Data` payload. If the
+archive already exists in the archive cache, Workhorse sends that
+file. Otherwise, Workhorse sends the `SendArchiveRequest` to the
+appropriate Gitaly server.
+1. The Gitaly server will call `git archive <ref>` to begin generating
+the Git archive on-the-fly. If the `include_lfs_blobs` flag is enabled,
+Gitaly enables a custom LFS smudge filter via the `-c
+filter.lfs.smudge=/path/to/gitaly-lfs-smudge` Git option.
+1. When `git` identifies a possible LFS pointer using the
+`.gitattributes` file, `git` calls `gitaly-lfs-smudge` and provides the
+LFS pointer via the standard input. Gitaly provides `GL_PROJECT_PATH`
+and `GL_INTERNAL_CONFIG` as environment variables to enable lookup of
+the LFS object.
+1. If a valid LFS pointer is decoded, `gitaly-lfs-smudge` makes an
+internal API call to Workhorse to download the LFS object from GitLab.
+1. Workhorse forwards this request to Rails. If the LFS object exists
+and is associated with the project, Rails sends `ArchivePath` either
+with a path where the LFS object resides (for local disk) or a
+pre-signed URL (when object storage is enabled) via the
+`Gitlab-Workhorse-Send-Data` HTTP header with a payload prefaced with
+`send-url`.
+1. Workhorse retrieves the file and send it to the `gitaly-lfs-smudge`
+process, which writes the contents to the standard output.
+1. `git` reads this output and sends it back to the Gitaly process.
+1. Gitaly sends the data back to Rails.
+1. The archive data is sent back to the client.
+
+In step 7, the `gitaly-lfs-smudge` filter must talk to Workhorse, not to
+Rails, or an invalid LFS blob will be saved. To support this, GitLab
+13.5 [changed the default Omnibus configuration to have Gitaly talk to
+the Workhorse](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4592)
+instead of Rails.
+
+One side effect of this change: the correlation ID of the original
+request is not preserved for the internal API requests made by Gitaly
+(or `gitaly-lfs-smudge`), such as the one made in step 8. The
+correlation IDs for those API requests will be random values until [this
+Workhorse issue](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/309) is
+resolved.