diff options
Diffstat (limited to 'doc/development/uploads/implementation.md')
-rw-r--r-- | doc/development/uploads/implementation.md | 193 |
1 files changed, 7 insertions, 186 deletions
diff --git a/doc/development/uploads/implementation.md b/doc/development/uploads/implementation.md index 13a875cd1af..1ad1aec23f2 100644 --- a/doc/development/uploads/implementation.md +++ b/doc/development/uploads/implementation.md @@ -1,190 +1,11 @@ --- -stage: none -group: unassigned -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'index.md' +remove_date: '2022-07-25' --- -# Uploads guide: How uploads work technically +This document was moved to [another location](index.md). -This page is for developers trying to better understand what kinds of uploads exist in GitLab and how they are implemented. - -## Kinds of uploads and how to choose between them - -We can identify three major use-cases for an upload: - -1. **storage:** if we are uploading for storing a file (like artifacts, packages, or discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](../file_storage.md). -1. **in-controller/synchronous processing:** if we allow processing **small files** synchronously, using [disk buffered upload](#disk-buffered-upload) may speed up development. -1. **Sidekiq/asynchronous processing:** Asynchronous processing must implement [direct upload](#direct-upload), the reason being that it's the only way to support Cloud Native deployments without a shared NFS. - -Selecting the proper acceleration is a tradeoff between speed of development and operational costs. - -For more details about currently broken feature see [epic &1802](https://gitlab.com/groups/gitlab-org/-/epics/1802). - -### Handling repository uploads - -Some features involves Git repository uploads without using a regular Git client. -Some examples are uploading a repository file from the web interface and [design management](../../user/project/issues/design_management.md). - -Those uploads requires the rails controller to act as a Git client in lieu of the user. -Those operation falls into _in-controller/synchronous processing_ category, but we have no warranties on the file size. - -In case of a LFS upload, the file pointer is committed synchronously, but file upload to object storage is performed asynchronously with Sidekiq. - -## Upload encodings - -By upload encoding we mean how the file is included within the incoming request. - -We have three kinds of file encoding in our uploads: - -1. <i class="fa fa-check-circle"></i> **multipart**: `multipart/form-data` is the most common, a file is encoded as a part of a multipart encoded request. -1. <i class="fa fa-check-circle"></i> **body**: some APIs uploads files as the whole request body. -1. <i class="fa fa-times-circle"></i> **JSON**: some JSON APIs upload files as base64-encoded strings. This requires a change to GitLab Workhorse, - which is tracked [in this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325068). - -## Uploading technologies - -By uploading technologies we mean how all the involved services interact with each other. - -GitLab supports 3 kinds of uploading technologies, here follows a brief description with a sequence diagram for each one. Diagrams are not meant to be exhaustive. - -### Rack Multipart upload - -This is the default kind of upload, and it's the most expensive in terms of resources. - -In this case, Workhorse is unaware of files being uploaded and acts as a regular proxy. - -When a multipart request reaches the rails application, `Rack::Multipart` leaves behind temporary files in `/tmp` and uses valuable Ruby process time to copy files around. - -```mermaid -sequenceDiagram - participant c as Client - participant w as Workhorse - participant r as Rails - - activate c - c ->>+w: POST /some/url/upload - w->>+r: POST /some/url/upload - - r->>r: save the incoming file on /tmp - r->>r: read the file for processing - - r-->>-c: request result - deactivate c - deactivate w -``` - -### Disk buffered upload - -This kind of upload avoids wasting resources caused by handling upload writes to `/tmp` in rails. - -This optimization is not active by default on REST API requests. - -When enabled, Workhorse looks for files in multipart MIME requests, uploading -any it finds to a temporary file on shared storage. The MIME data in the request -is replaced with the path to the corresponding file before it is forwarded to -Rails. - -To prevent abuse of this feature, Workhorse signs the modified request with a -special header, stating which entries it modified. Rails ignores any -unsigned path entries. - -```mermaid -sequenceDiagram - participant c as Client - participant w as Workhorse - participant r as Rails - participant s as NFS - - activate c - c ->>+w: POST /some/url/upload - - w->>+s: save the incoming file on a temporary location - s-->>-w: request result - - w->>+r: POST /some/url/upload - Note over w,r: file was replaced with its location<br>and other metadata - - opt requires async processing - r->>+redis: schedule a job - redis-->>-r: job is scheduled - end - - r-->>-c: request result - deactivate c - w->>-w: cleanup - - opt requires async processing - activate sidekiq - sidekiq->>+redis: fetch a job - redis-->>-sidekiq: job - - sidekiq->>+s: read file - s-->>-sidekiq: file - - sidekiq->>sidekiq: process file - - deactivate sidekiq - end -``` - -### Direct upload - -This is the more advanced acceleration technique we have in place. - -Workhorse asks Rails for temporary pre-signed object storage URLs and directly uploads to object storage. - -In this setup, an extra Rails route must be implemented in order to handle authorization. Examples of this can be found in: - -- [`Projects::LfsStorageController`](https://gitlab.com/gitlab-org/gitlab/-/blob/cc723071ad337573e0360a879cbf99bc4fb7adb9/app/controllers/projects/lfs_storage_controller.rb) - and [its routes](https://gitlab.com/gitlab-org/gitlab/-/blob/cc723071ad337573e0360a879cbf99bc4fb7adb9/config/routes/git_http.rb#L31-32). -- [API endpoints for uploading packages](../packages.md#file-uploads). - -Direct upload falls back to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../../administration/uploads.md#object-storage-settings). -The answer to the `/authorize` call contains only a file system path. - -```mermaid -sequenceDiagram - participant c as Client - participant w as Workhorse - participant r as Rails - participant os as Object Storage - - activate c - c ->>+w: POST /some/url/upload - - w ->>+r: POST /some/url/upload/authorize - Note over w,r: this request has an empty body - r-->>-w: presigned OS URL - - w->>+os: PUT file - Note over w,os: file is stored on a temporary location. Rails select the destination - os-->>-w: request result - - w->>+r: POST /some/url/upload - Note over w,r: file was replaced with its location<br>and other metadata - - r->>+os: move object to final destination - os-->>-r: request result - - opt requires async processing - r->>+redis: schedule a job - redis-->>-r: job is scheduled - end - - r-->>-c: request result - deactivate c - w->>-w: cleanup - - opt requires async processing - activate sidekiq - sidekiq->>+redis: fetch a job - redis-->>-sidekiq: job - - sidekiq->>+os: get object - os-->>-sidekiq: file - - sidekiq->>sidekiq: process file - - deactivate sidekiq - end -``` +<!-- This redirect file can be deleted after <2022-07-25>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> |