summaryrefslogtreecommitdiff
path: root/doc/development/file_storage.md
blob: 6e014e8c751b548c651edfcb28a698c75f2ecd19 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# File Storage in GitLab

We use the [CarrierWave] gem to handle file upload, store and retrieval.

There are many places where file uploading is used, according to contexts:

* System
  - Instance Logo (logo visible in sign in/sign up pages)
  - Header Logo (one displayed in the navigation bar)
* Group
  - Group avatars
* User
  - User avatars
  - User snippet attachments
* Project
  - Project avatars
  - Issues/MR/Notes Markdown attachments
  - Issues/MR/Notes Legacy Markdown attachments
  - CI Artifacts (archive, metadata, trace)
  - LFS Objects


## Disk storage

GitLab started saving everything on local disk. While directory location changed from previous versions,
they are still not 100% standardized. You can see them below:

| Description                           | In DB? | Relative path (from CarrierWave.root)                       | Uploader class         | model_type |
| ------------------------------------- | ------ | ----------------------------------------------------------- | ---------------------- | ---------- |
| Instance logo                         | yes    | uploads/-/system/appearance/logo/:id/:filename              | `AttachmentUploader`   | Appearance |
| Header logo                           | yes    | uploads/-/system/appearance/header_logo/:id/:filename       | `AttachmentUploader`   | Appearance |
| Group avatars                         | yes    | uploads/-/system/group/avatar/:id/:filename                 | `AvatarUploader`       | Group      |
| User avatars                          | yes    | uploads/-/system/user/avatar/:id/:filename                  | `AvatarUploader`       | User       |
| User snippet attachments              | yes    | uploads/-/system/personal_snippet/:id/:random_hex/:filename | `PersonalFileUploader` | Snippet    |
| Project avatars                       | yes    | uploads/-/system/project/avatar/:id/:filename               | `AvatarUploader`       | Project    |
| Issues/MR/Notes Markdown attachments        | yes    | uploads/:project_path_with_namespace/:random_hex/:filename  | `FileUploader`         | Project    |
| Issues/MR/Notes Legacy Markdown attachments | no     | uploads/-/system/note/attachment/:id/:filename              | `AttachmentUploader`   | Note       |
| CI Artifacts (CE)                     | yes    | shared/artifacts/:disk_hash[0..1]/:disk_hash[2..3]/:disk_hash/:year_:month_:date/:job_id/:job_artifact_id (:disk_hash is SHA256 digest of project_id) | `JobArtifactUploader`  | Ci::JobArtifact  |
| LFS Objects  (CE)                     | yes    | shared/lfs-objects/:hex/:hex/:object_hash                   | `LfsObjectUploader`    | LfsObject  |

CI Artifacts and LFS Objects behave differently in CE and EE. In CE they inherit the `GitlabUploader`
while in EE they inherit the `ObjectStorage` and store files in and S3 API compatible object store.

In the case of Issues/MR/Notes Markdown attachments, there is a different approach using the [Hashed Storage] layout,
instead of basing the path into a mutable variable `:project_path_with_namespace`, it's possible to use the
hash of the project ID instead, if project migrates to the new approach (introduced in 10.2).

> Note: We provide an [all-in-one rake task] to migrate all uploads to object
> storage in one go. If a new Uploader class or model type is introduced, make
> sure you add a rake task invocation corresponding to it to the [category
> list].

### Path segments

Files are stored at multiple locations and use different path schemes. 
All the `GitlabUploader` derived classes should comply with this path segment schema:

```
|   GitlabUploader
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `<gitlab_root>/public/` | `uploads/-/system/`       | `user/avatar/:id/`                | `:filename`                      |
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `CarrierWave.root`      | `GitlabUploader.base_dir` | `GitlabUploader#dynamic_segment`  | `CarrierWave::Uploader#filename` |
|                         | `CarrierWave::Uploader#store_dir`                             |                                  | 

|   FileUploader
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `<gitlab_root>/shared/` | `artifacts/`              | `:year_:month/:id`                | `:filename`                      |
| `<gitlab_root>/shared/` | `snippets/`               | `:secret/`                        | `:filename`                      |
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `CarrierWave.root`      | `GitlabUploader.base_dir` | `GitlabUploader#dynamic_segment`  | `CarrierWave::Uploader#filename` |
|                         | `CarrierWave::Uploader#store_dir`                             |                                  | 
|                         |                           | `FileUploader#upload_path                                            |

|   ObjectStore::Concern (store = remote)
| ----------------------- + ------------------------- + ----------------------------------- + -------------------------------- |
| `<bucket_name>`         | <ignored>                 | `user/avatar/:id/`                  | `:filename`                      |
| ----------------------- + ------------------------- + ----------------------------------- + -------------------------------- |
| `#fog_dir`              | `GitlabUploader.base_dir` | `GitlabUploader#dynamic_segment`    | `CarrierWave::Uploader#filename` |
|                         |                           | `ObjectStorage::Concern#store_dir`  |                                  | 
|                         |                           | `ObjectStorage::Concern#upload_path                                    |
```

The `RecordsUploads::Concern` concern will create an `Upload` entry for every file stored by a `GitlabUploader` persisting the dynamic parts of the path using
`GitlabUploader#dynamic_path`. You may then use the `Upload#build_uploader` method to manipulate the file.

## Object Storage

By including the `ObjectStorage::Concern` in the `GitlabUploader` derived class, you may enable the object storage for this uploader. To enable the object storage
in your uploader, you need to either 1) include `RecordsUpload::Concern` and prepend `ObjectStorage::Extension::RecordsUploads` or 2) mount the uploader and create a new field named `<mount>_store`.

The `CarrierWave::Uploader#store_dir` is overridden to

 - `GitlabUploader.base_dir` + `GitlabUploader.dynamic_segment` when the store is LOCAL
 - `GitlabUploader.dynamic_segment` when the store is REMOTE (the bucket name is used to namespace)

### Using `ObjectStorage::Extension::RecordsUploads`

> Note: this concern will automatically include `RecordsUploads::Concern` if not already included.

The `ObjectStorage::Concern` uploader will search for the matching `Upload` to select the correct object store. The `Upload` is mapped using `#store_dirs + identifier` for each store (LOCAL/REMOTE).

```ruby
class SongUploader < GitlabUploader
  include RecordsUploads::Concern
  include ObjectStorage::Concern
  prepend ObjectStorage::Extension::RecordsUploads

  ...
end

class Thing < ActiveRecord::Base
  mount :theme, SongUploader # we have a great theme song!

  ...
end
```

### Using a mounted uploader

The `ObjectStorage::Concern` will query the `model.<mount>_store` attribute to select the correct object store.
This column must be present in the model schema.

```ruby
class SongUploader < GitlabUploader
  include ObjectStorage::Concern

  ...
end

class Thing < ActiveRecord::Base
  attr_reader :theme_store # this is an ActiveRecord attribute
  mount :theme, SongUploader # we have a great theme song!

  def theme_store
    super || ObjectStorage::Store::LOCAL
  end

  ...
end
```

[CarrierWave]: https://github.com/carrierwaveuploader/carrierwave
[Hashed Storage]: ../administration/repository_storage_types.md
[all-in-one rake task]: ../administration/raketasks/uploads/migrate.md
[category list]: https://gitlab.com/gitlab-org/gitlab-ce/blob/master/lib/tasks/gitlab/uploads/migrate.rake