summaryrefslogtreecommitdiff
path: root/doc/administration/object_storage.md
blob: f13524461368fda9d562f9e30215f6645262c42c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
---
type: reference
---

# Object Storage

GitLab supports using an object storage service for holding numerous types of data.
It's recommended over NFS and
in general it's better in larger setups as object storage is
typically much more performant, reliable, and scalable.

## Options

GitLab has been tested on a number of object storage providers:

- [Amazon S3](https://aws.amazon.com/s3/)
- [Google Cloud Storage](https://cloud.google.com/storage)
- [Digital Ocean Spaces](https://www.digitalocean.com/products/spaces/)
- [Oracle Cloud Infrastructure](https://docs.cloud.oracle.com/en-us/iaas/Content/Object/Tasks/s3compatibleapi.htm)
- [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html)
- [Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
- On-premises hardware and appliances from various storage vendors.
- MinIO. We have [a guide to deploying this](https://docs.gitlab.com/charts/advanced/external-object-storage/minio.html) within our Helm Chart documentation.

### Known compatibility issues

- Dell EMC ECS: Prior to GitLab 13.3, there is a [known bug in GitLab Workhorse that prevents
  HTTP Range Requests from working with CI job artifacts](https://gitlab.com/gitlab-org/gitlab/-/issues/223806).
  Be sure to upgrade to GitLab v13.3.0 or above if you use S3 storage with this hardware.

## Configuration guides

There are two ways of specifying object storage configuration in GitLab:

- [Consolidated form](#consolidated-object-storage-configuration): A single credential is
  shared by all supported object types.
- [Storage-specific form](#storage-specific-configuration): Every object defines its
  own object storage [connection and configuration](#connection-settings).

For more information on the differences and to transition from one form to another, see
[Transition to consolidated form](#transition-to-consolidated-form).

### Consolidated object storage configuration

> Introduced in [GitLab 13.2](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4368).

Using the consolidated object storage configuration has a number of advantages:

- It can simplify your GitLab configuration since the connection details are shared
  across object types.
- It enables the use of [encrypted S3 buckets](#encrypted-s3-buckets).
- It [uploads files to S3 with proper `Content-MD5` headers](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/222).

Because [direct upload mode](../development/uploads.md#direct-upload)
must be enabled, only the following providers can be used:

- [Amazon S3-compatible providers](#s3-compatible-connection-settings)
- [Google Cloud Storage](#google-cloud-storage-gcs)
- [Azure Blob storage](#azure-blob-storage)

Background upload is not supported with the consolidated object storage
configuration. We recommend enabling direct upload mode because it does
not require a shared folder, and [this setting may become the
default](https://gitlab.com/gitlab-org/gitlab/-/issues/27331).

NOTE: **Note:**
Consolidated object storage configuration cannot be used for
backups or Mattermost. See [the full table for a complete list](#storage-specific-configuration).

NOTE: **Note:**
Enabling consolidated object storage will enable object storage for all object types.
If you wish to use local storage for specific object types, you can [selectively disable object storages](#selectively-disabling-object-storage).

Most types of objects, such as CI artifacts, LFS files, upload
attachments, and so on can be saved in object storage by specifying a single
credential for object storage with multiple buckets. A [different bucket
for each type must be used](#use-separate-buckets).

When the consolidated form is:

- Used with an S3-compatible object storage, Workhorse uses its internal S3 client to
  upload files.
- Not used with an S3-compatible object storage, Workhorse falls back to using
  pre-signed URLs.

See the section on [ETag mismatch errors](#etag-mismatch) for more details.

**In Omnibus installations:**

1. Edit `/etc/gitlab/gitlab.rb` and add the following lines, substituting
   the values you want:

    ```ruby
    # Consolidated object storage configuration
    gitlab_rails['object_store']['enabled'] = true
    gitlab_rails['object_store']['proxy_download'] = true
    gitlab_rails['object_store']['connection'] = {
      'provider' => 'AWS',
      'region' => '<eu-central-1>',
      'aws_access_key_id' => '<AWS_ACCESS_KEY_ID>',
      'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
    }
    # OPTIONAL: The following lines are only needed if server side encryption is required
    gitlab_rails['object_store']['storage_options'] = {
      'server_side_encryption' => '<AES256 or aws:kms>',
      'server_side_encryption_kms_key_id' => '<arn:s3:aws:xxx>'
    }
    gitlab_rails['object_store']['objects']['artifacts']['bucket'] = '<artifacts>'
    gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = '<external-diffs>'
    gitlab_rails['object_store']['objects']['lfs']['bucket'] = '<lfs-objects>'
    gitlab_rails['object_store']['objects']['uploads']['bucket'] = '<uploads>'
    gitlab_rails['object_store']['objects']['packages']['bucket'] = '<packages>'
    gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = '<dependency-proxy>'
    gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = '<terraform-state>'
    ```

   NOTE: For GitLab 9.4 or later, if you're using AWS IAM profiles, be sure to omit the
   AWS access key and secret access key/value pairs. For example:

   ```ruby
   gitlab_rails['object_store']['connection'] = {
     'provider' => 'AWS',
     'region' => '<eu-central-1>',
     'use_iam_profile' => true
   }
   ```

1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.

**In installations from source:**

1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following lines:

   ```yaml
   object_store:
     enabled: true
     proxy_download: true
     connection:
       provider: AWS
       aws_access_key_id: <AWS_ACCESS_KEY_ID>
       aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
       region: <eu-central-1>
     storage_options:
       server_side_encryption: <AES256 or aws:kms>
       server_side_encryption_key_kms_id: <arn:s3:aws:xxx>
     objects:
       artifacts:
         bucket: <artifacts>
       external_diffs:
         bucket: <external-diffs>
       lfs:
         bucket: <lfs-objects>
       uploads:
         bucket: <uploads>
       packages:
         bucket: <packages>
       dependency_proxy:
         bucket: <dependency_proxy>
       terraform_state:
         bucket: <terraform>
   ```

1. Edit `/home/git/gitlab-workhorse/config.toml` and add or amend the following lines:

   ```toml
   [object_storage]
     provider = "AWS"

   [object_storage.s3]
     aws_access_key_id = "<AWS_ACCESS_KEY_ID>"
     aws_secret_access_key = "<AWS_SECRET_ACCESS_KEY>"
   ```

1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.

#### Common parameters

In the consolidated configuration, the `object_store` section defines a
common set of parameters. Here we use the YAML from the source
installation because it's easier to see the inheritance:

```yaml
    object_store:
      enabled: true
      proxy_download: true
      connection:
        provider: AWS
        aws_access_key_id: <AWS_ACCESS_KEY_ID>
        aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
      objects:
        ...
```

The Omnibus configuration maps directly to this:

```ruby
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['proxy_download'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'aws_access_key_id' => '<AWS_ACCESS_KEY_ID',
  'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
}
```

| Setting | Description |
|---------|-------------|
| `enabled` | Enable/disable object storage |
| `proxy_download` | Set to `true` to [enable proxying all files served](#proxy-download). Option allows to reduce egress traffic as this allows clients to download directly from remote storage instead of proxying all data |
| `connection` | Various [connection options](#connection-settings) described below |
| `storage_options` | Options to use when saving new objects, such as [server side encryption](#server-side-encryption-headers). Introduced in GitLab 13.3 |
| `objects` | [Object-specific configuration](#object-specific-configuration)

### Connection settings

Both consolidated configuration form and storage-specific configuration form must configure a connection. The following sections describe parameters that can be used
in the `connection` setting.

#### S3-compatible connection settings

The connection settings match those provided by [fog-aws](https://github.com/fog/fog-aws):

| Setting | Description | Default |
|---------|-------------|---------|
| `provider` | Always `AWS` for compatible hosts | `AWS` |
| `aws_access_key_id` | AWS credentials, or compatible | |
| `aws_secret_access_key` | AWS credentials, or compatible | |
| `aws_signature_version` | AWS signature version to use. `2` or `4` are valid options. Digital Ocean Spaces and other providers may need `2`. | `4` |
| `enable_signature_v4_streaming` | Set to `true` to enable HTTP chunked transfers with [AWS v4 signatures](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html). Oracle Cloud S3 needs this to be `false`. | `true` |
| `region` | AWS region | us-east-1 |
| `host` | S3 compatible host for when not using AWS, e.g. `localhost` or `storage.example.com`. HTTPS and port 443 is assumed. | `s3.amazonaws.com` |
| `endpoint` | Can be used when configuring an S3 compatible service such as [MinIO](https://min.io), by entering a URL such as `http://127.0.0.1:9000`. This takes precedence over `host`. | (optional) |
| `path_style` | Set to `true` to use `host/bucket_name/object` style paths instead of `bucket_name.host/object`. Leave as `false` for AWS S3. | `false` |
| `use_iam_profile` | Set to `true` to use IAM profile instead of access keys | `false`

#### Oracle Cloud S3 connection settings

Note that Oracle Cloud S3 must be sure to use the following settings:

| Setting | Value |
|---------|-------|
| `enable_signature_v4_streaming` | `false` |
| `path_style` | `true` |

If `enable_signature_v4_streaming` is set to `true`, you may see the
following error in `production.log`:

```plaintext
STREAMING-AWS4-HMAC-SHA256-PAYLOAD is not supported
```

#### Google Cloud Storage (GCS)

Here are the valid connection parameters for GCS:

| Setting | Description | example |
|---------|-------------|---------|
| `provider` | The provider name | `Google` |
| `google_project` | GCP project name | `gcp-project-12345` |
| `google_client_email` | The email address of the service account | `foo@gcp-project-12345.iam.gserviceaccount.com` |
| `google_json_key_location` | The JSON key path | `/path/to/gcp-project-12345-abcde.json` |

NOTE: **Note:**
The service account must have permission to access the bucket.
[See more](https://cloud.google.com/storage/docs/authentication)

##### Google example (consolidated form)

For Omnibus installations, this is an example of the `connection` setting:

```ruby
gitlab_rails['object_store']['connection'] = {
  'provider' => 'Google',
  'google_project' => '<GOOGLE PROJECT>',
  'google_client_email' => '<GOOGLE CLIENT EMAIL>',
  'google_json_key_location' => '<FILENAME>'
}
```

#### Azure Blob storage

> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/25877) in GitLab 13.4.

Although Azure uses the word `container` to denote a collection of
blobs, GitLab standardizes on the term `bucket`. Be sure to configure
Azure container names in the `bucket` settings.

The following are the valid connection parameters for Azure. Read the
[Azure Blob storage documentation](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
to learn more.

| Setting | Description | Example |
|---------|-------------|---------|
| `provider` | Provider name | `AzureRM` |
| `azure_storage_account_name` | Name of the Azure Blob Storage account used to access the storage | `azuretest` |
| `azure_storage_access_key` | Storage account access key used to access the container. This is typically a secret, 512-bit encryption key encoded in base64. | `czV2OHkvQj9FKEgrTWJRZVRoV21ZcTN0Nnc5eiRDJkYpSkBOY1JmVWpYbjJy\nNHU3eCFBJUQqRy1LYVBkU2dWaw==\n` |
| `azure_storage_domain` | Domain name used to contact the Azure Blob Storage API (optional). Defaults to `blob.core.windows.net`. Set this if you are using Azure China, Azure Germany, Azure US Government, or some other custom Azure domain. | `blob.core.windows.net` |

##### Azure example (consolidated form)

For Omnibus installations, this is an example of the `connection` setting:

```ruby
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AzureRM',
  'azure_storage_account_name' => '<AZURE STORAGE ACCOUNT NAME>',
  'azure_storage_access_key' => '<AZURE STORAGE ACCESS KEY>',
  'azure_storage_domain' => '<AZURE STORAGE DOMAIN>',
}
```

###### Azure Workhorse settings (source installs only)

NOTE: **Note:**
For source installations, Workhorse needs to be configured with the
Azure credentials as well. This is not needed in Omnibus installs because
the Workhorse settings are populated from the settings above.

1. Edit `/home/git/gitlab-workhorse/config.toml` and add or amend the following lines:

   ```toml
   [object_storage]
     provider = "AzureRM"

   [object_storage.azurerm]
     azure_storage_account_name = "<AZURE STORAGE ACCOUNT NAME>"
     azure_storage_access_key = "<AZURE STORAGE ACCESS KEY>"
   ```

If you are using a custom Azure storage domain, note that
`azure_storage_domain` does **not** have to be set in the Workhorse
configuration. This information is exchanged in an API call between
GitLab Rails and Workhorse.

#### OpenStack-compatible connection settings

NOTE: **Note:**
This is not compatible with the consolidated object storage form.
OpenStack Swift is only supported with the storage-specific form. See the
[S3 settings](#s3-compatible-connection-settings) if you want to use the consolidated form.

While OpenStack Swift provides S3 compatibility, some users may want to use the
[Swift API](https://docs.openstack.org/swift/latest/api/object_api_v1_overview.html).
Here are the valid connection settings below for the Swift API, provided by
[fog-openstack](https://github.com/fog/fog-openstack):

| Setting | Description | Default |
|---------|-------------|---------|
| `provider` | Always `OpenStack` for compatible hosts | `OpenStack` |
| `openstack_username` | OpenStack username | |
| `openstack_api_key` | OpenStack API key  | |
| `openstack_temp_url_key` | OpenStack key for generating temporary URLs | |
| `openstack_auth_url` | OpenStack authentication endpoint | |
| `openstack_region` | OpenStack region | |
| `openstack_tenant` | OpenStack tenant ID |

#### Rackspace Cloud Files

NOTE: **Note:**
This is not compatible with the consolidated object
storage form. Rackspace Cloud is only supported with the storage-specific form.

Here are the valid connection parameters for Rackspace Cloud, provided by
[fog-rackspace](https://github.com/fog/fog-rackspace/):

| Setting | Description | example |
|---------|-------------|---------|
| `provider` | The provider name | `Rackspace` |
| `rackspace_username` | The username of the Rackspace account with access to the container | `joe.smith` |
| `rackspace_api_key` | The API key of the Rackspace account with access to the container | `ABC123DEF456ABC123DEF456ABC123DE` |
| `rackspace_region` | The Rackspace storage region to use, a three letter code from the [list of service access endpoints](https://developer.rackspace.com/docs/cloud-files/v1/general-api-info/service-access/) | `iad` |
| `rackspace_temp_url_key` | The private key you have set in the Rackspace API for [temporary URLs](https://developer.rackspace.com/docs/cloud-files/v1/use-cases/public-access-to-your-cloud-files-account/#tempurl). | `ABC123DEF456ABC123DEF456ABC123DE` |

NOTE: **Note:**
Regardless of whether the container has public access enabled or disabled, Fog will
use the TempURL method to grant access to LFS objects. If you see errors in logs referencing
instantiating storage with a `temp-url-key`, ensure that you have set the key properly
on the Rackspace API and in `gitlab.rb`. You can verify the value of the key Rackspace
has set by sending a GET request with token header to the service access endpoint URL
and comparing the output of the returned headers.

### Object-specific configuration

The following YAML shows how the `object_store` section defines
object-specific configuration block and how the `enabled` and
`proxy_download` flags can be overriden. The `bucket` is the only
required parameter within each type:

```yaml
  object_store:
      connection:
        ...
      objects:
        artifacts:
          bucket: artifacts
          proxy_download: false
        external_diffs:
          bucket: external-diffs
        lfs:
          bucket: lfs-objects
        uploads:
          bucket: uploads
        packages:
          bucket: packages
        dependency_proxy:
          enabled: false
          bucket: dependency_proxy
        terraform_state:
          bucket: terraform
```

This maps to this Omnibus GitLab configuration:

```ruby
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'artifacts'
gitlab_rails['object_store']['objects']['artifacts']['proxy_download'] = false
gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'external-diffs'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'lfs-objects'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'packages'
gitlab_rails['object_store']['objects']['dependency_proxy']['enabled'] = false
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'dependency-proxy'
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'terraform-state'
```

This is the list of valid `objects` that can be used:

|  Type              | Description |
|--------------------|---------------|
| `artifacts`        | [CI artifacts](job_artifacts.md) |
| `external_diffs`   | [Merge request diffs](merge_request_diffs.md) |
| `uploads`          | [User uploads](uploads.md) |
| `lfs`              | [Git Large File Storage objects](lfs/index.md) |
| `packages`         | [Project packages (e.g. PyPI, Maven, NuGet, etc.)](packages/index.md) |
| `dependency_proxy` | [GitLab Dependency Proxy](packages/dependency_proxy.md) |
| `terraform_state`  | [Terraform state files](terraform_state.md) |

Within each object type, three parameters can be defined:

| Setting          | Required? | Description |
|------------------|-----------|-------------|
| `bucket`         | Yes       | The bucket name for the object storage. |
| `enabled`        | No        | Overrides the common parameter |
| `proxy_download` | No        | Overrides the common parameter |

#### Selectively disabling object storage

As seen above, object storage can be disabled for specific types by
setting the `enabled` flag to `false`. For example, to disable object
storage for CI artifacts:

```ruby
gitlab_rails['object_store']['objects']['artifacts']['enabled'] = false
```

A bucket is not needed if the feature is disabled entirely. For example,
no bucket is needed if CI artifacts are disabled with this setting:

```ruby
gitlab_rails['artifacts_enabled'] = false
```

### Transition to consolidated form

Prior to GitLab 13.2:

- Object storage configuration for all types of objects such as CI/CD artifacts, LFS
  files, upload attachments, and so on had to be configured independently.
- Object store connection parameters such as passwords and endpoint URLs had to be
  duplicated for each type.

For example, an Omnibus GitLab install might have the following configuration:

```ruby
# Original object storage configuration
gitlab_rails['artifacts_object_store_enabled'] = true
gitlab_rails['artifacts_object_store_direct_upload'] = true
gitlab_rails['artifacts_object_store_proxy_download'] = true
gitlab_rails['artifacts_object_store_remote_directory'] = 'artifacts'
gitlab_rails['artifacts_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
gitlab_rails['uploads_object_store_enabled'] = true
gitlab_rails['uploads_object_store_direct_upload'] = true
gitlab_rails['uploads_object_store_proxy_download'] = true
gitlab_rails['uploads_object_store_remote_directory'] = 'uploads'
gitlab_rails['uploads_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
```

While this provides flexibility in that it makes it possible for GitLab
to store objects across different cloud providers, it also creates
additional complexity and unnecessary redundancy. Since both GitLab
Rails and Workhorse components need access to object storage, the
consolidated form avoids excessive duplication of credentials.

NOTE: **Note:**
The consolidated object storage configuration is **only** used if all
lines from the original form is omitted. To move to the consolidated form, remove the original configuration (for example, `artifacts_object_store_enabled`, `uploads_object_store_connection`, and so on.)

## Storage-specific configuration

For configuring object storage in GitLab 13.1 and earlier, or for storage types not
supported by consolidated configuration form, refer to the following guides:

|Object storage type|Supported by consolidated configuration?|
|-------------------|----------------------------------------|
| [Backups](../raketasks/backup_restore.md#uploading-backups-to-a-remote-cloud-storage)|No|
| [Job artifacts](job_artifacts.md#using-object-storage) including archived job logs | Yes |
| [LFS objects](lfs/index.md#storing-lfs-objects-in-remote-object-storage) | Yes |
| [Uploads](uploads.md#using-object-storage) | Yes |
| [Container Registry](packages/container_registry.md#use-object-storage) (optional feature) | No |
| [Merge request diffs](merge_request_diffs.md#using-object-storage) | Yes |
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
| [Packages](packages/index.md#using-object-storage) (optional feature) | Yes |
| [Dependency Proxy](packages/dependency_proxy.md#using-object-storage) (optional feature) **(PREMIUM ONLY)** | Yes |
| [Pseudonymizer](pseudonymizer.md#configuration) (optional feature) **(ULTIMATE ONLY)** | No |
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
| [Terraform state files](terraform_state.md#using-object-storage) | Yes |

### Other alternatives to filesystem storage

If you're working to [scale out](reference_architectures/index.md) your GitLab implementation,
or add fault tolerance and redundancy, you may be
looking at removing dependencies on block or network filesystems.
See the following additional guides and
[note that Pages requires disk storage](#gitlab-pages-requires-nfs):

1. Make sure the [`git` user home directory](https://docs.gitlab.com/omnibus/settings/configuration.html#moving-the-home-directory-for-a-user) is on local disk.
1. Configure [database lookup of SSH keys](operations/fast_ssh_key_lookup.md)
   to eliminate the need for a shared `authorized_keys` file.
1. [Prevent local disk usage for job logs](job_logs.md#prevent-local-disk-usage).

## Warnings, limitations, and known issues

### Use separate buckets

Using separate buckets for each data type is the recommended approach for GitLab.

A limitation of our configuration is that each use of object storage is separately configured.
[We have an issue for improving this](https://gitlab.com/gitlab-org/gitlab/-/issues/23345)
and easily using one bucket with separate folders is one improvement that this might bring.

There is at least one specific issue with using the same bucket:
when GitLab is deployed with the Helm chart restore from backup
[will not properly function](https://docs.gitlab.com/charts/advanced/external-object-storage/#lfs-artifacts-uploads-packages-external-diffs-pseudonymizer)
unless separate buckets are used.

One risk of using a single bucket would be that if your organisation decided to
migrate GitLab to the Helm deployment in the future. GitLab would run, but the situation with
backups might not be realised until the organisation had a critical requirement for the backups to work.

### S3 API compatibility issues

Not all S3 providers [are fully compatible](../raketasks/backup_restore.md#other-s3-providers)
with the Fog library that GitLab uses. Symptoms include an error in `production.log`:

```plaintext
411 Length Required
```

### GitLab Pages requires NFS

If you're working to add more GitLab servers for [scaling or fault tolerance](reference_architectures/index.md)
and one of your requirements is [GitLab Pages](../user/project/pages/index.md) this currently requires
NFS. There is [work in progress](https://gitlab.com/gitlab-org/gitlab-pages/-/issues/196)
to remove this dependency. In the future, GitLab Pages may use
[object storage](https://gitlab.com/gitlab-org/gitlab/-/issues/208135).

The dependency on disk storage also prevents Pages being deployed using the
[GitLab Helm chart](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/37).

### Incremental logging is required for CI to use object storage

If you configure GitLab to use object storage for CI logs and artifacts,
you can avoid [local disk usage for job logs](job_logs.md#data-flow) by enabling
[beta incremental logging](job_logs.md#new-incremental-logging-architecture).

### Proxy Download

Clients can download files in object storage by receiving a pre-signed, time-limited URL,
or by GitLab proxying the data from object storage to the client.
Downloading files from object storage directly
helps reduce the amount of egress traffic GitLab
needs to process.

When the files are stored on local block storage or NFS, GitLab has to act as a proxy.
This is not the default behavior with object storage.

The `proxy_download` setting controls this behavior: the default is generally `false`.
Verify this in the documentation for each use case. Set it to `true` if you want
GitLab to proxy the files.

When not proxying files, GitLab returns an
[HTTP 302 redirect with a pre-signed, time-limited object storage URL](https://gitlab.com/gitlab-org/gitlab/-/issues/32117#note_218532298).
This can result in some of the following problems:

- If GitLab is using non-secure HTTP to access the object storage, clients may generate
`https->http` downgrade errors and refuse to process the redirect. The solution to this
is for GitLab to use HTTPS. LFS, for example, will generate this error:

   ```plaintext
   LFS: lfsapi/client: refusing insecure redirect, https->http
   ```

- Clients will need to trust the certificate authority that issued the object storage
certificate, or may return common TLS errors such as:

   ```plaintext
   x509: certificate signed by unknown authority
   ```

- Clients will need network access to the object storage.
Network firewalls could block access.
Errors that might result
if this access is not in place include:

   ```plaintext
   Received status code 403 from server: Forbidden
   ```

Getting a `403 Forbidden` response is specifically called out on the
[package repository documentation](packages/index.md#using-object-storage)
as a side effect of how some build tools work.

Additionally for a short time period users could share pre-signed, time-limited object storage URLs
with others without authentication. Also bandwidth charges may be incurred
between the object storage provider and the client.

### ETag mismatch

Using the default GitLab settings, some object storage back-ends such as
[MinIO](https://gitlab.com/gitlab-org/gitlab/-/issues/23188)
and [Alibaba](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564)
might generate `ETag mismatch` errors.

If you are seeing this ETag mismatch error with Amazon Web Services S3,
it's likely this is due to [encryption settings on your bucket](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html).
To fix this issue, you have two options:

- [Use the consolidated object configuration](#consolidated-object-storage-configuration).
- [Use Amazon instance profiles](#using-amazon-instance-profiles).

The first option is recommended for MinIO. Otherwise, the
[workaround for MinIO](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564#note_244497658)
is to use the `--compat` parameter on the server.

Without consolidated object store configuration or instance profiles enabled,
GitLab Workhorse will upload files to S3 using pre-signed URLs that do
not have a `Content-MD5` HTTP header computed for them. To ensure data
is not corrupted, Workhorse checks that the MD5 hash of the data sent
equals the ETag header returned from the S3 server. When encryption is
enabled, this is not the case, which causes Workhorse to report an `ETag
mismatch` error during an upload.

With the consolidated object configuration and instance profile, Workhorse has
S3 credentials so that it can compute the `Content-MD5` header. This
eliminates the need to compare ETag headers returned from the S3 server.

### Using Amazon instance profiles

Instead of supplying AWS access and secret keys in object storage
configuration, GitLab can be configured to use IAM roles to set up an
[Amazon instance profile](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html).
When this is used, GitLab will fetch temporary credentials each time an
S3 bucket is accessed, so no hard-coded values are needed in the
configuration.

#### Encrypted S3 buckets

> - Introduced in [GitLab 13.1](https://gitlab.com/gitlab-org/gitlab-workhorse/-/merge_requests/466) for instance profiles only and [S3 default encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-encryption.html).
> - Introduced in [GitLab 13.2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34460) for static credentials when [consolidated object storage configuration](#consolidated-object-storage-configuration) and [S3 default encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-encryption.html) are used.

When configured either with an instance profile or with the consolidated
object configuration, GitLab Workhorse properly uploads files to S3
buckets that have [SSE-S3 or SSE-KMS encryption enabled by
default](https://docs.aws.amazon.com/kms/latest/developerguide/services-s3.html).
Note that customer master keys (CMKs) and SSE-C encryption are [not
supported since this requires sending the encryption keys in every request](https://gitlab.com/gitlab-org/gitlab/-/issues/226006).

##### Server-side encryption headers

> Introduced in [GitLab 13.3](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/38240).

Setting a default encryption on an S3 bucket is the easiest way to
enable encryption, but you may want to [set a bucket policy to ensure
only encrypted objects are uploaded](https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-store-kms-encrypted-objects/).
To do this, you must configure GitLab to send the proper encryption headers
in the `storage_options` configuration section:

|            Setting                  | Description |
|-------------------------------------|-------------|
| `server_side_encryption`            | Encryption mode (AES256 or aws:kms) |
| `server_side_encryption_kms_key_id` | Amazon Resource Name. Only needed when `aws:kms` is used in `server_side_encryption`. See the [Amazon documentation on using KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html) |

As with the case for default encryption, these options only work when
the Workhorse S3 client is enabled. One of the following two conditions
must be fulfilled:

- `use_iam_profile` is `true` in the connection settings.
- Consolidated object storage settings are in use.

[ETag mismatch errors](#etag-mismatch) will occur if server side
encryption headers are used without enabling the Workhorse S3 client.

##### Disabling the feature

The Workhorse S3 client is enabled by default when the
[`use_iam_profile` configuration option](#iam-permissions) is set to `true` or consolidated
object storage settings are configured.

The feature can be disabled using the `:use_workhorse_s3_client` feature flag. To disable the
feature, ask a GitLab administrator with
[Rails console access](feature_flags.md#how-to-enable-and-disable-features-behind-flags) to run the
following command:

```ruby
Feature.disable(:use_workhorse_s3_client)
```

#### IAM Permissions

To set up an instance profile:

1. Create an Amazon Identity Access and Management (IAM) role with the necessary permissions. The
   following example is a role for an S3 bucket named `test-bucket`:

   ```json
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "VisualEditor0",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:AbortMultipartUpload",
                   "s3:DeleteObject"
               ],
               "Resource": "arn:aws:s3:::test-bucket/*"
           }
       ]
   }
   ```

1. [Attach this role](https://aws.amazon.com/premiumsupport/knowledge-center/attach-replace-ec2-instance-profile/)
   to the EC2 instance hosting your GitLab instance.
1. Configure GitLab to use it via the `use_iam_profile` configuration option.