summaryrefslogtreecommitdiff
path: root/doc/administration/raketasks/uploads/sanitize.md
blob: 4f7c99babd8a3a36896f649f9f4c9640e113ae23 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
stage: none
group: unassigned
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
---

# Uploads sanitize Rake tasks **(CORE ONLY)**

In GitLab 11.9 and later, EXIF data is automatically stripped from JPG or TIFF image uploads.

EXIF data may contain sensitive information (for example, GPS location), so you
can remove EXIF data from existing images that were uploaded to an earlier version of GitLab.

## Requirements

To run this Rake task, you need `exiftool` installed on your system. If you installed GitLab:

- Using the Omnibus package, you're all set.
- From source, make sure `exiftool` is installed:

  ```shell
  # Debian/Ubuntu
  sudo apt-get install libimage-exiftool-perl

  # RHEL/CentOS
  sudo yum install perl-Image-ExifTool
  ```

## Remove EXIF data from existing uploads

To remove EXIF data from existing uploads, run the following command:

```shell
sudo RAILS_ENV=production -u git -H bundle exec rake gitlab:uploads:sanitize:remove_exif
```

By default, this command runs in "dry run" mode and doesn't remove EXIF data. It can be used for
checking if (and how many) images should be sanitized.

The Rake task accepts following parameters.

| Parameter    | Type    | Description                                                                                                                 |
|:-------------|:--------|:----------------------------------------------------------------------------------------------------------------------------|
| `start_id`   | integer | Only uploads with equal or greater ID will be processed                                                                     |
| `stop_id`    | integer | Only uploads with equal or smaller ID will be processed                                                                     |
| `dry_run`    | boolean | Do not remove EXIF data, only check if EXIF data are present or not. Defaults to `true`                                     |
| `sleep_time` | float   | Pause for number of seconds after processing each image. Defaults to 0.3 seconds                                            |
| `uploader`   | string  | Run sanitization only for uploads of the given uploader: `FileUploader`, `PersonalFileUploader`, or `NamespaceFileUploader` |
| `since`      | date    | Run sanitization only for uploads newer than given date. For example, `2019-05-01`                                          |

If you have too many uploads, you can speed up sanitization by:

- Setting `sleep_time` to a lower value.
- Running multiple Rake tasks in parallel, each with a separate range of upload IDs (by setting
  `start_id` and `stop_id`).

To remove EXIF data from all uploads, use:

```shell
sudo RAILS_ENV=production -u git -H bundle exec rake gitlab:uploads:sanitize:remove_exif[,,false,] 2>&1 | tee exif.log
```

To remove EXIF data on uploads with an ID between 100 and 5000 and pause for 0.1 second after each file, use:

```shell
sudo RAILS_ENV=production -u git -H bundle exec rake gitlab:uploads:sanitize:remove_exif[100,5000,false,0.1] 2>&1 | tee exif.log
```

The output is written into an `exif.log` file because it will probably be long.

If sanitization fails for an upload, an error message should be in the output of the Rake task.
Typical reasons include that the file is missing in the storage or it's not a valid image.

[Report](https://gitlab.com/gitlab-org/gitlab/-/issues/new) any issues and use the prefix 'EXIF' in
the issue title with the error output and (if possible) the image.