summaryrefslogtreecommitdiff
path: root/doc/administration/raketasks/uploads/sanitize.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/raketasks/uploads/sanitize.md')
-rw-r--r--doc/administration/raketasks/uploads/sanitize.md62
1 files changed, 62 insertions, 0 deletions
diff --git a/doc/administration/raketasks/uploads/sanitize.md b/doc/administration/raketasks/uploads/sanitize.md
new file mode 100644
index 00000000000..54a423b9571
--- /dev/null
+++ b/doc/administration/raketasks/uploads/sanitize.md
@@ -0,0 +1,62 @@
+# Uploads Sanitize tasks
+
+## Requirements
+
+You need `exiftool` installed on your system. If you installed GitLab:
+
+- Using the Omnibus package, you're all set.
+- From source, make sure `exiftool` is installed:
+
+ ```sh
+ # Debian/Ubuntu
+ sudo apt-get install libimage-exiftool-perl
+
+ # RHEL/CentOS
+ sudo yum install perl-Image-ExifTool
+ ```
+
+## Remove EXIF data from existing uploads
+
+Since 11.9 EXIF data are automatically stripped from JPG or TIFF image uploads.
+Because EXIF data may contain sensitive information (e.g. GPS location), you
+can remove EXIF data also from existing images which were uploaded before
+with the following command:
+
+```bash
+sudo RAILS_ENV=production -u git -H bundle exec rake gitlab:uploads:sanitize:remove_exif
+```
+
+This command by default runs in dry mode and it doesn't remove EXIF data. It can be used for
+checking if (and how many) images should be sanitized.
+
+The rake task accepts following parameters.
+
+Parameter | Type | Description
+--------- | ---- | -----------
+`start_id` | integer | Only uploads with equal or greater ID will be processed
+`stop_id` | integer | Only uploads with equal or smaller ID will be processed
+`dry_run` | boolean | Do not remove EXIF data, only check if EXIF data are present or not, default: true
+`sleep_time` | float | Pause for number of seconds after processing each image, default: 0.3 seconds
+
+If you have too many uploads, you can speed up sanitization by setting
+`sleep_time` to a lower value or by running multiple rake tasks in parallel,
+each with a separate range of upload IDs (by setting `start_id` and `stop_id`).
+
+To run the command without dry mode and remove EXIF data from all uploads, you can use:
+
+```bash
+sudo RAILS_ENV=production -u git -H bundle exec rake gitlab:uploads:sanitize:remove_exif[,,false,] 2>&1 | tee exif.log
+```
+
+To run the command without dry mode on uploads with ID between 100 and 5000 and pause for 0.1 second, you can use:
+
+```bash
+sudo RAILS_ENV=production -u git -H bundle exec rake gitlab:uploads:sanitize:remove_exif[100,5000,false,0.1] 2>&1 | tee exif.log
+```
+
+Because the output of commands will be probably long, the output is written also into exif.log file.
+
+If sanitization fails for an upload, an error message should be in the output of the rake task (typical reasons may
+be that the file is missing in the storage or it's not a valid image). Please
+[report](https://gitlab.com/gitlab-org/gitlab-ce/issues/new) any issues at `gitlab.com` and use
+prefix 'EXIF' in issue title with the error output and (if possible) the image.