summaryrefslogtreecommitdiff
path: root/doc/administration/pseudonymizer.md
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2022-05-19 07:33:21 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2022-05-19 07:33:21 +0000
commit36a59d088eca61b834191dacea009677a96c052f (patch)
treee4f33972dab5d8ef79e3944a9f403035fceea43f /doc/administration/pseudonymizer.md
parenta1761f15ec2cae7c7f7bbda39a75494add0dfd6f (diff)
downloadgitlab-ce-36a59d088eca61b834191dacea009677a96c052f.tar.gz
Add latest changes from gitlab-org/gitlab@15-0-stable-eev15.0.0-rc42
Diffstat (limited to 'doc/administration/pseudonymizer.md')
-rw-r--r--doc/administration/pseudonymizer.md123
1 files changed, 5 insertions, 118 deletions
diff --git a/doc/administration/pseudonymizer.md b/doc/administration/pseudonymizer.md
index 24d9792dcb0..ad4cfd11474 100644
--- a/doc/administration/pseudonymizer.md
+++ b/doc/administration/pseudonymizer.md
@@ -2,127 +2,14 @@
stage: Enablement
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+remove_date: '2022-08-22'
+redirect_to: 'index.md'
---
-# Pseudonymizer (DEPRECATED) **(ULTIMATE)**
+# Pseudonymizer (removed) **(ULTIMATE)**
-> [Deprecated](https://gitlab.com/gitlab-org/gitlab/-/issues/219952) in GitLab 14.7.
+> [Deprecated](https://gitlab.com/gitlab-org/gitlab/-/issues/219952) in
+> GitLab 14.7 and removed in 15.0.
WARNING:
This feature was [deprecated](https://gitlab.com/gitlab-org/gitlab/-/issues/219952) in GitLab 14.7.
-
-Your GitLab database contains sensitive information. To protect sensitive information
-when you run analytics on your database, you can use the Pseudonymizer service, which:
-
-1. Uses `HMAC(SHA256)` to mutate fields containing sensitive information.
-1. Preserves references (referential integrity) between fields.
-1. Exports your GitLab data, scrubbed of sensitive material.
-
-WARNING:
-If the source data is available, users can compare and correlate the scrubbed data
-with the original.
-
-To generate a pseudonymized data set:
-
-1. [Configure Pseudonymizer](#configure-pseudonymizer) fields and output location.
-1. [Enable Pseudonymizer data collection](#enable-pseudonymizer-data-collection).
-1. Optional. [Generate a data set manually](#generate-data-set-manually).
-
-## Configure Pseudonymizer
-
-To use the Pseudonymizer, configure both the fields you want to anonymize, and the location to
-store the scrubbed data:
-
-1. **Create a manifest file**: This file describes the fields to include or pseudonymize.
- - **Default manifest** - GitLab provides a default manifest in your GitLab installation
- ([example `manifest.yml` file](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/pseudonymizer.yml)).
- To use the example manifest file, use the `config/pseudonymizer.yml` relative path
- when you configure connection parameters.
- - **Custom manifest** - To use a custom manifest file, use the absolute path to
- the file when you configure the connection parameters.
-1. **Configure connection parameters**: In the configuration method appropriate for
- your version of GitLab, specify the [object storage](object_storage.md)
- connection parameters (`pseudonymizer.upload.connection`).
-
-**For Omnibus installations:**
-
-1. Edit `/etc/gitlab/gitlab.rb` and add the following lines by replacing with
- the values you want:
-
- ```ruby
- gitlab_rails['pseudonymizer_manifest'] = 'config/pseudonymizer.yml'
- gitlab_rails['pseudonymizer_upload_remote_directory'] = 'gitlab-elt' # bucket name
- gitlab_rails['pseudonymizer_upload_connection'] = {
- 'provider' => 'AWS',
- 'region' => 'eu-central-1',
- 'aws_access_key_id' => 'AWS_ACCESS_KEY_ID',
- 'aws_secret_access_key' => 'AWS_SECRET_ACCESS_KEY'
- }
- ```
-
- If you are using AWS IAM profiles, omit the AWS access key and secret access key/value pairs.
-
- ```ruby
- gitlab_rails['pseudonymizer_upload_connection'] = {
- 'provider' => 'AWS',
- 'region' => 'eu-central-1',
- 'use_iam_profile' => true
- }
- ```
-
-1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure)
- for the changes to take effect.
-
----
-
-**For installations from source:**
-
-1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
- lines:
-
- ```yaml
- pseudonymizer:
- manifest: config/pseudonymizer.yml
- upload:
- remote_directory: 'gitlab-elt' # bucket name
- connection:
- provider: AWS
- aws_access_key_id: AWS_ACCESS_KEY_ID
- aws_secret_access_key: AWS_SECRET_ACCESS_KEY
- region: eu-central-1
- ```
-
-1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source)
- for the changes to take effect.
-
-## Enable Pseudonymizer data collection
-
-To enable data collection:
-
-1. On the top bar, select **Menu > Admin**.
-1. On the left sidebar, select **Settings > Metrics and Profiling**, then expand
- **Pseudonymizer data collection**.
-1. Select **Enable Pseudonymizer data collection**.
-1. Select **Save changes**.
-
-## Generate data set manually
-
-You can also run the Pseudonymizer manually:
-
-1. Set these environment variables:
- - `PSEUDONYMIZER_OUTPUT_DIR` - Where to store the output CSV files. Defaults to `/tmp`.
- These commands produce CSV files that can be quite large. Make sure the directory
- can store a file at least 10% of the size of your database.
- - `PSEUDONYMIZER_BATCH` - The batch size when querying the database. Defaults to `100000`.
-1. Run the command appropriate for your application:
- - **Omnibus GitLab**:
- `sudo gitlab-rake gitlab:db:pseudonymizer`
- - **Installations from source**:
- `sudo -u git -H bundle exec rake gitlab:db:pseudonymizer RAILS_ENV=production`
-
-After you run the command, upload the output CSV files to your configured object
-storage. After the upload completes, delete the output file from the local disk.
-
-## Related topics
-
-- [Using object storage with GitLab](object_storage.md).