diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2022-04-20 10:00:54 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2022-04-20 10:00:54 +0000 |
commit | 3cccd102ba543e02725d247893729e5c73b38295 (patch) | |
tree | f36a04ec38517f5deaaacb5acc7d949688d1e187 /doc/development | |
parent | 205943281328046ef7b4528031b90fbda70c75ac (diff) | |
download | gitlab-ce-3cccd102ba543e02725d247893729e5c73b38295.tar.gz |
Add latest changes from gitlab-org/gitlab@14-10-stable-eev14.10.0-rc42
Diffstat (limited to 'doc/development')
136 files changed, 5802 insertions, 2159 deletions
diff --git a/doc/development/api_graphql_styleguide.md b/doc/development/api_graphql_styleguide.md index 417ccba26a0..4f27e811b11 100644 --- a/doc/development/api_graphql_styleguide.md +++ b/doc/development/api_graphql_styleguide.md @@ -827,35 +827,54 @@ A description of a field or argument is given using the `description:` keyword. For example: ```ruby -field :id, GraphQL::Types::ID, description: 'ID of the resource.' +field :id, GraphQL::Types::ID, description: 'ID of the issue.' +field :confidential, GraphQL::Types::Boolean, description: 'Indicates the issue is confidential.' +field :closed_at, Types::TimeType, description: 'Timestamp of when the issue was closed.' ``` -Descriptions of fields and arguments are viewable to users through: +You can view descriptions of fields and arguments in: - The [GraphiQL explorer](#graphiql). - The [static GraphQL API reference](../api/graphql/reference/index.md). ### Description style guide -To ensure consistency, the following should be followed whenever adding or updating -descriptions: +#### Language and punctuation -- Mention the name of the resource in the description. Example: - `'Labels of the issue'` (issue being the resource). -- Use `"{x} of the {y}"` where possible. Example: `'Title of the issue'`. - Do not start descriptions with `The` or `A`, for consistency and conciseness. -- Descriptions of `GraphQL::Types::Boolean` fields should answer the question: "What does - this field do?". Example: `'Indicates project has a Git repository'`. -- Always include the word `"timestamp"` when describing an argument or - field of type `Types::TimeType`. This lets the reader know that the - format of the property is `Time`, rather than just `Date`. -- Must end with a period (`.`). +Use `{x} of the {y}` where possible, where `{x}` is the item you're describing, +and `{y}` is the resource it applies to. For example: -Example: +```plaintext +ID of the issue. +``` + +Do not start descriptions with `The` or `A`, for consistency and conciseness. + +End all descriptions with a period (`.`). + +#### Booleans + +For a boolean field (`GraphQL::Types::Boolean`), start with a verb that describes +what it does. For example: + +```plaintext +Indicates the issue is confidential. +``` + +If necessary, provide the default. For example: + +```plaintext +Sets the issue to confidential. Default is false. +``` + +#### `Types::TimeType` field description + +For `Types::TimeType` GraphQL fields, include the word `timestamp`. This lets +the reader know that the format of the property is `Time`, rather than just `Date`. + +For example: ```ruby -field :id, GraphQL::Types::ID, description: 'ID of the issue.' -field :confidential, GraphQL::Types::Boolean, description: 'Indicates the issue is confidential.' field :closed_at, Types::TimeType, description: 'Timestamp of when the issue was closed.' ``` @@ -1782,8 +1801,8 @@ def ready?(**args) end ``` -In the future this may be able to be done using `InputUnions` if -[this RFC](https://github.com/graphql/graphql-spec/blob/master/rfcs/InputUnion.md) +In the future this may be able to be done using `OneOf Input Objects` if +[this RFC](https://github.com/graphql/graphql-spec/pull/825) is merged. ## GitLab custom scalars diff --git a/doc/development/application_limits.md b/doc/development/application_limits.md index 15d21883bb8..c4146b5af3e 100644 --- a/doc/development/application_limits.md +++ b/doc/development/application_limits.md @@ -19,7 +19,7 @@ and communicate those limits. There is a guide about [introducing application limits](https://about.gitlab.com/handbook/product/product-processes/#introducing-application-limits). -## Development +## Implement plan limits ### Insert database plan limits @@ -161,3 +161,31 @@ GitLab.com: - `opensource`: Namespaces and projects that are member of GitLab Open Source program. The `test` environment doesn't have any plans. + +## Implement rate limits using `Rack::Attack` + +We use the [`Rack::Attack`](https://github.com/rack/rack-attack) middleware to throttle Rack requests. +This applies to Rails controllers, Grape endpoints, and any other Rack requests. + +The process for adding a new throttle is loosely: + +1. Add new columns to the `ApplicationSetting` model (`*_enabled`, `*_requests_per_period`, `*_period_in_seconds`). +1. Extend `Gitlab::RackAttack` and `Gitlab::RackAttack::Request` to configure the new rate limit, + and apply it to the desired requests. +1. Add the new settings to the Admin Area form in `app/views/admin/application_settings/_ip_limits.html.haml`. +1. Document the new settings in [User and IP rate limits](../user/admin_area/settings/user_and_ip_rate_limits.md) and [Application settings API](../api/settings.md). +1. Configure the rate limit for GitLab.com and document it in [GitLab.com-specific rate limits](../user/gitlab_com/index.md#gitlabcom-specific-rate-limits). + +Refer to these past issues for implementation details: + +- [Create a separate rate limit for the Files API](https://gitlab.com/gitlab-org/gitlab/-/issues/335075). +- [Create a separate rate limit for unauthenticated API traffic](https://gitlab.com/gitlab-org/gitlab/-/issues/335300). + +## Implement rate limits using `Gitlab::ApplicationRateLimiter` + +This module implements a custom rate limiter that can be used to throttle +certain actions. Unlike `Rack::Attack` and `Rack::Throttle`, which operate at +the middleware level, this can be used at the controller or API level. + +See the `CheckRateLimit` concern for use in controllers. In other parts of the code +the `Gitlab::ApplicationRateLimiter` module can be called directly. diff --git a/doc/development/application_slis/index.md b/doc/development/application_slis/index.md index adb656761c5..a202bc419e1 100644 --- a/doc/development/application_slis/index.md +++ b/doc/development/application_slis/index.md @@ -111,7 +111,7 @@ After that, add the following information: metrics. For example: `["email_type"]`. If the significant labels for the SLI include `feature_category`, the metrics will also feed into the - [error budgets for stage groups](../stage_group_dashboards.md#error-budget). + [error budgets for stage groups](../stage_group_observability/index.md#error-budget). - `featureCategory`: if the SLI applies to a single feature category, you can specify it statically through this field to feed the SLI into the error budgets for stage groups. diff --git a/doc/development/application_slis/rails_request_apdex.md b/doc/development/application_slis/rails_request_apdex.md index b31c7d8756b..373589aaefc 100644 --- a/doc/development/application_slis/rails_request_apdex.md +++ b/doc/development/application_slis/rails_request_apdex.md @@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w > [Introduced](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525) in GitLab 14.4 NOTE: -This SLI is used for service monitoring. But not for [error budgets for stage groups](../stage_group_dashboards.md#error-budget) +This SLI is used for service monitoring. But not for [error budgets for stage groups](../stage_group_observability/index.md#error-budget) by default. You can [opt in](#error-budget-attribution-and-ownership). The request Apdex SLI (Service Level Indicator) is [an SLI defined in the application](index.md). @@ -221,7 +221,7 @@ end This SLI is used for service level monitoring. It feeds into the [error budget for stage -groups](../stage_group_dashboards.md#error-budget). For this +groups](../stage_group_observability/index.md#error-budget). For this particular SLI, we have opted everyone out by default to give time to set the correct urgencies on endpoints before it affects a group's error budget. diff --git a/doc/development/audit_event_guide/index.md b/doc/development/audit_event_guide/index.md index ae2f9748178..34f78174e5b 100644 --- a/doc/development/audit_event_guide/index.md +++ b/doc/development/audit_event_guide/index.md @@ -18,13 +18,14 @@ actions performed across the application. To instrument an audit event, the following attributes should be provided: -| Attribute | Type | Required? | Description | -|:-------------|:---------------------|:----------|:----------------------------------------------------| -| `name` | String | false | Action name to be audited. Used for error tracking | -| `author` | User | true | User who authors the change | -| `scope` | User, Project, Group | true | Scope which the audit event belongs to | -| `target` | Object | true | Target object being audited | -| `message` | String | true | Message describing the action | +| Attribute | Type | Required? | Description | +|:-------------|:---------------------|:----------|:-----------------------------------------------------------------| +| `name` | String | false | Action name to be audited. Used for error tracking | +| `author` | User | true | User who authors the change | +| `scope` | User, Project, Group | true | Scope which the audit event belongs to | +| `target` | Object | true | Target object being audited | +| `message` | String | true | Message describing the action | +| `created_at` | DateTime | false | The time when the action occured. Defaults to `DateTime.current` | ## How to instrument new Audit Events @@ -97,13 +98,21 @@ if merge_approval_rule.save author: current_user, scope: project_alpha, target: merge_approval_rule, - message: 'Created a new approval rule' + message: 'Created a new approval rule', + created_at: DateTime.current # Useful for pre-dating an audit event when created asynchronously. } ::Gitlab::Audit::Auditor.audit(audit_context) end ``` +### Data volume considerations + +Because every audit event is persisted to the database, consider the amount of data we expect to generate, and the rate of generation, for new +audit events. For new audit events that will produce a lot of data in the database, consider adding a +[streaming-only audit event](#event-streaming) instead. If you have questions about this, feel free to ping +`@gitlab-org/manage/compliance/backend` in an issue or merge request. + ## Audit Event instrumentation flows The two ways we can instrument audit events have different flows. @@ -185,5 +194,8 @@ All events where the entity is a `Group` or `Project` are recorded in the audit - `Group`, events are streamed to the group's root ancestor's event streaming destinations. - `Project`, events are streamed to the project's root ancestor's event streaming destinations. +You can add streaming-only events that are not stored in the GitLab database. This is primarily intended to be used for actions that generate +a large amount of data. See [this merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/76719/diffs#d56e47632f0384722d411ed3ab5b15e947bd2265_26_36) +for an example. This feature is under heavy development. Follow the [parent epic](https://gitlab.com/groups/gitlab-org/-/epics/5925) for updates on feature development. diff --git a/doc/development/avoiding_downtime_in_migrations.md b/doc/development/avoiding_downtime_in_migrations.md index 1de96df327c..d4c225b62c5 100644 --- a/doc/development/avoiding_downtime_in_migrations.md +++ b/doc/development/avoiding_downtime_in_migrations.md @@ -1,491 +1,11 @@ --- -stage: Enablement -group: Database -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'database/avoiding_downtime_in_migrations.md' +remove_date: '2022-07-08' --- -# Avoiding downtime in migrations +This document was moved to [another location](database/avoiding_downtime_in_migrations.md). -When working with a database certain operations may require downtime. Since we -cannot have downtime in migrations we need to use a set of steps to get the -same end result without downtime. This guide describes various operations that -may appear to need downtime, their impact, and how to perform them without -requiring downtime. - -## Dropping Columns - -Removing columns is tricky because running GitLab processes may still be using -the columns. To work around this safely, you will need three steps in three releases: - -1. Ignoring the column (release M) -1. Dropping the column (release M+1) -1. Removing the ignore rule (release M+2) - -The reason we spread this out across three releases is that dropping a column is -a destructive operation that can't be rolled back easily. - -Following this procedure helps us to make sure there are no deployments to GitLab.com -and upgrade processes for self-managed installations that lump together any of these steps. - -### Step 1: Ignoring the column (release M) - -The first step is to ignore the column in the application code. This is -necessary because Rails caches the columns and re-uses this cache in various -places. This can be done by defining the columns to ignore. For example, to ignore -`updated_at` in the User model you'd use the following: - -```ruby -class User < ApplicationRecord - include IgnorableColumns - ignore_column :updated_at, remove_with: '12.7', remove_after: '2020-01-22' -end -``` - -Multiple columns can be ignored, too: - -```ruby -ignore_columns %i[updated_at created_at], remove_with: '12.7', remove_after: '2020-01-22' -``` - -If the model exists in CE and EE, the column has to be ignored in the CE model. If the -model only exists in EE, then it has to be added there. - -We require indication of when it is safe to remove the column ignore with: - -- `remove_with`: set to a GitLab release typically two releases (M+2) after adding the - column ignore. -- `remove_after`: set to a date after which we consider it safe to remove the column - ignore, typically after the M+1 release date, during the M+2 development cycle. - -This information allows us to reason better about column ignores and makes sure we -don't remove column ignores too early for both regular releases and deployments to GitLab.com. For -example, this avoids a situation where we deploy a bulk of changes that include both changes -to ignore the column and subsequently remove the column ignore (which would result in a downtime). - -In this example, the change to ignore the column went into release 12.5. - -### Step 2: Dropping the column (release M+1) - -Continuing our example, dropping the column goes into a _post-deployment_ migration in release 12.6: - -```ruby - remove_column :user, :updated_at -``` - -### Step 3: Removing the ignore rule (release M+2) - -With the next release, in this example 12.7, we set up another merge request to remove the ignore rule. -This removes the `ignore_column` line and - if not needed anymore - also the inclusion of `IgnoreableColumns`. - -This should only get merged with the release indicated with `remove_with` and once -the `remove_after` date has passed. - -## Renaming Columns - -Renaming columns the normal way requires downtime as an application may continue -using the old column name during/after a database migration. To rename a column -without requiring downtime we need two migrations: a regular migration, and a -post-deployment migration. Both these migration can go in the same release. - -### Step 1: Add The Regular Migration - -First we need to create the regular migration. This migration should use -`Gitlab::Database::MigrationHelpers#rename_column_concurrently` to perform the -renaming. For example - -```ruby -# A regular migration in db/migrate -class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[1.0] - disable_ddl_transaction! - - def up - rename_column_concurrently :users, :updated_at, :updated_at_timestamp - end - - def down - undo_rename_column_concurrently :users, :updated_at, :updated_at_timestamp - end -end -``` - -This will take care of renaming the column, ensuring data stays in sync, and -copying over indexes and foreign keys. - -If a column contains one or more indexes that don't contain the name of the -original column, the previously described procedure will fail. In that case, -you'll first need to rename these indexes. - -### Step 2: Add A Post-Deployment Migration - -The renaming procedure requires some cleaning up in a post-deployment migration. -We can perform this cleanup using -`Gitlab::Database::MigrationHelpers#cleanup_concurrent_column_rename`: - -```ruby -# A post-deployment migration in db/post_migrate -class CleanupUsersUpdatedAtRename < Gitlab::Database::Migration[1.0] - disable_ddl_transaction! - - def up - cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp - end - - def down - undo_cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp - end -end -``` - -If you're renaming a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3), please carefully consider the state when the first migration has run but the second cleanup migration hasn't been run yet. -With [Canary](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/) it is possible that the system runs in this state for a significant amount of time. - -## Changing Column Constraints - -Adding or removing a `NOT NULL` clause (or another constraint) can typically be -done without requiring downtime. However, this does require that any application -changes are deployed _first_. Thus, changing the constraints of a column should -happen in a post-deployment migration. - -Avoid using `change_column` as it produces an inefficient query because it re-defines -the whole column type. - -You can check the following guides for each specific use case: - -- [Adding foreign-key constraints](migration_style_guide.md#adding-foreign-key-constraints) -- [Adding `NOT NULL` constraints](database/not_null_constraints.md) -- [Adding limits to text columns](database/strings_and_the_text_data_type.md) - -## Changing Column Types - -Changing the type of a column can be done using -`Gitlab::Database::MigrationHelpers#change_column_type_concurrently`. This -method works similarly to `rename_column_concurrently`. For example, let's say -we want to change the type of `users.username` from `string` to `text`. - -### Step 1: Create A Regular Migration - -A regular migration is used to create a new column with a temporary name along -with setting up some triggers to keep data in sync. Such a migration would look -as follows: - -```ruby -# A regular migration in db/migrate -class ChangeUsersUsernameStringToText < Gitlab::Database::Migration[1.0] - disable_ddl_transaction! - - def up - change_column_type_concurrently :users, :username, :text - end - - def down - undo_change_column_type_concurrently :users, :username - end -end -``` - -### Step 2: Create A Post Deployment Migration - -Next we need to clean up our changes using a post-deployment migration: - -```ruby -# A post-deployment migration in db/post_migrate -class ChangeUsersUsernameStringToTextCleanup < Gitlab::Database::Migration[1.0] - disable_ddl_transaction! - - def up - cleanup_concurrent_column_type_change :users, :username - end - - def down - undo_cleanup_concurrent_column_type_change :users, :username, :string - end -end -``` - -And that's it, we're done! - -### Casting data to a new type - -Some type changes require casting data to a new type. For example when changing from `text` to `jsonb`. -In this case, use the `type_cast_function` option. -Make sure there is no bad data and the cast will always succeed. You can also provide a custom function that handles -casting errors. - -Example migration: - -```ruby - def up - change_column_type_concurrently :users, :settings, :jsonb, type_cast_function: 'jsonb' - end -``` - -## Changing The Schema For Large Tables - -While `change_column_type_concurrently` and `rename_column_concurrently` can be -used for changing the schema of a table without downtime, it doesn't work very -well for large tables. Because all of the work happens in sequence the migration -can take a very long time to complete, preventing a deployment from proceeding. -They can also produce a lot of pressure on the database due to it rapidly -updating many rows in sequence. - -To reduce database pressure you should instead use a background migration -when migrating a column in a large table (for example, `issues`). This will -spread the work / load over a longer time period, without slowing down deployments. - -For more information, see [the documentation on cleaning up background -migrations](background_migrations.md#cleaning-up). - -## Adding Indexes - -Adding indexes does not require downtime when `add_concurrent_index` -is used. - -See also [Migration Style Guide](migration_style_guide.md#adding-indexes) -for more information. - -## Dropping Indexes - -Dropping an index does not require downtime. - -## Adding Tables - -This operation is safe as there's no code using the table just yet. - -## Dropping Tables - -Dropping tables can be done safely using a post-deployment migration, but only -if the application no longer uses the table. - -## Renaming Tables - -Renaming tables requires downtime as an application may continue -using the old table name during/after a database migration. - -If the table and the ActiveRecord model is not in use yet, removing the old -table and creating a new one is the preferred way to "rename" the table. - -Renaming a table is possible without downtime by following our multi-release -[rename table process](database/rename_database_tables.md#rename-table-without-downtime). - -## Adding Foreign Keys - -Adding foreign keys usually works in 3 steps: - -1. Start a transaction -1. Run `ALTER TABLE` to add the constraint(s) -1. Check all existing data - -Because `ALTER TABLE` typically acquires an exclusive lock until the end of a -transaction this means this approach would require downtime. - -GitLab allows you to work around this by using -`Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key`. This method -ensures that no downtime is needed. - -## Removing Foreign Keys - -This operation does not require downtime. - -## Migrating `integer` primary keys to `bigint` - -To [prevent the overflow risk](https://gitlab.com/groups/gitlab-org/-/epics/4785) for some tables -with `integer` primary key (PK), we have to migrate their PK to `bigint`. The process to do this -without downtime and causing too much load on the database is described below. - -### Initialize the conversion and start migrating existing data (release N) - -To start the process, add a regular migration to create the new `bigint` columns. Use the provided -`initialize_conversion_of_integer_to_bigint` helper. The helper also creates a database trigger -to keep in sync both columns for any new records ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072312_initialize_conversion_of_ci_stages_to_bigint.rb)): - -```ruby -class InitializeConversionOfCiStagesToBigint < ActiveRecord::Migration[6.1] - include Gitlab::Database::MigrationHelpers - - TABLE = :ci_stages - COLUMNS = %i(id) - - def up - initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS) - end - - def down - revert_initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS) - end -end -``` - -Ignore the new `bigint` columns: - -```ruby -module Ci - class Stage < Ci::ApplicationRecord - include IgnorableColumns - ignore_column :id_convert_to_bigint, remove_with: '14.2', remove_after: '2021-08-22' - end -``` - -To migrate existing data, we introduced new type of _batched background migrations_. -Unlike the classic background migrations, built on top of Sidekiq, batched background migrations -don't have to enqueue and schedule all the background jobs at the beginning. -They also have other advantages, like automatic tuning of the batch size, better progress visibility, -and collecting metrics. To start the process, use the provided `backfill_conversion_of_integer_to_bigint` -helper ([example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072346_backfill_ci_stages_for_bigint_conversion.rb)): - -```ruby -class BackfillCiStagesForBigintConversion < ActiveRecord::Migration[6.1] - include Gitlab::Database::MigrationHelpers - - TABLE = :ci_stages - COLUMNS = %i(id) - - def up - backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS) - end - - def down - revert_backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS) - end -end -``` - -### Monitor the background migration - -Check how the migration is performing while it's running. Multiple ways to do this are described below. - -#### High-level status of batched background migrations - -See how to [check the status of batched background migrations](../update/index.md#checking-for-background-migrations-before-upgrading). - -#### Query the database - -We can query the related database tables directly. Requires access to read-only replica. -Example queries: - -```sql --- Get details for batched background migration for given table -SELECT * FROM batched_background_migrations WHERE table_name = 'namespaces'\gx - --- Get count of batched background migration jobs by status for given table -SELECT - batched_background_migrations.id, batched_background_migration_jobs.status, COUNT(*) -FROM - batched_background_migrations - JOIN batched_background_migration_jobs ON batched_background_migrations.id = batched_background_migration_jobs.batched_background_migration_id -WHERE - table_name = 'namespaces' -GROUP BY - batched_background_migrations.id, batched_background_migration_jobs.status; - --- Batched background migration progress for given table (based on estimated total number of tuples) -SELECT - m.table_name, - LEAST(100 * sum(j.batch_size) / pg_class.reltuples, 100) AS percentage_complete -FROM - batched_background_migrations m - JOIN batched_background_migration_jobs j ON j.batched_background_migration_id = m.id - JOIN pg_class ON pg_class.relname = m.table_name -WHERE - j.status = 3 AND m.table_name = 'namespaces' -GROUP BY m.id, pg_class.reltuples; -``` - -#### Sidekiq logs - -We can also use the Sidekiq logs to monitor the worker that executes the batched background -migrations: - -1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address. -1. Change the index pattern to `pubsub-sidekiq-inf-gprd*`. -1. Add filter for `json.queue: cronjob:database_batched_background_migration`. - -#### PostgreSQL slow queries log - -Slow queries log keeps track of low queries that took above 1 second to execute. To see them -for batched background migration: - -1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address. -1. Change the index pattern to `pubsub-postgres-inf-gprd*`. -1. Add filter for `json.endpoint_id.keyword: Database::BatchedBackgroundMigrationWorker`. -1. Optional. To see only updates, add a filter for `json.command_tag.keyword: UPDATE`. -1. Optional. To see only failed statements, add a filter for `json.error_severity.keyword: ERROR`. -1. Optional. Add a filter by table name. - -#### Grafana dashboards - -To monitor the health of the database, use these additional metrics: - -- [PostgreSQL Tuple Statistics](https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m): if you see high rate of updates for the tables being actively converted, or increasing percentage of dead tuples for this table, it might mean that autovacuum cannot keep up. -- [PostgreSQL Overview](https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1): if you see high system usage or transactions per second (TPS) on the primary database server, it might mean that the migration is causing problems. - -### Prometheus metrics - -Number of [metrics](https://gitlab.com/gitlab-org/gitlab/-/blob/294a92484ce4611f660439aa48eee4dfec2230b5/lib/gitlab/database/background_migration/batched_migration_wrapper.rb#L90-128) -for each batched background migration are published to Prometheus. These metrics can be searched for and -visualized in Thanos ([see an example](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20(rate(batched_migration_job_updated_tuples_total%7Benv%3D%22gprd%22%7D%5B5m%5D))%20by%20(migration_id)%20&g0.tab=0&g0.stacked=0&g0.range_input=3d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.end_input=2021-06-13%2012%3A18%3A24&g0.moment_input=2021-06-13%2012%3A18%3A24)). - -### Swap the columns (release N + 1) - -After the background is completed and the new `bigint` columns are populated for all records, we can -swap the columns. Swapping is done with post-deployment migration. The exact process depends on the -table being converted, but in general it's done in the following steps: - -1. Using the provided `ensure_batched_background_migration_is_finished` helper, make sure the batched -migration has finished ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)). -If the migration has not completed, the subsequent steps fail anyway. By checking in advance we -aim to have more helpful error message. -1. Create indexes using the `bigint` columns that match the existing indexes using the `integer` -column ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L28-34)). -1. Create foreign keys (FK) using the `bigint` columns that match the existing FKs using the -`integer` column. Do this both for FK referencing other tables, and FKs that reference the table -that is being migrated ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L36-43)). -1. Inside a transaction, swap the columns: - 1. Lock the tables involved. To reduce the chance of hitting a deadlock, we recommended to do this in parent to child order ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L47)). - 1. Rename the columns to swap names ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L49-54)) - 1. Reset the trigger function ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L56-57)). - 1. Swap the defaults ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L59-62)). - 1. Swap the PK constraint (if any) ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L64-68)). - 1. Remove old indexes and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L70-72)). - 1. Remove old FKs (if still present) and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L74)). - -See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088), and [migration](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb). - -### Remove the trigger and old `integer` columns (release N + 2) - -Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper, -drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)). - -### Remove ignore rules (release N + 3) - -In the next release after the columns were dropped, remove the ignore rules as we do not need them -anymore ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71161)). - -## Data migrations - -Data migrations can be tricky. The usual approach to migrate data is to take a 3 -step approach: - -1. Migrate the initial batch of data -1. Deploy the application code -1. Migrate any remaining data - -Usually this works, but not always. For example, if a field's format is to be -changed from JSON to something else we have a bit of a problem. If we were to -change existing data before deploying application code we'll most likely run -into errors. On the other hand, if we were to migrate after deploying the -application code we could run into the same problems. - -If you merely need to correct some invalid data, then a post-deployment -migration is usually enough. If you need to change the format of data (for example, from -JSON to something else) it's typically best to add a new column for the new data -format, and have the application use that. In such a case the procedure would -be: - -1. Add a new column in the new format -1. Copy over existing data to this new column -1. Deploy the application code -1. In a post-deployment migration, copy over any remaining data - -In general there is no one-size-fits-all solution, therefore it's best to -discuss these kind of migrations in a merge request to make sure they are -implemented in the best way possible. +<!-- This redirect file can be deleted after <2022-07-08>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> diff --git a/doc/development/backend/create_source_code_be/index.md b/doc/development/backend/create_source_code_be/index.md index 6421ca3754a..8661d8b4d74 100644 --- a/doc/development/backend/create_source_code_be/index.md +++ b/doc/development/backend/create_source_code_be/index.md @@ -21,14 +21,12 @@ The team works across three codebases: Workhorse, GitLab Shell and GitLab Rails. ## Workhorse -GitLab Workhorse is a smart reverse proxy for GitLab. It handles "large" HTTP +[GitLab Workhorse](../../workhorse/index.md) is a smart reverse proxy for GitLab. It handles "large" HTTP requests such as file downloads, file uploads, `git push`, `git pull` and `git` archive downloads. Workhorse itself is not a feature, but there are several features in GitLab that would not work efficiently without Workhorse. -Workhorse documentation is available in the [Workhorse repository](https://gitlab.com/gitlab-org/gitlab/tree/master/workhorse). - ## GitLab Shell GitLab Shell handles Git SSH sessions for GitLab and modifies the list of authorized keys. diff --git a/doc/development/background_migrations.md b/doc/development/background_migrations.md index 9fffbd25518..3c9c34bccf8 100644 --- a/doc/development/background_migrations.md +++ b/doc/development/background_migrations.md @@ -1,497 +1,11 @@ --- -type: reference, dev -stage: none -group: Development -info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines" +redirect_to: 'database/background_migrations.md' +remove_date: '2022-07-08' --- -# Background migrations +This document was moved to [another location](database/background_migrations.md). -Background migrations should be used to perform data migrations whenever a -migration exceeds [the time limits in our guidelines](migration_style_guide.md#how-long-a-migration-should-take). For example, you can use background -migrations to migrate data that's stored in a single JSON column -to a separate table instead. - -If the database cluster is considered to be in an unhealthy state, background -migrations automatically reschedule themselves for a later point in time. - -## When To Use Background Migrations - -You should use a background migration when you migrate _data_ in tables that have -so many rows that the process would exceed [the time limits in our guidelines](migration_style_guide.md#how-long-a-migration-should-take) if performed using a regular Rails migration. - -- Background migrations should be used when migrating data in [high-traffic tables](migration_style_guide.md#high-traffic-tables). -- Background migrations may also be used when executing numerous single-row queries -for every item on a large dataset. Typically, for single-record patterns, runtime is -largely dependent on the size of the dataset, hence it should be split accordingly -and put into background migrations. -- Background migrations should not be used to perform schema migrations. - -Some examples where background migrations can be useful: - -- Migrating events from one table to multiple separate tables. -- Populating one column based on JSON stored in another column. -- Migrating data that depends on the output of external services (for example, an API). - -NOTE: -If the background migration is part of an important upgrade, make sure it's announced -in the release post. Discuss with your Project Manager if you're not sure the migration falls -into this category. - -## Isolation - -Background migrations must be isolated and can not use application code (for example, -models defined in `app/models`). Since these migrations can take a long time to -run it's possible for new versions to be deployed while they are still running. - -It's also possible for different migrations to be executed at the same time. -This means that different background migrations should not migrate data in a -way that would cause conflicts. - -## Idempotence - -Background migrations are executed in a context of a Sidekiq process. -Usual Sidekiq rules apply, especially the rule that jobs should be small -and idempotent. - -See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices) -for more details. - -Make sure that in case that your migration job is going to be retried data -integrity is guaranteed. - -## Background migrations for EE-only features - -All the background migration classes for EE-only features should be present in GitLab CE. -For this purpose, an empty class can be created for GitLab CE, and it can be extended for GitLab EE -as explained in the [guidelines for implementing Enterprise Edition features](ee_features.md#code-in-libgitlabbackground_migration). - -## How It Works - -Background migrations are simple classes that define a `perform` method. A -Sidekiq worker will then execute such a class, passing any arguments to it. All -migration classes must be defined in the namespace -`Gitlab::BackgroundMigration`, the files should be placed in the directory -`lib/gitlab/background_migration/`. - -## Scheduling - -Scheduling a background migration should be done in a post-deployment -migration that includes `Gitlab::Database::MigrationHelpers` -To do so, simply use the following code while -replacing the class name and arguments with whatever values are necessary for -your migration: - -```ruby -migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...]) -``` - -You can use the function `queue_background_migration_jobs_by_range_at_intervals` -to automatically split the job into batches: - -```ruby -queue_background_migration_jobs_by_range_at_intervals( - ClassName, - BackgroundMigrationClassName, - 2.minutes, - batch_size: 10_000 - ) -``` - -You'll also need to make sure that newly created data is either migrated, or -saved in both the old and new version upon creation. For complex and time -consuming migrations it's best to schedule a background job using an -`after_create` hook so this doesn't affect response timings. The same applies to -updates. Removals in turn can be handled by simply defining foreign keys with -cascading deletes. - -### Rescheduling background migrations - -If one of the background migrations contains a bug that is fixed in a patch -release, the background migration needs to be rescheduled so the migration would -be repeated on systems that already performed the initial migration. - -When you reschedule the background migration, make sure to turn the original -scheduling into a no-op by clearing up the `#up` and `#down` methods of the -migration performing the scheduling. Otherwise the background migration would be -scheduled multiple times on systems that are upgrading multiple patch releases at -once. - -When you start the second post-deployment migration, you should delete any -previously queued jobs from the initial migration with the provided -helper: - -```ruby -delete_queued_jobs('BackgroundMigrationClassName') -``` - -## Cleaning Up - -NOTE: -Cleaning up any remaining background migrations _must_ be done in either a major -or minor release, you _must not_ do this in a patch release. - -Because background migrations can take a long time you can't immediately clean -things up after scheduling them. For example, you can't drop a column that's -used in the migration process as this would cause jobs to fail. This means that -you'll need to add a separate _post deployment_ migration in a future release -that finishes any remaining jobs before cleaning things up (for example, removing a -column). - -As an example, say you want to migrate the data from column `foo` (containing a -big JSON blob) to column `bar` (containing a string). The process for this would -roughly be as follows: - -1. Release A: - 1. Create a migration class that performs the migration for a row with a given ID. - You can use [background jobs tracking](#background-jobs-tracking) to simplify cleaning up. - 1. Deploy the code for this release, this should include some code that will - schedule jobs for newly created data (for example, using an `after_create` hook). - 1. Schedule jobs for all existing rows in a post-deployment migration. It's - possible some newly created rows may be scheduled twice so your migration - should take care of this. -1. Release B: - 1. Deploy code so that the application starts using the new column and stops - scheduling jobs for newly created data. - 1. In a post-deployment migration, finalize all jobs that have not succeeded by now. - If you used [background jobs tracking](#background-jobs-tracking) in release A, - you can use `finalize_background_migration` from `BackgroundMigrationHelpers` to ensure no jobs remain. - This helper will: - 1. Use `Gitlab::BackgroundMigration.steal` to process any remaining - jobs in Sidekiq. - 1. Reschedule the migration to be run directly (that is, not through Sidekiq) - on any rows that weren't migrated by Sidekiq. This can happen if, for - instance, Sidekiq received a SIGKILL, or if a particular batch failed - enough times to be marked as dead. - 1. Remove `Gitlab::Database::BackgroundMigrationJob` rows where - `status = succeeded`. To retain diagnostic information that may - help with future bug tracking you can skip this step by specifying - the `delete_tracking_jobs: false` parameter. - 1. Remove the old column. - -This may also require a bump to the [import/export version](../user/project/settings/import_export.md), if -importing a project from a prior version of GitLab requires the data to be in -the new format. - -## Example - -To explain all this, let's use the following example: the table `integrations` has a -field called `properties` which is stored in JSON. For all rows you want to -extract the `url` key from this JSON object and store it in the `integrations.url` -column. There are millions of integrations and parsing JSON is slow, thus you can't -do this in a regular migration. - -To do this using a background migration we'll start with defining our migration -class: - -```ruby -class Gitlab::BackgroundMigration::ExtractIntegrationsUrl - class Integration < ActiveRecord::Base - self.table_name = 'integrations' - end - - def perform(start_id, end_id) - Integration.where(id: start_id..end_id).each do |integration| - json = JSON.load(integration.properties) - - integration.update(url: json['url']) if json['url'] - rescue JSON::ParserError - # If the JSON is invalid we don't want to keep the job around forever, - # instead we'll just leave the "url" field to whatever the default value - # is. - next - end - end -end -``` - -Next we'll need to adjust our code so we schedule the above migration for newly -created and updated integrations. We can do this using something along the lines of -the following: - -```ruby -class Integration < ActiveRecord::Base - after_commit :schedule_integration_migration, on: :update - after_commit :schedule_integration_migration, on: :create - - def schedule_integration_migration - BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id]) - end -end -``` - -We're using `after_commit` here to ensure the Sidekiq job is not scheduled -before the transaction completes as doing so can lead to race conditions where -the changes are not yet visible to the worker. - -Next we'll need a post-deployment migration that schedules the migration for -existing data. - -```ruby -class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0] - disable_ddl_transaction! - - MIGRATION = 'ExtractIntegrationsUrl' - DELAY_INTERVAL = 2.minutes - - def up - queue_background_migration_jobs_by_range_at_intervals( - define_batchable_model('integrations'), - MIGRATION, - DELAY_INTERVAL) - end - - def down - end -end -``` - -Once deployed our application will continue using the data as before but at the -same time will ensure that both existing and new data is migrated. - -In the next release we can remove the `after_commit` hooks and related code. We -will also need to add a post-deployment migration that consumes any remaining -jobs and manually run on any un-migrated rows. Such a migration would look like -this: - -```ruby -class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0] - disable_ddl_transaction! - - def up - # This must be included - Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl') - - # This should be included, but can be skipped - see below - define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch| - range = batch.pluck('MIN(id)', 'MAX(id)').first - - Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range) - end - end - - def down - end -end -``` - -The final step runs for any un-migrated rows after all of the jobs have been -processed. This is in case a Sidekiq process running the background migrations -received SIGKILL, leading to the jobs being lost. (See -[more reliable Sidekiq queue](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/36791) for more information.) - -If the application does not depend on the data being 100% migrated (for -instance, the data is advisory, and not mission-critical), then this final step -can be skipped. - -This migration will then process any jobs for the ExtractIntegrationsUrl migration -and continue once all jobs have been processed. Once done you can safely remove -the `integrations.properties` column. - -## Testing - -It is required to write tests for: - -- The background migrations' scheduling migration. -- The background migration itself. -- A cleanup migration. - -The `:migration` and `schema: :latest` RSpec tags are automatically set for -background migration specs. -See the -[Testing Rails migrations](testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class) -style guide. - -Keep in mind that `before` and `after` RSpec hooks are going -to migrate you database down and up, which can result in other background -migrations being called. That means that using `spy` test doubles with -`have_received` is encouraged, instead of using regular test doubles, because -your expectations defined in a `it` block can conflict with what is being -called in RSpec hooks. See [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839) -for more details. - -## Best practices - -1. Make sure to know how much data you're dealing with. -1. Make sure that background migration jobs are idempotent. -1. Make sure that tests you write are not false positives. -1. Make sure that if the data being migrated is critical and cannot be lost, the - clean-up migration also checks the final state of the data before completing. -1. When migrating many columns, make sure it won't generate too many - dead tuples in the process (you may need to directly query the number of dead tuples - and adjust the scheduling according to this piece of data). -1. Make sure to discuss the numbers with a database specialist, the migration may add - more pressure on DB than you expect (measure on staging, - or ask someone to measure on production). -1. Make sure to know how much time it'll take to run all scheduled migrations. -1. Provide an estimation section in the description, estimating both the total migration - run time and the query times for each background migration job. Explain plans for each query - should also be provided. - - For example, assuming a migration that deletes data, include information similar to - the following section: - - ```plaintext - Background Migration Details: - - 47600 items to delete - batch size = 1000 - 47600 / 1000 = 48 batches - - Estimated times per batch: - - 820ms for select statement with 1000 items (see linked explain plan) - - 900ms for delete statement with 1000 items (see linked explain plan) - Total: ~2 sec per batch - - 2 mins delay per batch (safe for the given total time per batch) - - 48 batches * 2 min per batch = 96 mins to run all the scheduled jobs - ``` - - The execution time per batch (2 sec in this example) is not included in the calculation - for total migration time. The jobs are scheduled 2 minutes apart without knowledge of - the execution time. - -## Additional tips and strategies - -### Nested batching - -A strategy to make the migration run faster is to schedule larger batches, and then use `EachBatch` -within the background migration to perform multiple statements. - -The background migration helpers that queue multiple jobs such as -`queue_background_migration_jobs_by_range_at_intervals` use [`EachBatch`](iterating_tables_in_batches.md). -The example above has batches of 1000, where each queued job takes two seconds. If the query has been optimized -to make the time for the delete statement within the [query performance guidelines](query_performance.md), -1000 may be the largest number of records that can be deleted in a reasonable amount of time. - -The minimum and most common interval for delaying jobs is two minutes. This results in two seconds -of work for each two minute job. There's nothing that prevents you from executing multiple delete -statements in each background migration job. - -Looking at the example above, you could alternatively do: - -```plaintext -Background Migration Details: - -47600 items to delete -batch size = 10_000 -47600 / 10_000 = 5 batches - -Estimated times per batch: -- Records are updated in sub-batches of 1000 => 10_000 / 1000 = 10 total updates -- 820ms for select statement with 1000 items (see linked explain plan) -- 900ms for delete statement with 1000 items (see linked explain plan) -Sub-batch total: ~2 sec per sub-batch, -Total batch time: 2 * 10 = 20 sec per batch - -2 mins delay per batch - -5 batches * 2 min per batch = 10 mins to run all the scheduled jobs -``` - -The batch time of 20 seconds still fits comfortably within the two minute delay, yet the total run -time is cut by a tenth from around 100 minutes to 10 minutes! When dealing with large background -migrations, this can cut the total migration time by days. - -When batching in this way, it is important to look at query times on the higher end -of the table or relation being updated. `EachBatch` may generate some queries that become much -slower when dealing with higher ID ranges. - -### Delay time - -When looking at the batch execution time versus the delay time, the execution time -should fit comfortably within the delay time for a few reasons: - -- To allow for a variance in query times. -- To allow autovacuum to catch up after periods of high churn. - -Never try to optimize by fully filling the delay window even if you are confident -the queries themselves have no timing variance. - -### Background jobs tracking - -`queue_background_migration_jobs_by_range_at_intervals` can create records for each job that is scheduled to run. -You can enable this behavior by passing `track_jobs: true`. Each record starts with a `pending` status. Make sure that your worker updates the job status to `succeeded` by calling `Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded` in the `perform` method of your background migration. - -```ruby -# Background migration code - -def perform(start_id, end_id) - # do work here - - mark_job_as_succeeded(start_id, end_id) -end - -private - -# Make sure that the arguments passed here match those passed to the background -# migration -def mark_job_as_succeeded(*arguments) - Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded( - self.class.name.demodulize, - arguments - ) -end -``` - -```ruby -# Post deployment migration -MIGRATION = 'YourBackgroundMigrationName' -DELAY_INTERVAL = 2.minutes.to_i # can be different -BATCH_SIZE = 10_000 # can be different - -disable_ddl_transaction! - -def up - queue_background_migration_jobs_by_range_at_intervals( - define_batchable_model('name_of_the_table_backing_the_model'), - MIGRATION, - DELAY_INTERVAL, - batch_size: BATCH_SIZE, - track_jobs: true - ) -end - -def down - # no-op -end -``` - -See [`lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb) for a full example. - -#### Rescheduling pending jobs - -You can reschedule pending migrations from the `background_migration_jobs` table by creating a post-deployment migration and calling `requeue_background_migration_jobs_by_range_at_intervals` with the migration name and delay interval. - -```ruby -# Post deployment migration -MIGRATION = 'YourBackgroundMigrationName' -DELAY_INTERVAL = 2.minutes - -disable_ddl_transaction! - -def up - requeue_background_migration_jobs_by_range_at_intervals(MIGRATION, DELAY_INTERVAL) -end - -def down - # no-op -end -``` - -See [`db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb) for a full example. - -### Viewing failure error logs - -After running a background migration, if any jobs have failed, you can view the logs in [Kibana](https://log.gprd.gitlab.net/goto/5f06a57f768c6025e1c65aefb4075694). -View the production Sidekiq log and filter for: - -- `json.class: BackgroundMigrationWorker` -- `json.job_status: fail` -- `json.meta.caller_id: <MyBackgroundMigrationSchedulingMigrationClassName>` -- `json.args: <MyBackgroundMigrationClassName>` - -Looking at the `json.error_class`, `json.error_message` and `json.error_backtrace` values may be helpful in understanding why the jobs failed. - -Depending on when and how the failure occurred, you may find other helpful information by filtering with `json.class: <MyBackgroundMigrationClassName>`. +<!-- This redirect file can be deleted after <2022-07-08>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> diff --git a/doc/development/batched_background_migrations.md b/doc/development/batched_background_migrations.md new file mode 100644 index 00000000000..e7703b5dd2b --- /dev/null +++ b/doc/development/batched_background_migrations.md @@ -0,0 +1,319 @@ +--- +type: reference, dev +stage: Enablement +group: Database +info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines" +--- + +# Batched background migrations + +Batched Background Migrations should be used to perform data migrations whenever a +migration exceeds [the time limits](migration_style_guide.md#how-long-a-migration-should-take) +in our guidelines. For example, you can use batched background +migrations to migrate data that's stored in a single JSON column +to a separate table instead. + +## When to use batched background migrations + +Use a batched background migration when you migrate _data_ in tables containing +so many rows that the process would exceed +[the time limits in our guidelines](migration_style_guide.md#how-long-a-migration-should-take) +if performed using a regular Rails migration. + +- Batched background migrations should be used when migrating data in + [high-traffic tables](migration_style_guide.md#high-traffic-tables). +- Batched background migrations may also be used when executing numerous single-row queries + for every item on a large dataset. Typically, for single-record patterns, runtime is + largely dependent on the size of the dataset. Split the dataset accordingly, + and put it into background migrations. +- Don't use batched background migrations to perform schema migrations. + +Background migrations can help when: + +- Migrating events from one table to multiple separate tables. +- Populating one column based on JSON stored in another column. +- Migrating data that depends on the output of external services. (For example, an API.) + +NOTE: +If the batched background migration is part of an important upgrade, it must be announced +in the release post. Discuss with your Project Manager if you're unsure if the migration falls +into this category. + +## Isolation + +Batched background migrations must be isolated and can not use application code. (For example, +models defined in `app/models`.). Because these migrations can take a long time to +run, it's possible for new versions to deploy while the migrations are still running. + +## Idempotence + +Batched background migrations are executed in a context of a Sidekiq process. +The usual Sidekiq rules apply, especially the rule that jobs should be small +and idempotent. Make sure that in case that your migration job is retried, data +integrity is guaranteed. + +See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices) +for more details. + +## Batched background migrations for EE-only features + +All the background migration classes for EE-only features should be present in GitLab CE. +For this purpose, create an empty class for GitLab CE, and extend it for GitLab EE +as explained in the guidelines for +[implementing Enterprise Edition features](ee_features.md#code-in-libgitlabbackground_migration). + +Batched Background migrations are simple classes that define a `perform` method. A +Sidekiq worker then executes such a class, passing any arguments to it. All +migration classes must be defined in the namespace +`Gitlab::BackgroundMigration`. Place the files in the directory +`lib/gitlab/background_migration/`. + +## Queueing + +Queueing a batched background migration should be done in a post-deployment +migration. Use this `queue_batched_background_migration` example, queueing the +migration to be executed in batches. Replace the class name and arguments with the values +from your migration: + +```ruby +queue_batched_background_migration( + JOB_CLASS_NAME, + TABLE_NAME, + JOB_ARGUMENTS, + JOB_INTERVAL + ) +``` + +Make sure the newly-created data is either migrated, or +saved in both the old and new version upon creation. Removals in +turn can be handled by defining foreign keys with cascading deletes. + +### Requeuing batched background migrations + +If one of the batched background migrations contains a bug that is fixed in a patch +release, you must requeue the batched background migration so the migration +repeats on systems that already performed the initial migration. + +When you requeue the batched background migration, turn the original +queuing into a no-op by clearing up the `#up` and `#down` methods of the +migration performing the requeuing. Otherwise, the batched background migration is +queued multiple times on systems that are upgrading multiple patch releases at +once. + +When you start the second post-deployment migration, delete the +previously batched migration with the provided code: + +```ruby +Gitlab::Database::BackgroundMigration::BatchedMigration + .for_configuration(MIGRATION_NAME, TABLE_NAME, COLUMN, JOB_ARGUMENTS) + .delete_all +``` + +## Cleaning up + +NOTE: +Cleaning up any remaining background migrations must be done in either a major +or minor release. You must not do this in a patch release. + +Because background migrations can take a long time, you can't immediately clean +things up after queueing them. For example, you can't drop a column used in the +migration process, as jobs would fail. You must add a separate _post-deployment_ +migration in a future release that finishes any remaining +jobs before cleaning things up. (For example, removing a column.) + +To migrate the data from column `foo` (containing a big JSON blob) to column `bar` +(containing a string), you would: + +1. Release A: + 1. Create a migration class that performs the migration for a row with a given ID. + 1. Update new rows using one of these techniques: + - Create a new trigger for simple copy operations that don't need application logic. + - Handle this operation in the model/service as the records are created or updated. + - Create a new custom background job that updates the records. + 1. Queue the batched background migration for all existing rows in a post-deployment migration. +1. Release B: + 1. Add a post-deployment migration that checks if the batched background migration is completed. + 1. Deploy code so that the application starts using the new column and stops to update new records. + 1. Remove the old column. + +Bump to the [import/export version](../user/project/settings/import_export.md) may +be required, if importing a project from a prior version of GitLab requires the +data to be in the new format. + +## Example + +The table `integrations` has a field called `properties`, stored in JSON. For all rows, +extract the `url` key from this JSON object and store it in the `integrations.url` +column. Millions of integrations exist, and parsing JSON is slow, so you can't +do this work in a regular migration. + +1. Start by defining our migration class: + + ```ruby + class Gitlab::BackgroundMigration::ExtractIntegrationsUrl + class Integration < ActiveRecord::Base + self.table_name = 'integrations' + end + + def perform(start_id, end_id) + Integration.where(id: start_id..end_id).each do |integration| + json = JSON.load(integration.properties) + + integration.update(url: json['url']) if json['url'] + rescue JSON::ParserError + # If the JSON is invalid we don't want to keep the job around forever, + # instead we'll just leave the "url" field to whatever the default value + # is. + next + end + end + end + ``` + + NOTE: + To get a `connection` in the batched background migration,use an inheritance + relation using the following base class `Gitlab::BackgroundMigration::BaseJob`. + For example: `class Gitlab::BackgroundMigration::ExtractIntegrationsUrl < Gitlab::BackgroundMigration::BaseJob` + +1. Add a new trigger to the database to update newly created and updated integrations, + similar to this example: + + ```ruby + execute(<<~SQL) + CREATE OR REPLACE FUNCTION example() RETURNS trigger + LANGUAGE plpgsql + AS $$ + BEGIN + NEW."url" := NEW.properties -> "url" + RETURN NEW; + END; + $$; + SQL + ``` + +1. Create a post-deployment migration that queues the migration for existing data: + + ```ruby + class QueueExtractIntegrationsUrl < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + MIGRATION = 'ExtractIntegrationsUrl' + DELAY_INTERVAL = 2.minutes + + def up + queue_batched_background_migration( + MIGRATION, + :migrations, + :id, + job_interval: DELAY_INTERVAL + ) + end + + def down + Gitlab::Database::BackgroundMigration::BatchedMigration + .for_configuration(MIGRATION, :migrations, :id, []).delete_all + end + end + ``` + + After deployment, our application: + - Continues using the data as before. + - Ensures that both existing and new data are migrated. + +1. In the next release, remove the trigger. We must also add a new post-deployment migration + that checks that the batched background migration is completed. For example: + + ```ruby + class FinalizeExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0] + MIGRATION = 'ExtractIntegrationsUrl' + disable_ddl_transaction! + + def up + ensure_batched_background_migration_is_finished( + job_class_name: MIGRATION, + table_name: :integrations, + column_name: :id, + job_arguments: [] + ) + end + + def down + # no-op + end + end + ``` + + If the application does not depend on the data being 100% migrated (for + instance, the data is advisory, and not mission-critical), then you can skip this + final step. This step confirms that the migration is completed, and all of the rows were migrated. + +After the batched migration is completed, you can safely remove the `integrations.properties` column. + +## Testing + +Writing tests is required for: + +- The batched background migrations' queueing migration. +- The batched background migration itself. +- A cleanup migration. + +The `:migration` and `schema: :latest` RSpec tags are automatically set for +background migration specs. Refer to the +[Testing Rails migrations](testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class) +style guide. + +Remember that `before` and `after` RSpec hooks +migrate your database down and up. These hooks can result in other batched background +migrations being called. Using `spy` test doubles with +`have_received` is encouraged, instead of using regular test doubles, because +your expectations defined in a `it` block can conflict with what is +called in RSpec hooks. Refer to [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839) +for more details. + +## Best practices + +1. Know how much data you're dealing with. +1. Make sure the batched background migration jobs are idempotent. +1. Confirm the tests you write are not false positives. +1. If the data being migrated is critical and cannot be lost, the + clean-up migration must also check the final state of the data before completing. +1. Discuss the numbers with a database specialist. The migration may add + more pressure on DB than you expect. Measure on staging, + or ask someone to measure on production. +1. Know how much time is required to run the batched background migration. + +## Additional tips and strategies + +### Viewing failure error logs + +You can view failures in two ways: + +- Via GitLab logs: + 1. After running a batched background migration, if any jobs fail, + view the logs in [Kibana](https://log.gprd.gitlab.net/goto/5f06a57f768c6025e1c65aefb4075694). + View the production Sidekiq log and filter for: + + - `json.new_state: failed` + - `json.job_class_name: <Batched Background Migration job class name>` + - `json.job_arguments: <Batched Background Migration job class arguments>` + + 1. Review the `json.exception_class` and `json.exception_message` values to help + understand why the jobs failed. + + 1. Remember the retry mechanism. Having a failure does not mean the job failed. + Always check the last status of the job. + +- Via database: + + 1. Get the batched background migration `CLASS_NAME`. + 1. Execute the following query in the PostgreSQL console: + + ```sql + SELECT migration.id, migration.job_class_name, transition_logs.exception_class, transition_logs.exception_message + FROM batched_background_migrations as migration + INNER JOIN batched_background_migration_jobs as jobs + ON jobs.batched_background_migration_id = migration.id + INNER JOIN batched_background_migration_job_transition_logs as transition_logs + ON transition_logs.batched_background_migration_job_id = jobs.id + WHERE transition_logs.next_status = '2' AND migration.job_class_name = "CLASS_NAME"; + ``` diff --git a/doc/development/cached_queries.md b/doc/development/cached_queries.md index 492c8d13600..8c69981b27a 100644 --- a/doc/development/cached_queries.md +++ b/doc/development/cached_queries.md @@ -1,6 +1,6 @@ --- -stage: none -group: unassigned +stage: Enablement +group: Memory info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- diff --git a/doc/development/chatops_on_gitlabcom.md b/doc/development/chatops_on_gitlabcom.md index 26fcf520393..e18fcb0061b 100644 --- a/doc/development/chatops_on_gitlabcom.md +++ b/doc/development/chatops_on_gitlabcom.md @@ -25,6 +25,7 @@ To request access to ChatOps on GitLab.com: - The same username you use on GitLab.com. You may have to choose a different username later. - Clicking the **Sign in with Google** button to sign in with your GitLab.com email address. + - Clicking the **Sign in with Okta** button to sign in with Okta. 1. Confirm that your username in [Internal GitLab for Operations](https://ops.gitlab.net/) is the same as your username in [GitLab.com](https://gitlab.com/). If the usernames diff --git a/doc/development/cicd/schema.md b/doc/development/cicd/schema.md index b63d951b881..0e456a25a7a 100644 --- a/doc/development/cicd/schema.md +++ b/doc/development/cicd/schema.md @@ -5,26 +5,26 @@ info: To determine the technical writer assigned to the Stage/Group associated w type: index, howto --- -# Contribute to the CI Schema **(FREE)** +# Contribute to the CI/CD Schema **(FREE)** -The [pipeline editor](../../ci/pipeline_editor/index.md) uses a CI schema to enhance -the authoring experience of our CI configuration files. With the CI schema, the editor can: +The [pipeline editor](../../ci/pipeline_editor/index.md) uses a CI/CD schema to enhance +the authoring experience of our CI/CD configuration files. With the CI/CD schema, the editor can: -- Validate the content of the CI configuration file as it is being written in the editor. +- Validate the content of the CI/CD configuration file as it is being written in the editor. - Provide autocomplete functionality and suggest available keywords. - Provide definitions of keywords through annotations. -As the rules and keywords for configuring our CI configuration files change, so too -should our CI schema. +As the rules and keywords for configuring our CI/CD configuration files change, so too +should our CI/CD schema. This feature is behind the [`schema_linting`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/feature_flags/development/schema_linting.yml) feature flag for self-managed instances, and is enabled for GitLab.com. ## JSON Schemas -The CI schema follows the [JSON Schema Draft-07](https://json-schema.org/draft-07/json-schema-release-notes.html) -specification. Although the CI configuration file is written in YAML, it is converted -into JSON by using `monaco-yaml` before it is validated by the CI schema. +The CI/CD schema follows the [JSON Schema Draft-07](https://json-schema.org/draft-07/json-schema-release-notes.html) +specification. Although the CI/CD configuration file is written in YAML, it is converted +into JSON by using `monaco-yaml` before it is validated by the CI/CD schema. If you're new to JSON schemas, consider checking out [this guide](https://json-schema.org/learn/getting-started-step-by-step) for @@ -32,8 +32,8 @@ a step-by-step introduction on how to work with JSON schemas. ## Update Keywords -The CI schema is at [`app/assets/javascripts/editor/schema/ci.json`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/editor/schema/ci.json). -It contains all the keywords available for authoring CI configuration files. +The CI/CD schema is at [`app/assets/javascripts/editor/schema/ci.json`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/editor/schema/ci.json). +It contains all the keywords available for authoring CI/CD configuration files. Check the [keyword reference](../../ci/yaml/index.md) for a comprehensive list of all available keywords. @@ -138,9 +138,72 @@ under the topmost **properties** key. ## Test the schema -For now, the CI schema can only be tested manually. To verify the behavior is correct: +### Verify changes 1. Enable the `schema_linting` feature flag. 1. Go to **CI/CD** > **Editor**. 1. Write your CI/CD configuration in the editor and verify that the schema validates it correctly. + +### Write specs + +All of the CI/CD schema specs are in [`spec/frontend/editor/schema/ci`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/spec/frontend/editor/schema/ci). +Legacy tests are in JSON, but we recommend writing all new tests in YAML. +You can write them as if you're adding a new `.gitlab-ci.yml` configuration file. + +Tests are separated into **positive** tests and **negative** tests. Positive tests +are snippets of CI/CD configuration code that use the schema keywords as intended. +Conversely, negative tests give examples of the schema keywords being used incorrectly. +These tests ensure that the schema validates different examples of input as expected. + +`ci_schema_spec.js` is responsible for running all of the tests against the schema. + +A detailed explanation of how the tests are set up can be found in this +[merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/83047). + +#### Update schema specs + +If a YAML test does not exist for the specified keyword, create new files in +`yaml_tests/positive_tests` and `yaml_tests/negative_tests`. Otherwise, you can update +the existing tests: + +1. Write both positive and negative tests to validate different kinds of input. +1. If you created new files, import them in `ci_schema_spec.js` and add each file to their + corresponding object entries. For example: + + ```javascript + import CacheYaml from './yaml_tests/positive_tests/cache.yml'; + import CacheNegativeYaml from './yaml_tests/negative_tests/cache.yml'; + + // import your new test files + import NewKeywordTestYaml from './yaml_tests/positive_tests/cache.yml'; + import NewKeywordTestNegativeYaml from './yaml_tests/negative_tests/cache.yml'; + + describe('positive tests', () => { + it.each( + Object.entries({ + CacheYaml, + NewKeywordTestYaml, // add positive test here + }), + )('schema validates %s', (_, input) => { + expect(input).toValidateJsonSchema(schema); + }); + }); + + describe('negative tests', () => { + it.each( + Object.entries({ + CacheNegativeYaml, + NewKeywordTestYaml, // add negative test here + }), + )('schema validates %s', (_, input) => { + expect(input).not.toValidateJsonSchema(schema); + }); + }); + ``` + +1. Run the command `yarn jest spec/frontend/editor/schema/ci/ci_schema_spec.js` + and verify that all the tests successfully pass. + +If the spec covers a change to an existing keyword and it affects the legacy JSON +tests, update them as well. diff --git a/doc/development/code_review.md b/doc/development/code_review.md index ec913df8e4a..48bbe4c60ba 100644 --- a/doc/development/code_review.md +++ b/doc/development/code_review.md @@ -74,17 +74,13 @@ It picks reviewers and maintainers from the list at the page, with these behaviors: 1. It doesn't pick people whose Slack or [GitLab status](../user/profile/index.md#set-your-current-status): - - Contains the string 'OOO', 'PTO', 'Parental Leave', or 'Friends and Family'. + - Contains the string `OOO`, `PTO`, `Parental Leave`, or `Friends and Family`. - GitLab user **Busy** indicator is set to `True`. - - Emoji is any of: - - 🌴 `:palm_tree:` - - 🏖️ `:beach:`, `:beach_umbrella:`, or `:beach_with_umbrella:` - - 🎡 `:ferris_wheel:` - - 🌡️ `:thermometer:` - - 🤒 `:face_with_thermometer:` - - 🔴 `:red_circle:` - - 💡 `:bulb:` - - 🌞 `:sun_with_face:` + - Emoji is from one of these categories: + - **On leave** - 🌴 `:palm_tree:`, 🏖️ `:beach:`, ⛱ `:beach_umbrella:`, 🏖 `:beach_with_umbrella:`, 🌞 `:sun_with_face:`, 🎡 `:ferris_wheel:` + - **Out sick** - 🌡️ `:thermometer:`, 🤒 `:face_with_thermometer:` + - **At capacity** - 🔴 `:red_circle:` + - **Focus mode** - 💡 `:bulb:` (focusing on their team's work) 1. [Trainee maintainers](https://about.gitlab.com/handbook/engineering/workflow/code-review/#trainee-maintainer) are three times as likely to be picked as other reviewers. 1. Team members whose Slack or [GitLab status](../user/profile/index.md#set-your-current-status) emoji @@ -92,12 +88,22 @@ page, with these behaviors: - Reviewers with 🔵 `:large_blue_circle:` are two times as likely to be picked as other reviewers. - Trainee maintainers with 🔵 `:large_blue_circle:` are four times as likely to be picked as other reviewers. 1. People whose [GitLab status](../user/profile/index.md#set-your-current-status) emoji - is 🔶 `:large_orange_diamond:` or 🔸 `:small_orange_diamond:` are half as likely to be picked. This applies to both reviewers and trainee maintainers. + is 🔶 `:large_orange_diamond:` or 🔸 `:small_orange_diamond:` are half as likely to be picked. 1. It always picks the same reviewers and maintainers for the same branch name (unless their out-of-office (OOO) status changes, as in point 1). It removes leading `ce-` and `ee-`, and trailing `-ce` and `-ee`, so that it can be stable for backport branches. +The [Roulette dashboard](https://gitlab-org.gitlab.io/gitlab-roulette) contains: + +- Assignment events in the last 7 and 30 days. +- Currently assigned merge requests per person. +- Sorting by different criteria. +- A manual reviewer roulette. +- Local time information. + +For more information, review [the roulette README](https://gitlab.com/gitlab-org/gitlab-roulette). + ### Approval guidelines As described in the section on the responsibility of the maintainer below, you @@ -136,6 +142,7 @@ with [domain expertise](#domain-experts). 1. If your merge request includes Product Intelligence (telemetry or analytics) changes, it should be reviewed and approved by a [Product Intelligence engineer](https://gitlab.com/gitlab-org/growth/product-intelligence/engineers). 1. If your merge request includes an addition of, or changes to a [Feature spec](testing_guide/testing_levels.md#frontend-feature-tests), it must be **approved by a [Quality maintainer](https://about.gitlab.com/handbook/engineering/projects/#gitlab_maintainers_qa) or [Quality reviewer](https://about.gitlab.com/handbook/engineering/projects/#gitlab_reviewers_qa)**. 1. If your merge request introduces a new service to GitLab (Puma, Sidekiq, Gitaly are examples), it must be **approved by a [product manager](https://about.gitlab.com/company/team/)**. See the [process for adding a service component to GitLab](adding_service_component.md) for details. +1. If your merge request includes changes related to authentication or authorization, it must be **approved by a [Manage:Authentication and Authorization team member](https://about.gitlab.com/company/team/)**. Check the [code review section on the group page](https://about.gitlab.com/handbook/engineering/development/dev/manage/authentication-and-authorization/#additional-considerations) for more details. Patterns for files known to require review from the team are listed in the in the `Authentication and Authorization` section of the [`CODEOWNERS`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/CODEOWNERS) file, and the team will be listed in the approvers section of all merge requests that modify these files. - (*1*): Specs other than JavaScript specs are considered backend code. - (*2*): We encourage you to seek guidance from a database maintainer if your merge @@ -154,7 +161,7 @@ Using checklists improves quality in software engineering. This checklist is a s ##### Quality -See the [test engineering process](https://about.gitlab.com/handbook/engineering/quality/test-engineering/) for further quality guidelines. +See the [test engineering process](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/test-engineering/) for further quality guidelines. 1. I have self-reviewed this MR per [code review guidelines](code_review.md). 1. For the code that this change impacts, I believe that the automated tests ([Testing Guide](testing_guide/index.md)) validate functionality that is highly important to users (including consideration of [all test levels](testing_guide/testing_levels.md)). @@ -240,6 +247,8 @@ warrant a comment could be: - Any benchmarking performed to complement the change. - Potentially insecure code. +If there are any projects, snippets, or other assets that are required for a reviewer to validate the solution, ensure they have access to those assets before requesting review. + Avoid: - Adding TODO comments (referenced above) directly to the source code unless the reviewer requires @@ -249,7 +258,7 @@ Avoid: [_explain why, not what_](https://blog.codinghorror.com/code-tells-you-how-comments-tell-you-why/). - Requesting maintainer reviews of merge requests with failed tests. If the tests are failing and you have to request a review, ensure you leave a comment with an explanation. - Excessively mentioning maintainers through email or Slack (if the maintainer is reachable -through Slack). If you can't add a reviewer for a merge request, `@` mentioning a maintainer in a comment is acceptable and in all other cases adding a reviewer is sufficient. +through Slack). If you can't add a reviewer for a merge request, it's acceptable to `@` mention a maintainer in a comment. In all other cases, it's sufficient to add a reviewer or [request their attention](../user/project/merge_requests/index.md#request-attention-to-a-merge-request) if they're already a reviewer. This saves reviewers time and helps authors catch mistakes earlier. @@ -259,10 +268,8 @@ This saves reviewers time and helps authors catch mistakes earlier. that it meets all requirements, you should: - Click the Approve button. -- `@` mention the author to generate a to-do notification, and advise them that their merge request has been reviewed and approved. -- Request a review from a maintainer. Default to requests for a maintainer with [domain expertise](#domain-experts), +- Request a review from a maintainer or [request their attention](../user/project/merge_requests/index.md#request-attention-to-a-merge-request) if they're already a reviewer. Default to requests for a maintainer with [domain expertise](#domain-experts), however, if one isn't available or you think the merge request doesn't need a review by a [domain expert](#domain-experts), feel free to follow the [Reviewer roulette](#reviewer-roulette) suggestion. -- Remove yourself as a reviewer. ### The responsibility of the maintainer @@ -290,7 +297,7 @@ If a developer who happens to also be a maintainer was involved in a merge reque as a reviewer, it is recommended that they are not also picked as the maintainer to ultimately approve and merge it. Maintainers should check before merging if the merge request is approved by the -required approvers. If still awaiting further approvals from others, remove yourself as a reviewer then `@` mention the author and explain why in a comment. Stay as reviewer if you're merging the code. +required approvers. If still awaiting further approvals from others, explain that in a comment and [request attention](../user/project/merge_requests/index.md#request-attention-to-a-merge-request) from other reviewers as appropriate. Do not remove yourself as a reviewer. Maintainers must check before merging if the merge request is introducing new vulnerabilities, by inspecting the list in the merge request @@ -312,14 +319,20 @@ After merging, a maintainer should stay as the reviewer listed on the merge requ ### Dogfooding the Reviewers feature -On March 18th 2021, an updated process was put in place aimed at efficiently and consistently dogfooding the Reviewers feature. +Replaced with [dogfooding the attention request feature](#dogfooding-the-attention-request-feature). + +### Dogfooding the attention request feature + +In March of 2022, an updated process was put in place aimed at efficiently and consistently dogfooding the +[attention requests feature](../user/project/merge_requests/index.md#request-attention-to-a-merge-request) under `Merge requests` -> `Need your attention`. This replaces previous guidance on [dogfooding the reviewers feature](#dogfooding-the-reviewers-feature). Here is a summary of the changes, also reflected in this section above. -- Merge request authors and DRIs stay as Assignees -- Authors request a review from Reviewers when they are expected to review -- Reviewers remove themselves after they're done reviewing/approving -- The last approver stays as Reviewer upon merging +- Merge request authors and DRIs stay as assignees +- Assignees request a review from reviewer(s) when they are expected to review +- Reviewers stay assigned for the entire duration of the merge request +- Reviewers request attention from the assignee or other reviewer(s) after they've done reviewing, depending on who needs to take action +- Assignees request attention from the reviewer(s) when changes are made ## Best practices @@ -392,6 +405,11 @@ When you are ready to have your merge request reviewed, you should [request an initial review](../user/project/merge_requests/getting_started.md#reviewer) by selecting a reviewer from your group or team. However, you can also assign it to any reviewer. The list of reviewers can be found on [Engineering projects](https://about.gitlab.com/handbook/engineering/projects/) page. +When a merge request has multiple areas for review, it is recommended you specify which area a reviewer should be reviewing, and at which stage (first or second). +This will help team members who qualify as a reviewer for multiple areas to know which area they're being requested to review. +For example, when a merge request has both `backend` and `frontend` concerns, you can mention the reviewer in this manner: +`@john_doe can you please review ~backend?` or `@jane_doe - could you please give this MR a ~frontend maintainer review?` + You can also use `workflow::ready for review` label. That means that your merge request is ready to be reviewed and any reviewer can pick it. It is recommended to use that label only if there isn't time pressure and make sure the merge request is assigned to a reviewer. When your merge request receives an approval from the first reviewer it can be passed to a maintainer. You should default to choosing a maintainer with [domain expertise](#domain-experts), and otherwise follow the Reviewer Roulette recommendation or use the label `ready for merge`. @@ -605,9 +623,9 @@ Enterprise Edition instance. This has some implications: migration on the staging environment if you aren't sure. 1. Categorized correctly: - Regular migrations run before the new code is running on the instance. - - [Post-deployment migrations](post_deployment_migrations.md) run _after_ + - [Post-deployment migrations](database/post_deployment_migrations.md) run _after_ the new code is deployed, when the instance is configured to do that. - - [Background migrations](background_migrations.md) run in Sidekiq, and + - [Background migrations](database/background_migrations.md) run in Sidekiq, and should only be done for migrations that would take an extreme amount of time at GitLab.com scale. 1. **Sidekiq workers** [cannot change in a backwards-incompatible way](sidekiq/compatibility_across_updates.md): diff --git a/doc/development/contributing/design.md b/doc/development/contributing/design.md index 463a7ee0e0b..def39a960d8 100644 --- a/doc/development/contributing/design.md +++ b/doc/development/contributing/design.md @@ -49,7 +49,7 @@ Check these aspects both when _designing_ and _reviewing_ UI changes. ### Visual design Check visual design properties using your browser's _elements inspector_ ([Chrome](https://developer.chrome.com/docs/devtools/css/), -[Firefox](https://developer.mozilla.org/en-US/docs/Tools/Page_Inspector/How_to/Open_the_Inspector)). +[Firefox](https://firefox-source-docs.mozilla.org/devtools-user/page_inspector/how_to/open_the_inspector/index.html)). - Use recommended [colors](https://design.gitlab.com/product-foundations/colors/) and [typography](https://design.gitlab.com/product-foundations/type-fundamentals/). @@ -66,7 +66,7 @@ Check visual design properties using your browser's _elements inspector_ ([Chrom Check states using your browser's _styles inspector_ to toggle CSS pseudo-classes like `:hover` and others ([Chrome](https://developer.chrome.com/docs/devtools/css/reference/#pseudo-class), -[Firefox](https://developer.mozilla.org/en-US/docs/Tools/Page_Inspector/How_to/Examine_and_edit_CSS#viewing_common_pseudo-classes)). +[Firefox](https://firefox-source-docs.mozilla.org/devtools-user/page_inspector/how_to/examine_and_edit_css/index.html#viewing-common-pseudo-classes)). - Account for all applicable states ([error](https://design.gitlab.com/content/error-messages), rest, loading, focus, hover, selected, disabled). @@ -78,7 +78,7 @@ like `:hover` and others ([Chrome](https://developer.chrome.com/docs/devtools/cs ### Responsive Check responsive behavior using your browser's _responsive mode_ ([Chrome](https://developer.chrome.com/docs/devtools/device-mode/#viewport), -[Firefox](https://developer.mozilla.org/en-US/docs/Tools/Responsive_Design_Mode)). +[Firefox](https://firefox-source-docs.mozilla.org/devtools-user/responsive_design_mode/index.html)). - Account for resizing, collapsing, moving, or wrapping of elements across all breakpoints (even if larger viewports are prioritized). @@ -99,7 +99,7 @@ Check accessibility using your browser's _accessibility inspector_ ([Chrome](htt When the design is ready, _before_ starting its implementation: - Share design specifications in the related issue, preferably through a [Figma link](https://help.figma.com/hc/en-us/articles/360040531773-Share-Files-with-anyone-using-Link-Sharing#Copy_links) - link or [GitLab Designs feature](../../user/project/issues/design_management.md#the-design-management-section). + link or [GitLab Designs feature](../../user/project/issues/design_management.md). See [when you should use each tool](https://about.gitlab.com/handbook/engineering/ux/product-designer/#deliver). - Document user flow and states (for example, using [Mermaid flowcharts in Markdown](../../user/markdown.md#mermaid)). - Document animations and transitions. diff --git a/doc/development/contributing/issue_workflow.md b/doc/development/contributing/issue_workflow.md index 4db686b9b1e..fe1549e7f34 100644 --- a/doc/development/contributing/issue_workflow.md +++ b/doc/development/contributing/issue_workflow.md @@ -31,11 +31,7 @@ on those issues. Please select someone with relevant experience from the If there is nobody mentioned with that expertise, look in the commit history for the affected files to find someone. -We also use [GitLab Triage](https://gitlab.com/gitlab-org/gitlab-triage) to automate -some triaging policies. This is currently set up as a scheduled pipeline -(`https://gitlab.com/gitlab-org/quality/triage-ops/-/pipeline_schedules/10512/edit`, -must have at least the Developer role in the project) running on [quality/triage-ops](https://gitlab.com/gitlab-org/quality/triage-ops) -project. +We also have triage automation in place, described [in our handbook](https://about.gitlab.com/handbook/engineering/quality/triage-operations/). ## Labels diff --git a/doc/development/contributing/merge_request_workflow.md b/doc/development/contributing/merge_request_workflow.md index a9b4d13ab06..5ed0885eed9 100644 --- a/doc/development/contributing/merge_request_workflow.md +++ b/doc/development/contributing/merge_request_workflow.md @@ -144,7 +144,7 @@ document from the Kubernetes team also has some great points regarding this. ### Commit messages guidelines -Commit messages should follow the guidelines below, for reasons explained by Chris Beams in [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/): +Commit messages should follow the guidelines below, for reasons explained by Chris Beams in [How to Write a Git Commit Message](https://cbea.ms/git-commit/): - The commit subject and body must be separated by a blank line. - The commit subject must start with a capital letter. @@ -203,7 +203,7 @@ Example commit message template that can be used on your machine that embodies t # Do not use Emojis # Use the body to explain what and why vs. how # Can use multiple lines with "-" for bullet points in body -# For more information: https://chris.beams.io/posts/git-commit/ +# For more information: https://cbea.ms/git-commit/ # -------------------- ``` @@ -286,8 +286,8 @@ requirements. ### Production use 1. Confirmed to be working in staging before implementing the change in production, where possible. -1. Confirmed to be working in the production with no new [Sentry](https://about.gitlab.com/handbook/engineering/#sentry) errors after the contribution is deployed. -1. Confirmed that the [rollout plan](https://about.gitlab.com/handbook/engineering/development/processes/rollout-plans) has been completed. +1. Confirmed to be working in the production with no new [Sentry](https://about.gitlab.com/handbook/engineering/monitoring/#sentry) errors after the contribution is deployed. +1. Confirmed that the [rollout plan](https://about.gitlab.com/handbook/engineering/development/processes/rollout-plans/) has been completed. 1. If there is a performance risk in the change, I have analyzed the performance of the system before and after the change. 1. *If the merge request uses feature flags, per-project or per-group enablement, and a staged rollout:* - Confirmed to be working on GitLab projects. diff --git a/doc/development/contributing/verify/index.md b/doc/development/contributing/verify/index.md index a2bb0eca733..828eb0a9598 100644 --- a/doc/development/contributing/verify/index.md +++ b/doc/development/contributing/verify/index.md @@ -55,7 +55,7 @@ and they serve us and our users well. Some examples of these principles are that - Feedback needs to be available when a user needs it and data can not disappear unexpectedly when engineers need it. - It all doesn’t matter if the platform is not secure and we are leaking credentials or secrets. -- When a user provides a set of preconditions in a form of CI/CD configuration, the result should be deterministic each time a pipeline runs, because otherwise the platform might not be trustworthy. +- When a user provides a set of preconditions in a form of CI/CD configuration, the result should be deterministic each time a pipeline runs, because otherwise the platform might not be trustworthy. - If it is fast, simple to use and has a great UX it will serve our users well. ## Building things in Verify @@ -189,8 +189,7 @@ Slack channel (GitLab team members only). After your merge request is merged by a maintainer, it is time to release it to users and the wider community. We usually do this with feature flags. While not every merge request needs a feature flag, most merge -requests in Verify should have feature flags. [**TODO** link to docs about what -needs a feature flag and what doesn’t]. +requests in Verify should have [feature flags](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/#when-to-use-feature-flags). If you already follow the advice on this page, you probably already have a few metrics and perhaps a few loggers added that make your new code observable diff --git a/doc/development/dangerbot.md b/doc/development/dangerbot.md index 9bf0fbe1d78..f941e0720c6 100644 --- a/doc/development/dangerbot.md +++ b/doc/development/dangerbot.md @@ -142,7 +142,7 @@ To enable the Dangerfile on another existing GitLab project, complete the follow 1. Create a `Dangerfile` with the following content: ```ruby - require_relative "lib/gitlab-dangerfiles" + require "gitlab-dangerfiles" Gitlab::Dangerfiles.for_project(self, &:import_defaults) ``` @@ -154,6 +154,8 @@ To enable the Dangerfile on another existing GitLab project, complete the follow - project: 'gitlab-org/quality/pipeline-common' file: - '/ci/danger-review.yml' + rules: + - if: '$CI_SERVER_HOST == "gitlab.com"' ``` 1. If your project is in the `gitlab-org` group, you don't need to set up any token as the `DANGER_GITLAB_API_TOKEN` diff --git a/doc/development/database/add_foreign_key_to_existing_column.md b/doc/development/database/add_foreign_key_to_existing_column.md index d74f826cc14..bfd455ef9da 100644 --- a/doc/development/database/add_foreign_key_to_existing_column.md +++ b/doc/development/database/add_foreign_key_to_existing_column.md @@ -123,7 +123,7 @@ end Validating the foreign key scans the whole table and makes sure that each relation is correct. NOTE: -When using [background migrations](../background_migrations.md), foreign key validation should happen in the next GitLab release. +When using [background migrations](background_migrations.md), foreign key validation should happen in the next GitLab release. Migration file for validating the foreign key: diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md new file mode 100644 index 00000000000..ad2768397e6 --- /dev/null +++ b/doc/development/database/avoiding_downtime_in_migrations.md @@ -0,0 +1,491 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Avoiding downtime in migrations + +When working with a database certain operations may require downtime. Since we +cannot have downtime in migrations we need to use a set of steps to get the +same end result without downtime. This guide describes various operations that +may appear to need downtime, their impact, and how to perform them without +requiring downtime. + +## Dropping Columns + +Removing columns is tricky because running GitLab processes may still be using +the columns. To work around this safely, you will need three steps in three releases: + +1. Ignoring the column (release M) +1. Dropping the column (release M+1) +1. Removing the ignore rule (release M+2) + +The reason we spread this out across three releases is that dropping a column is +a destructive operation that can't be rolled back easily. + +Following this procedure helps us to make sure there are no deployments to GitLab.com +and upgrade processes for self-managed installations that lump together any of these steps. + +### Step 1: Ignoring the column (release M) + +The first step is to ignore the column in the application code. This is +necessary because Rails caches the columns and re-uses this cache in various +places. This can be done by defining the columns to ignore. For example, to ignore +`updated_at` in the User model you'd use the following: + +```ruby +class User < ApplicationRecord + include IgnorableColumns + ignore_column :updated_at, remove_with: '12.7', remove_after: '2020-01-22' +end +``` + +Multiple columns can be ignored, too: + +```ruby +ignore_columns %i[updated_at created_at], remove_with: '12.7', remove_after: '2020-01-22' +``` + +If the model exists in CE and EE, the column has to be ignored in the CE model. If the +model only exists in EE, then it has to be added there. + +We require indication of when it is safe to remove the column ignore with: + +- `remove_with`: set to a GitLab release typically two releases (M+2) after adding the + column ignore. +- `remove_after`: set to a date after which we consider it safe to remove the column + ignore, typically after the M+1 release date, during the M+2 development cycle. + +This information allows us to reason better about column ignores and makes sure we +don't remove column ignores too early for both regular releases and deployments to GitLab.com. For +example, this avoids a situation where we deploy a bulk of changes that include both changes +to ignore the column and subsequently remove the column ignore (which would result in a downtime). + +In this example, the change to ignore the column went into release 12.5. + +### Step 2: Dropping the column (release M+1) + +Continuing our example, dropping the column goes into a _post-deployment_ migration in release 12.6: + +```ruby + remove_column :user, :updated_at +``` + +### Step 3: Removing the ignore rule (release M+2) + +With the next release, in this example 12.7, we set up another merge request to remove the ignore rule. +This removes the `ignore_column` line and - if not needed anymore - also the inclusion of `IgnoreableColumns`. + +This should only get merged with the release indicated with `remove_with` and once +the `remove_after` date has passed. + +## Renaming Columns + +Renaming columns the normal way requires downtime as an application may continue +using the old column name during/after a database migration. To rename a column +without requiring downtime we need two migrations: a regular migration, and a +post-deployment migration. Both these migration can go in the same release. + +### Step 1: Add The Regular Migration + +First we need to create the regular migration. This migration should use +`Gitlab::Database::MigrationHelpers#rename_column_concurrently` to perform the +renaming. For example + +```ruby +# A regular migration in db/migrate +class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + def up + rename_column_concurrently :users, :updated_at, :updated_at_timestamp + end + + def down + undo_rename_column_concurrently :users, :updated_at, :updated_at_timestamp + end +end +``` + +This will take care of renaming the column, ensuring data stays in sync, and +copying over indexes and foreign keys. + +If a column contains one or more indexes that don't contain the name of the +original column, the previously described procedure will fail. In that case, +you'll first need to rename these indexes. + +### Step 2: Add A Post-Deployment Migration + +The renaming procedure requires some cleaning up in a post-deployment migration. +We can perform this cleanup using +`Gitlab::Database::MigrationHelpers#cleanup_concurrent_column_rename`: + +```ruby +# A post-deployment migration in db/post_migrate +class CleanupUsersUpdatedAtRename < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + def up + cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp + end + + def down + undo_cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp + end +end +``` + +If you're renaming a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3), please carefully consider the state when the first migration has run but the second cleanup migration hasn't been run yet. +With [Canary](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/) it is possible that the system runs in this state for a significant amount of time. + +## Changing Column Constraints + +Adding or removing a `NOT NULL` clause (or another constraint) can typically be +done without requiring downtime. However, this does require that any application +changes are deployed _first_. Thus, changing the constraints of a column should +happen in a post-deployment migration. + +Avoid using `change_column` as it produces an inefficient query because it re-defines +the whole column type. + +You can check the following guides for each specific use case: + +- [Adding foreign-key constraints](../migration_style_guide.md#adding-foreign-key-constraints) +- [Adding `NOT NULL` constraints](not_null_constraints.md) +- [Adding limits to text columns](strings_and_the_text_data_type.md) + +## Changing Column Types + +Changing the type of a column can be done using +`Gitlab::Database::MigrationHelpers#change_column_type_concurrently`. This +method works similarly to `rename_column_concurrently`. For example, let's say +we want to change the type of `users.username` from `string` to `text`. + +### Step 1: Create A Regular Migration + +A regular migration is used to create a new column with a temporary name along +with setting up some triggers to keep data in sync. Such a migration would look +as follows: + +```ruby +# A regular migration in db/migrate +class ChangeUsersUsernameStringToText < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + def up + change_column_type_concurrently :users, :username, :text + end + + def down + undo_change_column_type_concurrently :users, :username + end +end +``` + +### Step 2: Create A Post Deployment Migration + +Next we need to clean up our changes using a post-deployment migration: + +```ruby +# A post-deployment migration in db/post_migrate +class ChangeUsersUsernameStringToTextCleanup < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + def up + cleanup_concurrent_column_type_change :users, :username + end + + def down + undo_cleanup_concurrent_column_type_change :users, :username, :string + end +end +``` + +And that's it, we're done! + +### Casting data to a new type + +Some type changes require casting data to a new type. For example when changing from `text` to `jsonb`. +In this case, use the `type_cast_function` option. +Make sure there is no bad data and the cast will always succeed. You can also provide a custom function that handles +casting errors. + +Example migration: + +```ruby + def up + change_column_type_concurrently :users, :settings, :jsonb, type_cast_function: 'jsonb' + end +``` + +## Changing The Schema For Large Tables + +While `change_column_type_concurrently` and `rename_column_concurrently` can be +used for changing the schema of a table without downtime, it doesn't work very +well for large tables. Because all of the work happens in sequence the migration +can take a very long time to complete, preventing a deployment from proceeding. +They can also produce a lot of pressure on the database due to it rapidly +updating many rows in sequence. + +To reduce database pressure you should instead use a background migration +when migrating a column in a large table (for example, `issues`). This will +spread the work / load over a longer time period, without slowing down deployments. + +For more information, see [the documentation on cleaning up background +migrations](background_migrations.md#cleaning-up). + +## Adding Indexes + +Adding indexes does not require downtime when `add_concurrent_index` +is used. + +See also [Migration Style Guide](../migration_style_guide.md#adding-indexes) +for more information. + +## Dropping Indexes + +Dropping an index does not require downtime. + +## Adding Tables + +This operation is safe as there's no code using the table just yet. + +## Dropping Tables + +Dropping tables can be done safely using a post-deployment migration, but only +if the application no longer uses the table. + +## Renaming Tables + +Renaming tables requires downtime as an application may continue +using the old table name during/after a database migration. + +If the table and the ActiveRecord model is not in use yet, removing the old +table and creating a new one is the preferred way to "rename" the table. + +Renaming a table is possible without downtime by following our multi-release +[rename table process](rename_database_tables.md#rename-table-without-downtime). + +## Adding Foreign Keys + +Adding foreign keys usually works in 3 steps: + +1. Start a transaction +1. Run `ALTER TABLE` to add the constraint(s) +1. Check all existing data + +Because `ALTER TABLE` typically acquires an exclusive lock until the end of a +transaction this means this approach would require downtime. + +GitLab allows you to work around this by using +`Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key`. This method +ensures that no downtime is needed. + +## Removing Foreign Keys + +This operation does not require downtime. + +## Migrating `integer` primary keys to `bigint` + +To [prevent the overflow risk](https://gitlab.com/groups/gitlab-org/-/epics/4785) for some tables +with `integer` primary key (PK), we have to migrate their PK to `bigint`. The process to do this +without downtime and causing too much load on the database is described below. + +### Initialize the conversion and start migrating existing data (release N) + +To start the process, add a regular migration to create the new `bigint` columns. Use the provided +`initialize_conversion_of_integer_to_bigint` helper. The helper also creates a database trigger +to keep in sync both columns for any new records ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072312_initialize_conversion_of_ci_stages_to_bigint.rb)): + +```ruby +class InitializeConversionOfCiStagesToBigint < ActiveRecord::Migration[6.1] + include Gitlab::Database::MigrationHelpers + + TABLE = :ci_stages + COLUMNS = %i(id) + + def up + initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end + + def down + revert_initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end +end +``` + +Ignore the new `bigint` columns: + +```ruby +module Ci + class Stage < Ci::ApplicationRecord + include IgnorableColumns + ignore_column :id_convert_to_bigint, remove_with: '14.2', remove_after: '2021-08-22' + end +``` + +To migrate existing data, we introduced new type of _batched background migrations_. +Unlike the classic background migrations, built on top of Sidekiq, batched background migrations +don't have to enqueue and schedule all the background jobs at the beginning. +They also have other advantages, like automatic tuning of the batch size, better progress visibility, +and collecting metrics. To start the process, use the provided `backfill_conversion_of_integer_to_bigint` +helper ([example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072346_backfill_ci_stages_for_bigint_conversion.rb)): + +```ruby +class BackfillCiStagesForBigintConversion < ActiveRecord::Migration[6.1] + include Gitlab::Database::MigrationHelpers + + TABLE = :ci_stages + COLUMNS = %i(id) + + def up + backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end + + def down + revert_backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS) + end +end +``` + +### Monitor the background migration + +Check how the migration is performing while it's running. Multiple ways to do this are described below. + +#### High-level status of batched background migrations + +See how to [check the status of batched background migrations](../../update/index.md#checking-for-background-migrations-before-upgrading). + +#### Query the database + +We can query the related database tables directly. Requires access to read-only replica. +Example queries: + +```sql +-- Get details for batched background migration for given table +SELECT * FROM batched_background_migrations WHERE table_name = 'namespaces'\gx + +-- Get count of batched background migration jobs by status for given table +SELECT + batched_background_migrations.id, batched_background_migration_jobs.status, COUNT(*) +FROM + batched_background_migrations + JOIN batched_background_migration_jobs ON batched_background_migrations.id = batched_background_migration_jobs.batched_background_migration_id +WHERE + table_name = 'namespaces' +GROUP BY + batched_background_migrations.id, batched_background_migration_jobs.status; + +-- Batched background migration progress for given table (based on estimated total number of tuples) +SELECT + m.table_name, + LEAST(100 * sum(j.batch_size) / pg_class.reltuples, 100) AS percentage_complete +FROM + batched_background_migrations m + JOIN batched_background_migration_jobs j ON j.batched_background_migration_id = m.id + JOIN pg_class ON pg_class.relname = m.table_name +WHERE + j.status = 3 AND m.table_name = 'namespaces' +GROUP BY m.id, pg_class.reltuples; +``` + +#### Sidekiq logs + +We can also use the Sidekiq logs to monitor the worker that executes the batched background +migrations: + +1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address. +1. Change the index pattern to `pubsub-sidekiq-inf-gprd*`. +1. Add filter for `json.queue: cronjob:database_batched_background_migration`. + +#### PostgreSQL slow queries log + +Slow queries log keeps track of low queries that took above 1 second to execute. To see them +for batched background migration: + +1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address. +1. Change the index pattern to `pubsub-postgres-inf-gprd*`. +1. Add filter for `json.endpoint_id.keyword: Database::BatchedBackgroundMigrationWorker`. +1. Optional. To see only updates, add a filter for `json.command_tag.keyword: UPDATE`. +1. Optional. To see only failed statements, add a filter for `json.error_severity.keyword: ERROR`. +1. Optional. Add a filter by table name. + +#### Grafana dashboards + +To monitor the health of the database, use these additional metrics: + +- [PostgreSQL Tuple Statistics](https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m): if you see high rate of updates for the tables being actively converted, or increasing percentage of dead tuples for this table, it might mean that autovacuum cannot keep up. +- [PostgreSQL Overview](https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1): if you see high system usage or transactions per second (TPS) on the primary database server, it might mean that the migration is causing problems. + +### Prometheus metrics + +Number of [metrics](https://gitlab.com/gitlab-org/gitlab/-/blob/294a92484ce4611f660439aa48eee4dfec2230b5/lib/gitlab/database/background_migration/batched_migration_wrapper.rb#L90-128) +for each batched background migration are published to Prometheus. These metrics can be searched for and +visualized in Thanos ([see an example](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20(rate(batched_migration_job_updated_tuples_total%7Benv%3D%22gprd%22%7D%5B5m%5D))%20by%20(migration_id)%20&g0.tab=0&g0.stacked=0&g0.range_input=3d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.end_input=2021-06-13%2012%3A18%3A24&g0.moment_input=2021-06-13%2012%3A18%3A24)). + +### Swap the columns (release N + 1) + +After the background is completed and the new `bigint` columns are populated for all records, we can +swap the columns. Swapping is done with post-deployment migration. The exact process depends on the +table being converted, but in general it's done in the following steps: + +1. Using the provided `ensure_batched_background_migration_is_finished` helper, make sure the batched +migration has finished ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)). +If the migration has not completed, the subsequent steps fail anyway. By checking in advance we +aim to have more helpful error message. +1. Create indexes using the `bigint` columns that match the existing indexes using the `integer` +column ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L28-34)). +1. Create foreign keys (FK) using the `bigint` columns that match the existing FKs using the +`integer` column. Do this both for FK referencing other tables, and FKs that reference the table +that is being migrated ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L36-43)). +1. Inside a transaction, swap the columns: + 1. Lock the tables involved. To reduce the chance of hitting a deadlock, we recommended to do this in parent to child order ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L47)). + 1. Rename the columns to swap names ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L49-54)) + 1. Reset the trigger function ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L56-57)). + 1. Swap the defaults ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L59-62)). + 1. Swap the PK constraint (if any) ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L64-68)). + 1. Remove old indexes and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L70-72)). + 1. Remove old FKs (if still present) and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L74)). + +See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088), and [migration](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb). + +### Remove the trigger and old `integer` columns (release N + 2) + +Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper, +drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)). + +### Remove ignore rules (release N + 3) + +In the next release after the columns were dropped, remove the ignore rules as we do not need them +anymore ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71161)). + +## Data migrations + +Data migrations can be tricky. The usual approach to migrate data is to take a 3 +step approach: + +1. Migrate the initial batch of data +1. Deploy the application code +1. Migrate any remaining data + +Usually this works, but not always. For example, if a field's format is to be +changed from JSON to something else we have a bit of a problem. If we were to +change existing data before deploying application code we'll most likely run +into errors. On the other hand, if we were to migrate after deploying the +application code we could run into the same problems. + +If you merely need to correct some invalid data, then a post-deployment +migration is usually enough. If you need to change the format of data (for example, from +JSON to something else) it's typically best to add a new column for the new data +format, and have the application use that. In such a case the procedure would +be: + +1. Add a new column in the new format +1. Copy over existing data to this new column +1. Deploy the application code +1. In a post-deployment migration, copy over any remaining data + +In general there is no one-size-fits-all solution, therefore it's best to +discuss these kind of migrations in a merge request to make sure they are +implemented in the best way possible. diff --git a/doc/development/database/background_migrations.md b/doc/development/database/background_migrations.md new file mode 100644 index 00000000000..1f7e0d76c89 --- /dev/null +++ b/doc/development/database/background_migrations.md @@ -0,0 +1,504 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Background migrations + +WARNING: +Background migrations are strongly discouraged in favor of the new [batched background migrations framework](../batched_background_migrations.md). +Please check that documentation and determine if that framework suits your needs and fall back +to these only if required. + +Background migrations should be used to perform data migrations whenever a +migration exceeds [the time limits in our guidelines](../migration_style_guide.md#how-long-a-migration-should-take). For example, you can use background +migrations to migrate data that's stored in a single JSON column +to a separate table instead. + +If the database cluster is considered to be in an unhealthy state, background +migrations automatically reschedule themselves for a later point in time. + +## When To Use Background Migrations + +You should use a background migration when you migrate _data_ in tables that have +so many rows that the process would exceed [the time limits in our guidelines](../migration_style_guide.md#how-long-a-migration-should-take) if performed using a regular Rails migration. + +- Background migrations should be used when migrating data in [high-traffic tables](../migration_style_guide.md#high-traffic-tables). +- Background migrations may also be used when executing numerous single-row queries +for every item on a large dataset. Typically, for single-record patterns, runtime is +largely dependent on the size of the dataset, hence it should be split accordingly +and put into background migrations. +- Background migrations should not be used to perform schema migrations. + +Some examples where background migrations can be useful: + +- Migrating events from one table to multiple separate tables. +- Populating one column based on JSON stored in another column. +- Migrating data that depends on the output of external services (for example, an API). + +NOTE: +If the background migration is part of an important upgrade, make sure it's announced +in the release post. Discuss with your Project Manager if you're not sure the migration falls +into this category. + +## Isolation + +Background migrations must be isolated and can not use application code (for example, +models defined in `app/models`). Since these migrations can take a long time to +run it's possible for new versions to be deployed while they are still running. + +It's also possible for different migrations to be executed at the same time. +This means that different background migrations should not migrate data in a +way that would cause conflicts. + +## Idempotence + +Background migrations are executed in a context of a Sidekiq process. +Usual Sidekiq rules apply, especially the rule that jobs should be small +and idempotent. + +See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices) +for more details. + +Make sure that in case that your migration job is going to be retried data +integrity is guaranteed. + +## Background migrations for EE-only features + +All the background migration classes for EE-only features should be present in GitLab CE. +For this purpose, an empty class can be created for GitLab CE, and it can be extended for GitLab EE +as explained in the [guidelines for implementing Enterprise Edition features](../ee_features.md#code-in-libgitlabbackground_migration). + +## How It Works + +Background migrations are simple classes that define a `perform` method. A +Sidekiq worker will then execute such a class, passing any arguments to it. All +migration classes must be defined in the namespace +`Gitlab::BackgroundMigration`, the files should be placed in the directory +`lib/gitlab/background_migration/`. + +## Scheduling + +Scheduling a background migration should be done in a post-deployment +migration that includes `Gitlab::Database::MigrationHelpers` +To do so, simply use the following code while +replacing the class name and arguments with whatever values are necessary for +your migration: + +```ruby +migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...]) +``` + +You can use the function `queue_background_migration_jobs_by_range_at_intervals` +to automatically split the job into batches: + +```ruby +queue_background_migration_jobs_by_range_at_intervals( + ClassName, + BackgroundMigrationClassName, + 2.minutes, + batch_size: 10_000 + ) +``` + +You'll also need to make sure that newly created data is either migrated, or +saved in both the old and new version upon creation. For complex and time +consuming migrations it's best to schedule a background job using an +`after_create` hook so this doesn't affect response timings. The same applies to +updates. Removals in turn can be handled by simply defining foreign keys with +cascading deletes. + +### Rescheduling background migrations + +If one of the background migrations contains a bug that is fixed in a patch +release, the background migration needs to be rescheduled so the migration would +be repeated on systems that already performed the initial migration. + +When you reschedule the background migration, make sure to turn the original +scheduling into a no-op by clearing up the `#up` and `#down` methods of the +migration performing the scheduling. Otherwise the background migration would be +scheduled multiple times on systems that are upgrading multiple patch releases at +once. + +When you start the second post-deployment migration, you should delete any +previously queued jobs from the initial migration with the provided +helper: + +```ruby +delete_queued_jobs('BackgroundMigrationClassName') +``` + +## Cleaning Up + +NOTE: +Cleaning up any remaining background migrations _must_ be done in either a major +or minor release, you _must not_ do this in a patch release. + +Because background migrations can take a long time you can't immediately clean +things up after scheduling them. For example, you can't drop a column that's +used in the migration process as this would cause jobs to fail. This means that +you'll need to add a separate _post deployment_ migration in a future release +that finishes any remaining jobs before cleaning things up (for example, removing a +column). + +As an example, say you want to migrate the data from column `foo` (containing a +big JSON blob) to column `bar` (containing a string). The process for this would +roughly be as follows: + +1. Release A: + 1. Create a migration class that performs the migration for a row with a given ID. + You can use [background jobs tracking](#background-jobs-tracking) to simplify cleaning up. + 1. Deploy the code for this release, this should include some code that will + schedule jobs for newly created data (for example, using an `after_create` hook). + 1. Schedule jobs for all existing rows in a post-deployment migration. It's + possible some newly created rows may be scheduled twice so your migration + should take care of this. +1. Release B: + 1. Deploy code so that the application starts using the new column and stops + scheduling jobs for newly created data. + 1. In a post-deployment migration, finalize all jobs that have not succeeded by now. + If you used [background jobs tracking](#background-jobs-tracking) in release A, + you can use `finalize_background_migration` from `BackgroundMigrationHelpers` to ensure no jobs remain. + This helper will: + 1. Use `Gitlab::BackgroundMigration.steal` to process any remaining + jobs in Sidekiq. + 1. Reschedule the migration to be run directly (that is, not through Sidekiq) + on any rows that weren't migrated by Sidekiq. This can happen if, for + instance, Sidekiq received a SIGKILL, or if a particular batch failed + enough times to be marked as dead. + 1. Remove `Gitlab::Database::BackgroundMigrationJob` rows where + `status = succeeded`. To retain diagnostic information that may + help with future bug tracking you can skip this step by specifying + the `delete_tracking_jobs: false` parameter. + 1. Remove the old column. + +This may also require a bump to the [import/export version](../../user/project/settings/import_export.md), if +importing a project from a prior version of GitLab requires the data to be in +the new format. + +## Example + +To explain all this, let's use the following example: the table `integrations` has a +field called `properties` which is stored in JSON. For all rows you want to +extract the `url` key from this JSON object and store it in the `integrations.url` +column. There are millions of integrations and parsing JSON is slow, thus you can't +do this in a regular migration. + +To do this using a background migration we'll start with defining our migration +class: + +```ruby +class Gitlab::BackgroundMigration::ExtractIntegrationsUrl + class Integration < ActiveRecord::Base + self.table_name = 'integrations' + end + + def perform(start_id, end_id) + Integration.where(id: start_id..end_id).each do |integration| + json = JSON.load(integration.properties) + + integration.update(url: json['url']) if json['url'] + rescue JSON::ParserError + # If the JSON is invalid we don't want to keep the job around forever, + # instead we'll just leave the "url" field to whatever the default value + # is. + next + end + end +end +``` + +Next we'll need to adjust our code so we schedule the above migration for newly +created and updated integrations. We can do this using something along the lines of +the following: + +```ruby +class Integration < ActiveRecord::Base + after_commit :schedule_integration_migration, on: :update + after_commit :schedule_integration_migration, on: :create + + def schedule_integration_migration + BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id]) + end +end +``` + +We're using `after_commit` here to ensure the Sidekiq job is not scheduled +before the transaction completes as doing so can lead to race conditions where +the changes are not yet visible to the worker. + +Next we'll need a post-deployment migration that schedules the migration for +existing data. + +```ruby +class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + MIGRATION = 'ExtractIntegrationsUrl' + DELAY_INTERVAL = 2.minutes + + def up + queue_background_migration_jobs_by_range_at_intervals( + define_batchable_model('integrations'), + MIGRATION, + DELAY_INTERVAL) + end + + def down + end +end +``` + +Once deployed our application will continue using the data as before but at the +same time will ensure that both existing and new data is migrated. + +In the next release we can remove the `after_commit` hooks and related code. We +will also need to add a post-deployment migration that consumes any remaining +jobs and manually run on any un-migrated rows. Such a migration would look like +this: + +```ruby +class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0] + disable_ddl_transaction! + + def up + # This must be included + Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl') + + # This should be included, but can be skipped - see below + define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch| + range = batch.pluck('MIN(id)', 'MAX(id)').first + + Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range) + end + end + + def down + end +end +``` + +The final step runs for any un-migrated rows after all of the jobs have been +processed. This is in case a Sidekiq process running the background migrations +received SIGKILL, leading to the jobs being lost. (See +[more reliable Sidekiq queue](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/36791) for more information.) + +If the application does not depend on the data being 100% migrated (for +instance, the data is advisory, and not mission-critical), then this final step +can be skipped. + +This migration will then process any jobs for the ExtractIntegrationsUrl migration +and continue once all jobs have been processed. Once done you can safely remove +the `integrations.properties` column. + +## Testing + +It is required to write tests for: + +- The background migrations' scheduling migration. +- The background migration itself. +- A cleanup migration. + +The `:migration` and `schema: :latest` RSpec tags are automatically set for +background migration specs. +See the +[Testing Rails migrations](../testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class) +style guide. + +Keep in mind that `before` and `after` RSpec hooks are going +to migrate you database down and up, which can result in other background +migrations being called. That means that using `spy` test doubles with +`have_received` is encouraged, instead of using regular test doubles, because +your expectations defined in a `it` block can conflict with what is being +called in RSpec hooks. See [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839) +for more details. + +## Best practices + +1. Make sure to know how much data you're dealing with. +1. Make sure that background migration jobs are idempotent. +1. Make sure that tests you write are not false positives. +1. Make sure that if the data being migrated is critical and cannot be lost, the + clean-up migration also checks the final state of the data before completing. +1. When migrating many columns, make sure it won't generate too many + dead tuples in the process (you may need to directly query the number of dead tuples + and adjust the scheduling according to this piece of data). +1. Make sure to discuss the numbers with a database specialist, the migration may add + more pressure on DB than you expect (measure on staging, + or ask someone to measure on production). +1. Make sure to know how much time it'll take to run all scheduled migrations. +1. Provide an estimation section in the description, estimating both the total migration + run time and the query times for each background migration job. Explain plans for each query + should also be provided. + + For example, assuming a migration that deletes data, include information similar to + the following section: + + ```plaintext + Background Migration Details: + + 47600 items to delete + batch size = 1000 + 47600 / 1000 = 48 batches + + Estimated times per batch: + - 820ms for select statement with 1000 items (see linked explain plan) + - 900ms for delete statement with 1000 items (see linked explain plan) + Total: ~2 sec per batch + + 2 mins delay per batch (safe for the given total time per batch) + + 48 batches * 2 min per batch = 96 mins to run all the scheduled jobs + ``` + + The execution time per batch (2 sec in this example) is not included in the calculation + for total migration time. The jobs are scheduled 2 minutes apart without knowledge of + the execution time. + +## Additional tips and strategies + +### Nested batching + +A strategy to make the migration run faster is to schedule larger batches, and then use `EachBatch` +within the background migration to perform multiple statements. + +The background migration helpers that queue multiple jobs such as +`queue_background_migration_jobs_by_range_at_intervals` use [`EachBatch`](../iterating_tables_in_batches.md). +The example above has batches of 1000, where each queued job takes two seconds. If the query has been optimized +to make the time for the delete statement within the [query performance guidelines](../query_performance.md), +1000 may be the largest number of records that can be deleted in a reasonable amount of time. + +The minimum and most common interval for delaying jobs is two minutes. This results in two seconds +of work for each two minute job. There's nothing that prevents you from executing multiple delete +statements in each background migration job. + +Looking at the example above, you could alternatively do: + +```plaintext +Background Migration Details: + +47600 items to delete +batch size = 10_000 +47600 / 10_000 = 5 batches + +Estimated times per batch: +- Records are updated in sub-batches of 1000 => 10_000 / 1000 = 10 total updates +- 820ms for select statement with 1000 items (see linked explain plan) +- 900ms for delete statement with 1000 items (see linked explain plan) +Sub-batch total: ~2 sec per sub-batch, +Total batch time: 2 * 10 = 20 sec per batch + +2 mins delay per batch + +5 batches * 2 min per batch = 10 mins to run all the scheduled jobs +``` + +The batch time of 20 seconds still fits comfortably within the two minute delay, yet the total run +time is cut by a tenth from around 100 minutes to 10 minutes! When dealing with large background +migrations, this can cut the total migration time by days. + +When batching in this way, it is important to look at query times on the higher end +of the table or relation being updated. `EachBatch` may generate some queries that become much +slower when dealing with higher ID ranges. + +### Delay time + +When looking at the batch execution time versus the delay time, the execution time +should fit comfortably within the delay time for a few reasons: + +- To allow for a variance in query times. +- To allow autovacuum to catch up after periods of high churn. + +Never try to optimize by fully filling the delay window even if you are confident +the queries themselves have no timing variance. + +### Background jobs tracking + +NOTE: +Background migrations with job tracking enabled must call `mark_all_as_succeeded` for its batch, even if no work is needed to be done. + +`queue_background_migration_jobs_by_range_at_intervals` can create records for each job that is scheduled to run. +You can enable this behavior by passing `track_jobs: true`. Each record starts with a `pending` status. Make sure that your worker updates the job status to `succeeded` by calling `Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded` in the `perform` method of your background migration. + +```ruby +# Background migration code + +def perform(start_id, end_id) + # do work here + + mark_job_as_succeeded(start_id, end_id) +end + +private + +# Make sure that the arguments passed here match those passed to the background +# migration +def mark_job_as_succeeded(*arguments) + Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded( + self.class.name.demodulize, + arguments + ) +end +``` + +```ruby +# Post deployment migration +MIGRATION = 'YourBackgroundMigrationName' +DELAY_INTERVAL = 2.minutes.to_i # can be different +BATCH_SIZE = 10_000 # can be different + +disable_ddl_transaction! + +def up + queue_background_migration_jobs_by_range_at_intervals( + define_batchable_model('name_of_the_table_backing_the_model'), + MIGRATION, + DELAY_INTERVAL, + batch_size: BATCH_SIZE, + track_jobs: true + ) +end + +def down + # no-op +end +``` + +See [`lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb) for a full example. + +#### Rescheduling pending jobs + +You can reschedule pending migrations from the `background_migration_jobs` table by creating a post-deployment migration and calling `requeue_background_migration_jobs_by_range_at_intervals` with the migration name and delay interval. + +```ruby +# Post deployment migration +MIGRATION = 'YourBackgroundMigrationName' +DELAY_INTERVAL = 2.minutes + +disable_ddl_transaction! + +def up + requeue_background_migration_jobs_by_range_at_intervals(MIGRATION, DELAY_INTERVAL) +end + +def down + # no-op +end +``` + +See [`db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb) for a full example. + +### Viewing failure error logs + +After running a background migration, if any jobs have failed, you can view the logs in [Kibana](https://log.gprd.gitlab.net/goto/5f06a57f768c6025e1c65aefb4075694). +View the production Sidekiq log and filter for: + +- `json.class: BackgroundMigrationWorker` +- `json.job_status: fail` +- `json.meta.caller_id: <MyBackgroundMigrationSchedulingMigrationClassName>` +- `json.args: <MyBackgroundMigrationClassName>` + +Looking at the `json.error_class`, `json.error_message` and `json.error_backtrace` values may be helpful in understanding why the jobs failed. + +Depending on when and how the failure occurred, you may find other helpful information by filtering with `json.class: <MyBackgroundMigrationClassName>`. diff --git a/doc/development/database/client_side_connection_pool.md b/doc/development/database/client_side_connection_pool.md index 8316a75ac8d..60c8665df87 100644 --- a/doc/development/database/client_side_connection_pool.md +++ b/doc/development/database/client_side_connection_pool.md @@ -1,8 +1,7 @@ --- -type: dev, reference -stage: none -group: Development -info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines" +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- # Client-side connection-pool diff --git a/doc/development/database/database_lab.md b/doc/development/database/database_lab.md new file mode 100644 index 00000000000..1c8694b113d --- /dev/null +++ b/doc/development/database/database_lab.md @@ -0,0 +1,101 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Database Lab and Postgres.ai + +Internal users at GitLab have access to the Database Lab Engine (DLE) and +[postgres.ai](https://console.postgres.ai/) for testing performance of database queries +on replicated production data. Unlike a typical read-only production replica, in the DLE you can +also create, update, and delete rows. You can also test the performance of +schema changes, like additional indexes or columns, in an isolated copy of production data. + +## Access Database Lab Engine + +Access to the DLE is helpful for: + +- Database reviewers and maintainers. +- Engineers who work on merge requests that have large effects on databases. + +To access the DLE's services, you can: + +- Perform query testing in the `#database_lab` Slack channel, or in the Postgres.ai web console. + Employees access both services with their GitLab Google account. Query testing + provides `EXPLAIN` (analyze, buffers) plans for queries executed there. +- Migration testing by triggering a job as a part of a merge request. +- Direct `psql` access to DLE instead of a production replica. Available to authorized users only. + To request `psql` access, file an [access request](https://about.gitlab.com/handbook/business-technology/team-member-enablement/onboarding-access-requests/access-requests/#individual-or-bulk-access-request). + +For more assistance, use the `#database` Slack channel. + +NOTE: +If you need only temporary access to a production replica, instead of a Database Lab +clone, follow the runbook procedure for connecting to the +[database console with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Database_Console_via_Teleport.md). +This procedure is similar to [Rails console access with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Rails_Console_via_Teleport.md#how-to-use-teleport-to-connect-to-rails-console). + +### Query testing + +You can access Database Lab's query analysis features either: + +- In the `#database_lab` Slack channel. Shows everyone's commands and results, but + your own commands are still isolated in their own clone. +- In [the Postgres.ai web console](https://console.postgres.ai/GitLab/joe-instances). + Shows only the commands you run. + +#### Generate query plans + +Query plans are an essential part of the database review process. These plans +enable us to decide quickly if a given query can be performant on GitLab.com. +Running the `explain` command generates an `explain` plan and a link to the Postgres.ai +console with more query analysis. For example, running `EXPLAIN SELECT * FROM application_settings` +does the following: + +1. Runs `explain (analyze, buffers) select * from application_settings;` against a database clone. +1. Responds with timing and buffer details from the run. +1. Provides a [detailed, shareable report on the results](https://console.postgres.ai/shared/24d543c9-893b-4ff6-8deb-a8f902f85a53). + +#### Making schema changes + +Sometimes when testing queries, a contributor may realize that the query needs an index +or other schema change to make added queries more performant. To test the query, run the `exec` command. +For example, running this command: + +```sql +exec CREATE INDEX on application_settings USING btree (instance_administration_project_id) +``` + +creates the specified index on the table. You can [test queries](#generate-query-plans) leveraging +the new index. `exec` does not return any results, only the time required to execute the query. + +#### Reset the clone + +After many changes, such as after a destructive query or an ineffective index, +you must start over. To reset your designated clone, run `reset`. + +### Migration testing + +For information on testing migrations, review our +[database migration testing documentation](database_migration_pipeline.md). + +### Access the console with `psql` + +Team members with [`psql` access](#access-database-lab-engine), can gain direct access +to a clone via `psql`. Access to `psql` enables you to see data, not just metadata. + +To connect to a clone using `psql`: + +1. Create a clone from the [desired instance](https://console.postgres.ai/gitlab/instances/). + 1. Provide a **Clone ID**: Something that uniquely identifies your clone, such as `yourname-testing-gitlabissue`. + 1. Provide a **Database username** and **Database password**: Connects `psql` to your clone. + 1. Select **Enable deletion protection** if you want to preserve your clone. Avoid selecting this option. + Clones are removed after 12 hours. +1. In the **Clone details** page of the Postgres.ai web interface, copy and run + the command to start SSH port forwarding for the clone. +1. In the **Clone details** page of the Postgres.ai web interface, copy and run the `psql` connection string. + Use the password provided at setup. + +After you connect, use clone like you would any `psql` console in production, but with +the added benefit and safety of an isolated writeable environment. diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md index 9d5e4821c9f..ca9ca36b156 100644 --- a/doc/development/database/database_reviewer_guidelines.md +++ b/doc/development/database/database_reviewer_guidelines.md @@ -70,7 +70,7 @@ Finally, you can find various guides in the [Database guides](index.md) page tha topics and use cases. The most frequently required during database reviewing are the following: - [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations. -- [Avoiding downtime in migrations](../avoiding_downtime_in_migrations.md). +- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md). - [SQL guidelines](../sql.md) for working with SQL queries. - [Guidelines for JiHu contributions with database migrations](https://about.gitlab.com/handbook/ceo/chief-of-staff-team/jihu-support/jihu-database-change-process.html) diff --git a/doc/development/database/deleting_migrations.md b/doc/development/database/deleting_migrations.md new file mode 100644 index 00000000000..be9009f365d --- /dev/null +++ b/doc/development/database/deleting_migrations.md @@ -0,0 +1,39 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Delete existing migrations + +When removing existing migrations from the GitLab project, you have to take into account +the possibility of the migration already been included in past releases or in the current release, and thus already executed on GitLab.com and/or in self-managed instances. + +Because of it, it's not possible to delete existing migrations, as that could lead to: + +- Schema inconsistency, as changes introduced into the database were not rolled back properly. +- Leaving a record on the `schema_versions` table, that points out to migration that no longer exists on the codebase. + +Instead of deleting we can opt for disabling the migration. + +## Pre-requisites to disable a migration + +Migrations can be disabled if: + +- They caused a timeout or general issue on GitLab.com. +- They are obsoleted, for example, changes are not necessary due to a feature change. +- Migration is a data migration only, that is, the migration does not change the database schema. + +## How to disable a data migration? + +In order to disable a migration, the following steps apply to all types of migrations: + +1. Turn the migration into a no-op by removing the code inside `#up`, `#down` + or `#perform` methods, and adding `# no-op` comment instead. +1. Add a comment explaining why the code is gone. + +Disabling migrations requires explicit approval of Database Maintainer. + +## Examples + +- [Disable scheduling of productivity analytics](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/17253) diff --git a/doc/development/database/index.md b/doc/development/database/index.md index efc48f72d00..0363d13ed4c 100644 --- a/doc/development/database/index.md +++ b/doc/development/database/index.md @@ -23,14 +23,15 @@ info: To determine the technical writer assigned to the Stage/Group associated w ## Migrations -- [Avoiding downtime in migrations](../avoiding_downtime_in_migrations.md) +- [Migrations for multiple databases](migrations_for_multiple_databases.md) +- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md) - [SQL guidelines](../sql.md) for working with SQL queries - [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations - [Testing Rails migrations](../testing_guide/testing_migrations_guide.md) guide -- [Post deployment migrations](../post_deployment_migrations.md) -- [Background migrations](../background_migrations.md) +- [Post deployment migrations](post_deployment_migrations.md) +- [Background migrations](background_migrations.md) - [Swapping tables](../swapping_tables.md) -- [Deleting migrations](../deleting_migrations.md) +- [Deleting migrations](deleting_migrations.md) - [Partitioning tables](table_partitioning.md) ## Debugging @@ -64,6 +65,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w - [Pagination guidelines](pagination_guidelines.md) - [Pagination performance guidelines](pagination_performance_guidelines.md) - [Efficient `IN` operator queries](efficient_in_operator_queries.md) +- [Data layout and access patterns](layout_and_access_patterns.md) ## Case studies diff --git a/doc/development/database/keyset_pagination.md b/doc/development/database/keyset_pagination.md index 4f0b353a37f..88928feb927 100644 --- a/doc/development/database/keyset_pagination.md +++ b/doc/development/database/keyset_pagination.md @@ -166,7 +166,7 @@ These order objects can be defined in the model classes as normal ActiveRecord s Consider the following scope: ```ruby -scope = Issue.where(project_id: 10).order(Gitlab::Database.nulls_last_order('relative_position', 'DESC')) +scope = Issue.where(project_id: 10).order(Issue.arel_table[:relative_position].desc.nulls_last) # SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 10 ORDER BY relative_position DESC NULLS LAST scope.keyset_paginate # raises: Gitlab::Pagination::Keyset::UnsupportedScopeOrder: The order on the scope does not support keyset pagination @@ -190,8 +190,8 @@ order = Gitlab::Pagination::Keyset::Order.build([ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new( attribute_name: 'relative_position', column_expression: Issue.arel_table[:relative_position], - order_expression: Gitlab::Database.nulls_last_order('relative_position', 'DESC'), - reversed_order_expression: Gitlab::Database.nulls_first_order('relative_position', 'ASC'), + order_expression: Issue.arel_table[:relative_position].desc.nulls_last, + reversed_order_expression: Issue.arel_table[:relative_position].asc.nulls_first, nullable: :nulls_last, order_direction: :desc, distinct: false diff --git a/doc/development/database/layout_and_access_patterns.md b/doc/development/database/layout_and_access_patterns.md new file mode 100644 index 00000000000..a3e2fefb2a3 --- /dev/null +++ b/doc/development/database/layout_and_access_patterns.md @@ -0,0 +1,61 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Best practices for data layout and access patterns + +Certain patterns of data access, and especially data updates, can exacerbate strain +on the database. Avoid them if possible. + +This document lists some patterns to avoid, with recommendations for alternatives. + +## High-frequency updates, especially to the same row + +Avoid single database rows that are updated by many transactions at the same time. + +- If many processes attempt to update the same row simultaneously, they queue up + as each transaction locks the row for writing. As this can significantly increase + transaction timings, the Rails connection pools can saturate, leading to + application-wide downtime. +- For each row update, PostgreSQL inserts a new row version and deletes the old one. + In high-traffic scenarios, this approach can cause vacuum and WAL (write-ahead log) + pressure, reducing database performance. + +This pattern often happens when an aggregate is too expensive to compute for each +request, so a running tally is kept in the database. If you need such an aggregate, +consider keeping a running total in a single row, plus a small working set of +recently added data, such as individual increments: + +- When introducing new data, add it to the working set. These inserts do not + cause lock contention. +- When calculating the aggregate, combine the running total with a live aggregate + from the working set, providing an up-to-date result. +- Add a periodic job that incorporates the working set into the running total and + clears it in a transaction, bounding the amount of work needed by a reader. + +## Wide tables + +PostgreSQL organizes rows into 8 KB pages, and operates on one page at a time. +By minimizing the width of rows in a table, we improve the following: + +- Sequential and bitmap index scan performance, because fewer pages must be + scanned if each contains more rows. +- Vacuum performance, because vacuum can process more rows in each page. +- Update performance, because during a (non-HOT) update, each index must be + updated for every row update. + +Mitigating wide tables is one part of the database team's +[100 GB table initiative](../../architecture/blueprints/database_scaling/size-limits.md), +as wider tables can fit fewer rows in 100 GB. + +When adding columns to a table, consider if you intend to access the data in the +new columns by itself, in a one-to-one relationship with the other columns of the +table. If so, the new columns could be a good candidate for splitting to a new table. + +Several tables have already been split in this way. For example: + +- `search_data` is split from `issues`. +- `project_pages_metadata` is split from `projects`. +- `merge_request_diff_details` is split from `merge_request_diffs` diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md index 17a825b4812..2bcdc91202a 100644 --- a/doc/development/database/loose_foreign_keys.md +++ b/doc/development/database/loose_foreign_keys.md @@ -95,27 +95,27 @@ Created database 'gitlabhq_test_ee' Created database 'gitlabhq_geo_test_ee' Showing cross-schema foreign keys (20): - ID | HAS_LFK | FROM | TO | COLUMN | ON_DELETE - 0 | N | ci_builds | projects | project_id | cascade - 1 | N | ci_job_artifacts | projects | project_id | cascade - 2 | N | ci_pipelines | projects | project_id | cascade - 3 | Y | ci_pipelines | merge_requests | merge_request_id | cascade - 4 | N | external_pull_requests | projects | project_id | cascade - 5 | N | ci_sources_pipelines | projects | project_id | cascade - 6 | N | ci_stages | projects | project_id | cascade - 7 | N | ci_pipeline_schedules | projects | project_id | cascade - 8 | N | ci_runner_projects | projects | project_id | cascade - 9 | Y | dast_site_profiles_pipelines | ci_pipelines | ci_pipeline_id | cascade - 10 | Y | vulnerability_feedback | ci_pipelines | pipeline_id | nullify - 11 | N | ci_variables | projects | project_id | cascade - 12 | N | ci_refs | projects | project_id | cascade - 13 | N | ci_builds_metadata | projects | project_id | cascade - 14 | N | ci_subscriptions_projects | projects | downstream_project_id | cascade - 15 | N | ci_subscriptions_projects | projects | upstream_project_id | cascade - 16 | N | ci_sources_projects | projects | source_project_id | cascade - 17 | N | ci_job_token_project_scope_links | projects | source_project_id | cascade - 18 | N | ci_job_token_project_scope_links | projects | target_project_id | cascade - 19 | N | ci_project_monthly_usages | projects | project_id | cascade + ID | HAS_LFK | FROM | TO | COLUMN | ON_DELETE + 0 | N | ci_builds | projects | project_id | cascade + 1 | N | ci_job_artifacts | projects | project_id | cascade + 2 | N | ci_pipelines | projects | project_id | cascade + 3 | Y | ci_pipelines | merge_requests | merge_request_id | cascade + 4 | N | external_pull_requests | projects | project_id | cascade + 5 | N | ci_sources_pipelines | projects | project_id | cascade + 6 | N | ci_stages | projects | project_id | cascade + 7 | N | ci_pipeline_schedules | projects | project_id | cascade + 8 | N | ci_runner_projects | projects | project_id | cascade + 9 | Y | dast_site_profiles_pipelines | ci_pipelines | ci_pipeline_id | cascade + 10 | Y | vulnerability_feedback | ci_pipelines | pipeline_id | nullify + 11 | N | ci_variables | projects | project_id | cascade + 12 | N | ci_refs | projects | project_id | cascade + 13 | N | ci_builds_metadata | projects | project_id | cascade + 14 | N | ci_subscriptions_projects | projects | downstream_project_id | cascade + 15 | N | ci_subscriptions_projects | projects | upstream_project_id | cascade + 16 | N | ci_sources_projects | projects | source_project_id | cascade + 17 | N | ci_job_token_project_scope_links | projects | source_project_id | cascade + 18 | N | ci_job_token_project_scope_links | projects | target_project_id | cascade + 19 | N | ci_project_monthly_usages | projects | project_id | cascade To match FK write one or many filters to match against FROM/TO/COLUMN: - scripts/decomposition/generate-loose-foreign-key <filter(s)...> @@ -191,7 +191,7 @@ ci_pipelines: ### Track record changes To know about deletions in the `projects` table, configure a `DELETE` trigger -using a [post-deployment migration](../post_deployment_migrations.md). The +using a [post-deployment migration](post_deployment_migrations.md). The trigger needs to be configured only once. If the model already has at least one `loose_foreign_key` definition, then this step can be skipped: @@ -226,7 +226,7 @@ ON DELETE CASCADE; The migration must run after the `DELETE` trigger is installed and the loose foreign key definition is deployed. As such, it must be a [post-deployment -migration](../post_deployment_migrations.md) dated after the migration for the +migration](post_deployment_migrations.md) dated after the migration for the trigger. If the foreign key is deleted earlier, there is a good chance of introducing data inconsistency which needs manual cleanup: @@ -480,3 +480,380 @@ it executes `occurrence.pipeline.created_at`. When looping through the vulnerability occurrences in the Sidekiq worker, we could try to load the corresponding pipeline and choose to skip processing that occurrence if pipeline is not found. + +## Architecture + +The loose foreign keys feature is implemented within the `LooseForeignKeys` Ruby namespace. The +code is isolated from the core application code and theoretically, it could be a standalone library. + +The feature is invoked solely in the [`LooseForeignKeys::CleanupWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/loose_foreign_keys/cleanup_worker.rb) worker class. The worker is scheduled via a +cron job where the schedule depends on the configuration of the GitLab instance. + +- Non-decomposed GitLab (1 database): invoked every minute. +- Decomposed GitLab (2 databases, CI and Main): invoked every minute, cleaning up one database +at a time. For example, the cleanup worker for the main database runs every two minutes. + +To avoid lock contention and the processing of the same database rows, the worker does not run +parallel. This behavior is ensured with a Redis lock. + +**Record cleanup procedure:** + +1. Acquire the Redis lock. +1. Determine which database to clean up. +1. Collect all database tables where the deletions are tracked (parent tables). + - This is achieved by reading the `config/gitlab_loose_foreign_keys.yml` file. + - A table is considered "tracked" when a loose foreign key definition exists for the table and + the `DELETE` trigger is installed. +1. Cycle through the tables with an infinite loop. +1. For each table, load a batch of deleted parent records to clean up. +1. Depending on the YAML configuration, build `DELETE` or `UPDATE` (nullify) queries for the +referenced child tables. +1. Invoke the queries. +1. Repeat until all child records are cleaned up or the maximum limit is reached. +1. Remove the deleted parent records when all child records are cleaned up. + +### Database structure + +The feature relies on triggers installed on the parent tables. When a parent record is deleted, +the trigger will automatically insert a new record into the `loose_foreign_keys_deleted_records` +database table. + +The inserted record will store the following information about the deleted record: + +- `fully_qualified_table_name`: name of the database table where the record was located. +- `primary_key_value`: the ID of the record, the value will be present in the child tables as +the foreign key value. At the moment, composite primary keys are not supported, the parent table +must have an `id` column. +- `status`: defaults to pending, represents the status of the cleanup process. +- `consume_after`: defaults to the current time. +- `cleanup_attempts`: defaults to 0. The number of times the worker tried to clean up this record. +A non-zero number would mean that this record has many child records and cleaning it up requires +several runs. + +#### Database decomposition + +The `loose_foreign_keys_deleted_records` table will exist on both database servers (Ci and Main) +after the [database decomposition](https://gitlab.com/groups/gitlab-org/-/epics/6168). The worker +ill determine which parent tables belong to which database by reading the +`lib/gitlab/database/gitlab_schemas.yml` YAML file. + +Example: + +- Main database tables + - `projects` + - `namespaces` + - `merge_requests` +- Ci database tables + - `ci_builds` + - `ci_pipelines` + +When the worker is invoked for the Ci database, the worker will load deleted records only from the +`ci_builds` and `ci_pipelines` tables. During the cleanup process, `DELETE` and `UPDATE` queries +will mostly run on tables located in the Main database. In this example, one `UPDATE` query will +nullify the `merge_requests.head_pipeline_id` column. + +#### Database partitioning + +Due to the large volume of inserts the database table receives daily, a special partitioning +strategy was implemented to address data bloat concerns. Originally, the +[time-decay](https://about.gitlab.com/company/team/structure/working-groups/database-scalability/time-decay.html) +strategy was considered for the feature but due to the large data volume we decided to implement a +new strategy. + +A deleted record is considered fully processed when all its direct children records have been +cleaned up. When this happens, the loose foreign key worker will update the `status` column of +the deleted record. After this step, the record is no longer needed. + +The sliding partitioning strategy provides an efficient way of cleaning up old, unused data by +adding a new database partition and removing the old one when certain conditions are met. +The `loose_foreign_keys_deleted_records` database table is list partitioned where most of the +time there is only one partition attached to the table. + +```sql + Partitioned table "public.loose_foreign_keys_deleted_records" + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+--------------+------------- + id | bigint | | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain | | + partition | bigint | | not null | 84 | plain | | + primary_key_value | bigint | | not null | | plain | | + status | smallint | | not null | 1 | plain | | + created_at | timestamp with time zone | | not null | now() | plain | | + fully_qualified_table_name | text | | not null | | extended | | + consume_after | timestamp with time zone | | | now() | plain | | + cleanup_attempts | smallint | | | 0 | plain | | +Partition key: LIST (partition) +Indexes: + "loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id) + "index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1 +Check constraints: + "check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150) +Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_84 FOR VALUES IN ('84') +``` + +The `partition` column controls the insert direction, the `partition` value determines which +partition will get the deleted rows inserted via the trigger. Notice that the default value of +the `partition` table matches with the value of the list partition (84). In `INSERT` query +within the trigger thevalue of the `partition` is omitted, the trigger always relies on the +default value of the column. + +Example `INSERT` query for the trigger: + +```sql +INSERT INTO loose_foreign_keys_deleted_records +(fully_qualified_table_name, primary_key_value) +SELECT TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, old_table.id FROM old_table; +``` + +The partition "sliding" process is controlled by two, regularly executed callbacks. These +callbackes are defined within the `LooseForeignKeys::DeletedRecord` model. + +The `next_partition_if` callback controls when to create a new partition. A new partition will +be created when the current partition has at least one record older than 24 hours. A new partition +is added by the [`PartitionManager`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/partitioning/partition_manager.rb) +using the following steps: + +1. Create a new partition, where the `VALUE` for the partition is `CURRENT_PARTITION + 1`. +1. Update the default value of the `partition` column to `CURRENT_PARTITION + 1`. + +With these steps, new `INSERT`-s via the triggers will end up in the new partition. At this point, +the database table has two partitions. + +The `detach_partition_if` callback determines if the old partitions can be detached from the table. +A partition is detachable if there are no pending (unprocessed) records in the partition +(`status = 1`). The detached partitions will be available for some time, you can see the list +detached partitions in the `detached_partitions` table: + +```sql +select * from detached_partitions; +``` + +#### Cleanup queries + +The `LooseForeignKeys::CleanupWorker` has its database query builder which depends on `Arel`. +The feature doesn't reference any application-specific `ActiveRecord` models to avoid unexpected +side effects. The database queries are batched, which means that several parent records are being +cleaned up at the same time. + +Example `DELETE` query: + +```sql +DELETE +FROM "merge_request_metrics" +WHERE ("merge_request_metrics"."id") IN + (SELECT "merge_request_metrics"."id" + FROM "merge_request_metrics" + WHERE "merge_request_metrics"."pipeline_id" IN (1, 2, 10, 20) + LIMIT 1000 FOR UPDATE SKIP LOCKED) +``` + +The primary key values of the parent records are 1, 2, 10, and 20. + +Example `UPDATE` (nullify) query: + +```sql +UPDATE "merge_requests" +SET "head_pipeline_id" = NULL +WHERE ("merge_requests"."id") IN + (SELECT "merge_requests"."id" + FROM "merge_requests" + WHERE "merge_requests"."head_pipeline_id" IN (3, 4, 30, 40) + LIMIT 500 FOR UPDATE SKIP LOCKED) +``` + +These queries are batched, which means that in many cases, several invocations are needed to clean +up all associated child records. + +The batching is implemented with loops, the processing will stop when all associated child records +are cleaned up or the limit is reached. + +```ruby +loop do + modification_count = process_batch_with_skip_locked + + break if modification_count == 0 || over_limit? +end + +loop do + modification_count = process_batch + + break if modification_count == 0 || over_limit? +end +``` + +The loop-based batch processing is preferred over `EachBatch` for the following reasons: + +- The records in the batch are modified, so the next batch will contain different records. +- There is always an index on the foreign key column however, the column is usually not unique. +`EachBatch` requires a unique column for the iteration. +- The record order doesn't matter for the cleanup. + +Notice that we have two loops. The initial loop will process records with the `SKIP LOCKED` clause. +The query will skip rows that are locked by other application processes. This will ensure that the +cleanup worker will less likely to become blocked. The second loop will execute the database +queries without `SKIP LOCKED` to ensure that all records have been processed. + +#### Processing limits + +A constant, large volume of record updates or deletions can cause incidents and affect the +availability of GitLab: + +- Increased table bloat. +- Increased number of pending WAL files. +- Busy tables, difficulty when acquiring locks. + +To mitigate these issues, several limits are applied when the worker runs. + +- Each query has `LIMIT`, a query cannot process an unbounded number of rows. +- The maximum number of record deletions and record updates is limited. +- The maximum runtime (30 seconds) for the database queries is limited. + +The limit rules are implemented in the `LooseForeignKeys::ModificationTracker` class. When one of +the limits (record modification count, time limit) is reached the processing is stopped +immediately. After some time, the next scheduled worker will continue the cleanup process. + +#### Performance characteristics + +The database trigger on the parent tables will **decrease** the record deletion speed. Each +statement that removes rows from the parent table will invoke the trigger to insert records +into the `loose_foreign_keys_deleted_records` table. + +The queries within the cleanup worker are fairly efficient index scans, with limits in place +they're unlikely to affect other parts of the application. + +The database queries are not running in transaction, when an error happens for example a statement +timeout or a worker crash, the next job will continue the processing. + +## Troubleshooting + +### Accumulation of deleted records + +There can be cases where the workers need to process an unusually large amount of data. This can +happen under normal usage, for example when a large project or group is deleted. In this scenario, +there can be several million rows to be deleted or nullified. Due to the limits enforced by the +worker, processing this data will take some time. + +When cleaning up "heavy-hitters", the feature ensures fair processing by rescheduling larger +batches for later. This gives time for other deleted records to be processed. + +For example, a project with millions of `ci_builds` records is deleted. The `ci_builds` records +will be deleted by the loose foreign keys feature. + +1. The cleanup worker is scheduled and picks up a batch of deleted `projects` records. The large +project is part of the batch. +1. Deletion of the orphaned `ci_builds` rows has started. +1. The time limit is reached, but the cleanup is not complete. +1. The `cleanup_attempts` column is incremented for the deleted records. +1. Go to step 1. The next cleanup worker continues the cleanup. +1. When the `cleanup_attempts` reaches 3, the batch is re-scheduled 10 minutes later by updating +the `consume_after` column. +1. The next cleanup worker will process a different batch. + +We have Prometheus metrics in place to monitor the deleted record cleanup: + +- `loose_foreign_key_processed_deleted_records`: Number of processed deleted records. When large +cleanup happens, this number would decrease. +- `loose_foreign_key_incremented_deleted_records`: Number of deleted records which were not +finished processing. The `cleanup_attempts` column was incremented. +- `loose_foreign_key_rescheduled_deleted_records`: Number of deleted records that had to be +rescheduled at a later time after 3 cleanup attempts. + +Example Thanos query: + +```plaintext +loose_foreign_key_rescheduled_deleted_records{env="gprd", table="ci_runners"} +``` + +Another way to look at the situation is by running a database query. This query gives the exact +counts of the unprocessed records: + +```sql +SELECT partition, fully_qualified_table_name, count(*) +FROM loose_foreign_keys_deleted_records +WHERE +status = 1 +GROUP BY 1, 2; +``` + +Example output: + +```sql + partition | fully_qualified_table_name | count +-----------+----------------------------+------- + 87 | public.ci_builds | 874 + 87 | public.ci_job_artifacts | 6658 + 87 | public.ci_pipelines | 102 + 87 | public.ci_runners | 111 + 87 | public.merge_requests | 255 + 87 | public.namespaces | 25 + 87 | public.projects | 6 +``` + +The query includes the partition number which can be useful to detect if the cleanup process is +significantly lagging behind. When multiple different partition values are present in the list +that means the cleanup of some deleted records didn't finish in several days (1 new partition +is added every day). + +Steps to diagnose the problem: + +- Check which records are accumulating. +- Try to get an estimate of the number of remaining records. +- Looking into the worker performance stats (Kibana or Thanos). + +Possible solutions: + +- Short-term: increase the batch sizes. +- Long-term: invoke the worker more frequently. Parallelize the worker + +For a one-time fix, we can run the cleanup worker several times from the rails console. The worker +can run parallelly however, this can introduce lock contention and it could increase the worker +runtime. + +```ruby +LooseForeignKeys::CleanupWorker.new.perform +``` + +When the cleanup is done, the older partitions will be automatically detached by the +`PartitionManager`. + +### PartitionManager bug + +NOTE: +This issue happened in the past on Staging and it has been mitigated. + +When adding a new partition, the default value of the `partition` column is also updated. This is +a schema change that is executed in the same transaction as the new partition creation. It's highly +unlikely that the `partition` column goes outdated. + +However, if this happens then this can cause application-wide incidents because the `partition` +value points to a partition that doesn't exist. Symptom: deletion of records from tables where the +`DELETE` trigger is installed fails. + +```sql +\d+ loose_foreign_keys_deleted_records; + + Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+--------------+------------- + id | bigint | | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain | | + partition | bigint | | not null | 4 | plain | | + primary_key_value | bigint | | not null | | plain | | + status | smallint | | not null | 1 | plain | | + created_at | timestamp with time zone | | not null | now() | plain | | + fully_qualified_table_name | text | | not null | | extended | | + consume_after | timestamp with time zone | | | now() | plain | | + cleanup_attempts | smallint | | | 0 | plain | | +Partition key: LIST (partition) +Indexes: + "loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id) + "index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1 +Check constraints: + "check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150) +Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_3 FOR VALUES IN ('3') +``` + +Check the default value of the `partition` column and compare it with the available partitions +(4 vs 3). The partition with the value of 4 does not exist. To mitigate the problem an emergency +schema change is required: + +```sql +ALTER TABLE loose_foreign_keys_deleted_records ALTER COLUMN partition SET DEFAULT 3; +``` diff --git a/doc/development/database/migrations_for_multiple_databases.md b/doc/development/database/migrations_for_multiple_databases.md new file mode 100644 index 00000000000..0ec4612e985 --- /dev/null +++ b/doc/development/database/migrations_for_multiple_databases.md @@ -0,0 +1,390 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Migrations for Multiple databases + +> Support for describing migration purposes was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/73756) in GitLab 14.8. + +This document describes how to properly write database migrations +for [the decomposed GitLab application using multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168). + +Learn more about general multiple databases support in a [separate document](multiple_databases.md). + +WARNING: +If you experience any issues using `Gitlab::Database::Migration[2.0]`, +you can temporarily revert back to the previous behavior by changing the version to `Gitlab::Database::Migration[1.0]`. +Please report any issues with `Gitlab::Database::Migration[2.0]` in [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/358430). + +The design for multiple databases (except for the Geo database) assumes +that all decomposed databases have **the same structure** (for example, schema), but **the data is different** in each database. This means that some tables do not contain data on each database. + +## Operations + +Depending on the used constructs, we can classify migrations to be either: + +1. Modifying structure ([DDL - Data Definition Language](https://www.postgresql.org/docs/current/ddl.html)) (for example, `ALTER TABLE`). +1. Modifying data ([DML - Data Manipulation Language](https://www.postgresql.org/docs/current/dml.html)) (for example, `UPDATE`). +1. Performing [other queries](https://www.postgresql.org/docs/current/queries.html) (for example, `SELECT`) that are treated as **DML** for the purposes of our migrations. + +**The usage of `Gitlab::Database::Migration[2.0]` requires migrations to always be of a single purpose**. +Migrations cannot mix **DDL** and **DML** changes as the application requires the structure +(as described by `db/structure.sql`) to be exactly the same across all decomposed databases. + +### Data Definition Language (DDL) + +The DDL migrations are all migrations that: + +1. Create or drop a table (for example, `create_table`). +1. Add or remove an index (for example, `add_index`, `add_index_concurrently`). +1. Add or remove a foreign key (for example `add_foreign_key`, `add_foreign_key_concurrently`). +1. Add or remove a column with or without a default value (for example, `add_column`). +1. Create or drop trigger functions (for example, `create_trigger_function`). +1. Attach or detach triggers from tables (for example, `track_record_deletions`, `untrack_record_deletions`). +1. Prepare or not async indexes (for example, `prepare_async_index`, `unprepare_async_index_by_name`). + +As such DDL migrations **CANNOT**: + +1. Read or modify data in any form, via SQL statements or ActiveRecord models. +1. Update column values (for example, `update_column_in_batches`). +1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`). +1. Read the state of feature flags since they are stored in `main:` (a `features` and `feature_gates`). +1. Read application settings (as settings are stored in `main:`). + +As the majority of migrations in the GitLab codebase are of the DDL-type, +this is also the default mode of operation and requires no further changes +to the migrations files. + +#### Example: perform DDL on all databases + +Example migration adding a concurrent index that is treated as change of the structure (DDL) +that is executed on all configured databases. + +```ruby +class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state' + + def up + add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME + end + + def down + remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME + end +end +``` + +### Data Manipulation Language (DML) + +The DML migrations are all migrations that: + +1. Read data via SQL statements (for example, `SELECT * FROM projects WHERE id=1`). +1. Read data via ActiveRecord models (for example, `User < MigrationRecord`). +1. Create, update or delete data via ActiveRecord models (for example, `User.create!(...)`). +1. Create, update or delete data via SQL statements (for example, `DELETE FROM projects WHERE id=1`). +1. Update columns in batches (for example, `update_column_in_batches(:projects, :archived, true)`). +1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`). +1. Access application settings (for example, `ApplicationSetting.last` if run for `main:` database). +1. Read and modify feature flags if run for the `main:` database. + +The DML migrations **CANNOT**: + +1. Make any changes to DDL since this breaks the rule of keeping `structure.sql` coherent across + all decomposed databases. +1. **Read data from another database**. + +To indicate the `DML` migration type, a migration must use the `restrict_gitlab_migration gitlab_schema:` +syntax in a migration class. This marks the given migration as DML and restricts access to it. + +#### Example: perform DML only in context of the database containing the given `gitlab_schema` + +Example migration updating `archived` column of `projects` that is executed +only for the database containing `gitlab_main` schema. + +```ruby +class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + restrict_gitlab_migration gitlab_schema: :gitlab_main + + def up + update_column_in_batches(:projects, :archived, true) do |table, query| + query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord + end + end + + def down + # no-op + end +end +``` + +#### Example: usage of `ActiveRecord` classes + +A migration using `ActiveRecord` class to perform data manipulation +must use the `MigrationRecord` class. This class is guaranteed to provide +a correct connection in a context of a given migration. + +Underneath the `MigrationRecord == ActiveRecord::Base`, as once the `db:migrate` +runs, it switches the active connection of `ActiveRecord::Base.establish_connection :ci`. +To avoid confusion to using the `ActiveRecord::Base`, `MigrationRecord` is required. + +This implies that DML migrations are forbidden to read data from other +databases. For example, running migration in context of `ci:` and reading feature flags +from `main:`, as no established connection to another database is present. + +```ruby +class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + restrict_gitlab_migration gitlab_schema: :gitlab_main + + class Project < MigrationRecord + end + + def up + Project.where(archived: false).each_batch of |batch| + batch.update_all(archived: true) + end + end + + def down + end +end +``` + +### The special purpose of `gitlab_shared` + +As described in [gitlab_schema](multiple_databases.md#the-special-purpose-of-gitlab_shared), +the `gitlab_shared` tables are allowed to contain data across all databases. This implies +that such migrations should run across all databases to modify structure (DDL) or modify data (DML). + +As such migrations accessing `gitlab_shared` do not need to use `restrict_gitlab_migration gitlab_schema:`, +migrations without restriction run across all databases and are allowed to modify data on each of them. +If the `restrict_gitlab_migration gitlab_schema:` is specified, the `DML` migration +runs only in a context of a database containing the given `gitlab_schema`. + +#### Example: run DML `gitlab_shared` migration on all databases + +Example migration updating `loose_foreign_keys_deleted_records` table +that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`. + +This migration is executed across all configured databases. + +```ruby +class DeleteAllLooseForeignKeyRecords < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + def up + execute("DELETE FROM loose_foreign_keys_deleted_records") + end + + def down + # no-op + end +end +``` + +#### Example: run DML `gitlab_shared` only on the database containing the given `gitlab_schema` + +Example migration updating `loose_foreign_keys_deleted_records` table +that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`. + +This migration since it configures restriction on `gitlab_ci` is executed only +in context of database containing `gitlab_ci` schema. + +```ruby +class DeleteCiBuildsLooseForeignKeyRecords < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + restrict_gitlab_migration gitlab_schema: :gitlab_ci + + def up + execute("DELETE FROM loose_foreign_keys_deleted_records WHERE fully_qualified_table_name='ci_builds'") + end + + def down + # no-op + end +end +``` + +### The behavior of skipping migrations + +The only migrations that are skipped are the ones performing **DML** changes. +The **DDL** migrations are **always and unconditionally** executed. + +The implemented [solution](https://gitlab.com/gitlab-org/gitlab/-/issues/355014#solution-2-use-database_tasks) +uses the `database_tasks:` as a way to indicate which additional database configurations +(in `config/database.yml`) share the same primary database. The database configurations +marked with `database_tasks: false` are exempt from executing `db:migrate` for those +database configurations. + +If database configurations do not share databases (all do have `database_tasks: true`), +each migration runs for every database configuration: + +1. The DDL migration applies all structure changes on all databases. +1. The DML migration runs only in the context of a database containing the given `gitlab_schema:`. +1. If the DML migration is not eligible to run, it is skipped. It's still + marked as executed in `schema_migrations`. While running `db:migrate`, the skipped + migration outputs `Current migration is skipped since it modifies 'gitlab_ci' which is outside of 'gitlab_main, gitlab_shared`. + +To prevent loss of migrations if the `database_tasks: false` is configured, a dedicated +Rake task is used [`gitlab:db:validate_config`](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/83118). +The `gitlab:db:validate_config` validates the correctness of `database_tasks:` by checking database identifiers +of each underlying database configuration. The ones that share the database are required to have +the `database_tasks: false` set. `gitlab:db:validate_config` always runs before `db:migrate`. + +## Validation + +Validation in a nutshell uses [pg_query](https://github.com/pganalyze/pg_query) to analyze +each query and classify tables with information from [`gitlab_schema.yml`](multiple_databases.md#gitlab-schema). +The migration is skipped if the specified `gitlab_schema` is outside of a list of schemas +managed by a given database connection (`Gitlab::Database::gitlab_schemas_for_connection`). + +The `Gitlab::Database::Migration[2.0]` includes `Gitlab::Database::MigrationHelpers::RestrictGitlabSchema` +which extends the `#migrate` method. For the duration of a migration a dedicated query analyzer +is installed `Gitlab::Database::QueryAnalyzers::RestrictAllowedSchemas` that accepts +a list of allowed schemas as defined by `restrict_gitlab_migration:`. If the executed query +is outside of allowed schemas, it raises an exception. + +## Exceptions + +Depending on misuse or lack of `restrict_gitlab_migration` various exceptions can be raised +as part of the migration run and prevent the migration from being completed. + +### Exception 1: migration running in DDL mode does DML select + +```ruby +class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + # Missing: + # restrict_gitlab_migration gitlab_schema: :gitlab_main + + def up + update_column_in_batches(:projects, :archived, true) do |table, query| + query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord + end + end + + def down + # no-op + end +end +``` + +```plaintext +Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode +Modifying of 'projects' (gitlab_main) with 'SELECT * FROM projects... +``` + +The current migration do not use `restrict_gitlab_migration`. The lack indicates a migration +running in **DDL** mode, but the executed payload appears to be reading data from `projects`. + +**The solution** is to add `restrict_gitlab_migration gitlab_schema: :gitlab_main`. + +### Exception 2: migration running in DML mode changes the structure + +```ruby +class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + # restrict_gitlab_migration if defined indicates DML, it should be removed + restrict_gitlab_migration gitlab_schema: :gitlab_main + + INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state' + + def up + add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME + end + + def down + remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME + end +end +``` + +```plaintext +DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode. +Modifying of 'merge_request_reviewers' with 'CREATE INDEX... +``` + +The current migration do use `restrict_gitlab_migration`. The presence indicates **DML** mode, +but the executed payload appears to be doing structure changes (DDL). + +**The solution** is to remove `restrict_gitlab_migration gitlab_schema: :gitlab_main`. + +### Exception 3: migration running in DML mode accesses data from a table in another schema + +```ruby +class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + # Since it modifies `projects` it should use `gitlab_main` + restrict_gitlab_migration gitlab_schema: :gitlab_ci + + def up + update_column_in_batches(:projects, :archived, true) do |table, query| + query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord + end + end + + def down + # no-op + end +end +``` + +```plaintext +Select/DML queries (SELECT/UPDATE/DELETE) do access 'projects' (gitlab_main) " \ +which is outside of list of allowed schemas: 'gitlab_ci' +``` + +The current migration do restrict the migration to `gitlab_ci`, but appears to modify +data in `gitlab_main`. + +**The solution** is to change `restrict_gitlab_migration gitlab_schema: :gitlab_ci`. + +### Exception 4: mixing DDL and DML mode + +```ruby +class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0] + disable_ddl_transaction! + + # This migration is invalid regardless of specification + # as it cannot modify structure and data at the same time + restrict_gitlab_migration gitlab_schema: :gitlab_ci + + def up + add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: 'index_on_merge_request_reviewers' + update_column_in_batches(:projects, :archived, true) do |table, query| + query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord + end + end + + def down + # no-op + end +end +``` + +The migrations mixing **DDL** and **DML** depending on ordering of operations raises +one of the prior exceptions. + +## Upcoming changes on multiple database migrations + +The `restrict_gitlab_migration` using `gitlab_schema:` is considered as a first iteration +of this feature for running migrations selectively depending on a context. It is possible +to add additional restrictions to DML-only migrations (as the structure coherency is likely +to stay as-is until further notice) to restrict when they run. + +A Potential extension is to limit running DML migration only to specific environments: + +```ruby +restrict_gitlab_migration gitlab_schema: :gitlab_main, gitlab_env: :gitlab_com +``` diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md index c9bbf73be55..3b1b06b557c 100644 --- a/doc/development/database/multiple_databases.md +++ b/doc/development/database/multiple_databases.md @@ -9,141 +9,86 @@ info: To determine the technical writer assigned to the Stage/Group associated w To scale GitLab, the we are [decomposing the GitLab application database into multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168). -## CI/CD Database +## GitLab Schema -> Support for configuring the GitLab Rails application to use a distinct -database for CI/CD tables was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64289) -in GitLab 14.1. This feature is still under development, and is not ready for production use. +For properly discovering allowed patterns between different databases +the GitLab application implements the `lib/gitlab/database/gitlab_schemas.yml` YAML file. -### Development setup +This file provides a virtual classification of tables into a `gitlab_schema` +which conceptually is similar to [PostgreSQL Schema](https://www.postgresql.org/docs/current/ddl-schemas.html). +We decided as part of [using database schemas to better isolated CI decomposed features](https://gitlab.com/gitlab-org/gitlab/-/issues/333415) +that we cannot use PostgreSQL schema due to complex migration procedures. Instead we implemented +the concept of application-level classification. +Each table of GitLab needs to have a `gitlab_schema` assigned: -By default, GitLab is configured to use only one main database. To -opt-in to use a main database, and CI database, modify the -`config/database.yml` file to have a `main` and a `ci` database -configurations. +- `gitlab_main`: describes all tables that are being stored in the `main:` database (for example, like `projects`, `users`). +- `gitlab_ci`: describes all CI tables that are being stored in the `ci:` database (for example, `ci_pipelines`, `ci_builds`). +- `gitlab_shared`: describe all application tables that contain data across all decomposed databases (for example, `loose_foreign_keys_deleted_records`). +- `...`: more schemas to be introduced with additional decomposed databases -You can set this up using [GDK](#gdk-configuration) or by -[manually configuring `config/database.yml`](#manually-set-up-the-cicd-database). +The usage of schema enforces the base class to be used: -#### GDK configuration +- `ApplicationRecord` for `gitlab_main` +- `Ci::ApplicationRecord` for `gitlab_ci` +- `Gitlab::Database::SharedModel` for `gitlab_shared` -If you are using GDK, you can follow the following steps: +### The impact of `gitlab_schema` -1. On the GDK root directory, run: +The usage of `gitlab_schema` has a significant impact on the application. +The `gitlab_schema` primary purpose is to introduce a barrier between different data access patterns. - ```shell - gdk config set gitlab.rails.databases.ci.enabled true - ``` +This is used as a primary source of classification for: -1. Open your `gdk.yml`, and confirm that it has the following lines: +- [Discovering cross-joins across tables from different schemas](#removing-joins-between-ci_-and-non-ci_-tables) +- [Discovering cross-database transactions across tables from different schemas](#removing-cross-database-transactions) - ```yaml - gitlab: - rails: - databases: - ci: - enabled: true - ``` +### The special purpose of `gitlab_shared` -1. Reconfigure GDK: +`gitlab_shared` is a special case describing tables or views that by design contain data across +all decomposed databases. This does describe application-defined tables (like `loose_foreign_keys_deleted_records`), +Rails-defined tables (like `schema_migrations` or `ar_internal_metadata` as well as internal PostgreSQL tables +(for example, `pg_attribute`). - ```shell - gdk reconfigure - ``` +**Be careful** to use `gitlab_shared` as it requires special handling while accessing data. +Since `gitlab_shared` shares not only structure but also data, the application needs to be written in a way +that traverses all data from all databases in sequential manner. -1. [Create the new CI/CD database](#create-the-new-database). +```ruby +Gitlab::Database::EachDatabase.each_model_connection([MySharedModel]) do |connection, connection_name| + MySharedModel.select_all_data... +end +``` -#### Manually set up the CI/CD database +As such, migrations modifying data of `gitlab_shared` tables are expected to run across +all decomposed databases. -You can manually edit `config/database.yml` to split the databases. -To do so, consider a `config/database.yml` file like the example below: +## Migrations -```yaml -development: - main: - adapter: postgresql - encoding: unicode - database: gitlabhq_development - host: /path/to/gdk/postgresql - pool: 10 - prepared_statements: false - variables: - statement_timeout: 120s - -test: &test - main: - adapter: postgresql - encoding: unicode - database: gitlabhq_test - host: /path/to/gdk/postgresql - pool: 10 - prepared_statements: false - variables: - statement_timeout: 120s -``` +Read [Migrations for Multiple Databases](migrations_for_multiple_databases.md). -Edit it to split the databases into `main` and `ci`: +## CI/CD Database -```yaml -development: - main: - adapter: postgresql - encoding: unicode - database: gitlabhq_development - host: /path/to/gdk/postgresql - pool: 10 - prepared_statements: false - variables: - statement_timeout: 120s - ci: - adapter: postgresql - encoding: unicode - database: gitlabhq_development_ci - host: /path/to/gdk/postgresql - pool: 10 - prepared_statements: false - variables: - statement_timeout: 120s - -test: &test - main: - adapter: postgresql - encoding: unicode - database: gitlabhq_test - host: /path/to/gdk/postgresql - pool: 10 - prepared_statements: false - variables: - statement_timeout: 120s - ci: - adapter: postgresql - encoding: unicode - database: gitlabhq_test_ci - host: /path/to/gdk/postgresql - pool: 10 - prepared_statements: false - variables: - statement_timeout: 120s -``` +> Support for configuring the GitLab Rails application to use a distinct +database for CI/CD tables was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64289) +in GitLab 14.1. This feature is still under development, and is not ready for production use. -Next, [create the new CI/CD database](#create-the-new-database). +### Configure single database -#### Create the new database +By default, GDK is configured to run with multiple databases. To configure GDK to use a single database: -After configuring GitLab for the two databases, create the new CI/CD database: +1. On the GDK root directory, run: -1. Create the new `ci:` database, load the DB schema into the `ci:` database, - and run any pending migrations: + ```shell + gdk config set gitlab.rails.databases.ci.enabled false + ``` - ```shell - bundle exec rails db:create db:schema:load:ci db:migrate - ``` +1. Reconfigure GDK: -1. Restart GDK: + ```shell + gdk reconfigure + ``` - ```shell - gdk restart - ``` +To switch back to using multiple databases, set `gitlab.rails.databases.ci.enabled` to `true` and run `gdk reconfigure`. <!-- NOTE: The `validate_cross_joins!` method in `spec/support/database/prevent_cross_joins.rb` references @@ -167,9 +112,9 @@ already many such examples that need to be fixed in The following steps are the process to remove cross-database joins between `ci_*` and non `ci_*` tables: -1. **{check-circle}** Add all failing specs to the [`cross-join-allowlist.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/f5de89daeb468fc45e1e95a76d1b5297aa53da11/spec/support/database/cross-join-allowlist.yml) +1. **{check-circle}** Add all failing specs to the [`cross-join-allowlist.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/database/cross-join-allowlist.yml) file. -1. **{dotted-circle}** Find the code that caused the spec failure and wrap the isolated code +1. **{check-circle}** Find the code that caused the spec failure and wrap the isolated code in [`allow_cross_joins_across_databases`](#allowlist-for-existing-cross-joins). Link to a new issue assigned to the correct team to remove the specs from the `cross-join-allowlist.yml` file. diff --git a/doc/development/database/not_null_constraints.md b/doc/development/database/not_null_constraints.md index de070f7e434..af7d569e282 100644 --- a/doc/development/database/not_null_constraints.md +++ b/doc/development/database/not_null_constraints.md @@ -197,7 +197,7 @@ end If you have to clean up a nullable column for a [high-traffic table](../migration_style_guide.md#high-traffic-tables) (for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and -it will need an additional [background migration cleaning up](../background_migrations.md#cleaning-up) +it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up) in the release after adding the data migration. In that rare case you will need 3 releases end-to-end: diff --git a/doc/development/database/post_deployment_migrations.md b/doc/development/database/post_deployment_migrations.md new file mode 100644 index 00000000000..799eefdb875 --- /dev/null +++ b/doc/development/database/post_deployment_migrations.md @@ -0,0 +1,81 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Post Deployment Migrations + +Post deployment migrations are regular Rails migrations that can optionally be +executed after a deployment. By default these migrations are executed alongside +the other migrations. To skip these migrations you must set the +environment variable `SKIP_POST_DEPLOYMENT_MIGRATIONS` to a non-empty value +when running `rake db:migrate`. + +For example, this would run all migrations including any post deployment +migrations: + +```shell +bundle exec rake db:migrate +``` + +This however skips post deployment migrations: + +```shell +SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate +``` + +## Deployment Integration + +Say you're using Chef for deploying new versions of GitLab and you'd like to run +post deployment migrations after deploying a new version. Let's assume you +normally use the command `chef-client` to do so. To make use of this feature +you'd have to run this command as follows: + +```shell +SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo chef-client +``` + +Once all servers have been updated you can run `chef-client` again on a single +server _without_ the environment variable. + +The process is similar for other deployment techniques: first you would deploy +with the environment variable set, then you re-deploy a single +server but with the variable _unset_. + +## Creating Migrations + +To create a post deployment migration you can use the following Rails generator: + +```shell +bundle exec rails g post_deployment_migration migration_name_here +``` + +This generates the migration file in `db/post_migrate`. These migrations +behave exactly like regular Rails migrations. + +## Use Cases + +Post deployment migrations can be used to perform migrations that mutate state +that an existing version of GitLab depends on. For example, say you want to +remove a column from a table. This requires downtime as a GitLab instance +depends on this column being present while it's running. Normally you'd follow +these steps in such a case: + +1. Stop the GitLab instance +1. Run the migration removing the column +1. Start the GitLab instance again + +Using post deployment migrations we can instead follow these steps: + +1. Deploy a new version of GitLab while ignoring post deployment migrations +1. Re-run `rake db:migrate` but without the environment variable set + +Here we don't need any downtime as the migration takes place _after_ a new +version (which doesn't depend on the column anymore) has been deployed. + +Some other examples where these migrations are useful: + +- Cleaning up data generated due to a bug in GitLab +- Removing tables +- Migrating jobs from one Sidekiq queue to another diff --git a/doc/development/database/rename_database_tables.md b/doc/development/database/rename_database_tables.md index 881adf00ad0..7a76c028042 100644 --- a/doc/development/database/rename_database_tables.md +++ b/doc/development/database/rename_database_tables.md @@ -82,7 +82,7 @@ when naming indexes, so there is a possibility that not all indexes are properly the migration locally, check if there are inconsistently named indexes (`db/structure.sql`). Those can be renamed manually in a separate migration, which can be also part of the release M.N+1. - Foreign key columns might still contain the old table name. For smaller tables, follow our [standard column -rename process](../avoiding_downtime_in_migrations.md#renaming-columns) +rename process](avoiding_downtime_in_migrations.md#renaming-columns) - Avoid renaming database tables which are using with triggers. - Table modifications (add or remove columns) are not allowed during the rename process, please make sure that all changes to the table happen before the rename migration is started (or in the next release). - As the index names might change, verify that the model does not use bulk insert diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md index 9674deb4603..4ed7cf1b4de 100644 --- a/doc/development/database/strings_and_the_text_data_type.md +++ b/doc/development/database/strings_and_the_text_data_type.md @@ -229,7 +229,7 @@ end To keep this guide short, we skipped the definition of the background migration and only provided a high level example of the post-deployment migration that is used to schedule the batches. -You can find more information on the guide about [background migrations](../background_migrations.md) +You can find more information on the guide about [background migrations](background_migrations.md) #### Validate the text limit (next release) @@ -277,7 +277,7 @@ end If you have to clean up a text column for a really [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3) (for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and -it will need an additional [background migration cleaning up](../background_migrations.md#cleaning-up) +it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up) in the release after adding the data migration. In that rare case you will need 3 releases end-to-end: diff --git a/doc/development/database/table_partitioning.md b/doc/development/database/table_partitioning.md index 5319c73aad0..ec768136404 100644 --- a/doc/development/database/table_partitioning.md +++ b/doc/development/database/table_partitioning.md @@ -214,7 +214,7 @@ end ``` This step uses the same mechanism as any background migration, so you -may want to read the [Background Migration](../background_migrations.md) +may want to read the [Background Migration](background_migrations.md) guide for details on that process. Background jobs are scheduled every 2 minutes and copy `50_000` records at a time, which can be used to estimate the timing of the background migration portion of the diff --git a/doc/development/database_review.md b/doc/development/database_review.md index 4b5845992b9..fd0e2e17623 100644 --- a/doc/development/database_review.md +++ b/doc/development/database_review.md @@ -125,7 +125,7 @@ the following preparations into account. test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slack channel and add the execution time to the MR description: - Execution time largely varies between `#database-lab` and GitLab.com, but an elevated execution time from `#database-lab` can give a hint that the execution on GitLab.com will also be considerably high. - - If the execution from `#database-lab` is longer than `1h`, the index should be moved to a [post-migration](post_deployment_migrations.md). + - If the execution from `#database-lab` is longer than `1h`, the index should be moved to a [post-migration](database/post_deployment_migrations.md). Keep in mind that in this case you may need to split the migration and the application changes in separate releases to ensure the index will be in place when the code that needs it will be deployed. - Manually trigger the [database testing](database/database_migration_pipeline.md) job (`db:gitlabcom-database-testing`) in the `test` stage. @@ -212,7 +212,7 @@ Include in the MR description: #### Preparation when removing columns, tables, indexes, or other structures -- Follow the [guidelines on dropping columns](avoiding_downtime_in_migrations.md#dropping-columns). +- Follow the [guidelines on dropping columns](database/avoiding_downtime_in_migrations.md#dropping-columns). - Generally it's best practice (but not a hard rule) to remove indexes and foreign keys in a post-deployment migration. - Exceptions include removing indexes and foreign keys for small tables. - If you're adding a composite index, another index might become redundant, so remove that in the same migration. @@ -222,6 +222,7 @@ Include in the MR description: - Check migrations - Review relational modeling and design choices + - Consider [access patterns and data layout](database/layout_and_access_patterns.md) if new tables or columns are added. - Review migrations follow [database migration style guide](migration_style_guide.md), for example - [Check ordering of columns](ordering_table_columns.md) @@ -235,8 +236,8 @@ Include in the MR description: - Check that the relevant version files under `db/schema_migrations` were added or removed. - Check queries timing (If any): In a single transaction, cumulative query time executed in a migration needs to fit comfortably within `15s` - preferably much less than that - on GitLab.com. - - For column removals, make sure the column has been [ignored in a previous release](avoiding_downtime_in_migrations.md#dropping-columns) -- Check [background migrations](background_migrations.md): + - For column removals, make sure the column has been [ignored in a previous release](database/avoiding_downtime_in_migrations.md#dropping-columns) +- Check [background migrations](database/background_migrations.md): - Establish a time estimate for execution on GitLab.com. For historical purposes, it's highly recommended to include this estimation on the merge request description. - If a single `update` is below than `1s` the query can be placed @@ -249,6 +250,8 @@ Include in the MR description: it's suggested to treat background migrations as post migrations: place them in `db/post_migrate` instead of `db/migrate`. Keep in mind that post migrations are executed post-deployment in production. + - If a migration [has tracking enabled](database/background_migrations.md#background-jobs-tracking), + ensure `mark_all_as_succeeded` is called even if no work is done. - Check [timing guidelines for migrations](migration_style_guide.md#how-long-a-migration-should-take) - Check migrations are reversible and implement a `#down` method - Check new table migrations: diff --git a/doc/development/deleting_migrations.md b/doc/development/deleting_migrations.md index 25ec1c08335..5d5ca431598 100644 --- a/doc/development/deleting_migrations.md +++ b/doc/development/deleting_migrations.md @@ -1,39 +1,11 @@ --- -stage: none -group: unassigned -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'database/deleting_migrations.md' +remove_date: '2022-07-08' --- -# Delete existing migrations +This document was moved to [another location](database/deleting_migrations.md). -When removing existing migrations from the GitLab project, you have to take into account -the possibility of the migration already been included in past releases or in the current release, and thus already executed on GitLab.com and/or in self-managed instances. - -Because of it, it's not possible to delete existing migrations, as that could lead to: - -- Schema inconsistency, as changes introduced into the database were not rolled back properly. -- Leaving a record on the `schema_versions` table, that points out to migration that no longer exists on the codebase. - -Instead of deleting we can opt for disabling the migration. - -## Pre-requisites to disable a migration - -Migrations can be disabled if: - -- They caused a timeout or general issue on GitLab.com. -- They are obsoleted, for example, changes are not necessary due to a feature change. -- Migration is a data migration only, that is, the migration does not change the database schema. - -## How to disable a data migration? - -In order to disable a migration, the following steps apply to all types of migrations: - -1. Turn the migration into a no-op by removing the code inside `#up`, `#down` - or `#perform` methods, and adding `# no-op` comment instead. -1. Add a comment explaining why the code is gone. - -Disabling migrations requires explicit approval of Database Maintainer. - -## Examples - -- [Disable scheduling of productivity analytics](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/17253) +<!-- This redirect file can be deleted after <2022-07-08>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> diff --git a/doc/development/diffs.md b/doc/development/diffs.md index d61de740f15..5f03ba93a4d 100644 --- a/doc/development/diffs.md +++ b/doc/development/diffs.md @@ -1,6 +1,6 @@ --- -stage: none -group: unassigned +stage: Create +group: Code Review info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- diff --git a/doc/development/documentation/feature_flags.md b/doc/development/documentation/feature_flags.md index 1e4698ff867..fb58851e93f 100644 --- a/doc/development/documentation/feature_flags.md +++ b/doc/development/documentation/feature_flags.md @@ -19,8 +19,29 @@ must be documented. For context, see the When you document feature flags, you must: -- [Add a note at the start of the topic](#use-a-note-to-describe-the-state-of-the-feature-flag). - [Add version history text](#add-version-history-text). +- [Add a note at the start of the topic](#use-a-note-to-describe-the-state-of-the-feature-flag). + +## Add version history text + +When the state of a flag changes (for example, disabled by default to enabled by default), add the change to the version history. + +Possible version history entries are: + +```markdown +> - [Introduced](issue-link) in GitLab X.X [with a flag](../../administration/feature_flags.md) named <flag name>. Disabled by default. +> - [Enabled on GitLab.com](issue-link) in GitLab X.X. +> - [Enabled on GitLab.com](issue-link) in GitLab X.X. Available to GitLab.com administrators only. +> - [Enabled on self-managed](issue-link) in GitLab X.X. +> - [Generally available](issue-link) in GitLab X.Y. [Feature flag <flag name>](issue-link) removed. +``` + +You can combine entries if they happened in the same release: + +```markdown +> - Introduced in GitLab 14.2 [with a flag](../../administration/feature_flags.md) named `ci_include_rules`. Disabled by default. +> - [Enabled on GitLab.com and self-managed](https://gitlab.com/gitlab-org/gitlab/-/issues/337507) in GitLab 14.3. +``` ## Use a note to describe the state of the feature flag @@ -30,7 +51,8 @@ The note has three parts, and follows this structure: ```markdown FLAG: -<Self-managed GitLab availability information.> <GitLab.com availability information.> +<Self-managed GitLab availability information.> +<GitLab.com availability information.> <This feature is not ready for production use.> ``` @@ -61,27 +83,6 @@ If needed, you can add this sentence: `The feature is not ready for production use.` -## Add version history text - -When the state of a flag changes (for example, disabled by default to enabled by default), add the change to the version history. - -Possible version history entries are: - -```markdown -> - [Introduced](issue-link) in GitLab X.X [with a flag](../../administration/feature_flags.md) named <flag name>. Disabled by default. -> - [Enabled on GitLab.com](issue-link) in GitLab X.X. -> - [Enabled on GitLab.com](issue-link) in GitLab X.X. Available to GitLab.com administrators only. -> - [Enabled on self-managed](issue-link) in GitLab X.X. -> - [Generally available](issue-link) in GitLab X.Y. [Feature flag <flag name>](issue-link) removed. -``` - -You can combine entries if they happened in the same release: - -```markdown -> - Introduced in GitLab 14.2 [with a flag](../../administration/feature_flags.md) named `ci_include_rules`. Disabled by default. -> - [Enabled on GitLab.com and self-managed](https://gitlab.com/gitlab-org/gitlab/-/issues/337507) in GitLab 14.3. -``` - ## Feature flag documentation examples The following examples show the progression of a feature flag. diff --git a/doc/development/documentation/index.md b/doc/development/documentation/index.md index 66d6beb821f..c6afcdbddd0 100644 --- a/doc/development/documentation/index.md +++ b/doc/development/documentation/index.md @@ -20,7 +20,7 @@ In addition to this page, the following resources can help you craft and contrib ## Source files and rendered web locations -Documentation for GitLab, GitLab Runner, Omnibus GitLab, and Charts is published to <https://docs.gitlab.com>. Documentation for GitLab is also published within the application at `/help` on the domain of the GitLab instance. +Documentation for GitLab, GitLab Runner, GitLab Operator, Omnibus GitLab, and Charts is published to <https://docs.gitlab.com>. Documentation for GitLab is also published within the application at `/help` on the domain of the GitLab instance. At `/help`, only help for your current edition and version is included. Help for other versions is available at <https://docs.gitlab.com/archives/>. The source of the documentation exists within the codebase of each GitLab application in the following repository locations: @@ -31,6 +31,7 @@ The source of the documentation exists within the codebase of each GitLab applic | [GitLab Runner](https://gitlab.com/gitlab-org/gitlab-runner/) | [`/docs`](https://gitlab.com/gitlab-org/gitlab-runner/-/tree/main/docs) | | [Omnibus GitLab](https://gitlab.com/gitlab-org/omnibus-gitlab/) | [`/doc`](https://gitlab.com/gitlab-org/omnibus-gitlab/tree/master/doc) | | [Charts](https://gitlab.com/gitlab-org/charts/gitlab) | [`/doc`](https://gitlab.com/gitlab-org/charts/gitlab/tree/master/doc) | +| [GitLab Operator](https://gitlab.com/gitlab-org/cloud-native/gitlab-operator) | [`/doc`](https://gitlab.com/gitlab-org/cloud-native/gitlab-operator/-/tree/master/doc) | Documentation issues and merge requests are part of their respective repositories and all have the label `Documentation`. diff --git a/doc/development/documentation/restful_api_styleguide.md b/doc/development/documentation/restful_api_styleguide.md index 4d654b6b901..8a505ed84a8 100644 --- a/doc/development/documentation/restful_api_styleguide.md +++ b/doc/development/documentation/restful_api_styleguide.md @@ -83,23 +83,25 @@ to describe the GitLab release that introduced the API call. Use the following table headers to describe the methods. Attributes should always be in code blocks using backticks (`` ` ``). -Sort the attributes in the table: first, required, then alphabetically. +Sort the table by required attributes first, then alphabetically. ```markdown | Attribute | Type | Required | Description | |:-----------------------------|:--------------|:-----------------------|:-----------------------------------------------------| -| `user` | string | **{check-circle}** Yes | The GitLab username. | -| `assignee_ids` **(PREMIUM)** | integer array | **{dotted-circle}** No | The IDs of the users to assign the issue to. | -| `confidential` | boolean | **{dotted-circle}** No | Set an issue to be confidential. Default is `false`. | +| `title` | string | **{check-circle}** Yes | Title of the issue. | +| `assignee_ids` **(PREMIUM)** | integer array | **{dotted-circle}** No | IDs of the users to assign the issue to. | +| `confidential` | boolean | **{dotted-circle}** No | Sets the issue to confidential. Default is `false`. | ``` Rendered example: | Attribute | Type | Required | Description | |:-----------------------------|:--------------|:-----------------------|:-----------------------------------------------------| -| `user` | string | **{check-circle}** Yes | The GitLab username. | -| `assignee_ids` **(PREMIUM)** | integer array | **{dotted-circle}** No | The IDs of the users to assign the issue to. | -| `confidential` | boolean | **{dotted-circle}** No | Set an issue to be confidential. Default is `false`. | +| `title` | string | **{check-circle}** Yes | Title of the issue. | +| `assignee_ids` **(PREMIUM)** | integer array | **{dotted-circle}** No | IDs of the users to assign the issue to. | +| `confidential` | boolean | **{dotted-circle}** No | Sets the issue to confidential. Default is `false`. | + +For information about writing attribute descriptions, see the [GraphQL API description style guide](../api_graphql_styleguide.md#description-style-guide). ## cURL commands diff --git a/doc/development/documentation/site_architecture/index.md b/doc/development/documentation/site_architecture/index.md index e7a915eab09..bdda15e2064 100644 --- a/doc/development/documentation/site_architecture/index.md +++ b/doc/development/documentation/site_architecture/index.md @@ -71,7 +71,7 @@ GitLab Docs is built with a combination of external: - [Schema.org](https://schema.org/) - [Google Analytics](https://marketingplatform.google.com/about/analytics/) -- [Google Tag Manager](https://developers.google.com/tag-manager/) +- [Google Tag Manager](https://developers.google.com/tag-platform/tag-manager) ## Global navigation diff --git a/doc/development/documentation/styleguide/index.md b/doc/development/documentation/styleguide/index.md index 91e9d0c703d..7bfc0320d02 100644 --- a/doc/development/documentation/styleguide/index.md +++ b/doc/development/documentation/styleguide/index.md @@ -349,17 +349,15 @@ Follow these guidelines for punctuation: <!-- vale gitlab.Repetition = NO --> -| Rule | Example | -|------------------------------------------------------------------|--------------------------------------------------------| -| Avoid semicolons. Use two sentences instead. | That's the way that the world goes 'round. You're up one day and the next you're down. -| Always end full sentences with a period. | For a complete overview, read through this document. | -| Always add a space after a period when beginning a new sentence. | For a complete overview, check this doc. For other references, check out this guide. | -| Do not use double spaces. (Tested in [`SentenceSpacing.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/SentenceSpacing.yml).) | --- | -| Do not use tabs for indentation. Use spaces instead. You can configure your code editor to output spaces instead of tabs when pressing the tab key. | --- | -| Use serial commas (Oxford commas) before the final **and** or **or** in a list of three or more items. (Tested in [`OxfordComma.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/OxfordComma.yml).) | You can create new issues, merge requests, and milestones. | -| Always add a space before and after dashes when using it in a sentence (for replacing a comma, for example). | You should try this - or not. | -| When a colon is part of a sentence, always use lowercase after the colon. | Linked issues: a way to create a relationship between issues. | -| Do not use typographer's quotes. Use straight quotes instead. (Tested in [`NonStandardQuotes.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/NonStandardQuotes.yml).) | "It's the questions we can't answer that teach us the most"---Patrick Rothfuss | +- End full sentences with a period. +- Use one space between sentences. +- Do not use semicolons. Use two sentences instead. +- Do not use double spaces. (Tested in [`SentenceSpacing.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/SentenceSpacing.yml).) +- Do not use non-breaking spaces. Use standard spaces instead. (Tested in [`lint-doc.sh`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/scripts/lint-doc.sh).) +- Do not use tabs for indentation. Use spaces instead. You can configure your code editor to output spaces instead of tabs when pressing the tab key. +- Use serial (Oxford) commas before the final **and** or **or** in a list of three or more items. (Tested in [`OxfordComma.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/OxfordComma.yml).) +- Avoid dashes. Use separate sentences, or commas, instead. +- Do not use typographer's ("curly") quotes. Use straight quotes instead. (Tested in [`NonStandardQuotes.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/NonStandardQuotes.yml).) <!-- vale gitlab.Repetition = YES --> @@ -403,17 +401,6 @@ Backticks are more precise than quotes. For example, in this string: It's not clear whether the user should include the period in the string. -### Spaces between words - -Use only standard spaces between words. The search engine for the documentation -website doesn't split words separated with -[non-breaking spaces](https://en.wikipedia.org/wiki/Non-breaking_space) when -indexing, and fails to create expected individual search terms. Tests that search -for certain words separated by regular spaces can't find words separated by -non-breaking spaces. - -Tested in [`lint-doc.sh`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/scripts/lint-doc.sh). - ## Lists - Always start list items with a capital letter, unless they're parameters or @@ -421,30 +408,30 @@ Tested in [`lint-doc.sh`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/scr - Always leave a blank line before and after a list. - Begin a line with spaces (not tabs) to denote a [nested sub-item](#nesting-inside-a-list-item). -### Ordered vs. unordered lists +### Choose between an ordered or unordered list -Only use ordered lists when their items describe a sequence of steps to follow. - -Do: +Use ordered lists for a sequence of steps. For example: ```markdown -These are the steps to do something: +Follow these steps to do something. 1. First, do the first step. 1. Then, do the next step. 1. Finally, do the last step. ``` -Don't: +Use an unordered lists when the steps do not need to be completed in order. For example: ```markdown -This is a list of available features: +These things are imported: -1. Feature 1 -1. Feature 2 -1. Feature 3 +- Thing 1 +- Thing 2 +- Thing 3 ``` +You can choose to introduce either list with a colon, but you do not have to. + ### Markup - Use dashes (`-`) for unordered lists instead of asterisks (`*`). @@ -454,12 +441,8 @@ This is a list of available features: ### Punctuation - Don't add commas (`,`) or semicolons (`;`) to the ends of list items. -- Only add periods to the end of a list item if the item consists of a complete - sentence (with a subject and a verb). -- Be consistent throughout the list: if the majority of the items do not end in - a period, do not end any of the items in a period, even if they consist of a - complete sentence. The opposite is also valid: if the majority of the items - end with a period, end all with a period. +- If a list item is a complete sentence (with a subject and a verb), add a period at the end. +- Majority rules. If the majority of items do not end in a period, do not end any of the items in a period. - Separate list items from explanatory text with a colon (`:`). For example: ```markdown @@ -469,32 +452,6 @@ This is a list of available features: - Second item: this explains the second item. ``` -**Examples:** - -Do: - -- First list item -- Second list item -- Third list item - -Don't: - -- First list item -- Second list item -- Third list item. - -Do: - -- Let's say this is a complete sentence. -- Let's say this is also a complete sentence. -- Not a complete sentence. - -Don't (vary use of periods; majority rules): - -- Let's say this is a complete sentence. -- Let's say this is also a complete sentence. -- Not a complete sentence - ### Nesting inside a list item It's possible to nest items under a list item, so that they render with the same @@ -686,7 +643,7 @@ For the heading text, **do not**: - Use words that might change in the future. Changing a heading changes its anchor URL, which affects other linked pages. - Repeat text from earlier headings. For example, instead of `Troubleshooting merge requests`, - use `Troubleshooting`. + use `Troubleshooting`. - Use links. ### Anchor links @@ -746,7 +703,7 @@ We include guidance for links in these categories: - Use inline link Markdown markup `[Text](https://example.com)`. It's easier to read, review, and maintain. Do not use `[Text][identifier]` reference-style links. -- Use [meaningful anchor text](https://www.futurehosting.com/blog/links-should-have-meaningful-anchor-text-heres-why/). +- Use meaningful anchor text. For example, instead of writing something like `Read more about merge requests [here](LINK)`, write `Read more about [merge requests](LINK)`. @@ -1561,6 +1518,47 @@ The voting strategy in GitLab 13.4 and later requires the primary and secondary voters to agree. ``` +#### Deprecated features + +When a feature is deprecated, add `(DEPRECATED)` to the page title or to +the heading of the section documenting the feature, immediately before +the tier badge: + +```markdown +<!-- Page title example: --> +# Feature A (DEPRECATED) **(ALL TIERS)** + +<!-- Doc section example: --> +## Feature B (DEPRECATED) **(PREMIUM SELF)** +``` + +Add the deprecation to the version history note (you can include a link +to a replacement when available): + +```markdown +> - [Deprecated](<link-to-issue>) in GitLab 11.3. Replaced by [meaningful text](<link-to-appropriate-documentation>). +``` + +You can also describe the replacement in surrounding text, if available. If the +deprecation isn't obvious in existing text, you may want to include a warning: + +```markdown +WARNING: +This feature was [deprecated](link-to-issue) in GitLab 12.3 and replaced by +[Feature name](link-to-feature-documentation). +``` + +If you add `(DEPRECATED)` to the page's title and the document is linked from the docs +navigation, either remove the page from the nav or update the nav item to include the +same text before the feature name: + +```yaml + - doc_title: (DEPRECATED) Feature A +``` + +In the first major GitLab version after the feature was deprecated, be sure to +remove information about that deprecated feature. + #### End-of-life for features or products When a feature or product enters its end-of-life, indicate its status by @@ -1572,7 +1570,7 @@ For example: ```markdown WARNING: This feature is in its end-of-life process. It is [deprecated](link-to-issue) -for use in GitLab X.X, and is planned for [removal](link-to-issue) in GitLab X.X. +in GitLab X.X, and is planned for [removal](link-to-issue) in GitLab X.X. ``` After the feature or product is officially deprecated and removed, remove @@ -1647,47 +1645,6 @@ To view historical information about a feature, review GitLab [release posts](https://about.gitlab.com/releases/), or search for the issue or merge request where the work was done. -### Deprecated features - -When a feature is deprecated, add `(DEPRECATED)` to the page title or to -the heading of the section documenting the feature, immediately before -the tier badge: - -```markdown -<!-- Page title example: --> -# Feature A (DEPRECATED) **(ALL TIERS)** - -<!-- Doc section example: --> -## Feature B (DEPRECATED) **(PREMIUM SELF)** -``` - -Add the deprecation to the version history note (you can include a link -to a replacement when available): - -```markdown -> - [Deprecated](<link-to-issue>) in GitLab 11.3. Replaced by [meaningful text](<link-to-appropriate-documentation>). -``` - -You can also describe the replacement in surrounding text, if available. If the -deprecation isn't obvious in existing text, you may want to include a warning: - -```markdown -WARNING: -This feature was [deprecated](link-to-issue) in GitLab 12.3 and replaced by -[Feature name](link-to-feature-documentation). -``` - -If you add `(DEPRECATED)` to the page's title and the document is linked from the docs -navigation, either remove the page from the nav or update the nav item to include the -same text before the feature name: - -```yaml - - doc_title: (DEPRECATED) Feature A -``` - -In the first major GitLab version after the feature was deprecated, be sure to -remove information about that deprecated feature. - ## Products and features Refer to the information in this section when describing products and features diff --git a/doc/development/documentation/styleguide/word_list.md b/doc/development/documentation/styleguide/word_list.md index c38c6586c3a..65f6a0a328b 100644 --- a/doc/development/documentation/styleguide/word_list.md +++ b/doc/development/documentation/styleguide/word_list.md @@ -97,6 +97,12 @@ The token generated when you create an agent for Kubernetes. Use **agent access - secret token - authentication token +## air gap, air-gapped + +Use **offline environment** to describe installations that have physical barriers or security policies that prevent or limit internet access. Do not use **air gap**, **air gapped**, or **air-gapped**. For example: + +- The firewall policies in an offline environment prevent the computer from accessing the internet. + ## allow, enable Try to avoid **allow** and **enable**, unless you are talking about security-related features. @@ -261,11 +267,17 @@ Do not use **Developer permissions**. A user who is assigned the Developer role See [the Microsoft style guide](https://docs.microsoft.com/en-us/style-guide/a-z-word-list-term-collections/d/disable-disabled) for guidance on **disable**. Use **inactive** or **off** instead. ([Vale](../testing.md#vale) rule: [`InclusionAbleism.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/InclusionAbleism.yml)) - ## disallow Use **prevent** instead of **disallow**. ([Vale](../testing.md#vale) rule: [`Substitutions.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/Substitutions.yml)) +## downgrade + +To be more upbeat and precise, do not use **downgrade**. Focus instead on the action the user is taking. + +- For changing to earlier GitLab versions, use [**roll back**](#roll-back). +- For changing to lower GitLab tiers, use **change the subscription tier**. + ## dropdown list Use **dropdown list** to refer to the UI element. Do not use **dropdown** without **list** after it. @@ -729,12 +741,31 @@ Do not use **Reporter permissions**. A user who is assigned the Reporter role ha Use title case for **Repository Mirroring**. +## respectively + +Avoid **respectively** and be more precise instead. + +Use: + +- To create a user, select **Create user**. For an existing user, select **Save changes**. + +Instead of: + +- Select **Create user** or **Save changes** if you created a new user or + edited an existing one respectively. + ## roles Do not use **roles** and [**permissions**](#permissions) interchangeably. Each user is assigned a role. Each role includes a set of permissions. Roles are not the same as [**access levels**](#access-level). +## roll back + +Use **roll back** for changing a GitLab version to an earlier one. + +Do not use **roll back** for licensing or subscriptions. Use **change the subscription tier** instead. + ## runner, runners Use lowercase for **runners**. These are the agents that run CI/CD jobs. See also [GitLab Runner](#gitlab-runner) and [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/233529). @@ -921,6 +952,33 @@ Use [**2FA** and **two-factor authentication**](#2fa-two-factor-authentication) Do not use **type** if you can avoid it. Use **enter** instead. +## update + +Use **update** for installing a newer **patch** version of the software only. For example: + +- Update GitLab from 14.9 to 14.9.1. + +Do not use **update** for any other case. Instead, use **upgrade**. + +## upgrade + +Use **upgrade** for: + +- Choosing a higher subscription tier (Premium or Ultimate). +- Installing a newer **major** (13.0, 14.0) or **minor** (13.8, 14.5) version of GitLab. + +For example: + +- Upgrade to GitLab Ultimate. +- Upgrade GitLab from 14.0 to 14.1. +- Upgrade GitLab from 14.0 to 15.0. + +Use caution with the phrase **Upgrade GitLab** without any other text. +Ensure the surrounding text clarifies whether +you're talking about the product version or the subscription tier. + +See also [downgrade](#downgrade) and [roll back](#roll-back). + ## useful Do not use **useful**. If the user doesn't find the process to be useful, we lose their trust. ([Vale](../testing.md#vale) rule: [`Simplicity.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/Simplicity.yml)) @@ -971,7 +1029,7 @@ Do not use **yet** when talking about the product or its features. The documenta Sometimes you might need to use **yet** when writing a task. If you use **yet**, ensure the surrounding phrases are written -in present tense, active voice. +in present tense, active voice. [View guidance about how to write about future features](index.md#promising-features-in-future-versions). ([Vale](../testing.md#vale) rule: [`CurrentStatus.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/.vale/gitlab/CurrentStatus.yml)) diff --git a/doc/development/documentation/testing.md b/doc/development/documentation/testing.md index 49fe0aff3c6..9facb22669b 100644 --- a/doc/development/documentation/testing.md +++ b/doc/development/documentation/testing.md @@ -189,7 +189,7 @@ English language. Vale's configuration is stored in the [`.vale.ini`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.vale.ini) file located in the root directory of projects. -Vale supports creating [custom tests](https://errata-ai.github.io/vale/styles/) that extend any of +Vale supports creating [custom tests](https://docs.errata.ai/vale/styles) that extend any of several types of checks, which we store in the `.linting/vale/styles/gitlab` directory in the documentation directory of projects. @@ -365,6 +365,35 @@ file for the [`gitlab`](https://gitlab.com/gitlab-org/gitlab) project. To set up `lefthook` for documentation linting, see [Pre-push static analysis](../contributing/style_guides.md#pre-push-static-analysis-with-lefthook). +#### Show Vale warnings on push + +By default, `lefthook` shows only Vale errors when pushing changes to a branch. The default branches +have no Vale errors, so any errors listed here are introduced by commits to the branch. + +To also see the Vale warnings when pushing to a branch, set a local environment variable: `VALE_WARNINGS=true`. + +Enable Vale warnings on push to improve the documentation suite by: + +- Detecting warnings you might be introducing with your commits. +- Identifying warnings that already exist in the page, which you can resolve to reduce technical debt. + +These warnings: + +- Don't stop the push from working. +- Don't result in a broken pipeline. +- Include all warnings for a file, not just warnings that are introduced by the commits. + +To enable Vale warnings on push: + +- Automatically, add `VALE_WARNINGS=true` to your shell configuration. +- Manually, prepend `VALE_WARNINGS=true` to invocations of `lefthook`. For example: + + ```shell + VALE_WARNINGS=true bundle exec lefthook run pre-push + ``` + +You can also [configure your editor](#configure-editors) to show Vale warnings. + ### Show subset of Vale alerts You can set Visual Studio Code to display only a subset of Vale alerts when viewing files: diff --git a/doc/development/ee_features.md b/doc/development/ee_features.md index 5bd830715f5..019dbb13599 100644 --- a/doc/development/ee_features.md +++ b/doc/development/ee_features.md @@ -21,7 +21,12 @@ info: To determine the technical writer assigned to the Stage/Group associated w When developing locally, there are times when you need your instance to act like the SaaS version of the product. In those instances, you can simulate SaaS by exporting an environment variable as seen below: -`export GITLAB_SIMULATE_SAAS=1` +```shell +export GITLAB_SIMULATE_SAAS=1 +``` + +There are many ways to pass an environment variable to your local GitLab instance. +For example, you can create a `env.runit` file in the root of your GDK with the above snippet. ## Act as CE when unlicensed diff --git a/doc/development/event_store.md b/doc/development/event_store.md index b00a824e2eb..967272dcf2e 100644 --- a/doc/development/event_store.md +++ b/doc/development/event_store.md @@ -290,3 +290,42 @@ executed synchronously every time the given event is published. For complex conditions it's best to subscribe to all the events and then handle the logic in the `handle_event` method of the subscriber worker. + +## Testing + +The publisher's responsibility is to ensure that the event is published correctly. + +To test that an event has been published correctly, we can use the RSpec matcher `:publish_event`: + +```ruby +it 'publishes a ProjectDeleted event with project id and namespace id' do + expected_data = { project_id: project.id, namespace_id: project.namespace_id } + + # The matcher verifies that when the block is called, the block publishes the expected event and data. + expect { destroy_project(project, user, {}) } + .to publish_event(Projects::ProjectDeletedEvent) + .with(expected_data) +end +``` + +The subscriber must ensure that a published event can be consumed correctly. For this purpose +we have added helpers and shared examples to standardize the way we test subscribers: + +```ruby +RSpec.describe MergeRequests::UpdateHeadPipelineWorker do + let(:event) { Ci::PipelineCreatedEvent.new(data: ({ pipeline_id: pipeline.id })) } + + # This shared example ensures that an event is published and correctly processed by + # the current subscriber (`described_class`). + it_behaves_like 'consumes the published event' do + let(:event) { event } + end + + it 'does something' do + # This helper directly executes `perform` ensuring that `handle_event` is called correctly. + consume_event(subscriber: described_class, event: event) + + # run expectations + end +end +``` diff --git a/doc/development/experiment_guide/index.md b/doc/development/experiment_guide/index.md index c34e5eb36dc..f7af1113b6e 100644 --- a/doc/development/experiment_guide/index.md +++ b/doc/development/experiment_guide/index.md @@ -64,8 +64,8 @@ We recommend the following workflow: 1. **If the experiment is a success**, designers add the new icon or illustration to the Pajamas UI kit as part of the cleanup process. Engineers can then add it to the [SVG library](https://gitlab-org.gitlab.io/gitlab-svgs/) and modify the implementation based on the [Frontend Development Guidelines](../fe_guide/icons.md#usage-in-hamlrails-2). - -## Turn off all experiments + +## Turn off all experiments When there is a case on GitLab.com (SaaS) that necessitates turning off all experiments, we have this control. diff --git a/doc/development/fe_guide/accessibility.md b/doc/development/fe_guide/accessibility.md index e71e414002a..2a1083d031f 100644 --- a/doc/development/fe_guide/accessibility.md +++ b/doc/development/fe_guide/accessibility.md @@ -32,7 +32,7 @@ By default, macOS limits the <kbd>tab</kbd> key to **Text boxes and lists only** 1. Open the **Shortcuts** tab. 1. Enable the setting **Use keyboard navigation to move focus between controls**. -You can read more about enabling browser-specific keyboard navigation on [a11yproject](https://www.a11yproject.com/posts/2017-12-29-macos-browser-keyboard-navigation/). +You can read more about enabling browser-specific keyboard navigation on [a11yproject](https://www.a11yproject.com/posts/macos-browser-keyboard-navigation/). ## Quick checklist diff --git a/doc/development/fe_guide/emojis.md b/doc/development/fe_guide/emojis.md index 2dedbc8f19d..7ef88c5ca19 100644 --- a/doc/development/fe_guide/emojis.md +++ b/doc/development/fe_guide/emojis.md @@ -25,7 +25,7 @@ when your platform does not support it. - `app/assets/images/emoji.png` - `app/assets/images/emoji@2x.png` 1. Ensure you see new individual images copied into `app/assets/images/emoji/` - 1. Ensure you can see the new emojis and their aliases in the GitLab Flavored Markdown (GFM) Autocomplete + 1. Ensure you can see the new emojis and their aliases in the GitLab Flavored Markdown (GLFM) Autocomplete 1. Ensure you can see the new emojis and their aliases in the award emoji menu 1. You might need to add new emoji Unicode support checks and rules for platforms that do not support a certain emoji and we need to fallback to an image. diff --git a/doc/development/fe_guide/graphql.md b/doc/development/fe_guide/graphql.md index e79a473df9e..ddd99f3614d 100644 --- a/doc/development/fe_guide/graphql.md +++ b/doc/development/fe_guide/graphql.md @@ -264,9 +264,15 @@ Read more about [Vue Apollo](https://github.com/vuejs/vue-apollo) in the [Vue Ap ### Local state with Apollo -It is possible to manage an application state with Apollo by passing -in a resolvers object when creating the default client. The default state can be set by writing -to the cache after setting up the default client. In the example below, we are using query with `@client` Apollo directive to write the initial data to Apollo cache and then get this state in the Vue component: +It is possible to manage an application state with Apollo by using [client-site resolvers](#using-client-side-resolvers) +or [type policies with reactive variables](#using-type-policies-with-reactive-variables) when creating your default +client. + +#### Using client-side resolvers + +The default state can be set by writing to the cache after setting up the default client. In the +example below, we are using query with `@client` Apollo directive to write the initial data to +Apollo cache and then get this state in the Vue component: ```javascript // user.query.graphql @@ -322,7 +328,7 @@ export default { Along with creating local data, we can also extend existing GraphQL types with `@client` fields. This is extremely helpful when we need to mock an API response for fields not yet added to our GraphQL API. -#### Mocking API response with local Apollo cache +##### Mocking API response with local Apollo cache Using local Apollo Cache is helpful when we have a need to mock some GraphQL API responses, queries, or mutations locally (such as when they're still not added to our actual API). @@ -384,6 +390,108 @@ For each attempt to fetch a version, our client fetches `id` and `sha` from the Read more about local state management with Apollo in the [Vue Apollo documentation](https://vue-apollo.netlify.app/guide/local-state.html#local-state). +#### Using type policies with reactive variables + +Apollo Client 3 offers an alternative to [client-side resolvers](#using-client-side-resolvers) by using +[reactive variables to store client state](https://www.apollographql.com/docs/react/local-state/reactive-variables/). + +**NOTE:** +We are still learning the best practices for both **type policies** and **reactive vars**. +Take a moment to improve this guide or [leave a comment](https://gitlab.com/gitlab-org/frontend/rfcs/-/issues/100) +if you use it! + +In the example below we define a `@client` query and its `typedefs`: + +```javascript +// ./graphql/typedefs.graphql +extend type Query { + localData: String! +} +``` + +```javascript +// ./graphql/get_local_data.query.graphql +query getLocalData { + localData @client +} +``` + +Similar to resolvers, your `typePolicies` will execute when the `@client` query is used. However, +using `makeVar` will trigger every relevant active Apollo query to reactively update when the state +mutates. + +```javascript +// ./graphql/local_state.js + +import { makeVar } from '@apollo/client/core'; +import typeDefs from './typedefs.graphql'; + +export const createLocalState = () => { + // set an initial value + const localDataVar = makeVar(''); + + const cacheConfig = { + typePolicies: { + Query: { + fields: { + localData() { + // obtain current value + // triggers when `localDataVar` is updated + return localDataVar(); + }, + }, + }, + }, + }; + + // methods that update local state + const localMutations = { + setLocalData(newData) { + localDataVar(newData); + }, + clearData() { + localDataVar(''); + }, + }; + + return { + cacheConfig, + typeDefs, + localMutations, + }; +}; +``` + +Pass the cache config to your Apollo Client: + +```javascript +// index.js + +// ... +import createDefaultClient from '~/lib/graphql'; +import { createLocalState } from './graphql/local_state'; + +const { cacheConfig, typeDefs, localMutations } = createLocalState(); + +const apolloProvider = new VueApollo({ + defaultClient: createDefaultClient({}, { cacheConfig, typeDefs }), +}); + +return new Vue({ + el, + apolloProvider, + provide: { + // inject local state mutations to your app + localMutations, + }, + render(h) { + return h(MyApp); + }, +}); +``` + +Wherever used, the local query will update as the state updates thanks to the **reactive variable**. + ### Using with Vuex When the Apollo Client is used in Vuex and fetched data is stored in the Vuex store, the Apollo Client cache does not need to be enabled. Otherwise we would have data from the API stored in two places - Vuex store and Apollo Client cache. With Apollo's default settings, a subsequent fetch from the GraphQL API could result in fetching data from Apollo cache (in the case where we have the same query and variables). To prevent this behavior, we need to disable Apollo Client cache by passing a valid `fetchPolicy` option to its constructor: @@ -583,7 +691,7 @@ we want to fetch after or before a given endpoint. For example, here we're fetching 10 designs after a cursor (let us call this `projectQuery`): ```javascript -#import "~/graphql_shared/fragments/pageInfo.fragment.graphql" +#import "~/graphql_shared/fragments/page_info.fragment.graphql" query { project(fullPath: "root/my-project") { @@ -606,7 +714,7 @@ query { } ``` -Note that we are using the [`pageInfo.fragment.graphql`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/graphql_shared/fragments/pageInfo.fragment.graphql) to populate the `pageInfo` information. +Note that we are using the [`page_info.fragment.graphql`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/graphql_shared/fragments/page_info.fragment.graphql) to populate the `pageInfo` information. #### Using `fetchMore` method in components @@ -869,7 +977,7 @@ You'd then be able to retrieve the data without providing any pagination-specifi Here's an example of a query using the `@connection` directive: ```graphql -#import "~/graphql_shared/fragments/pageInfo.fragment.graphql" +#import "~/graphql_shared/fragments/page_info.fragment.graphql" query DastSiteProfiles($fullPath: ID!, $after: String, $before: String, $first: Int, $last: Int) { project(fullPath: $fullPath) { diff --git a/doc/development/fe_guide/performance.md b/doc/development/fe_guide/performance.md index 94beecf6168..bcdc49a1070 100644 --- a/doc/development/fe_guide/performance.md +++ b/doc/development/fe_guide/performance.md @@ -457,6 +457,6 @@ General tips: ## Additional Resources - [WebPage Test](https://www.webpagetest.org) for testing site loading time and size. -- [Google PageSpeed Insights](https://developers.google.com/speed/pagespeed/insights/) grades web pages and provides feedback to improve the page. +- [Google PageSpeed Insights](https://pagespeed.web.dev/) grades web pages and provides feedback to improve the page. - [Profiling with Chrome DevTools](https://developer.chrome.com/docs/devtools/) - [Browser Diet](https://browserdiet.com/) is a community-built guide that catalogues practical tips for improving web page performance. diff --git a/doc/development/fe_guide/registry_architecture.md b/doc/development/fe_guide/registry_architecture.md new file mode 100644 index 00000000000..47a6dc40e19 --- /dev/null +++ b/doc/development/fe_guide/registry_architecture.md @@ -0,0 +1,90 @@ +--- +stage: Package +group: unassigned +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Registry architecture + +GitLab has several registry applications. Given that they all leverage similar UI, UX, and business +logic, they are all built with the same architecture. In addition, a set of shared components +already exists to unify the user and developer experiences. + +Existing registries: + +- Package Registry +- Container Registry +- Infrastructure Registry +- Dependency Proxy + +## Frontend architecture + +### Component classification + +All the registries follow an architecture pattern that includes four component types: + +- Pages: represent an entire app, or for the registries using [vue-router](https://v3.router.vuejs.org/) they represent one router + route. +- Containers: represent a single piece of functionality. They contain complex logic and may + connect to the API. +- Presentationals: represent a portion of the UI. They receive all their data with `props` or through + `inject`, and do not connect to the API. +- Shared components: presentational components that accept a various array of configurations and are + shared across all of the registries. + +### Communicating with the API + +The complexity and communication with the API should be concentrated in the pages components, and +in the container components when needed. This makes it easier to: + +- Handle concurrent requests, loading states, and user messages. +- Maintain the code, especially to estimate work. If it touches a page or functional component, + expect it to be more complex. +- Write fast and consistent unit tests. + +### Best practices + +- Use [`provide` or `inject`](https://v2.vuejs.org/v2/api/?redirect=true#provide-inject) + to pass static, non-reactive values coming from the app initialization. +- When passing data, prefer `props` over nested queries or Vuex bindings. Only pages and + container components should be aware of the state and API communication. +- Don't repeat yourself. If one registry receives functionality, the likelihood of the rest needing + it in the future is high. If something seems reusable and isn't bound to the state, create a + shared component. +- Try to express functionality and logic with dedicated components. It's much easier to deal with + events and properties than callbacks and asynchronous code (see + [`delete_package.vue`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/packages_and_registries/package_registry/components/functional/delete_package.vue)). +- Leverage [startup for GraphQL calls](graphql.md#making-initial-queries-early-with-graphql-startup-calls). + +## Shared compoenents library + +Inside `vue_shared/components/registry` and `packages_and_registries/shared`, there's a set of +shared components that you can use to implement registry functionality. These components build the +main pieces of the desired UI and UX of a registry page. The most important components are: + +- `code-instruction`: represents a copyable box containing code. Supports multiline and single line + code boxes. Snowplow tracks the code copy event. +- `details-row`: represents a row of details. Used to add additional info in the details area of + the `list-item` component. +- `history-item`: represents a history list item used to build a timeline. +- `list-item`: represents a list element in the registry. It supports: left action, left primary and + secondary content, right primary and secondary content, right action, and details slots. +- `metadata-item`: represents one piece of metadata, with an icon or a link. Used primarily in the + title area. +- `persisted-dropdown-selection`: represents a dropdown menu that stores the user selection in the + `localStorage`. +- `registry-search`: implements `gl-filtered-search` with a sorting section on the right. +- `title-area`: implements the top title area of the registry. Includes: a main title, an avatar, a + subtitle, a metadata row, and a right actions slot. + +## Adding a new registry page + +When adding a new registry: + +- Leverage the shared components that already exist. It's good to look at how the components are + structured and used in the more mature registries (for example, the Package Registry). +- If it's in line with the backend requirements, we suggest using GraphQL for the API. This helps in + dealing with the innate performance issue of registries. +- If possible, we recommend using [Vue Router](https://v3.router.vuejs.org/) + and frontend routing. Coupled with Apollo, the caching layer helps with the perceived page + performance. diff --git a/doc/development/fe_guide/source_editor.md b/doc/development/fe_guide/source_editor.md index 2ff0bacfc3a..b06e341630f 100644 --- a/doc/development/fe_guide/source_editor.md +++ b/doc/development/fe_guide/source_editor.md @@ -35,7 +35,7 @@ Vue component, but the integration of Source Editor is generally straightforward const editor = new SourceEditor({ // Editor Options. // The list of all accepted options can be found at - // https://microsoft.github.io/monaco-editor/api/enums/monaco.editor.editoroption.html + // https://microsoft.github.io/monaco-editor/api/enums/monaco.editor.EditorOption.html }); ``` @@ -56,19 +56,19 @@ An instance of Source Editor accepts the following configuration options: | `blobContent` | `false` | `String`: The initial content to render in the editor. | | `extensions` | `false` | `Array`: Extensions to use in this instance. | | `blobGlobalId` | `false` | `String`: An auto-generated property.<br>**Note:** This property may go away in the future. Do not pass `blobGlobalId` unless you know what you're doing.| -| Editor Options | `false` | `Object(s)`: Any property outside of the list above is treated as an Editor Option for this particular instance. Use this field to override global Editor Options on the instance level. A full [index of Editor Options](https://microsoft.github.io/monaco-editor/api/enums/monaco.editor.editoroption.html) is available. | +| Editor Options | `false` | `Object(s)`: Any property outside of the list above is treated as an Editor Option for this particular instance. Use this field to override global Editor Options on the instance level. A full [index of Editor Options](https://microsoft.github.io/monaco-editor/api/enums/monaco.editor.EditorOption.html) is available. | ## API The editor uses the same public API as -[provided by Monaco editor](https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.istandalonecodeeditor.html) +[provided by Monaco editor](https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.IStandaloneCodeEditor.html) with additional functions on the instance level: | Function | Arguments | Description | --------------------- | ----- | ----- | | `updateModelLanguage` | `path`: String | Updates the instance's syntax highlighting to follow the extension of the passed `path`. Available only on the instance level.| | `use` | Array of objects | Array of extensions to apply to the instance. Accepts only the array of _objects_. You must fetch the extensions' ES6 modules must be fetched and resolved in your views or components before they are passed to `use`. This property is available on _instance_ (applies extension to this particular instance) and _global editor_ (applies the same extension to all instances) levels. | -| Monaco Editor options | See [documentation](https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.istandalonecodeeditor.html) | Default Monaco editor options | +| Monaco Editor options | See [documentation](https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.IStandaloneCodeEditor.html) | Default Monaco editor options | ## Tips @@ -202,7 +202,7 @@ export default { In the code example, `this` refers to the instance. By referring to the instance, we can access the complete underlying -[Monaco editor API](https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.istandalonecodeeditor.html), +[Monaco editor API](https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.IStandaloneCodeEditor.html), which includes functions like `getValue()`. Now let's use our extension: diff --git a/doc/development/fe_guide/vue.md b/doc/development/fe_guide/vue.md index b947d90cc11..fecb0af936d 100644 --- a/doc/development/fe_guide/vue.md +++ b/doc/development/fe_guide/vue.md @@ -123,6 +123,10 @@ Using dependency injection to provide values from HAML is ideal when: prop-drilling becomes an inconvenience. Prop-drilling when the same prop is passed through all components in the hierarchy until the component that is genuinely using it. +Dependency injection can potentially break a child component (either an immediate child or multiple levels deep) if the value declared in the `inject` configuration doesn't have defaults defined and the parent component has not provided the value using the `provide` configuration. + +- A [default value](https://vuejs.org/guide/components/provide-inject.html#injection-default-values) might be useful in contexts where it makes sense. + ##### props If the value from HAML doesn't fit the criteria of dependency injection, use `props`. @@ -499,7 +503,7 @@ component under test, with the `computed` property, for example). Remember to us We should test for events emitted in response to an action in our component. This is used to verify the correct events are being fired with the correct arguments. -For any DOM events we should use [`trigger`](https://vue-test-utils.vuejs.org/api/wrapper/#trigger) +For any DOM events we should use [`trigger`](https://v1.test-utils.vuejs.org/api/wrapper/#trigger) to fire out event. ```javascript @@ -530,7 +534,7 @@ it('should fire the itemClicked event', () => { ``` We should verify an event has been fired by asserting against the result of the -[`emitted()`](https://vue-test-utils.vuejs.org/api/wrapper/#emitted) method. +[`emitted()`](https://v1.test-utils.vuejs.org/api/wrapper/#emitted) method. ## Vue.js Expert Role diff --git a/doc/development/fe_guide/vue3_migration.md b/doc/development/fe_guide/vue3_migration.md index f174408c946..8c8bb36d962 100644 --- a/doc/development/fe_guide/vue3_migration.md +++ b/doc/development/fe_guide/vue3_migration.md @@ -156,6 +156,6 @@ export default { </template> ``` -[In Vue 3](https://v3.vuejs.org/guide/migration/props-default-this.html#props-default-function-this-access), +[In Vue 3](https://v3-migration.vuejs.org/breaking-changes/props-default-this.html), the props default value factory is passed the raw props as an argument, and can also access injections. diff --git a/doc/development/feature_categorization/index.md b/doc/development/feature_categorization/index.md index d6b64001e13..b2d141798fa 100644 --- a/doc/development/feature_categorization/index.md +++ b/doc/development/feature_categorization/index.md @@ -58,7 +58,8 @@ not, the specs will fail. ### Excluding Sidekiq workers from feature categorization A few Sidekiq workers, that are used across all features, cannot be mapped to a -single category. These should be declared as such using the `feature_category_not_owned!` +single category. These should be declared as such using the +`feature_category :not_owned` declaration, as shown below: ```ruby @@ -66,7 +67,7 @@ class SomeCrossCuttingConcernWorker include ApplicationWorker # Declares that this worker does not map to a feature category - feature_category_not_owned! + feature_category :not_owned # rubocop:disable Gitlab/AvoidFeatureCategoryNotOwned # ... end diff --git a/doc/development/feature_flags/process.md b/doc/development/feature_flags/process.md deleted file mode 100644 index f98366beb6b..00000000000 --- a/doc/development/feature_flags/process.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -redirect_to: 'https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/' -remove_date: '2022-03-01' ---- - -This document was moved to [another location](https://about.gitlab.com/handbook/product-development-flow/feature-flag-lifecycle/). - -<!-- This redirect file can be deleted after 2022-03-01. --> -<!-- Redirects that point to other docs in the same project expire in three months. --> -<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> -<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> diff --git a/doc/development/gemfile.md b/doc/development/gemfile.md index 5ff1bc7b127..e0f5a905831 100644 --- a/doc/development/gemfile.md +++ b/doc/development/gemfile.md @@ -58,9 +58,9 @@ to a gem, go through these steps: the [`gitlab-org/ruby/gems` namespace](https://gitlab.com/gitlab-org/ruby/gems/). - To create this project: - 1. Follow the [instructions for new projects](https://about.gitlab.com/handbook/engineering/#creating-a-new-project). - 1. Follow the instructions for setting up a [CI/CD configuration](https://about.gitlab.com/handbook/engineering/#cicd-configuration). - 1. Follow the instructions for [publishing a project](https://about.gitlab.com/handbook/engineering/#publishing-a-project). + 1. Follow the [instructions for new projects](https://about.gitlab.com/handbook/engineering/gitlab-repositories/#creating-a-new-project). + 1. Follow the instructions for setting up a [CI/CD configuration](https://about.gitlab.com/handbook/engineering/gitlab-repositories/#cicd-configuration). + 1. Follow the instructions for [publishing a project](https://about.gitlab.com/handbook/engineering/gitlab-repositories/#publishing-a-project). - See [issue #325463](https://gitlab.com/gitlab-org/gitlab/-/issues/325463) for an example. diff --git a/doc/development/gitlab_flavored_markdown/index.md b/doc/development/gitlab_flavored_markdown/index.md new file mode 100644 index 00000000000..682d8011cd8 --- /dev/null +++ b/doc/development/gitlab_flavored_markdown/index.md @@ -0,0 +1,20 @@ +--- +stage: Create +group: Editor +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Markdown developer documentation **(FREE)** + +This page contains the MVC for the developer documentation for GitLab Flavored Markdown. +For the user documentation about Markdown in GitLab, refer to +[GitLab Flavored Markdown](../../user/markdown.md). + +## GitLab Flavored Markdown specification guide + +The [specification guide](specification_guide/index.md) includes: + +- [Terms and definitions](specification_guide/index.md#terms-and-definitions). +- [Parsing and rendering](specification_guide/index.md#parsing-and-rendering). +- [Goals](specification_guide/index.md#goals). +- [Implementation](specification_guide/index.md#implementation) of the spec. diff --git a/doc/development/gitlab_flavored_markdown/specification_guide/index.md b/doc/development/gitlab_flavored_markdown/specification_guide/index.md new file mode 100644 index 00000000000..021f7bafce9 --- /dev/null +++ b/doc/development/gitlab_flavored_markdown/specification_guide/index.md @@ -0,0 +1,717 @@ +--- +stage: Create +group: Editor +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# GitLab Flavored Markdown (GLFM) Specification Guide **(FREE)** + +GitLab supports Markdown in various places. The Markdown dialect we use is called +GitLab Flavored Markdown, or GLFM. + +The specification for the GLFM dialect is based on the +[GitHub Flavored Markdown (GFM) specification](https://github.github.com/gfm/), +which is in turn based on the [CommonMark specification](https://spec.commonmark.org/current/). +The GLFM specification includes +[several extensions](../../../user/markdown.md#differences-between-gitlab-flavored-markdown-and-standard-markdown) +to the GFM specification. + +See the [section on acronyms](#acronyms-glfm-ghfm-gfm-commonmark) for a +detailed explanation of the various acronyms used in this document. +This guide is a developer-facing document that describes the various terms and +definitions, goals, tools, and implementations related to the GLFM specification. +It is intended to support and augment the [user-facing documentation](../../../user/markdown.md) +for GitLab Flavored Markdown. + +NOTE: +In this document, _GFM_ refers to _GitHub_ Flavored Markdown, not _GitLab_ Flavored Markdown. +Refer to the [section on acronyms](#acronyms-glfm-ghfm-gfm-commonmark) +for a detailed explanation of the various acronyms used in this document. + +NOTE: +This guide and the implementation and files described in it are still a work in +progress. As the work progresses, rewrites and consolidation +between this guide and the [user-facing documentation](../../../user/markdown.md) +for GitLab Flavored Markdown are likely. + +## Terms and definitions + +### Acronyms: GLFM, GHFM, GFM, CommonMark + +[_GitHub_ Flavored Markdown](https://github.github.com/gfm/) is widely referred +to by the acronym GFM, and this document follows that convention as well. +_GitLab_ Flavored Markdown is referred to as GLFM in this document, +to distinguish it from GitHub Flavored Markdown. + +Unfortunately, this convention is not followed consistently in the rest +of the documentation or GitLab codebase. In many places, the GFM +acronym is used to refer to _GitLab_ Flavored Markdown. An +[open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/24592) exists to resolve +this inconsistency. + +Some places in the code refer to both the GitLab and GitHub specifications +simultaneous in the same areas of logic. In these situations, +_GitHub_ Flavored Markdown may be referred to with variable or constant names like +`ghfm_` to avoid confusion. + +The original CommonMark specification is referred to as _CommonMark_ (no acronym). + +### Various Markdown specifications + +The specification format we use is based on the approach used in CommonMark, where +a `spec.txt` file serves as documentation, as well as being in a format that can +serve as input to automated conformance tests. It is +[explained in the CommonMark specification](https://spec.commonmark.org/0.30/#about-this-document): + +> This document attempts to specify Markdown syntax unambiguously. It contains many +> examples with side-by-side Markdown and HTML. These are intended to double as conformance tests. + +The HTML-rendered versions of the specifications: + +- [GitLab Flavored Markdown (GLFM) specification](https://gitlab.com/gitlab-org/gitlab/-/blob/master/glfm_specification/output/spec.html), which extends the: +- [GitHub Flavored Markdown (GFM) specification](https://github.github.com/gfm/), which extends the: +- [CommonMark specification](https://spec.commonmark.org/0.30/) + +NOTE: +The creation of the +[GitLab Flavored Markdown (GLFM) specification](https://gitlab.com/gitlab-org/gitlab/-/blob/master/glfm_specification/output/spec.html) +file is still pending. + +However, GLFM has more complex parsing, rendering, and testing requirements than +GFM or CommonMark. Therefore, +it does not have a static, hardcoded, manually updated `spec.txt`. Instead, the +GLFM `spec.txt` is automatically generated based on other input files. This process +is explained in detail in the [Implementation](#implementation) sections below. + +### Markdown examples + +Everywhere in the context of the specification and this guide, the term +_examples_ is specifically used to refer to the Markdown + HTML pairs used +to illustrate the canonical parsing (or rendering) behavior of various Markdown source +strings in the standard +[CommonMark specification format](https://spec.commonmark.org/0.30/#example-1). + +In this context, it should not be confused with other similar or related meanings of +_example_, such as +[RSpec examples](https://relishapp.com/rspec/rspec-core/docs/example-groups/basic-structure-describe-it). + +### Parsers and renderers + +To understand the various ways in which a specification is used, and how it related +to a given Markdown dialect, it's important to understand the distinction between +a _parser_ and a _renderer_: + +- A Markdown _parser_ accepts Markdown as input and produces a Markdown + Abstract Syntax Tree (AST) as output. +- A Markdown _renderer_ accepts the AST produced by a parser, and produces HTML + (or a PDF, or any other relevant rendering format) as output. + +### Types of Markdown tests driven by the GLFM specification + +The two main types of automated testing are driven by the Markdown +examples and data contained in the GLFM specification. We refer to them as: + +- Markdown conformance testing. +- Markdown snapshot testing. + +Many other types of tests also occur in the GitLab +codebase, and some of these tests are also related to the GLFM Markdown dialect. +Therefore, to avoid confusion, we use these standard terms for the two types +of specification-driven testing referred to in this documentation and elsewhere. + +#### Markdown conformance testing + +_Markdown conformance testing_ refers to the standard testing method used by +all CommonMark Markdown dialects to verify that a specific implementation conforms +to the CommonMark Markdown specification. It is enforced by running the standard +CommonMark tool [`spec_tests.py`](https://github.com/github/cmark-gfm/blob/master/test/spec_tests.py) +against a given `spec.txt` specification and the implementation. + +NOTE: +`spec_tests.py` may eventually be re-implemented in Ruby, to not have a dependency on Python. + +#### Markdown snapshot testing + +_Markdown snapshot testing_ refers to the automated testing performed in +the GitLab codebase, which is driven by snapshot fixture data derived from the +GLFM specification. It consists of both backend RSpec tests and frontend Jest tests +which use the fixture data. This fixture data is contained in YAML files. These files +can be generated and updated based on the Markdown examples in the specification, +and the existing GLFM parser and render implementations. They may also be +manually updated as necessary to test-drive incomplete implementations. +Regarding the terminology used here: + +1. The Markdown snapshot tests can be considered a form of the + [Golden Master Testing approach](https://www.google.com/search?q=golden+master+testing), + which is also referred to as Approval Testing or Characterization Testing. + 1. The term Golden Master originally comes from the recording industry, and + refers to the process of mastering, or making a final mix from which all + other copies are produced. + 1. For more information and background, you can read about + [Characterization Tests](https://en.wikipedia.org/wiki/Characterization_test) and + [Golden Masters](https://en.wikipedia.org/wiki/Gold_master_(disambiguation)). +1. The usage of the term _snapshot_ does not refer to the approach of + [Jest snapshot testing](https://jestjs.io/docs/snapshot-testing), as used elsewhere + in the GitLab frontend testing suite. However, the Markdown snapshot testing does + follow the same philosophy and patterns as Jest snapshot testing: + 1. Snapshot fixture data is represented as files which are checked into source control. + 1. The files can be automatically generated and updated based on the implementation + of the code under tests. + 1. The files can also be manually updated when necessary, for example, to test-drive + changes to an incomplete or buggy implementation. +1. The usage of the term _fixture_ does not refer to standard + [Rails database fixture files](https://api.rubyonrails.org/classes/ActiveRecord/FixtureSet.html). + It instead refers to _test fixtures_ in the + [more generic definition](https://en.wikipedia.org/wiki/Test_fixture#Software), + as input data to support automated testing. However, fixture files still exist, so + they are colocated under the `spec/fixtures` directory with the rest of + the fixture data for the GitLab Rails application. + +## Parsing and Rendering + +The Markdown dialect used in the GitLab application has a dual requirement for rendering: + +1. Rendering to static read-only HTML format, to be displayed in various + places throughout the application. +1. Rendering editable content in the + [Content Editor](https://about.gitlab.com/direction/create/editor/content_editor/), + a ["What You See Is What You Get" (WYSIWYG)](https://en.wikipedia.org/wiki/WYSIWYG) + editor. The Content Editor supports real-time instant switching between an editable + Markdown source and an editable WYSIWYG document. + +These requirements means that GitLab has two independent parser and renderer +implementations: + +1. The backend parser / renderer supports parsing and rendering to _static_ + read-only HTML. It is [implemented in Ruby](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/banzai). + It leverages the [`commonmarker`](https://github.com/gjtorikian/commonmarker) gem, + which is a Ruby wrapper for [`libcmark-gfm`](https://github.com/github/cmark), + GitHub's fork of the reference parser for CommonMark. `libcmark-gfm` is an extended + version of the C reference implementation of [CommonMark](http://commonmark.org/) +1. The frontend parser / renderer supports parsing and _WYSIWYG_ rendering for + the Content Editor. It is implemented in JavaScript. Parsing is based on the + [Remark](https://github.com/remarkjs/remark) Markdown parser, which produces a + MDAST Abstract Syntax Tree (MDAST). Rendering is the process of turning + an MDAST into a [ProseMirror document](../../fe_guide/content_editor.md). Then, + ProseMirror is used to render a ProseMirror document to WYSIWYG HTML. In this + document, we refer to the process of turning Markdown into an MDAST as the + _frontend / JavaScript parser_, and the entire process of rendering Markdown + to WYSIWYG HTML in ProseMirror as the _Content Editor_. Several + requirements drive the need for an independent frontend parser / renderer + implementation, including: + 1. Lack of necessary support for accurate source mapping in the HTML renderer + implementation used on the backend. + 1. Latency and bandwidth concerns: eliminating the need for a round-trip to the backend + every time the user switches between the Markdown source and the WYSIWYG document. + 1. Different HTML and browser rendering requirements for WYSIWYG documents. For example, + displaying read-only elements such as diagrams and references in an editable form. + +### Multiple versions of rendered HTML + +Both of these GLFM renderer implementations (static and WYSIWYG) produce +HTML which differs from the canonical HTML examples from the specification. +For every Markdown example in the GLFM specification, three +versions of HTML can potentially be rendered from the example: + +1. **Static HTML**: HTML produced by the backend (Ruby) renderer, which + contains extra styling and behavioral HTML. For example, **Create task** buttons + added for dynamically creating an issue from a task list item. + The GitLab [Markdown API](../../../api/markdown.md) generates HTML + for a given Markdown string using this method. +1. **WYSIWYG HTML**: HTML produced by the frontend (JavaScript) Content Editor, + which includes parsing and rendering logic. Used to present an editable document + in the ProseMirror WYSIWYG editor. +1. **Canonical HTML**: The clean, basic version of HTML rendered from Markdown. + 1. For the examples which come from the CommonMark specification and + GFM extensions specification, + the canonical HTML is the exact identical HTML found in the + GFM + `spec.txt` example blocks. + 1. For GLFM extensions to the <abbr title="GitHub Flavored Markdown">GFM</abbr> / CommonMark + specification, a `glfm_canonical_examples.txt` + [input specification file](#input-specification-files) contains the + Markdown examples and corresponding canonical HTML examples. + +As the rendered static and WYSIWYG HTML from the backend (Ruby) and frontend (JavaScript) +renderers contain extra HTML, their rendered HTML can be converted to canonical HTML +by a [canonicalization](#canonicalization-of-html) process. + +#### Canonicalization of HTML + +Neither the backend (Ruby) nor the frontend (JavaScript) rendered can directly render canonical HTML. +Nor should they be able to, because: + +- It's not a direct requirement to support any GitLab application feature. +- Adding this feature adds unnecessary requirements and complexity to the implementations. + +Instead, the rendered static or WYSIWYG HTML is converted to canonical HTML by a +_canonicalization_ process. This process can strip all the extra styling and behavioral +HTML from the static or WYSIWYG HTML, resulting in canonical HTML which exactly +matches the Markdown + HTML examples in a standard `spec.txt` specification. + +Use the [`canonicalize-html.rb` script](#canonicalize-htmlrb-script) for this process. +More explanation about this canonicalization process in the sections below. + +NOTE: +Some of the static or WYSIWYG HTML examples may not be representable as canonical +HTML. (For example, when they are represented as an image.) In these cases, the Markdown +conformance test for the example can be skipped by setting `skip_update_example_snapshots: true` +for the example in `glfm_specification/input/gitlab_flavored_markdown/glfm_example_status.yml`. + +## Goals + +Given the constraints above, we have a few goals related to the GLFM +specification and testing infrastructure: + +1. A canonical `spec.txt` exists, and represents the official specification for + GLFM, which meets these requirements: + 1. The spec is a strict superset of the GitHub Flavored Markdown + (GFM) specification, just as + <abbr title="GitHub Flavored Markdown">GFM</abbr> is a strict superset + [of the CommonMark specification](https://github.github.com/gfm/#what-is-github-flavored-markdown-). + Therefore, it contains the superset of all canonical Markdown + HTML examples + for CommonMark, GFM, and GLFM. + 1. It contains a prose introduction section which is specific to GitLab and GLFM. + 1. It contains all other non-introduction sections verbatim from the + GFM + `spec.txt`. + 1. It contains a new extra section for the GLFM GitLab-specific extensions, + with both prose and examples describing the extensions. + 1. It should be in the standard format which can processed by the standard + CommonMark tools [`spec_tests.py`](https://github.com/github/cmark-gfm/blob/master/test/spec_tests.py), + which is a [script used to run the Markdown conformance tests](https://github.github.com/gfm/#about-this-document) + against all examples contained in a `spec.txt`. +1. The GLFM parsers and HTML renderers for + both the static backend (Ruby) and WYSIWYG frontend (JavaScript) implementations + support _consistent_ rendering of all canonical Markdown + HTML examples in the + GLFM `spec.txt` specification, as verified by `spec_tests.py`. + + NOTE: + Consistent does not mean that both of these implementations render + to the identical HTML. They each have different implementation-specific additions + to the HTML they render, so therefore their rendered HTML is + ["canonicalized"](#canonicalization-of-html) to canonical HTML prior running + the Markdown conformance tests. +1. For _both_ the static backend (Ruby) and WYSIWYG frontend (JavaScript) implementations, + a set of example snapshots exists in the form of YAML files, which + correspond to every Markdown example in the GLFM `spec.txt`. These example snapshots + support the following usages for every GLFM Markdown example: + 1. The backend (Ruby) parser and renderer can convert Markdown to the + expected custom static HTML. + 1. The frontend (JavaScript) parser and renderer (which includes GitLab custom + code and Remark) can convert Markdown to the expected ProseMirror JSON + representing a ProseMirror document. + 1. The **Content Editor** (which includes the frontend (JavaScript) parser and renderer, + and ProseMirror) can convert Markdown to the expected custom WYSIWYG HTML as rendered by ProseMirror. + 1. The **Content Editor** can complete a round-trip test, which involves converting + from Markdown, to MDAST, to ProseMirror Document, then back to Markdown. It ensures + the resulting Markdown is exactly identical, with no differences. + +## Implementation + +The following set of scripts and files is complex. However, it allows us to meet +all of the goals listed above, and is carefully designed to meet the following +implementation goals: + +1. Minimize the amount of manual editing, curation, and maintenance of the GLFM specification + and related files. +1. Automate and simplify the process of updating the GLFM specification and related + files when there are changes to the upstream CommonMark spec, + GFM extensions, or the GLFM extensions. +1. Support partial or incomplete implementations of the GLFM specification, whether + due to in-progress work, bugs, or new future Markdown support, while still + performing all functionality for the existing implementations. +1. Automate, simplify, and support running various tests, including the standard + CommonMark conformance tests and GLFM-implementation-specific unit/acceptance + Markdown snapshot tests. +1. Provide a rich set of extensible metadata around all GLFM specification examples + to support current and future requirements, such as automated acceptance + testing and automated documentation updates. + +The documentation on the implementation is split into three sections: + +1. [Scripts](#scripts). +1. [Specification files](#specification-files). +1. Example snapshot files: These YAML files are used as input data + or fixtures to drive the various tests, and are located under + `spec/fixtures/glfm/example_snapshots`. All example snapshot files are automatically + generated based on the specification files and the implementation of the parsers and renderers. + However, they can also be directly edited if necessary, such as to + test-drive an incomplete implementation. + +### Scripts + +These executable scripts perform various tasks related to maintaining +the specification and running tests. Each script has a shell-executable entry point +file located under `scripts/glfm`, but the actual implementation is in unit-tested +classes under `scripts/lib/glfm`. + +NOTE: +Some of these scripts are implemented in Ruby, and others are shell scripts. +Ruby scripts are used for more complex custom scripts, to enable easier unit testing +and debugging. Shell scripts are used for simpler scripts which primarily invoke +other shell commands, to avoid the challenges related to +[running other shell sub-processes](https://github.com/thewoolleyman/process_helper#why-yet-another-ruby-process-wrapper-library) +from Ruby scripts. + +NOTE: +The Ruby executable scripts under `scripts/glfm` have dashes instead of underscores +in the filenames. This naming is non-standard for a Ruby file, but is used to distinguish +them from the corresponding implementation class entry point files under +`scripts/lib/glfm` when searching by filename. + +#### `update-specification.rb` script + +The `scripts/glfm/update-specification.rb` script uses specification input files to +generate and update `spec.txt` (Markdown) and `spec.html` (HTML). The `spec.html` is +generated by passing the generated (or updated) `spec.txt` Markdown to the backend API +for rendering to static HTML: + +```mermaid +graph LR +subgraph script: + A{update-specification.rb} + A --> B{Backend Markdown API} +end +subgraph input:<br/>input specification files + C[gfm_spec_v_0.29.txt] --> A + D[glfm_intro.txt] --> A + E[glfm_canonical_examples.txt] --> A +end +subgraph output:<br/>GLFM specification files + A --> F[spec.txt] + F --> B + B --> G[spec.html] +end +``` + +#### `update-example-snapshots.rb` script + +The `scripts/glfm/update-example-snapshots.rb` script uses input specification +files to update example snapshots: + +```mermaid +graph LR +subgraph script: + A{update-example-snapshots.rb} +end +subgraph input:<br/>input specification files + B[downloaded gfm_spec_v_0.29.txt] --> A + C[glfm_canonical_examples.txt] --> A + D[glfm_example_status.yml] --> A +end +subgraph output:<br/>example snapshot files + A --> E[examples_index.yml] + A --> F[markdown.yml] + A --> G[html.yml] + A --> H[prosemirror_json.yml] +end +``` + +#### `run-snapshot-tests.sh` script + +The `scripts/glfm/run-snapshot-tests.sh` convenience shell script runs all relevant +Markdown snapshot testing RSpec and Jest `*_spec` files (from main app `spec` folder) +which are driven by `example_snapshot` YAML files. + +The actual RSpec and Jest test `*_spec` files (frontend and backend) live +under the normal relevant locations under `spec`, matching the location of their +corresponding implementations. They can be run either: + +- As part of the normal pipelines. +- From the command line or an IDE, just like any other file under `spec`. + +However, they are spread across four different locations: + +- Backend tests under `spec/requests`. +- Backend EE tests under `ee/spec/requests`. +- Frontend tests under `spec/frontend`. +- Frontend EE tests under `ee/spec/frontend`. + +Therefore, this convenience script is intended to only be used in local +development. It simplifies running all tests at once and returning a single return +code. It contains only shell scripting commands for the relevant +`bundle exec rspec ...` and `yarn jest ...` commands. + +```mermaid +graph LR +subgraph script: + A{run-snapshopt-tests.sh} --> B + B[relevant rspec/jest test files] +end +subgraph input:<br/>YAML + C[examples_index.yml] --> B + D[markdown.yml] --> B + E[html.yml] --> B + F[prosemirror_json.yml] --> B +end +subgraph output:<br/>test results/output + B --> G[rspec/jest output] +end +``` + +#### `canonicalize-html.rb` script + +The `scripts/glfm/canonicalize-html.rb` handles the +["canonicalization" of HTML](#canonicalization-of-html). It is a pipe-through +helper script which takes as input a static or WYSIWYG HTML string containing +extra HTML, and outputs a canonical HTML string. + +It is implemented as a standalone, modular, single-purpose script, based on the +[Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy#:~:text=The%20Unix%20philosophy%20emphasizes%20building,developers%20other%20than%20its%20creators.). +It's easy to use when running the standard CommonMark `spec_tests.py` +script, which expects canonical HTML, against the GitLab renderer implementations. + +#### `run-spec-tests.sh` script + +`scripts/glfm/run-spec-tests.sh` is a convenience shell script which runs +conformance specs via the CommonMark standard `spec_tests.py` script, +which uses the `glfm_specification/output/spec.txt` file and `scripts/glfm/canonicalize-html.rb` +helper script to test the GLFM renderer implementations' support for rendering Markdown +specification examples to canonical HTML. + +```mermaid +graph LR +subgraph scripts: + A{run-spec-tests.sh} --> C + subgraph specification testing process + B[canonicalize-html.sh] --> C + C[spec_tests.py] + end +end +subgraph input + D[spec.txt GLFM specification] --> C + E((GLFM static<br/>renderer implementation)) --> B + F((GLFM WYSIWYG<br/>renderer implementation)) --> B +end +subgraph output:<br/>test results/output + C --> G[spec_tests.py output] +end +``` + +### Specification files + +These files represent the GLFM specification itself. They are all +located under the root `glfm_specification`, and are further divided into two +subfolders: + +- `input`: Contains files which are imported or manually edited. +- `output`: Contains files which are automatically generated. + +#### Input specification files + +The `glfm_specification/input` directory contains files which are the original +input to drive all other automated GLFM specification scripts/processes/tests. +They are either downloaded, as in the case of the +GFM `spec.txt` file, or manually +updated, as in the case of all GFM files. + +- `glfm_specification/input/github_flavored_markdown/gfm_spec_v_0.29.txt` - + official latest [GFM spec.txt](https://github.com/github/cmark-gfm/blob/master/test/spec.txt), + automatically downloaded and updated by `update-specification.rb` script. +- `glfm_specification/input/gitlab_flavored_markdown/glfm_intro.txt` - + Manually updated text of intro section for generated GLFM `spec.txt`. + - Replaces GFM version of introductory + section in `spec.txt`. +- `glfm_specification/input/gitlab_flavored_markdown/glfm_canonical_examples.txt` - + Manually updated canonical Markdown+HTML examples for GLFM extensions. + - Standard backtick-delimited `spec.txt` examples format with Markdown + canonical HTML. + - Inserted as a new section before the appendix of generated `spec.txt`. +- `glfm_specification/input/gitlab_flavored_markdown/glfm_example_status.yml` - + Manually updated status of automatic generation of files based on Markdown + examples. + - Allows example snapshot generation, Markdown conformance tests, or + Markdown snapshot tests to be skipped for individual examples. For example, if + they are unimplemented, broken, or cannot be tested for some reason. + +`glfm_specification/input/gitlab_flavored_markdown/glfm_example_status.yml` sample entry: + +```yaml +07_99_an_example_with_incomplete_wysiwyg_implementation_1: + skip_update_example_snapshots: true + skip_running_snapshot_static_html_tests: false + skip_running_snapshot_wysiwyg_html_tests: true + skip_running_snapshot_prosemirror_json_tests: true + skip_running_conformance_static_tests: false + skip_running_conformance_wysiwyg_tests: true +``` + +#### Output specification files + +The `glfm_specification/output` directory contains the CommonMark standard format +`spec.txt` file which represents the canonical GLFM specification which is generated +by the `update-specification.rb` script. It also contains the rendered `spec.html` +and `spec.pdf` which are generated from with the `spec.txt` as input. + +- `glfm_specification/output/spec.txt` - A Markdown file, in the standard format + with prose and Markdown + canonical HTML examples, generated (or updated) by the + `update-specification.rb` script. +- `glfm_specification/output/spec.html` - An HTML file, rendered based on `spec.txt`, + also generated (or updated) by the `update-specification.rb` script at the same time as + `spec.txt`. It corresponds to the HTML-rendered versions of the + "GitHub Flavored Markdown" (<abbr title="GitHub Flavored Markdown">GFM</abbr>) + [specification](https://github.github.com/gfm/) + and the [CommonMark specification](https://spec.commonmark.org/0.30/). + +These output `spec.**` files, which represent the official, canonical GLFM specification +are colocated under the same parent folder `glfm_specification` with the other +`input` specification files. They're located here both for convenience, and because they are all +a mix of manually edited and generated files. In GFM, +`spec.txt` is [located in the test dir](https://github.com/github/cmark-gfm/blob/master/test/spec.txt), +and in CommonMark it's located +[in the project root](https://github.com/github/cmark-gfm/blob/master/test/spec.txt). +No precedent exists for a standard location. In the future, we may decide to +move or copy a hosted version of the rendered HTML `spec.html` version to another location or site. + +### Example snapshot files + +The `example_snapshots` directory contains files which are generated by the +`update-example-snapshots.rb` script based off of the files in the +`glfm_specification/input` directory. They are used as fixtures to drive the +various Markdown snapshot tests. + +After the entire GLFM implementation is complete for both backend (Ruby) and +frontend (JavaScript), all of these YAML files can be automatically generated. +However, while the implementations are still in progress, the `skip_update_example_snapshots` +key in `glfm_specification/input/gitlab_flavored_markdown/glfm_example_status.yml` +can be used to disable automatic generation of some examples, and they can instead +be manually edited as necessary to help drive the implementations. + +#### `spec/fixtures/glfm/example_snapshots/examples_index.yml` + +`spec/fixtures/glfm/example_snapshots/examples_index.yml` is the main list of all +CommonMark, GFM, and GLFM example names, each with a unique canonical name. + +- It is generated from the hierarchical sections and examples in the + GFM `spec.txt` specification. +- For CommonMark and GFM examples, + these sections originally came from the GFM `spec.txt`. +- For GLFM examples, it is generated from `glfm_canonical_examples.txt`, which is + the additional Section 7 in the GLFM `spec.txt`. +- It also contains extra metadata about each example, such as: + 1. `spec_txt_example_position` - The position of the example in the generated GLFM `spec.txt` file. + 1. `source_specification` - Which specification the example originally came from: + `commonmark`, `github`, or `gitlab`. +- The naming convention for example entry names is based on nested header section + names and example index within the header. + - This naming convention should result in fairly stable names and example positions. + The CommonMark / GLFM specification rarely changes, and most GLFM + examples where multiple examples exist for the same Section 7 subsection are + added to the end of the sub-section. + +`spec/fixtures/glfm/example_snapshots/examples_index.yml` sample entries: + +```yaml +02_01_preliminaries_characters_and_lines_1: + spec_txt_example_position: 1 + source_specification: commonmark +03_01_blocks_and_inlines_precedence_1: + spec_txt_example_position: 12 + source_specification: commonmark +05_03_container_blocks_task_list_items_1: + spec_txt_example_position: 279 + source_specification: github +06_04_inlines_emphasis_and_strong_emphasis_1: + spec_txt_example_position: 360 + source_specification: github +07_01_audio_link_1: + spec_txt_example_position: 301 + source_specification: gitlab +``` + +#### `spec/fixtures/glfm/example_snapshots/markdown.yml` + +`spec/fixtures/glfm/example_snapshots/markdown.yml` contains the original Markdown +for each entry in `spec/fixtures/glfm/example_snapshots/examples_index.yml` + +- For CommonMark and GFM Markdown, + it is generated (or updated) from the standard GFM + `spec.txt` using the `update-example-snapshots.rb` script. +- For GLFM, it is generated (or updated) from the + `glfm_specification/input/gitlab_flavored_markdown/glfm_canonical_examples.txt` + input specification file. + +`spec/fixtures/glfm/example_snapshots/markdown.yml` sample entry: + +```yaml +06_04_inlines_emphasis_and_strong_emphasis_1: |- + *foo bar* +``` + +#### `spec/fixtures/glfm/example_snapshots/html.yml` + +`spec/fixtures/glfm/example_snapshots/html.yml` contains the HTML for each entry in +`spec/fixtures/glfm/example_snapshots/examples_index.yml` + +Three types of entries exist, with different HTML for each: + +- **Canonical** + - The ["Canonical"](#canonicalization-of-html) HTML. + - For CommonMark and GFM examples, the HTML comes from the examples in `spec.txt`. + - For GLFM examples, it is generated/updated from + `glfm_specification/input/gitlab_flavored_markdown/glfm_canonical_examples.txt`. +- **Static** + - This is the static (backend (Ruby)-generated) HTML for each entry in + `spec/fixtures/glfm/example_snapshots/examples_index.yml`. + - It is generated/updated from backend [Markdown API](../../../api/markdown.md) + (or the underlying internal classes) via the `update-example-snapshots.rb` script, + but can be manually updated for static examples with incomplete implementations. +- **WYSIWYG** + - The WYSIWYG (frontend, JavaScript-generated) HTML for each entry in + `spec/fixtures/glfm/example_snapshots/examples_index.yml`. + - It is generated (or updated) from the frontend Content Editor implementation via the + `update-example-snapshots.rb` script. It can be manually updated for WYSIWYG + examples with incomplete implementations. + +Any exceptions or failures which occur when generating HTML are replaced with an +`Error - check implementation` value. + +`spec/fixtures/glfm/example_snapshots/html.yml` sample entry: + +```yaml +06_04_inlines_emphasis_and_strong_emphasis_1: + canonical: |- + <p><em>foo bar</em></p> + static: |- + <p data-sourcepos="1:1-1:9" dir="auto"><strong>foo bar</strong></p> + wysiwyg: |- + <p><strong>foo bar</strong></p> +``` + +NOTE: +The actual `static` or `WYSIWYG` entries may differ from the example `html.yml`, +depending on how the implementations evolve. + +#### `spec/fixtures/glfm/example_snapshots/prosemirror_json.yml` + +`spec/fixtures/glfm/example_snapshots/prosemirror_json.yml` contains the ProseMirror +JSON for each entry in `spec/fixtures/glfm/example_snapshots/examples_index.yml` + +- It is generated (or updated) from the frontend code via the `update-example-snapshots.rb` + script, but can be manually updated for examples with incomplete implementations. +- Any exceptions or failures when generating are replaced with a `Error - check implementation` value. + +`spec/fixtures/glfm/example_snapshots/prosemirror_json.yml` sample entry: + +```yaml +06_04_inlines_emphasis_and_strong_emphasis_1: |- + { + "type": "doc", + "content": [ + { + "type": "paragraph", + "content": [ + { + "type": "text", + "marks": [ + { + "type": "bold" + } + ], + "text": "foo bar" + }, + ] + }, + ] + } +``` diff --git a/doc/development/go_guide/go_upgrade.md b/doc/development/go_guide/go_upgrade.md index a99253b9723..3267d1262f0 100644 --- a/doc/development/go_guide/go_upgrade.md +++ b/doc/development/go_guide/go_upgrade.md @@ -76,9 +76,27 @@ if you need help finding the correct person or labels: 1. Create the epic in `gitlab-org` group: - Title the epic `Update Go version to <VERSION_NUMBER>`. - Ping the engineering managers responsible for [the projects listed below](#known-dependencies-using-go). + - Most engineering managers can be identified on + [the product page](https://about.gitlab.com/handbook/product/categories/) or the + [feature page](https://about.gitlab.com/handbook/product/categories/features/). + - If you still can't find the engineering manager, use + [Git blame](/ee/user/project/repository/git_blame.md) to identify a maintainer + involved in the project. + +1. Create an upgrade issue for each dependency in the + [location indicated below](#known-dependencies-using-go) titled + `Support building with Go <VERSION_NUMBER>`. Add the proper labels to each issue + for easier triage. These should include the stage, group and section. + - The issue should be assigned by a member of the maintaining group. + - The milestone should be assigned by a member of the maintaining group. -1. Create an upgrade issue for each dependency in the [location indicated below](#known-dependencies-using-go) - titled `Support building with Go <VERSION_NUMBER>`. Add the proper label to each issue for easier triage. + NOTE: + Some overlap exists between project dependencies. When creating an issue for a + dependency that is part of a larger product, note the relationship in the issue + body. For example: Projects built in the context of Omnibus GitLab have their + runtime Go version managed by Omnibus, but "support" and compatibility should + be a concern of the individual project. Issues in the parent project's dependencies + issue should be about adding support for the updated Go version. NOTE: The upgrade issues must include [upgrade validation items](#upgrade-validation) @@ -94,9 +112,10 @@ if you need help finding the correct person or labels: - [Composition Analysis tracker](https://gitlab.com/gitlab-org/gitlab/-/issues). - [Container Security tracker](https://gitlab.com/gitlab-org/gitlab/-/issues). - NOTE: - Updates to these Security analyzers should not block upgrades to Charts or Omnibus since - the analyzers are built independently as separate container images. + NOTE: + Updates to these Security analyzers should not block upgrades to Charts or Omnibus since + the analyzers are built independently as separate container images. + 1. Schedule builder updates with Distribution projects: - Dependency and GitLab Development Kit issues created in previous steps should be set as blockers. - Each issue should have the title `Support building with Go <VERSION_NUMBER>` and description as noted: diff --git a/doc/development/i18n/externalization.md b/doc/development/i18n/externalization.md index 65cf8911e12..2aea15de443 100644 --- a/doc/development/i18n/externalization.md +++ b/doc/development/i18n/externalization.md @@ -786,7 +786,7 @@ The reasoning behind this is that in some languages words change depending on co in Japanese は is added to the subject of a sentence and を to the object. This is impossible to translate correctly if you extract individual words from the sentence. -When in doubt, try to follow the best practices described in this [Mozilla Developer documentation](https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localization_content_best_practices#Splitting). +When in doubt, try to follow the best practices described in this [Mozilla Developer documentation](https://mozilla-l10n.github.io/documentation/localization/dev_best_practices.html#splitting-and-composing-sentences). ### Always pass string literals to the translation helpers diff --git a/doc/development/i18n/proofreader.md b/doc/development/i18n/proofreader.md index 7c9777527ef..afc04045763 100644 --- a/doc/development/i18n/proofreader.md +++ b/doc/development/i18n/proofreader.md @@ -20,7 +20,7 @@ are very appreciative of the work done by translators and proofreaders! - Arabic - Proofreaders needed. - Bosnian - - Proofreaders needed. + - Haris Delalić - [GitLab](https://gitlab.com/haris.delalic), [Crowdin](https://crowdin.com/profile/haris.delalic) - Bulgarian - Lyubomir Vasilev - [Crowdin](https://crowdin.com/profile/lyubomirv) - Catalan @@ -38,7 +38,7 @@ are very appreciative of the work done by translators and proofreaders! - Victor Wu - [GitLab](https://gitlab.com/_victorwu_), [Crowdin](https://crowdin.com/profile/victorwu) - Ivan Ip - [GitLab](https://gitlab.com/lifehome), [Crowdin](https://crowdin.com/profile/lifehome) - Croatian - - Proofreaders needed. + - Haris Delalić - [GitLab](https://gitlab.com/haris.delalic), [Crowdin](https://crowdin.com/profile/haris.delalic) - Czech - Jan Urbanec - [GitLab](https://gitlab.com/TatranskyMedved), [Crowdin](https://crowdin.com/profile/Tatranskymedved) - Danish @@ -111,7 +111,7 @@ are very appreciative of the work done by translators and proofreaders! - Andrey Komarov - [GitLab](https://gitlab.com/elkamarado), [Crowdin](https://crowdin.com/profile/kamarado) - Iaroslav Postovalov - [GitLab](https://gitlab.com/CMDR_Tvis), [Crowdin](https://crowdin.com/profile/CMDR_Tvis) - Serbian (Latin and Cyrillic) - - Proofreaders needed. + - Haris Delalić - [GitLab](https://gitlab.com/haris.delalic), [Crowdin](https://crowdin.com/profile/haris.delalic) - Sinhalese/Sinhala සිංහල - හෙළබස (HelaBasa) - [GitLab](https://gitlab.com/helabasa), [Crowdin](https://crowdin.com/profile/helabasa) - Slovak diff --git a/doc/development/img/merge_request_reports_v14_7.png b/doc/development/img/merge_request_reports_v14_7.png Binary files differindex 282d6f96aa6..1c06e7f4fd0 100644 --- a/doc/development/img/merge_request_reports_v14_7.png +++ b/doc/development/img/merge_request_reports_v14_7.png diff --git a/doc/development/img/merge_widget_v14_7.png b/doc/development/img/merge_widget_v14_7.png Binary files differindex d5e8ed8df52..86bc11802d1 100644 --- a/doc/development/img/merge_widget_v14_7.png +++ b/doc/development/img/merge_widget_v14_7.png diff --git a/doc/development/index.md b/doc/development/index.md index 048112215fc..5c0cc7f9718 100644 --- a/doc/development/index.md +++ b/doc/development/index.md @@ -97,9 +97,8 @@ a given group, request an engineering review from one of the group's members. After the engineering review is complete, assign the MR to the [Technical Writer associated with the stage and group](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments) in the modified documentation page's metadata. - -If you have questions or need further input, request a review from the -Technical Writer assigned to the [Development Guidelines](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines). +If the page is not assigned to a specific group, follow the +[Technical Writing review process for development guidelines](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines). #### Broader changes @@ -139,12 +138,13 @@ In these cases, use the following workflow: and approval from the VP of Development, the DRI for Development Guidelines, @clefelhocz1. -1. After all approvals are complete, review the page's metadata to - [find a Technical Writer](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments) - who can help you merge the changes. - for final content review and merge. The Technical Writer may ask for - additional approvals as previously suggested before merging the MR. - +1. After all approvals are complete, assign the MR to the + [Technical Writer associated with the stage and group](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments) + in the modified documentation page's metadata. + If the page is not assigned to a specific group, follow the + [Technical Writing review process for development guidelines](https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines). + The Technical Writer may ask for additional approvals as previously suggested before merging the MR. + ### Reviewer values > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/57293) in GitLab 14.1. diff --git a/doc/development/integrations/jira_connect.md b/doc/development/integrations/jira_connect.md index 5391b2c119e..26ef67c937c 100644 --- a/doc/development/integrations/jira_connect.md +++ b/doc/development/integrations/jira_connect.md @@ -79,7 +79,7 @@ If you use Gitpod and you get an error about Jira not being able to access the d > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/81126) in GitLab 14.9 [with a flag](../../administration/feature_flags.md) named `jira_connect_oauth`. Disabled by default. -GitLab for Jira users can authenticate with GitLab using GitLab OAuth. +GitLab for Jira users can authenticate with GitLab using GitLab OAuth. WARNING: This feature is not ready for production use. The feature flag should only be enabled in development. diff --git a/doc/development/integrations/secure.md b/doc/development/integrations/secure.md index 11fb06bd128..5f7cccdab64 100644 --- a/doc/development/integrations/secure.md +++ b/doc/development/integrations/secure.md @@ -327,6 +327,21 @@ You can find the schemas for these scanners here: - [Coverage Fuzzing](https://gitlab.com/gitlab-org/security-products/security-report-schemas/-/blob/master/dist/coverage-fuzzing-report-format.json) - [Secret Detection](https://gitlab.com/gitlab-org/security-products/security-report-schemas/-/blob/master/dist/secret-detection-report-format.json) +### Retention period for vulnerabilities + +GitLab has the following retention policies for vulnerabilities on non-default branches. Vulnerabilities are no longer available: + +- When the related CI job artifact expires. +- 90 days after the pipeline is created, even if the related CI job artifacts are locked. + +To view vulnerabilities, either: + +- Re-run the pipeline. +- Download the related CI job artifacts if they are available. + +NOTE: +This does not apply for the vulnerabilities existing on the default branch. + ### Enable report validation > [Deprecated](https://gitlab.com/gitlab-org/gitlab/-/issues/354928) in GitLab 14.9, and planned for removal in GitLab 15.0. diff --git a/doc/development/internal_api/index.md b/doc/development/internal_api/index.md index ef58d6c2c44..cdbc674e0a5 100644 --- a/doc/development/internal_api/index.md +++ b/doc/development/internal_api/index.md @@ -621,7 +621,7 @@ Example response: "name":"premium", "trial":false, "auto_renew":null, - "upgradable":false + "upgradable":false, }, "usage": { "seats_in_subscription":10, @@ -672,7 +672,7 @@ Example response: "name":"premium", "trial":false, "auto_renew":null, - "upgradable":false + "upgradable":false, }, "usage": { "seats_in_subscription":80, @@ -711,7 +711,8 @@ Example response: "name":"premium", "trial":false, "auto_renew":null, - "upgradable":false + "upgradable":false, + "exclude_guests":false, }, "usage": { "seats_in_subscription":80, diff --git a/doc/development/iterating_tables_in_batches.md b/doc/development/iterating_tables_in_batches.md index 38cdbdf5b79..8813fe560db 100644 --- a/doc/development/iterating_tables_in_batches.md +++ b/doc/development/iterating_tables_in_batches.md @@ -93,7 +93,7 @@ falling into an endless loop as described in following When dealing with data migrations the preferred way to iterate over a large volume of data is using `EachBatch`. -A special case of data migration is a [background migration](background_migrations.md#scheduling) +A special case of data migration is a [background migration](database/background_migrations.md#scheduling) where the actual data modification is executed in a background job. The migration code that determines the data ranges (slices) and schedules the background jobs uses `each_batch`. diff --git a/doc/development/licensed_feature_availability.md b/doc/development/licensed_feature_availability.md index 0de3f94cf70..6df5c2164e8 100644 --- a/doc/development/licensed_feature_availability.md +++ b/doc/development/licensed_feature_availability.md @@ -1,6 +1,6 @@ --- stage: Fulfillment -group: License +group: Provision info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- diff --git a/doc/development/merge_request_concepts/index.md b/doc/development/merge_request_concepts/index.md index f1dab69543a..90e8ff41368 100644 --- a/doc/development/merge_request_concepts/index.md +++ b/doc/development/merge_request_concepts/index.md @@ -49,7 +49,7 @@ When all of the required merge checks are satisfied a merge request becomes merg ### Approvals -Approval rules specify users that are required to or can optionally approve a merge request based on some kind of organizational policy. When approvals are required, they effectively become a required merge check. The key differentiator between merge checks and approval rules is that users **do** interact with approval rules, by deciding to approve the merge request. +Approval rules specify users that are required to or can optionally approve a merge request based on some kind of organizational policy. When approvals are required, they effectively become a required merge check. The key differentiator between merge checks and approval rules is that users **do** interact with approval rules, by deciding to approve the merge request. Additionally, approval settings provide configuration options to define how those approval rules are applied in a merge request. They can set limitations, add requirements, or modify approvals. @@ -58,5 +58,5 @@ Examples of approval rules and settings include: 1. [merge request approval rules](../../user/project/merge_requests/approvals/rules.md) 1. [code owner approvals](../../user/project/code_owners.md) 1. [security approvals](../../user/application_security/index.md#security-approvals-in-merge-requests) -1. [prevent editing approval rules](../../user/project/merge_requests/approvals/settings.md#prevent-editing-approval-rules-in-merge-requests)] +1. [prevent editing approval rules](../../user/project/merge_requests/approvals/settings.md#prevent-editing-approval-rules-in-merge-requests) 1. [remove all approvals when commits are added](../../user/project/merge_requests/approvals/settings.md#remove-all-approvals-when-commits-are-added-to-the-source-branch) diff --git a/doc/development/merge_request_performance_guidelines.md b/doc/development/merge_request_performance_guidelines.md index 40f02f4fb6f..fe8e730d64e 100644 --- a/doc/development/merge_request_performance_guidelines.md +++ b/doc/development/merge_request_performance_guidelines.md @@ -16,7 +16,7 @@ with and agreed upon by backend maintainers and performance specialists. It's also highly recommended that you read the following guides: - [Performance Guidelines](performance.md) -- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md) +- [Avoiding downtime in migrations](database/avoiding_downtime_in_migrations.md) ## Definition diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md index d85b7372814..086e061452b 100644 --- a/doc/development/migration_style_guide.md +++ b/doc/development/migration_style_guide.md @@ -45,14 +45,14 @@ work it needs to perform and how long it takes to complete: One exception is a migration that takes longer but is absolutely critical for the application to operate correctly. For example, you might have indices that enforce unique tuples, or that are needed for query performance in critical parts of the application. In cases where the migration would be unacceptably slow, however, a better option might be to guard the feature with a [feature flag](feature_flags/index.md) and perform a post-deployment migration instead. The feature can then be turned on after the migration finishes. -1. [**Post-deployment migrations.**](post_deployment_migrations.md) These are Rails migrations in `db/post_migrate` and +1. [**Post-deployment migrations.**](database/post_deployment_migrations.md) These are Rails migrations in `db/post_migrate` and run _after_ new application code has been deployed (for GitLab.com after the production deployment has finished). They can be used for schema changes that aren't critical for the application to operate, or data migrations that take at most a few minutes. Common examples for schema changes that should run post-deploy include: - Clean-ups, like removing unused columns. - Adding non-critical indices on high-traffic tables. - Adding non-critical indices that take a long time to create. -1. [**Background migrations.**](background_migrations.md) These aren't regular Rails migrations, but application code that is +1. [**Background migrations.**](database/background_migrations.md) These aren't regular Rails migrations, but application code that is executed via Sidekiq jobs, although a post-deployment migration is used to schedule them. Use them only for data migrations that exceed the timing guidelines for post-deploy migrations. Background migrations should _not_ change the schema. @@ -129,13 +129,13 @@ TARGET=12-9-stable-ee scripts/regenerate-schema ## Avoiding downtime -The document ["Avoiding downtime in migrations"](avoiding_downtime_in_migrations.md) specifies +The document ["Avoiding downtime in migrations"](database/avoiding_downtime_in_migrations.md) specifies various database operations, such as: -- [dropping and renaming columns](avoiding_downtime_in_migrations.md#dropping-columns) -- [changing column constraints and types](avoiding_downtime_in_migrations.md#changing-column-constraints) -- [adding and dropping indexes, tables, and foreign keys](avoiding_downtime_in_migrations.md#adding-indexes) -- [migrating `integer` primary keys to `bigint`](avoiding_downtime_in_migrations.md#migrating-integer-primary-keys-to-bigint) +- [dropping and renaming columns](database/avoiding_downtime_in_migrations.md#dropping-columns) +- [changing column constraints and types](database/avoiding_downtime_in_migrations.md#changing-column-constraints) +- [adding and dropping indexes, tables, and foreign keys](database/avoiding_downtime_in_migrations.md#adding-indexes) +- [migrating `integer` primary keys to `bigint`](database/avoiding_downtime_in_migrations.md#migrating-integer-primary-keys-to-bigint) and explains how to perform them without requiring downtime. @@ -219,7 +219,7 @@ in that limit. Singular query timings should fit within the [standard limit](que In case you need to insert, update, or delete a significant amount of data, you: - Must disable the single transaction with `disable_ddl_transaction!`. -- Should consider doing it in a [Background Migration](background_migrations.md). +- Should consider doing it in a [Background Migration](database/background_migrations.md). ## Migration helpers and versioning @@ -240,7 +240,7 @@ of migration helpers. In this example, we use version 1.0 of the migration class: ```ruby -class TestMigration < Gitlab::Database::Migration[1.0] +class TestMigration < Gitlab::Database::Migration[2.0] def change end end @@ -253,7 +253,7 @@ version of migration helpers automatically. Migration helpers and versioning were [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68986) in GitLab 14.3. For merge requests targeting previous stable branches, use the old format and still inherit from -`ActiveRecord::Migration[6.1]` instead of `Gitlab::Database::Migration[1.0]`. +`ActiveRecord::Migration[6.1]` instead of `Gitlab::Database::Migration[2.0]`. ## Retry mechanism when acquiring database locks @@ -535,7 +535,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration class like so: ```ruby -class MyMigration < Gitlab::Database::Migration[1.0] +class MyMigration < Gitlab::Database::Migration[2.0] disable_ddl_transaction! INDEX_NAME = 'index_name' @@ -586,7 +586,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration class like so: ```ruby -class MyMigration < Gitlab::Database::Migration[1.0] +class MyMigration < Gitlab::Database::Migration[2.0] disable_ddl_transaction! INDEX_NAME = 'index_name' @@ -629,7 +629,7 @@ The easiest way to test for existence of an index by name is to use the be used with a name option. For example: ```ruby -class MyMigration < Gitlab::Database::Migration[1.0] +class MyMigration < Gitlab::Database::Migration[2.0] INDEX_NAME = 'index_name' def up @@ -664,7 +664,7 @@ Here's an example where we add a new column with a foreign key constraint. Note it includes `index: true` to create an index for it. ```ruby -class Migration < Gitlab::Database::Migration[1.0] +class Migration < Gitlab::Database::Migration[2.0] def change add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade } @@ -710,7 +710,7 @@ expensive and disruptive operation for larger tables, but in reality it's not. Take the following migration as an example: ```ruby -class DefaultRequestAccessGroups < Gitlab::Database::Migration[1.0] +class DefaultRequestAccessGroups < Gitlab::Database::Migration[2.0] def change change_column_default(:namespaces, :request_access_enabled, from: false, to: true) end @@ -943,7 +943,7 @@ The Rails 5 natively supports `JSONB` (binary JSON) column type. Example migration adding this column: ```ruby -class AddOptionsToBuildMetadata < Gitlab::Database::Migration[1.0] +class AddOptionsToBuildMetadata < Gitlab::Database::Migration[2.0] def change add_column :ci_builds_metadata, :config_options, :jsonb end @@ -975,7 +975,7 @@ Do not store `attr_encrypted` attributes as `:text` in the database; use efficient: ```ruby -class AddSecretToSomething < Gitlab::Database::Migration[1.0] +class AddSecretToSomething < Gitlab::Database::Migration[2.0] def change add_column :something, :encrypted_secret, :binary add_column :something, :encrypted_secret_iv, :binary @@ -1033,8 +1033,8 @@ If you need more complex logic, you can define and use models local to a migration. For example: ```ruby -class MyMigration < Gitlab::Database::Migration[1.0] - class Project < ActiveRecord::Base +class MyMigration < Gitlab::Database::Migration[2.0] + class Project < MigrationRecord self.table_name = 'projects' end @@ -1114,7 +1114,7 @@ by an integer. For example: `users` would turn into `users0` ## Using models in migrations (discouraged) The use of models in migrations is generally discouraged. As such models are -[contraindicated for background migrations](background_migrations.md#isolation), +[contraindicated for background migrations](database/background_migrations.md#isolation), the model needs to be declared in the migration. If using a model in the migrations, you should first @@ -1132,8 +1132,8 @@ in a previous migration. It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information. ```ruby -class AddAndSeedMyColumn < Gitlab::Database::Migration[1.0] - class User < ActiveRecord::Base +class AddAndSeedMyColumn < Gitlab::Database::Migration[2.0] + class User < MigrationRecord self.table_name = 'users' end diff --git a/doc/development/new_fe_guide/modules/widget_extensions.md b/doc/development/new_fe_guide/modules/widget_extensions.md index d3cd839464d..638a0a2a85b 100644 --- a/doc/development/new_fe_guide/modules/widget_extensions.md +++ b/doc/development/new_fe_guide/modules/widget_extensions.md @@ -40,6 +40,7 @@ export default { summary(data) {}, // Required: Level 1 summary text statusIcon(data) {}, // Required: Level 1 status icon tertiaryButtons() {}, // Optional: Level 1 action buttons + shouldCollapse() {}, // Optional: Add logic to determine if the widget can expand or not }, methods: { fetchCollapsedData(props) {}, // Required: Fetches data required for collapsed state diff --git a/doc/development/pipelines.md b/doc/development/pipelines.md index 2aef0e10314..e0b236bc5fc 100644 --- a/doc/development/pipelines.md +++ b/doc/development/pipelines.md @@ -187,7 +187,7 @@ See the [experiment issue](https://gitlab.com/gitlab-org/quality/team-tasks/-/is #### Automatic retry of failing tests in a separate process -When the `$RETRY_FAILED_TESTS_IN_NEW_PROCESS` variable is set to `true`, RSpec tests that failed are automatically retried once in a separate +Unless `$RETRY_FAILED_TESTS_IN_NEW_PROCESS` variable is set to `false` (`true` by default), RSpec tests that failed are automatically retried once in a separate RSpec process. The goal is to get rid of most side-effects from previous tests that may lead to a subsequent test failure. We keep track of retried tests in the `$RETRIED_TESTS_REPORT_FILE` file saved as artifact by the `rspec:flaky-tests-report` job. diff --git a/doc/development/post_deployment_migrations.md b/doc/development/post_deployment_migrations.md index 6ab3620c197..c3922718e77 100644 --- a/doc/development/post_deployment_migrations.md +++ b/doc/development/post_deployment_migrations.md @@ -1,81 +1,11 @@ --- -stage: none -group: unassigned -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'database/post_deployment_migrations.md' +remove_date: '2022-07-08' --- -# Post Deployment Migrations +This document was moved to [another location](database/post_deployment_migrations.md). -Post deployment migrations are regular Rails migrations that can optionally be -executed after a deployment. By default these migrations are executed alongside -the other migrations. To skip these migrations you must set the -environment variable `SKIP_POST_DEPLOYMENT_MIGRATIONS` to a non-empty value -when running `rake db:migrate`. - -For example, this would run all migrations including any post deployment -migrations: - -```shell -bundle exec rake db:migrate -``` - -This however skips post deployment migrations: - -```shell -SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate -``` - -## Deployment Integration - -Say you're using Chef for deploying new versions of GitLab and you'd like to run -post deployment migrations after deploying a new version. Let's assume you -normally use the command `chef-client` to do so. To make use of this feature -you'd have to run this command as follows: - -```shell -SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo chef-client -``` - -Once all servers have been updated you can run `chef-client` again on a single -server _without_ the environment variable. - -The process is similar for other deployment techniques: first you would deploy -with the environment variable set, then you re-deploy a single -server but with the variable _unset_. - -## Creating Migrations - -To create a post deployment migration you can use the following Rails generator: - -```shell -bundle exec rails g post_deployment_migration migration_name_here -``` - -This generates the migration file in `db/post_migrate`. These migrations -behave exactly like regular Rails migrations. - -## Use Cases - -Post deployment migrations can be used to perform migrations that mutate state -that an existing version of GitLab depends on. For example, say you want to -remove a column from a table. This requires downtime as a GitLab instance -depends on this column being present while it's running. Normally you'd follow -these steps in such a case: - -1. Stop the GitLab instance -1. Run the migration removing the column -1. Start the GitLab instance again - -Using post deployment migrations we can instead follow these steps: - -1. Deploy a new version of GitLab while ignoring post deployment migrations -1. Re-run `rake db:migrate` but without the environment variable set - -Here we don't need any downtime as the migration takes place _after_ a new -version (which doesn't depend on the column anymore) has been deployed. - -Some other examples where these migrations are useful: - -- Cleaning up data generated due to a bug in GitLab -- Removing tables -- Migrating jobs from one Sidekiq queue to another +<!-- This redirect file can be deleted after <2022-07-08>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> diff --git a/doc/development/product_qualified_lead_guide/index.md b/doc/development/product_qualified_lead_guide/index.md index 6943f931d79..2395689ada2 100644 --- a/doc/development/product_qualified_lead_guide/index.md +++ b/doc/development/product_qualified_lead_guide/index.md @@ -10,6 +10,41 @@ The Product Qualified Lead (PQL) funnel connects our users with our team members A hand-raise PQL is a user who requests to speak to sales from within the product. +## Set up your development environment + +1. Set up GDK with a connection to your local CustomersDot instance. +1. Set up CustomersDot to talk to a staging instance of Platypus. + +1. Set up CustomersDot using the [normal install instructions](https://gitlab.com/gitlab-org/customers-gitlab-com/-/blob/staging/doc/setup/installation_steps.md). +1. Set the `CUSTOMER_PORTAL_URL` env var to your local (or ngrok) URL of your CustomersDot instance. +1. Place `export CUSTOMER_PORTAL_URL='https://XXX.ngrok.io/'` in your shell rc script (~/.zshrc or ~/.bash_profile or ~/.bashrc) and restart GDK. +1. Enter the credentials on CustomersDot development to Platypus in your `/config/secrets.yml` and restart. Credentials for the Platypus Staging are in the 1Password Growth vault. The URL for staging is `https://staging.ci.nexus.gitlabenvironment.cloud`. + +```yaml + platypus_url: "<%= ENV['PLATYPUS_URL'] %>" + platypus_client_id: "<%= ENV['PLATYPUS_CLIENT_ID'] %>" + platypus_client_secret: "<%= ENV['PLATYPUS_CLIENT_SECRET'] %>" +``` + +### Set up lead monitoring + +1. Set up access for Platypus Staging `https://staging.ci.nexus.gitlabenvironment.cloud` using the Platypus Staging credentials in the 1Password Growth vault. +1. Set up access for the Marketo sandbox, similar [to this example request](https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/13162). + +### Manually test leads + +1. Register a new user with a unique email on your local GitLab instance. +1. Send the PQL lead by submitting your new form or creating a new trial or a new hand raise lead. +1. Use easily identifiable values that can be easily seen in Platypus staging. +1. Observe the entry in the staging instance of Platypus and paste in the merge request comment and mention. + +## Troubleshooting + +- Check the application and Sidekiq logs on `gitlab.com` and CustomersDot to monitor leads. +- Check the `leads` table in CustomersDot. +- Set up staging credentials for Platypus, and track the leads on the [Platypus Dashboard](https://staging.ci.nexus.gitlabenvironment.cloud/admin/queues/queue/new-lead-queue). +- Ask for access to the Marketo Sandbox and validate the leads there, [to this example request](https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/13162). + ## Embed a hand-raise lead form [HandRaiseLeadButton](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/assets/javascripts/hand_raise_leads/hand_raise_lead/components/hand_raise_lead_button.vue) is a reusable component that adds a button and a hand-raise modal to any screen. @@ -92,7 +127,7 @@ The flow of a PQL lead is as follows: 1. Marketo does scoring and sends the form to Salesforce. 1. Our Sales team uses Salesforce to connect to the leads. -### Trial lead flow +### Trial lead flow #### Trial lead flow on GitLab.com @@ -131,7 +166,7 @@ sequenceDiagram HostedPlans|CreateTrialService->create_trial_history#: Creates a record in trial_histories table ``` -### Hand raise lead flow +### Hand raise lead flow #### Hand raise flow on GitLab.com @@ -161,11 +196,4 @@ sequenceDiagram Platypus->>Workato: [lead] Workato->>Marketo: [lead] Marketo->>Salesforce(SFDC): [lead] -``` - -## Monitor and manually test leads - -- Check the application and Sidekiq logs on `gitlab.com` and CustomersDot to monitor leads. -- Check the `leads` table in CustomersDot. -- Set up staging credentials for Platypus, and track the leads on the [Platypus Dashboard](https://staging.ci.nexus.gitlabenvironment.cloud/admin/queues/queue/new-lead-queue). -- Ask for access to the Marketo Sandbox and validate the leads there. +``` diff --git a/doc/development/project_templates.md b/doc/development/project_templates.md new file mode 100644 index 00000000000..74ded9c93fc --- /dev/null +++ b/doc/development/project_templates.md @@ -0,0 +1,157 @@ +--- +stage: Manage +group: Workspace +info: "To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments" +--- + +# Contribute to GitLab project templates + +Thanks for considering a contribution to the GitLab +[built-in project templates](../user/project/working_with_projects.md#create-a-project-from-a-built-in-template). + +## Prerequisites + +To add a new or update an existing template, you must have the following tools +installed: + +- `wget` +- `tar` +- `jq` + +## Create a new project + +To contribute a new built-in project template to be distributed with GitLab: + +1. Create a new public project with the project content you'd like to contribute + in a namespace of your choosing. You can [view a working example](https://gitlab.com/gitlab-org/project-templates/dotnetcore). + Projects should be as simple as possible and free of any unnecessary assets or dependencies. +1. When the project is ready for review, [create a new issue](https://gitlab.com/gitlab-org/gitlab/issues) with a link to your project. + In your issue, `@` mention the relevant Backend Engineering Manager and Product + Manager for the [Templates feature](https://about.gitlab.com/handbook/product/categories/#source-code-group). + +## Add the SVG icon to GitLab SVGs + +If the template you're adding has an SVG icon, you need to first add it to +<https://gitlab.com/gitlab-org/gitlab-svgs>: + +1. Follow the steps outlined in the + [GitLab SVGs project](https://gitlab.com/gitlab-org/gitlab-svgs/-/blob/main/README.md#adding-icons-or-illustrations) + and submit a merge request. +1. When the merge request is merged, `gitlab-bot` will pull the new changes in + the `gitlab-org/gitlab` project. +1. You can now continue on the vendoring process. + +## Vendoring process + +To make the project template available when creating a new project, the vendoring +process will have to be completed: + +1. [Export the project](../user/project/settings/import_export.md#export-a-project-and-its-data) + you created in the previous step and save the file as `<name>.tar.gz`, where + `<name>` is the short name of the project. +1. Edit the following files to include the project template. Two types of built-in + templates are available within GitLab: + - **Normal templates**: Available in GitLab Free and above (this is the most common type of built-in template). + See MR [!25318](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25318) for an example. + + To add a normal template: + + 1. Open `lib/gitlab/project_template.rb` and add details of the template + in the `localized_templates_table` method. In the following example, + the short name of the project is `hugo`: + + ```ruby + ProjectTemplate.new('hugo', 'Pages/Hugo', _('Everything you need to create a GitLab Pages site using Hugo'), 'https://gitlab.com/pages/hugo', 'illustrations/logos/hugo.svg'), + ``` + + If the vendored project doesn't have an SVG icon, omit `, 'illustrations/logos/hugo.svg'`. + + 1. Open `spec/lib/gitlab/project_template_spec.rb` and add the short name + of the template in the `.all` test. + 1. Open `app/assets/javascripts/projects/default_project_templates.js` and + add details of the template. For example: + + ```javascript + hugo: { + text: s__('ProjectTemplates|Pages/Hugo'), + icon: '.template-option .icon-hugo', + }, + ``` + + If the vendored project doesn't have an SVG icon, use `.icon-gitlab_logo` + instead. + + - **Enterprise templates**: Introduced in GitLab 12.10, that are available only in GitLab Premium and above. + See MR [!28187](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/28187) for an example. + + To add an Enterprise template: + + 1. Open `ee/lib/ee/gitlab/project_template.rb` and add details of the template + in the `localized_ee_templates_table` method. For example: + + ```ruby + ::Gitlab::ProjectTemplate.new('hipaa_audit_protocol', 'HIPAA Audit Protocol', _('A project containing issues for each audit inquiry in the HIPAA Audit Protocol published by the U.S. Department of Health & Human Services'), 'https://gitlab.com/gitlab-org/project-templates/hipaa-audit-protocol', 'illustrations/logos/asklepian.svg') + ``` + + 1. Open `ee/spec/lib/gitlab/project_template_spec.rb` and add the short name + of the template in the `.all` test. + 1. Open `ee/app/assets/javascripts/projects/default_project_templates.js` and + add details of the template. For example: + + ```javascript + hipaa_audit_protocol: { + text: s__('ProjectTemplates|HIPAA Audit Protocol'), + icon: '.template-option .icon-hipaa_audit_protocol', + }, + ``` + +1. Run the `vendor_template` script. Make sure to pass the correct arguments: + + ```shell + scripts/vendor_template <git_repo_url> <name> <comment> + ``` + +1. Regenerate `gitlab.pot`: + + ```shell + bin/rake gettext:regenerate + ``` + +1. By now, there should be one new file under `vendor/project_templates/` and + 4 changed files. Commit all of them in a new branch and create a merge + request. + +## Test with GDK + +If you are using the GitLab Development Kit (GDK) you must disable `praefect` +and regenerate the Procfile, as the Rake task is not currently compatible with it: + +```yaml +# gitlab-development-kit/gdk.yml +praefect: + enabled: false +``` + +1. Follow the steps described in the [vendoring process](#vendoring-process). +1. Run the following Rake task where `<path>/<name>` is the + name you gave the template in `lib/gitlab/project_template.rb`: + + ```shell + bin/rake gitlab:update_project_templates[<path>/<name>] + ``` + +You can now test to create a new project by importing the new template in GDK. + +## Contribute an improvement to an existing template + +Existing templates are imported from the following groups: + +- [`project-templates`](https://gitlab.com/gitlab-org/project-templates) +- [`pages`](htps://gitlab.com/pages) + +To contribute a change, open a merge request in the relevant project +and mention `@gitlab-org/manage/import/backend` when you are ready for a review. + +Then, if your merge request gets accepted, either [open an issue](https://gitlab.com/gitlab-org/gitlab/-/issues) +to ask for it to get updated, or open a merge request updating +the [vendored template](#vendoring-process). diff --git a/doc/development/pry_debugging.md b/doc/development/pry_debugging.md index 5481da348e8..6751559b2ef 100644 --- a/doc/development/pry_debugging.md +++ b/doc/development/pry_debugging.md @@ -17,7 +17,11 @@ You can then connect to this session by using the [pry-shell](https://github.com You can watch [this video](https://www.youtube.com/watch?v=Lzs_PL_BySo), for more information about how to use the `pry-shell`. -## `byebug` vs `binding.pry` +WARNING: +`binding.pry` can occasionally experience autoloading issues and fail during name resolution. +If needed, `binding.irb` can be used instead with a more limited feature set. + +## `byebug` vs `binding.pry` vs `binding.irb` `byebug` has a very similar interface as `gdb`, but `byebug` does not use the powerful Pry REPL. @@ -41,6 +45,12 @@ this document, so for the full documentation head over to the [Pry wiki](https:/ Below are a few features definitely worth checking out, also run `help` in a pry session to see what else you can do. +## `binding.irb` + +As of Ruby 2.7, IRB ships with a simple interactive debugger. + +Check out [the docs](https://ruby-doc.org/stdlib-2.7.0/libdoc/irb/rdoc/Binding.html) for more. + ### State navigation With the [state navigation](https://github.com/pry/pry/wiki/State-navigation) diff --git a/doc/development/query_recorder.md b/doc/development/query_recorder.md index 424c089f88e..17f2fecc1bc 100644 --- a/doc/development/query_recorder.md +++ b/doc/development/query_recorder.md @@ -1,6 +1,6 @@ --- -stage: none -group: unassigned +stage: Enablement +group: Database info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- diff --git a/doc/development/scalability.md b/doc/development/scalability.md index fe7063be0e8..4450df0399d 100644 --- a/doc/development/scalability.md +++ b/doc/development/scalability.md @@ -36,7 +36,7 @@ application starts, Rails queries the database schema, caching the tables and column types for the data requested. Because of this schema cache, dropping a column or table while the application is running can produce 500 errors to the user. This is why we have a [process for dropping columns and other -no-downtime changes](avoiding_downtime_in_migrations.md). +no-downtime changes](database/avoiding_downtime_in_migrations.md). #### Multi-tenancy diff --git a/doc/development/secure_coding_guidelines.md b/doc/development/secure_coding_guidelines.md index 10f6c22e54a..8a86a46d1d3 100644 --- a/doc/development/secure_coding_guidelines.md +++ b/doc/development/secure_coding_guidelines.md @@ -203,7 +203,7 @@ Go's [`regexp`](https://pkg.go.dev/regexp) package uses `re2` and isn't vulnerab ### Description -A [Server-side Request Forgery (SSRF)](https://www.hackerone.com/blog-How-To-Server-Side-Request-Forgery-SSRF) is an attack in which an attacker +A [Server-side Request Forgery (SSRF)](https://www.hackerone.com/application-security/how-server-side-request-forgery-ssrf) is an attack in which an attacker is able coerce a application into making an outbound request to an unintended resource. This resource is usually internal. In GitLab, the connection most commonly uses HTTP, but an SSRF can be performed with any protocol, such as @@ -1165,7 +1165,7 @@ func printZipContents(src string) error { ## Time of check to time of use bugs Time of check to time of use, or TOCTOU, is a class of error which occur when the state of something changes unexpectedly partway during a process. -More specifically, it's when the property you checked and validated has changed when you finally get around to using that property. +More specifically, it's when the property you checked and validated has changed when you finally get around to using that property. These types of bugs are often seen in environments which allow multi-threading and concurrency, like filesystems and distributed web applications; these are a type of race condition. TOCTOU also occurs when state is checked and stored, then after a period of time that state is relied on without re-checking its accuracy and/or validity. @@ -1179,7 +1179,7 @@ GitLab-specific example can be found in [this issue](https://gitlab.com/gitlab-o **Example 3:** you need to fetch a remote file, and perform a `HEAD` request to get and validate the content length and content type. When you subsequently make a `GET` request, though, the file delivered is a different size or different file type. (This is stretching the definition of TOCTOU, but things _have_ changed between time of check and time of use). -**Example 4:** you allow users to upvote a comment if they haven't already. The server is multi-threaded, and you aren't using transactions or an applicable database index. By repeatedly clicking upvote in quick succession a malicious user is able to add multiple upvotes: the requests arrive at the same time, the checks run in parallel and confirm that no upvote exists yet, and so each upvote is written to the database. +**Example 4:** you allow users to upvote a comment if they haven't already. The server is multi-threaded, and you aren't using transactions or an applicable database index. By repeatedly clicking upvote in quick succession a malicious user is able to add multiple upvotes: the requests arrive at the same time, the checks run in parallel and confirm that no upvote exists yet, and so each upvote is written to the database. Here's some pseudocode showing an example of a potential TOCTOU bug: diff --git a/doc/development/service_ping/implement.md b/doc/development/service_ping/implement.md index 25e841e113b..ca4a0158051 100644 --- a/doc/development/service_ping/implement.md +++ b/doc/development/service_ping/implement.md @@ -760,7 +760,7 @@ To set up Service Ping locally, you must: 1. Clone and start [Versions Application](https://gitlab.com/gitlab-services/version-gitlab-com). Make sure you run `docker-compose up` to start a PostgreSQL and Redis instance. 1. Point GitLab to the Versions Application endpoint instead of the default endpoint: - 1. Open [service_ping/submit_service.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/service_ping/submit_service.rb#L5) in your local and modified `PRODUCTION_URL`. + 1. Open [service_ping/submit_service.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/service_ping/submit_service.rb#L5) locally and modify `STAGING_BASE_URL`. 1. Set it to the local Versions Application URL: `http://localhost:3000/usage_data`. ### Test local setup diff --git a/doc/development/service_ping/index.md b/doc/development/service_ping/index.md index 6878fd1bf28..14bb90537e7 100644 --- a/doc/development/service_ping/index.md +++ b/doc/development/service_ping/index.md @@ -48,7 +48,7 @@ make better product decisions. There are several other benefits to enabling Service Ping: - As a benefit of having Service Ping active, GitLab lets you analyze the users' activities over time of your GitLab installation. -- As a benefit of having Service Ping active, GitLab provides you with [DevOps Score](../../user/admin_area/analytics/dev_ops_report.md#devops-score), which gives you an overview of your entire instance's adoption of Concurrent DevOps from planning to monitoring. +- As a benefit of having Service Ping active, GitLab provides you with [DevOps Score](../../user/admin_area/analytics/dev_ops_reports.md#devops-score), which gives you an overview of your entire instance's adoption of Concurrent DevOps from planning to monitoring. - You get better, more proactive support (assuming that our TAMs and support organization used the data to deliver more value). - You get insight and advice into how to get the most value out of your investment in GitLab. Wouldn't you want to know that a number of features or values are not being adopted in your organization? - You get a report that illustrates how you compare against other similar organizations (anonymized), with specific advice and recommendations on how to improve your DevOps processes. @@ -76,7 +76,7 @@ tier. Users can continue to access the features in a paid tier without sharing u #### Features available in 14.1 and later -1. [Email from GitLab](../../tools/email.md). +1. [Email from GitLab](../../user/admin_area/email_from_gitlab.md). #### Features available in 14.4 and later @@ -582,7 +582,8 @@ ServicePing::SubmitService.new(skip_db_write: true).execute ## Manually upload Service Ping payload -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/7388) in GitLab 14.8 with a flag named `admin_application_settings_service_usage_data_center`. Disabled by default. +> - [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/7388) in GitLab 14.8 with a flag named `admin_application_settings_service_usage_data_center`. Disabled by default. +> - [Feature flag removed](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/83265) in GitLab 14.10. Service Ping payload can be uploaded to GitLab even if your application instance doesn't have access to the internet, or you don't have Service Ping [cron job](#how-service-ping-works) enabled. @@ -598,6 +599,9 @@ To upload payload manually: 1. Select **Choose file** and choose the file from p5. 1. Select **Upload**. +The uploaded file is encrypted and sent using secure [HTTPS protocol](https://en.wikipedia.org/wiki/HTTPS). HTTPS creates a secure +communication channel between web browser and the server, and protects transmitted data against man-in-the-middle attacks. + ## Monitoring Service Ping reporting process state is monitored with [internal SiSense dashboard](https://app.periscopedata.com/app/gitlab/968489/Product-Intelligence---Service-Ping-Health). diff --git a/doc/development/service_ping/metrics_dictionary.md b/doc/development/service_ping/metrics_dictionary.md index 6884844da3f..ab3d301908b 100644 --- a/doc/development/service_ping/metrics_dictionary.md +++ b/doc/development/service_ping/metrics_dictionary.md @@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w # Metrics Dictionary Guide [Service Ping](index.md) metrics are defined in individual YAML files definitions from which the -[Metrics Dictionary](https://metrics.gitlab.com/) is built. +[Metrics Dictionary](https://metrics.gitlab.com/) is built. Currently, the metrics dictionary is built automatically once a day. When a change to a metric is made in a YAML file, you can see the change in the dictionary within 24 hours. This guide describes the dictionary and how it's implemented. ## Metrics Definition and validation @@ -95,7 +95,7 @@ return to the instrumentation and update it. 1. Add the metric instrumentation class to `lib/gitlab/usage/metrics/instrumentations/`. 1. Add the metric logic in the instrumentation class. -1. Run the [metrics YAML generator](metrics_dictionary.md#metrics-definition-and-validation). +1. Run the [metrics YAML generator](metrics_dictionary.md#create-a-new-metric-definition). 1. Use the metric name suggestion to select a suitable metric name. 1. Update the metric's YAML definition with the correct `key_path`. diff --git a/doc/development/service_ping/metrics_instrumentation.md b/doc/development/service_ping/metrics_instrumentation.md index c684d9d12ef..3d56f3e777f 100644 --- a/doc/development/service_ping/metrics_instrumentation.md +++ b/doc/development/service_ping/metrics_instrumentation.md @@ -24,7 +24,9 @@ This guide describes how to develop Service Ping metrics using metrics instrumen A metric definition has the [`instrumentation_class`](metrics_dictionary.md) field, which can be set to a class. -The defined instrumentation class should have one of the existing metric classes: `DatabaseMetric`, `RedisMetric`, `RedisHLLMetric`, or `GenericMetric`. +The defined instrumentation class should inherit one of the existing metric classes: `DatabaseMetric`, `RedisMetric`, `RedisHLLMetric`, or `GenericMetric`. + +The current convention is that a single instrumentation class corresponds to a single metric. On a rare occasions, there are exceptions to that convention like [Redis metrics](#redis-metrics). To use a single instrumentation class for more than one metric, please reach out to one of the `@gitlab-org/growth/product-intelligence/engineers` members to consult about your case. Using the instrumentation classes ensures that metrics can fail safe individually, without breaking the entire process of Service Ping generation. @@ -186,3 +188,30 @@ rails generate gitlab:usage_metric CountIssues --type database create lib/gitlab/usage/metrics/instrumentations/count_issues_metric.rb create spec/lib/gitlab/usage/metrics/instrumentations/count_issues_metric_spec.rb ``` + +## Migrate Service Ping metrics to instrumentation classes + +This guide describes how to migrate a Service Ping metric from [`lib/gitlab/usage_data.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb) or [`ee/lib/ee/gitlab/usage_data.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/ee/gitlab/usage_data.rb) to instrumentation classes. + +1. Choose the metric type: + +- [Database metric](#database-metrics) +- [Redis HyperLogLog metrics](#redis-hyperloglog-metrics) +- [Redis metric](#redis-metrics) +- [Generic metric](#generic-metrics) + +1. Determine the location of instrumentation class: either under `ee` or outside `ee`. + +1. [Generate the instrumentation class file](#create-a-new-metric-instrumentation-class). + +1. Fill the instrumentation class body: + + - Add code logic for the metric. This might be similar to the metric implementation in `usage_data.rb`. + - Add tests for the individual metric [`spec/lib/gitlab/usage/metrics/instrumentations/`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/usage/metrics/instrumentations). + - Add tests for Service Ping. + +1. [Generate the metric definition file](metrics_dictionary.md#create-a-new-metric-definition). + +1. Remove the code from [`lib/gitlab/usage_data.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data.rb) or [`ee/lib/ee/gitlab/usage_data.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/ee/gitlab/usage_data.rb). + +1. Remove the tests from [`spec/lib/gitlab/usage_data.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/lib/gitlab/usage_data_spec.rb) or [`ee/spec/lib/ee/gitlab/usage_data.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/spec/lib/ee/gitlab/usage_data_spec.rb). diff --git a/doc/development/service_ping/metrics_lifecycle.md b/doc/development/service_ping/metrics_lifecycle.md index a7ecf15a493..844c989c640 100644 --- a/doc/development/service_ping/metrics_lifecycle.md +++ b/doc/development/service_ping/metrics_lifecycle.md @@ -60,6 +60,8 @@ The correct approach is to add a new metric for GitLab 12.6 release with updated and update existing business analysis artefacts to use `example_metric_without_archived` instead of `example_metric` +Currently, the [Metrics Dictionary](https://metrics.gitlab.com/) is built automatically once a day. When a change to a metric is made in a YAML file, you can see the change in the dictionary within 24 hours. + ## Remove a metric WARNING: @@ -95,6 +97,12 @@ To remove a metric: used to test the [`UsageDataController#create`](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/3760ef28/spec/controllers/usage_data_controller_spec.rb#L75) endpoint, and assure that test suite does not fail when metric that you wish to remove is not included into test payload. +1. Remove data from Redis + + For [Ordinary Redis](implement.md#ordinary-redis-counters) counters remove data stored in Redis. + + - Add a migration to remove the data from Redis for the related Redis keys. For more details, see [this MR example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/82604/diffs). + 1. Create an issue in the [GitLab Data Team project](https://gitlab.com/gitlab-data/analytics/-/issues). Ask for confirmation that the metric is not referred to in any SiSense dashboards and diff --git a/doc/development/service_ping/troubleshooting.md b/doc/development/service_ping/troubleshooting.md index 770b6650764..15bc01f1270 100644 --- a/doc/development/service_ping/troubleshooting.md +++ b/doc/development/service_ping/troubleshooting.md @@ -28,4 +28,4 @@ For results about an investigation conducted into an unexpected drop in Service ### Troubleshooting data warehouse layer -Reach out to the [Data team](https://about.gitlab.com/handbook/business-technology/data-team) to ask about current state of data warehouse. On their handbook page there is a [section with contact details](https://about.gitlab.com/handbook/business-technology/data-team/#how-to-connect-with-us). +Reach out to the [Data team](https://about.gitlab.com/handbook/business-technology/data-team/) to ask about current state of data warehouse. On their handbook page there is a [section with contact details](https://about.gitlab.com/handbook/business-technology/data-team/#how-to-connect-with-us). diff --git a/doc/development/sidekiq/compatibility_across_updates.md b/doc/development/sidekiq/compatibility_across_updates.md index 919f6935139..35f4b88351e 100644 --- a/doc/development/sidekiq/compatibility_across_updates.md +++ b/doc/development/sidekiq/compatibility_across_updates.md @@ -156,4 +156,4 @@ end You must rename the queue in a post-deployment migration not in a normal migration. Otherwise, it runs too early, before all the workers that -schedule these jobs have stopped running. See also [other examples](../post_deployment_migrations.md#use-cases). +schedule these jobs have stopped running. See also [other examples](../database/post_deployment_migrations.md#use-cases). diff --git a/doc/development/single_table_inheritance.md b/doc/development/single_table_inheritance.md index eb406b02a91..0783721e628 100644 --- a/doc/development/single_table_inheritance.md +++ b/doc/development/single_table_inheritance.md @@ -1,6 +1,6 @@ --- -stage: none -group: unassigned +stage: Enablement +group: Database info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- @@ -31,8 +31,8 @@ could result in loading unexpected code or associations which may cause unintend side effects or failures during upgrades. ```ruby -class SomeMigration < Gitlab::Database::Migration[1.0] - class Services < ActiveRecord::Base +class SomeMigration < Gitlab::Database::Migration[2.0] + class Services < MigrationRecord self.table_name = 'services' self.inheritance_column = :_type_disabled end diff --git a/doc/development/snowplow/implementation.md b/doc/development/snowplow/implementation.md index 6061a1d4cd2..162b77772f9 100644 --- a/doc/development/snowplow/implementation.md +++ b/doc/development/snowplow/implementation.md @@ -21,8 +21,25 @@ For the recommended frontend tracking implementation, see [Usage recommendations Structured events and page views include the [`gitlab_standard`](schemas.md#gitlab_standard) context, using the `window.gl.snowplowStandardContext` object which includes [default data](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/views/layouts/_snowplow.html.haml) -as base. This object can be modified for any subsequent structured event fired, -although it's not recommended. +as base: + +| Property | Example | +| -------- | ------- | +| `context_generated_at` | `"2022-01-01T01:00:00.000Z"` | +| `environment` | `"production"` | +| `extra` | `{}` | +| `namespace_id` | `123` | +| `plan` | `"gold"` | +| `project_id` | `456` | +| `source` | `"gitlab-rails"` | +| `user_id` | `789`* | + +_\* Undergoes a pseudonymization process at the collector level._ + +These properties [are overriden](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/tracking/get_standard_context.js) +with frontend-specific values, like `source` (`gitlab-javascript`), `google_analytics_id` +and the custom `extra` object. You can modify this object for any subsequent +structured event that fires, although this is not recommended. Tracking implementations must have an `action` and a `category`. You can provide additional properties from the [structured event taxonomy](index.md#structured-event-taxonomy), in @@ -396,13 +413,13 @@ Use the following arguments: |------------|---------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------| | `category` | String | | Area or aspect of the application. For example, `HealthCheckController` or `Lfs::FileTransformer`. | | `action` | String | | The action being taken. For example, a controller action such as `create`, or an Active Record callback. | -| `label` | String | nil | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. | -| `property` | String | nil | Any additional property of the element, or object being acted on. | -| `value` | Numeric | nil | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. | -| `context` | Array\[SelfDescribingJSON\] | nil | An array of custom contexts to send with this event. Most events should not have any custom contexts. | -| `project` | Project | nil | The project associated with the event. | -| `user` | User | nil | The user associated with the event. | -| `namespace` | Namespace | nil | The namespace associated with the event. | +| `label` | String | `nil` | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. | +| `property` | String | `nil` | Any additional property of the element, or object being acted on. | +| `value` | Numeric | `nil` | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. | +| `context` | Array\[SelfDescribingJSON\] | `nil` | An array of custom contexts to send with this event. Most events should not have any custom contexts. | +| `project` | Project | `nil` | The project associated with the event. | +| `user` | User | `nil` | The user associated with the event. This value undergoes a pseudonymization process at the collector level. | +| `namespace` | Namespace | `nil` | The namespace associated with the event. | | `extra` | Hash | `{}` | Additional keyword arguments are collected into a hash and sent with the event. | ### Unit testing diff --git a/doc/development/snowplow/index.md b/doc/development/snowplow/index.md index 29f4514a21e..9b684757fe1 100644 --- a/doc/development/snowplow/index.md +++ b/doc/development/snowplow/index.md @@ -150,6 +150,23 @@ ORDER BY page_view_start DESC LIMIT 100 ``` +#### Top 20 users who fired `reply_comment_button` in the last 30 days + +```sql +SELECT + count(*) as hits, + se_action, + se_category, + gsc_pseudonymized_user_id +FROM legacy.snowplow_gitlab_events_30 +WHERE + se_label = 'reply_comment_button' + AND gsc_pseudonymized_user_id IS NOT NULL +GROUP BY gsc_pseudonymized_user_id, se_category, se_action +ORDER BY count(*) DESC +LIMIT 20 +``` + #### Query JSON formatted data ```sql diff --git a/doc/development/snowplow/schemas.md b/doc/development/snowplow/schemas.md index 63864c9329b..4066151600d 100644 --- a/doc/development/snowplow/schemas.md +++ b/doc/development/snowplow/schemas.md @@ -10,17 +10,18 @@ This page provides Snowplow schema reference for GitLab events. ## `gitlab_standard` -We are including the [`gitlab_standard` schema](https://gitlab.com/gitlab-org/iglu/-/blob/master/public/schemas/com.gitlab/gitlab_standard/jsonschema/) with every event. See [Standardize Snowplow Schema](https://gitlab.com/groups/gitlab-org/-/epics/5218) for details. +We are including the [`gitlab_standard` schema](https://gitlab.com/gitlab-org/iglu/-/blob/master/public/schemas/com.gitlab/gitlab_standard/jsonschema/) for structured events and page views. The [`StandardContext`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/tracking/standard_context.rb) -class represents this schema in the application. Some properties are automatically populated for [frontend](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/views/layouts/_snowplow.html.haml) -events. +class represents this schema in the application. Some properties are +[automatically populated for frontend events](implementation.md#snowplow-javascript-frontend-tracking), +and can be [provided manually for backend events](implementation.md#implement-ruby-backend-tracking). | Field Name | Required | Default value | Type | Description | |----------------|:-------------------:|-----------------------|--|---------------------------------------------------------------------------------------------| | `project_id` | **{dotted-circle}** | Current project ID * | integer | | | `namespace_id` | **{dotted-circle}** | Current group/namespace ID * | integer | | -| `user_id` | **{dotted-circle}** | Current user ID * | integer | User database record ID attribute. This file undergoes a pseudonymization process at the collector level. | +| `user_id` | **{dotted-circle}** | Current user ID * | integer | User database record ID attribute. This value undergoes a pseudonymization process at the collector level. | | `context_generated_at` | **{dotted-circle}** | Current timestamp | string (date time format) | Timestamp indicating when context was generated. | | `environment` | **{check-circle}** | Current environment | string (max 32 chars) | Name of the source environment, such as `production` or `staging` | | `source` | **{check-circle}** | Event source | string (max 32 chars) | Name of the source application, such as `gitlab-rails` or `gitlab-javascript` | diff --git a/doc/development/snowplow/troubleshooting.md b/doc/development/snowplow/troubleshooting.md index 75c8b306a67..47d775d89aa 100644 --- a/doc/development/snowplow/troubleshooting.md +++ b/doc/development/snowplow/troubleshooting.md @@ -28,7 +28,7 @@ While on CloudWatch dashboard set time range to last 4 weeks, to get better pict Drop occurring at application layer can be symptom of some issue, but it might be also a result of normal application lifecycle, intended changes done to product intelligence or experiments tracking or even a result of a public holiday in some regions of the world with a larger user-base. To verify if there is an underlying problem to solve, you can check following things: -1. Check `about.gitlab.com` website traffic on [Google Analytics](https://analytics.google.com/) to verify if some public holiday might impact overall use of GitLab system +1. Check `about.gitlab.com` website traffic on [Google Analytics](https://analytics.google.com/analytics/web/) to verify if some public holiday might impact overall use of GitLab system 1. You may require to open an access request for Google Analytics access first eg: [access request internal issue](https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/1772) 1. Plot `select date(dvce_created_tstamp) , event , count(*) from legacy.snowplow_unnested_events_90 where dvce_created_tstamp > '2021-06-15' and dvce_created_tstamp < '2021-07-10' group by 1 , 2 order by 1 , 2` in SiSense to see what type of events was responsible for drop 1. Plot `select date(dvce_created_tstamp) ,se_category , count(*) from legacy.snowplow_unnested_events_90 where dvce_created_tstamp > '2021-06-15' and dvce_created_tstamp < '2021-07-31' and event = 'struct' group by 1 , 2 order by 1, 2` what events recorded the biggest drops in suspected category @@ -47,4 +47,4 @@ Already conducted investigations: ### Troubleshooting data warehouse layer -Reach out to [Data team](https://about.gitlab.com/handbook/business-technology/data-team) to ask about current state of data warehouse. On their handbook page there is a [section with contact details](https://about.gitlab.com/handbook/business-technology/data-team/#how-to-connect-with-us) +Reach out to [Data team](https://about.gitlab.com/handbook/business-technology/data-team/) to ask about current state of data warehouse. On their handbook page there is a [section with contact details](https://about.gitlab.com/handbook/business-technology/data-team/#how-to-connect-with-us) diff --git a/doc/development/spam_protection_and_captcha/graphql_api.md b/doc/development/spam_protection_and_captcha/graphql_api.md index b47e3f84320..e3f4e9069e5 100644 --- a/doc/development/spam_protection_and_captcha/graphql_api.md +++ b/doc/development/spam_protection_and_captcha/graphql_api.md @@ -13,28 +13,27 @@ related to changing a model's confidential/public flag. ## Add support to the GraphQL mutations -This implementation is very similar to the controller implementation. You create a `spam_params` -instance based on the request, and pass it to the relevant Service class constructor. +The main steps are: -The three main differences from the controller implementation are: +1. Use `include Mutations::SpamProtection` in your mutation. +1. Create a `spam_params` instance based on the request. Obtain the request from the context + via `context[:request]` when creating the `SpamParams` instance. +1. Pass `spam_params` to the relevant Service class constructor. +1. After you create or update the `Spammable` model instance, call `#check_spam_action_response!` + and pass it the model instance. This call: + 1. Performs the necessary spam checks on the model. + 1. If spam is detected: + - Raises a `GraphQL::ExecutionError` exception. + - Includes the relevant information added as error fields to the response via the `extensions:` parameter. + For more details on these fields, refer to the section in the GraphQL API documentation on + [Resolve mutations detected as spam](../../api/graphql/index.md#resolve-mutations-detected-as-spam). -1. Use `include Mutations::SpamProtection` instead of `...JsonFormatActionsSupport`. -1. Obtain the request from the context via `context[:request]` when creating the `SpamParams` - instance. -1. After you create or updated the `Spammable` model instance, call `#check_spam_action_response!` - and pass it the model instance. This call will: - 1. Perform the necessary spam checks on the model. - 1. If spam is detected: - - Raise a `GraphQL::ExecutionError` exception. - - Include the relevant information added as error fields to the response via the `extensions:` parameter. - For more details on these fields, refer to the section on - [Spam and CAPTCHA support in the GraphQL API](../../api/graphql/index.md#resolve-mutations-detected-as-spam). - - NOTE: - If you use the standard ApolloLink or Axios interceptor CAPTCHA support described - above, the field details are unimportant. They become important if you - attempt to use the GraphQL API directly to process a failed check for potential spam, and - resubmit the request with a solved CAPTCHA response. + NOTE: + If you use the standard ApolloLink or Axios interceptor CAPTCHA support described + above, you can ignore the field details, because they are handled + automatically. They become relevant if you attempt to use the GraphQL API directly to + process a failed check for potential spam, and resubmit the request with a solved + CAPTCHA response. For example: @@ -57,10 +56,13 @@ module Mutations widget = service_response.payload[:widget] check_spam_action_response!(widget) - # If possible spam wasdetected, an exception would have been thrown by + # If possible spam was detected, an exception would have been thrown by # `#check_spam_action_response!`, so the normal resolve return logic can follow below. end end end end ``` + +Refer to the [Exploratory Testing](exploratory_testing.md) section for instructions on how to test +CAPTCHA behavior in the GraphQL API. diff --git a/doc/development/spam_protection_and_captcha/index.md b/doc/development/spam_protection_and_captcha/index.md index 9b195df536d..dbe8c4aa4e9 100644 --- a/doc/development/spam_protection_and_captcha/index.md +++ b/doc/development/spam_protection_and_captcha/index.md @@ -16,7 +16,7 @@ To add this support, you must implement the following areas as applicable: 1. [Model and Services](model_and_services.md): The basic prerequisite changes to the backend code which are required to add spam or CAPTCHA API and UI support for a feature which does not yet have support. -1. REST API (Supported, documentation coming soon): The changes needed to add +1. [REST API](rest_api.md): The changes needed to add spam or CAPTCHA support to Grape REST API endpoints. Refer to the related [REST API documentation](../../api/index.md#resolve-requests-detected-as-spam). 1. [GraphQL API](graphql_api.md): The changes needed to add spam or CAPTCHA support to GraphQL diff --git a/doc/development/spam_protection_and_captcha/rest_api.md b/doc/development/spam_protection_and_captcha/rest_api.md new file mode 100644 index 00000000000..ad74977eb67 --- /dev/null +++ b/doc/development/spam_protection_and_captcha/rest_api.md @@ -0,0 +1,90 @@ +--- +stage: Manage +group: Authentication and Authorization +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# REST API spam protection and CAPTCHA support + +If the model can be modified via the REST API, you must also add support to all of the +relevant API endpoints which may modify spammable or spam-related attributes. This +definitely includes the `POST` and `PUT` mutations, but may also include others, such as those +related to changing a model's confidential/public flag. + +## Add support to the REST endpoints + +The main steps are: + +1. Add `helpers SpammableActions::CaptchaCheck::RestApiActionsSupport` in your `resource`. +1. Create a `spam_params` instance based on the request. +1. Pass `spam_params` to the relevant Service class constructor. +1. After you create or update the `Spammable` model instance, call `#check_spam_action_response!`, + save the created or updated instance in a variable. +1. Identify the error handling logic for the `failure` case of the request, + when create or update was not successful. These indicate possible spam detection, + which adds an error to the `Spammable` instance. + The error is usually similar to `render_api_error!` or `render_validation_error!`. +1. Wrap the existing error handling logic in a + `with_captcha_check_rest_api(spammable: my_spammable_instance)` call, passing the `Spammable` + model instance you saved in a variable as the `spammable:` named argument. This call will: + 1. Perform the necessary spam checks on the model. + 1. If spam is detected: + - Raise a Grape `#error!` exception with a descriptive spam-specific error message. + - Include the relevant information added as error fields to the response. + For more details on these fields, refer to the section in the REST API documentation on + [Resolve requests detected as spam](../../api/index.md#resolve-requests-detected-as-spam). + + NOTE: + If you use the standard ApolloLink or Axios interceptor CAPTCHA support described + above, you can ignore the field details, because they are handled + automatically. They become relevant if you attempt to use the GraphQL API directly to + process a failed check for potential spam, and resubmit the request with a solved + CAPTCHA response. + +Here is an example for the `post` and `put` actions on the `snippets` resource: + +```ruby +module API + class Snippets < ::API::Base + #... + resource :snippets do + # This helper provides `#with_captcha_check_rest_api` + helpers SpammableActions::CaptchaCheck::RestApiActionsSupport + + post do + #... + spam_params = ::Spam::SpamParams.new_from_request(request: request) + service_response = ::Snippets::CreateService.new(project: nil, current_user: current_user, params: attrs, spam_params: spam_params).execute + snippet = service_response.payload[:snippet] + + if service_response.success? + present snippet, with: Entities::PersonalSnippet, current_user: current_user + else + # Wrap the normal error response in a `with_captcha_check_rest_api(spammable: snippet)` block + with_captcha_check_rest_api(spammable: snippet) do + # If possible spam was detected, an exception would have been thrown by + # `#with_captcha_check_rest_api` for Grape to handle via `error!` + render_api_error!({ error: service_response.message }, service_response.http_status) + end + end + end + + put ':id' do + #... + spam_params = ::Spam::SpamParams.new_from_request(request: request) + service_response = ::Snippets::UpdateService.new(project: nil, current_user: current_user, params: attrs, spam_params: spam_params).execute(snippet) + + snippet = service_response.payload[:snippet] + + if service_response.success? + present snippet, with: Entities::PersonalSnippet, current_user: current_user + else + # Wrap the normal error response in a `with_captcha_check_rest_api(spammable: snippet)` block + with_captcha_check_rest_api(spammable: snippet) do + # If possible spam was detected, an exception would have been thrown by + # `#with_captcha_check_rest_api` for Grape to handle via `error!` + render_api_error!({ error: service_response.message }, service_response.http_status) + end + end + end +``` diff --git a/doc/development/spam_protection_and_captcha/web_ui.md b/doc/development/spam_protection_and_captcha/web_ui.md index 6aa01f401bd..9aeb9e96d44 100644 --- a/doc/development/spam_protection_and_captcha/web_ui.md +++ b/doc/development/spam_protection_and_captcha/web_ui.md @@ -37,7 +37,7 @@ additional fields being added to the models. Instead, communication is handled: The spam and CAPTCHA-related logic is also cleanly abstracted into reusable modules and helper methods which can wrap existing logic, and only alter the existing flow if potential spam is detected or a CAPTCHA display is needed. This approach allows the spam and CAPTCHA -support to be easily added to new areas of the application with minimal changes to +support to be added to new areas of the application with minimal changes to existing logic. In the case of the frontend, potentially **zero** changes are needed! On the frontend, this is handled abstractly and transparently using `ApolloLink` for Apollo, and an @@ -75,7 +75,7 @@ sequenceDiagram The backend is also cleanly abstracted via mixin modules and helper methods. The three main changes required to the relevant backend controller actions (normally just `create`/`update`) are: -1. Create a `SpamParams` parameter object instance based on the request, using the simple static +1. Create a `SpamParams` parameter object instance based on the request, using the static `#new_from_request` factory method. This method takes a request, and returns a `SpamParams` instance. 1. Pass the created `SpamParams` instance as the `spam_params` named argument to the Service class constructor, which you should have already added. If the spam check indicates diff --git a/doc/development/sql.md b/doc/development/sql.md index e2208caf35a..4b6153b7205 100644 --- a/doc/development/sql.md +++ b/doc/development/sql.md @@ -254,13 +254,13 @@ of records plucked. `MAX_PLUCK` defaults to `1_000` in `ApplicationRecord`. ## Inherit from ApplicationRecord -Most models in the GitLab codebase should inherit from `ApplicationRecord`, -rather than from `ActiveRecord::Base`. This allows helper methods to be easily -added. +Most models in the GitLab codebase should inherit from `ApplicationRecord` +or `Ci::ApplicationRecord` rather than from `ActiveRecord::Base`. This allows +helper methods to be easily added. An exception to this rule exists for models created in database migrations. As these should be isolated from application code, they should continue to subclass -from `ActiveRecord::Base`. +from `MigrationRecord` which is available only in migration context. ## Use UNIONs @@ -376,7 +376,7 @@ Explicit column list definition: ```ruby # Good, the SELECT columns are consistent -columns = User.cached_column_names # The helper returns fully qualified (table.column) column names (Arel) +columns = User.cached_column_list # The helper returns fully qualified (table.column) column names (Arel) scope1 = User.select(*columns).where(id: [1, 2, 3]) # selects the columns explicitly scope2 = User.select(*columns).where(id: [10, 11, 12]) # uses SELECT users.* diff --git a/doc/development/stage_group_dashboards.md b/doc/development/stage_group_dashboards.md index 744d049f72d..8e3e6982430 100644 --- a/doc/development/stage_group_dashboards.md +++ b/doc/development/stage_group_dashboards.md @@ -1,273 +1,11 @@ --- -stage: Platforms -group: Scalability -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'stage_group_observability/index.md' +remove_date: '2022-06-15' --- -# Dashboards for stage groups +This document was moved to [another location](stage_group_observability/index.md). -## Introduction - -Observability is about bringing visibility into a system to see and understand the state of each component, with context, to support performance tuning and debugging. To run a SaaS platform at scale, a rich and detailed observability platform is a necessity. We have a set of monitoring dashboards designed for [each stage group](https://about.gitlab.com/handbook/product/categories/#devops-stages). - -These dashboards are designed to give an insight, to everyone working in a feature category, into how their code operates at GitLab.com scale. They are grouped per stage group to show the impact of feature/code changes, deployments, and feature-flag toggles. - -Each stage group has a dashboard consisting of metrics at the application level, such as Rails Web Requests, Rails API Requests, Sidekiq Jobs, and so on. The metrics in each dashboard are filtered and accumulated based on the [GitLab product categories](https://about.gitlab.com/handbook/product/categories/) and [feature categories](feature_categorization/index.md). - -The list of dashboards for each stage group is accessible at <https://dashboards.gitlab.net/dashboards/f/stage-groups/stage-groups> (GitLab team members only), or at [the public mirror](https://dashboards.gitlab.com/dashboards?tag=feature_category&tag=stage-groups) (accessible to everyone with a GitLab.com account, with some limitations). - -The dashboards for stage groups are at a very early stage. All contributions are welcome. If you have any questions or suggestions, please submit an issue in the [Scalability Team issues tracker](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/new). - -## Dashboard content - -### Error budget - -Read more about how we are using error budgets overall in our -[handbook](https://about.gitlab.com/handbook/engineering/error-budgets/). - -By default, the first row of panels on the dashboard will show the [error -budget for the stage -group](https://about.gitlab.com/handbook/engineering/error-budgets/#budget-spend-by-stage-group). This -row shows how the features owned by -the group are contributing to our [overall -availability](https://about.gitlab.com/handbook/engineering/infrastructure/performance-indicators/#gitlabcom-availability). - -The budget is always aggregated over the 28 days before the [time -selected on the dashboard](#time-range-controls). - -We're currently displaying the information in 2 formats: - -1. Availability: This number can be compared to GitLab.com's overall - availability target of 99.95% uptime. -1. Budget Spent: This shows the time over the past 28 days that - features owned by the group have not been performing adequately. - -The budget is calculated based on indicators per component. Each -component can have 2 indicators: - -1. [Apdex](https://en.wikipedia.org/wiki/Apdex): The rate of - operations that performed adequately. - - The threshold for 'performed adequately' is stored in our [metrics - catalog](https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog) - and depends on the service in question. For the Puma (Rails) - component of the - [API](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/api.jsonnet#L127), - [Git](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/git.jsonnet#L216), - and - [Web](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/web.jsonnet#L154) - services, that threshold is **5 seconds**. - - We're working on making this target configurable per endpoint in [this - project](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525). Learn - how to - [customize the request Apdex](application_slis/rails_request_apdex.md), this new Apdex - measurement is not yet part of the error budget. - - For Sidekiq job execution, the threshold depends on the - [job urgency](sidekiq/worker_attributes.md#job-urgency). It is - [currently](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/lib/sidekiq-helpers.libsonnet#L25-38) - **10 seconds** for high-urgency jobs and **5 minutes** for other - jobs. - - Some stage groups may have more services than these, and the - thresholds for those will be in the metrics catalog as well. - -1. Error rate: The rate of operations that had errors. - -The calculation to a ratio then happens as follows: - -```math -\frac {operations\_meeting\_apdex + (total\_operations - operations\_with\_errors)} {total\_apdex\_measurements + total\_operations} -``` - -### Check where budget is being spent - -The row below the error budget row is collapsed by default. Expanding -it shows which component and violation type had the most offending -operations in the past 28 days. - -![Error attribution](img/stage_group_dashboards_error_attribution.png) - -The first panel on the left shows a table with the number of errors per -component. Digging into the first row in that table is going to have -the biggest impact on the budget spent. - -Commonly, the components spending most of the budget are Sidekiq or Puma. The panel in -the center explains what these violation types mean, and how to dig -deeper in the logs. - -The panel on the right provides links to Kibana that should reveal -which endpoints or Sidekiq jobs are causing the errors. - -To learn how to use these panels and logs for -determining which Rails endpoints are slow, -see the [Error Budget Attribution for Purchase group](https://youtu.be/M9u6unON7bU) video. - -Other components visible in the table come from -[service level indicators](https://sre.google/sre-book/service-level-objectives/) (SLIs) defined -in the [metrics -catalog](https://gitlab.com/gitlab-com/runbooks/-/blob/master/metrics-catalog/README.md). - -For those types of failures, you can follow the link to the service -dashboard linked from the `type` column. The service dashboard -contains a row specifically for the SLI that is causing the budget -spent, with useful links to the logs and a description of what the -component means. For example, see the `server` component of the -`web-pages` service: - -![web-pages-server-component SLI](img/stage_group_dashboards_service_sli_detail.png) - -## Usage of the dashboard - -Inside a stage group dashboard, there are some notable components. Let's take the [Source Code group's dashboard](https://dashboards.gitlab.net/d/stage-groups-source_code/stage-groups-group-dashboard-create-source-code?orgId=1) as an example. - -### Time range controls - -![Default time filter](img/stage_group_dashboards_time_filter.png) - -- By default, all the times are in UTC time zone. [We use UTC when communicating in Engineering](https://about.gitlab.com/handbook/communication/#writing-style-guidelines). -- All metrics recorded in the GitLab production system have [1-year retention](https://gitlab.com/gitlab-cookbooks/gitlab-prometheus/-/blob/31526b03fef823e2f9b3cda7c75dcd28a12418a3/attributes/prometheus.rb#L40). -- Alternatively, you can zoom in or filter the time range directly on a graph. See the [Grafana Time Range Controls](https://grafana.com/docs/grafana/latest/dashboards/time-range-controls/) documentation for more information. - -### Filters and annotations - -In each dashboard, there are two filters and some annotations switches on the top of the page. [Grafana annotations](https://grafana.com/docs/grafana/latest/dashboards/annotations/) mark some special events, which are meaningful to development and operational activities, directly on the graphs. - -![Filters and annotations](img/stage_group_dashboards_filters.png) - -| Name | Type | Description | -| ---- | ---- | ----------- | -| `PROMETHEUS_DS` | filter | Filter the selective [Prometheus data sources](https://about.gitlab.com/handbook/engineering/monitoring/#prometheus). The default value is `Global`, which aggregates the data from all available data sources. Most of the time, you don't need to care about this filter. | -| `environment` | filter | Filter the environment the metrics are fetched from. The default setting is production (`gprd`). Check [Production Environment mapping](https://about.gitlab.com/handbook/engineering/infrastructure/production/architecture/#environments) for other possibilities. | -| `deploy` | annotation | Mark a deployment event on the GitLab.com SaaS platform. | -| `canary-deploy` | annotation | Mark a [canary deployment](https://about.gitlab.com/handbook/engineering/#canary-testing) event on the GitLab.com SaaS platform. | -| `feature-flags` | annotation | Mark the time point where a feature flag is updated.| - -This is an example of a feature flag annotation displayed on a dashboard panel. - -![Annotations](img/stage_group_dashboards_annotation.png) - -### Metrics panels - -![Metrics panels](img/stage_group_dashboards_metrics.png) - -Although most of the metrics displayed in the panels are self-explanatory in their title and nearby description, note the following: - -- The events are counted, measured, accumulated, then collected, and stored as [time series](https://prometheus.io/docs/concepts/data_model/). The data are calculated using statistical methods to produce metrics. It means that metrics are approximately correct and meaningful over a time period. They help you have an overview of the stage of a system over time. They are not meant to give you precise numbers of a discrete event. If you need a higher level of accuracy, please look at another monitoring tool like [logs](https://about.gitlab.com/handbook/engineering/monitoring/#logs). Please read the following examples for more explanations. -- All the rate metrics' units are `requests per second`. The default aggregate time frame is 1 minute. For example, a panel shows the requests per second number at `2020-12-25 00:42:00` is `34.13`. It means at the minute 42 (from `2020-12-25 00:42:00` to `2020-12-25 00:42:59` ), there are approximately `34.13 * 60 = ~ 2047` requests processed by the web servers. -- You may encounter some gotchas related to decimal fraction and rounding up frequently, especially in low-traffic cases. For example, the error rate of `RepositoryUpdateMirrorWorker` at `2020-12-25 02:04:00` is `0.07`, equivalent to `4.2` jobs per minute. The raw result is `0.06666666667`, equivalent to 4 jobs per minute. -- All the rate metrics are more accurate when the data is big enough. The default floating-point precision is 2. In some extremely low panels, you would see `0.00` although there is still some real traffic. - -To inspect the raw data of the panel for further calculation, click on the Inspect button from the dropdown menu of a panel. Queries, raw data, and panel JSON structure are available. Read more at [Grafana panel inspection](https://grafana.com/docs/grafana/latest/panels/inspect-panel/). - -All the dashboards are powered by [Grafana](https://grafana.com/), a frontend for displaying metrics. Grafana consumes the data returned from queries to backend Prometheus data source, then presents them under different visualizations. The stage group dashboards are built to serve the most common use cases with a limited set of filters, and pre-built queries. Grafana provides a way to explore and visualize the metrics data with [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/). This would require some knowledge about [Prometheus PromQL query language](https://prometheus.io/docs/prometheus/latest/querying/basics/). - -## How to debug with the dashboards - -- A team member in the Code Review group has merged an MR which got deployed to production. -- To verify the deployment, we can check the [Code Review group's dashboard](https://dashboards.gitlab.net/d/stage-groups-code_review/stage-groups-group-dashboard-create-code-review?orgId=1). -- Sidekiq Error Rate panel shows an elevated error rate, specifically `UpdateMergeRequestsWorker`. - - ![Debug 1](img/stage_group_dashboards_debug_1.png) - -- If we click on `Kibana: Kibana Sidekiq failed request logs` link in the Extra links session, we can filter for `UpdateMergeRequestsWorker`, and read through the logs. - - ![Debug 2](img/stage_group_dashboards_debug_2.png) - -- [Sentry](https://sentry.gitlab.net/gitlab/gitlabcom/) gives us a way to find the exception where we can filter by transaction type and correlation_id from a Kibana's result item. - - ![Debug 3](img/stage_group_dashboards_debug_3.png) - -- A precise exception, including a stack trace, job arguments, and other information, should now appear. Happy debugging! - -## How to customize the dashboard - -All Grafana dashboards at GitLab are generated from the [Jsonnet files](https://github.com/grafana/grafonnet-lib) stored in [the runbook project](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards). Particularly, the stage group dashboards definitions are stored in [/dashboards/stage-groups](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards/stage-groups) subfolder in the Runbook. By convention, each group has a corresponding Jsonnet file. The dashboards are synced with GitLab [stage group data](https://gitlab.com/gitlab-com/www-gitlab-com/-/raw/master/data/stages.yml) every month. Expansion and customization are one of the key principles used when we designed this system. To customize your group's dashboard, you need to edit the corresponding file and follow the [Runbook workflow](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards#dashboard-source). The dashboard is updated after the MR is merged. Looking at an autogenerated file, for example, [`product_planning.dashboard.jsonnet`](https://gitlab.com/gitlab-com/runbooks/-/blob/master/dashboards/stage-groups/product_planning.dashboard.jsonnet): - -```jsonnet -// This file is autogenerated using scripts/update_stage_groups_dashboards.rb -// Please feel free to customize this file. -local stageGroupDashboards = import './stage-group-dashboards.libsonnet'; - -stageGroupDashboards.dashboard('product_planning') -.stageGroupDashboardTrailer() -``` - -We provide basic customization to filter out the components essential to your group's activities. By default, only the `web`, `api`, and `sidekiq` components are available in the dashboard, while `git` is hidden. See [how to enable available components and optional graphs](#optional-graphs). - -You can also append further information or custom metrics to a dashboard. This is an example that adds some links and a total request rate on the top of the page: - -```jsonnet -local stageGroupDashboards = import './stage-group-dashboards.libsonnet'; -local grafana = import 'github.com/grafana/grafonnet-lib/grafonnet/grafana.libsonnet'; -local basic = import 'grafana/basic.libsonnet'; - -stageGroupDashboards.dashboard('source_code') -.addPanel( - grafana.text.new( - title='Group information', - mode='markdown', - content=||| - Useful link for the Source Code Management group dashboard: - - [Issue list](https://gitlab.com/groups/gitlab-org/-/issues?scope=all&state=opened&label_name%5B%5D=repository) - - [Epic list](https://gitlab.com/groups/gitlab-org/-/epics?label_name[]=repository) - |||, - ), - gridPos={ x: 0, y: 0, w: 24, h: 4 } -) -.addPanel( - basic.timeseries( - title='Total Request Rate', - yAxisLabel='Requests per Second', - decimals=2, - query=||| - sum ( - rate(gitlab_transaction_duration_seconds_count{ - env='$environment', - environment='$environment', - feature_category=~'source_code_management', - }[$__interval]) - ) - ||| - ), - gridPos={ x: 0, y: 0, w: 24, h: 7 } -) -.stageGroupDashboardTrailer() -``` - -![Stage Group Dashboard Customization](img/stage_group_dashboards_time_customization.png) - -<i class="fa fa-youtube-play youtube" aria-hidden="true"></i> -If you want to see the workflow in action, we've recorded a pairing session on customizing a dashboard, -available on [GitLab Unfiltered](https://youtu.be/shEd_eiUjdI). - -For deeper customization and more complicated metrics, visit the [Grafonnet lib](https://github.com/grafana/grafonnet-lib) project and the [GitLab Prometheus Metrics](../administration/monitoring/prometheus/gitlab_metrics.md#gitlab-prometheus-metrics) documentation. - -### Optional Graphs - -Some Graphs aren't relevant for all groups, so they aren't added to -the dashboard by default. They can be added by customizing the -dashboard. - -By default, only the `web`, `api`, and `sidekiq` metrics are -shown. If you wish to see the metrics from the `git` fleet (or any -other component that might be added in the future), this could be -configured as follows: - -```jsonnet -stageGroupDashboards -.dashboard('source_code', components=stageGroupDashboards.supportedComponents) -.stageGroupDashboardTrailer() -``` - -If your group is interested in Sidekiq job durations and their -thresholds, these graphs can be added by calling the -`.addSidekiqJobDurationByUrgency` function: - -```jsonnet -stageGroupDashboards -.dashboard('access') -.addSidekiqJobDurationByUrgency() -.stageGroupDashboardTrailer() -``` +<!-- This redirect file can be deleted after <2022-06-15>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> diff --git a/doc/development/stage_group_observability/dashboards/error_budget_detail.md b/doc/development/stage_group_observability/dashboards/error_budget_detail.md new file mode 100644 index 00000000000..19f98d404e7 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/error_budget_detail.md @@ -0,0 +1,127 @@ +--- +stage: Platforms +group: Scalability +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Error budget detail dashboard + +With error budget detailed dashboards you can explore the error budget +spent at specific moments in time. By default, the dashboard shows +the past 28 days. You can adjust it with the [time range controls](index.md#time-range-controls) +or by selecting a range on one of the graphs. + +This dashboard is the same kind of dashboard we use for service level +monitoring. For example, see the +[overview dashboard for the web service](https://dashboards.gitlab.net/d/web-main) (GitLab internal). + +## Error budget panels + +On top of each dashboard, there's the same panel with the [error budget](../index.md#error-budget). +Here, the time based targets adjust depending on the range. +For example, while the budget was 20 minutes per 28 days, it is only 1/4 of that for 7 days: + +![5m budget in 7 days](img/error_budget_detail_7d_budget.png) + +Also, keep in mind that Grafana rounds the numbers. In this example the +total time spent is 5 minutes and 24 seconds, so 24 seconds over +budget. + +The attribution panels also show only failures that occurred +within the selected range. + +These two panels represent a view of the "official" error budget: they +take into account if an SLI was ignored. +The [attribution panels](../index.md#check-where-budget-is-being-spent) show which components +contributed the most over the selected period. + +The panels below take into account all SLIs that contribute to GitLab.com availability. +This includes SLIs that are ignored for the official error budget. + +## Time series for aggregations + +The time series panels for aggregations all contain three panels: + +- Apdex: the [Apdex score](https://en.wikipedia.org/wiki/Apdex) for one or more SLIs. Higher score is better. +- Error Ratio: the error ratio for one or more SLIs. Lower is better. +- Requests Per Second: the number of operations per second. Higher means a bigger impact on the error budget. + +The Apdex and error-ratio panels also contain two alerting thresholds: + +- The one-hour threshold: the fast burn rate. + + When this line is crossed, we've spent 2% of our monthly budget in the last hour. + +- The six-hour threshold: the slow burn rate. + + When this line is crossed, we've spent 2% of our budget in the last six hours. + +If there is no error-ratio or Apdex for a certain SLI, the panel is hidden. + +Read more about these alerting windows in +[Google SRE workbook](https://sre.google/workbook/alerting-on-slos/#recommended_time_windows_and_burn_rates_f). + +We don't have alerting on these metrics for stage groups. +This work is being discussed in [epic 615](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/615). +If this is something you would like for your group, let us know there. + +### Stage group aggregation + +![stage group aggregation graphs](img/error_budget_detail_stage_group_aggregation.png) + +The stage group aggregation shows a graph with the Apdex and errors +portion of the error budget over time. The lower a dip in the Apdex +graph or the higher a peak on the error ratio graph, the more budget +was spent at that moment. + +The third graph shows the sum of all the request rates for all +SLIs. Higher means there was more traffic. + +To zoom in on a particular moment where a lot of budget was spent, select the appropriate time in +the graph. + +### Service-level indicators + +![Rails requests service level indicator](img/error_budget_detail_sli.png) + +This time series shows a breakdown of each SLI that could be contributing to the +error budget for a stage group. Similar to the stage group +aggregation, it contains an Apdex score, error ratio, and request +rate. + +Here we also display an explanation panel, describing the SLI and +linking to other monitoring tools. The links to logs (📖) or +visualizations (📈) in Kibana are scoped to the feature categories +for your stage group, and limited to the range selected. Keep in mind +that we only keep logs in Kibana for seven days. + +In the graphs, there is a single line per service. In the previous example image, +`rails_requests` is an SLI for the `web`, `api` and `git` services. + +Sidekiq is not included in this dashboard. We're tracking this in +[epic 700](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/700). + +### SLI detail + +![Rails requests SLI detail](img/error_budget_detail_sli_detail.png) + +The SLI details row shows a breakdown of a specific SLI based on the +labels present on the source metrics. + +For example, in the previous image, the `rails_requests` SLI has an `endpoint_id` label. +We can show how much a certain endpoint was requested (RPS), and how much it contributed to the error +budget spend. + +For Apdex we show the **Apdex Attribution** panel. The more prominent +color is the one that contributed most to the spend. To see the +top spending endpoint over the entire range, sort by the average. + +For error ratio we show an error rate. To see which label contributed most to the spend, sort by the +average. + +We don't have endpoint information available for Rails errors. This work is being planned in +[epic 663](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/663). + +The number of series to be loaded in the SLI details graphs is very +high when compared to the other aggregations. Because of this, it's not possible to +load more than a few days' worth of data. diff --git a/doc/development/stage_group_observability/dashboards/img/error_budget_detail_7d_budget.png b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_7d_budget.png Binary files differnew file mode 100644 index 00000000000..1b2996d7d26 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_7d_budget.png diff --git a/doc/development/stage_group_observability/dashboards/img/error_budget_detail_sli.png b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_sli.png Binary files differnew file mode 100644 index 00000000000..0472e35b0cb --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_sli.png diff --git a/doc/development/stage_group_observability/dashboards/img/error_budget_detail_sli_detail.png b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_sli_detail.png Binary files differnew file mode 100644 index 00000000000..99530886ae9 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_sli_detail.png diff --git a/doc/development/stage_group_observability/dashboards/img/error_budget_detail_stage_group_aggregation.png b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_stage_group_aggregation.png Binary files differnew file mode 100644 index 00000000000..d679637dcc4 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/img/error_budget_detail_stage_group_aggregation.png diff --git a/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_28d_budget.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_28d_budget.png Binary files differnew file mode 100644 index 00000000000..eb164dd3f68 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_28d_budget.png diff --git a/doc/development/img/stage_group_dashboards_annotation.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_annotation.png Binary files differindex 3776d87e5bb..3776d87e5bb 100644 --- a/doc/development/img/stage_group_dashboards_annotation.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_annotation.png diff --git a/doc/development/img/stage_group_dashboards_debug_1.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_debug_1.png Binary files differindex 309fad89120..309fad89120 100644 --- a/doc/development/img/stage_group_dashboards_debug_1.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_debug_1.png diff --git a/doc/development/img/stage_group_dashboards_debug_2.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_debug_2.png Binary files differindex 2aad9ab5592..2aad9ab5592 100644 --- a/doc/development/img/stage_group_dashboards_debug_2.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_debug_2.png diff --git a/doc/development/img/stage_group_dashboards_debug_3.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_debug_3.png Binary files differindex 38647410ffd..38647410ffd 100644 --- a/doc/development/img/stage_group_dashboards_debug_3.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_debug_3.png diff --git a/doc/development/img/stage_group_dashboards_filters.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_filters.png Binary files differindex 27a836bc36d..27a836bc36d 100644 --- a/doc/development/img/stage_group_dashboards_filters.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_filters.png diff --git a/doc/development/img/stage_group_dashboards_metrics.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_metrics.png Binary files differindex 6b6faff6e3b..6b6faff6e3b 100644 --- a/doc/development/img/stage_group_dashboards_metrics.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_metrics.png diff --git a/doc/development/img/stage_group_dashboards_time_customization.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_time_customization.png Binary files differindex 49e61183b7c..49e61183b7c 100644 --- a/doc/development/img/stage_group_dashboards_time_customization.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_time_customization.png diff --git a/doc/development/img/stage_group_dashboards_time_filter.png b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_time_filter.png Binary files differindex 81a3dc789f1..81a3dc789f1 100644 --- a/doc/development/img/stage_group_dashboards_time_filter.png +++ b/doc/development/stage_group_observability/dashboards/img/stage_group_dashboards_time_filter.png diff --git a/doc/development/stage_group_observability/dashboards/index.md b/doc/development/stage_group_observability/dashboards/index.md new file mode 100644 index 00000000000..f4e646c8634 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/index.md @@ -0,0 +1,70 @@ +--- +stage: Platforms +group: Scalability +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Dashboards for stage groups + +We generate a lot of dashboards acting as windows to the metrics we +use to monitor GitLab.com. Most of our dashboards are generated from +Jsonnet in the +[runbooks repository](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards#dashboard-source). +Anyone can contribute to these, adding new dashboards or modifying +existing ones. + +When adding new dashboards for your stage groups, tagging them with +`stage_group:<group name>` cross-links the dashboard on other +dashboards with the same tag. You can create dashboards for stage groups +in the [`dashboards/stage-groups`](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards/stage-groups) +directory. Directories can't be nested more than one level deep. + +To see a list of all the dashboards for your stage group: + +1. In Grafana, go to the [Dashboard browser](https://dashboards.gitlab.net/dashboards?tag=stage-groups). +1. To see all of the dashboards for a specific group, filter for `stage_group:<group name>`. + +Some generated dashboards are already available: + +1. [Stage group dashboard](stage_group_dashboard.md): a customizable + dashboard with tailored metrics per group. +1. [Error budget detail dashboard](error_budget_detail.md): a + dashboard allowing to explore the error budget spend over time and + over multiple SLIs. + +## Time range controls + +![Default time filter](img/stage_group_dashboards_time_filter.png) + +By default, all the times are in UTC time zone. +[We use UTC when communicating in Engineering.](https://about.gitlab.com/handbook/communication/#writing-style-guidelines) + +All metrics recorded in the GitLab production system have +[one-year retention](https://gitlab.com/gitlab-cookbooks/gitlab-prometheus/-/blob/31526b03fef823e2f9b3cda7c75dcd28a12418a3/attributes/prometheus.rb#L40). + +You can also zoom in and filter the time range directly on a graph. For more information, see the +[Grafana Time Range Controls](https://grafana.com/docs/grafana/latest/dashboards/time-range-controls/) +documentation. + +## Filters and annotations + +On each dashboard, there are two filters and some annotation switches on the top of the page. + +Some special events are meaningful to development and operational activities. +[Grafana annotations](https://grafana.com/docs/grafana/latest/dashboards/annotations/) mark them +directly on the graphs. + +![Filters and annotations](img/stage_group_dashboards_filters.png) + +| Name | Type | Description | +| --------------- | ---------- | ----------- | +| `PROMETHEUS_DS` | filter | Filter the selective [Prometheus data sources](https://about.gitlab.com/handbook/engineering/monitoring/#prometheus). The default value is `Global`, which aggregates the data from all available data sources. Most of the time, you don't need to care about this filter. | +| `environment` | filter | Filter the environment the metrics are fetched from. The default setting is production (`gprd`). For other options, see [Production Environment mapping](https://about.gitlab.com/handbook/engineering/infrastructure/production/architecture/#environments). | +| `stage` | filter | Filter metrics by stage: `main` or `cny` for canary. Default is `main` | +| `deploy` | annotation | Mark a deployment event on the GitLab.com SaaS platform. | +| `canary-deploy` | annotation | Mark a [canary deployment](https://about.gitlab.com/handbook/engineering/#canary-testing) event on the GitLab.com SaaS platform. | +| `feature-flags` | annotation | Mark the time point when a feature flag is updated. | + +Example of a feature flag annotation displayed on a dashboard panel: + +![Annotations](img/stage_group_dashboards_annotation.png) diff --git a/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md b/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md new file mode 100644 index 00000000000..c1831cfce69 --- /dev/null +++ b/doc/development/stage_group_observability/dashboards/stage_group_dashboard.md @@ -0,0 +1,200 @@ +--- +stage: Platforms +group: Scalability +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Stage group dashboard + +The stage group dashboard is generated dashboard that contains metrics +for common components used by most stage groups. The dashboard is +fully customizable and owned by the stage groups. + +This page explains what is on these dashboards, how to use their +contents, and how they can be customized. + +## Dashboard contents + +### Error budget panels + +![28 day budget](img/stage_group_dashboards_28d_budget.png) + +The top panels display the [error budget](../index.md#error-budget). +These panels always show the 28 days before the end time selected in the +[time range controls](index.md#time-range-controls). This data doesn't +follow the selected range. It does respect the filters for environment +and stage. + +### Metrics panels + +![Metrics panels](img/stage_group_dashboards_metrics.png) + +Although most of the metrics displayed in the panels are self-explanatory in their title and nearby +description, note the following: + +- The events are counted, measured, accumulated, collected, and stored as + [time series](https://prometheus.io/docs/concepts/data_model/). The data is calculated using + statistical methods to produce metrics. It means that metrics are approximately correct and + meaningful over a time period. They help you get an overview of the stage of a system over time. + They are not meant to give you precise numbers of a discrete event. + + If you need a higher level of accuracy, use another monitoring tool, such as + [logs](https://about.gitlab.com/handbook/engineering/monitoring/#logs). + Read the following examples for more explanations. +- All the rate metrics' units are `requests per second`. The default aggregate time frame is 1 minute. + + For example, a panel shows the requests per second number at `2020-12-25 00:42:00` to be `34.13`. + It means at the minute 42 (from `2020-12-25 00:42:00` to `2020-12-25 00:42:59` ), there are + approximately `34.13 * 60 = ~ 2047` requests processed by the web servers. +- You might encounter some gotchas related to decimal fraction and rounding up frequently, especially + in low-traffic cases. For example, the error rate of `RepositoryUpdateMirrorWorker` at + `2020-12-25 02:04:00` is `0.07`, equivalent to `4.2` jobs per minute. The raw result is + `0.06666666667`, equivalent to 4 jobs per minute. +- All the rate metrics are more accurate when the data is big enough. The default floating-point + precision is 2. In some extremely low panels, you can see `0.00`, even though there is still some + real traffic. + +To inspect the raw data of the panel for further calculation, select **Inspect** from the dropdown +list of a panel. Queries, raw data, and panel JSON structure are available. +Read more at [Grafana panel inspection](https://grafana.com/docs/grafana/latest/panels/inspect-panel/). + +All the dashboards are powered by [Grafana](https://grafana.com/), a frontend for displaying metrics. +Grafana consumes the data returned from queries to backend Prometheus data source, then presents it +with visualizations. The stage group dashboards are built to serve the most common use cases with a +limited set of filters and pre-built queries. Grafana provides a way to explore and visualize the +metrics data with [Grafana Explore](https://grafana.com/docs/grafana/latest/explore/). This requires +some knowledge of the [Prometheus PromQL query language](https://prometheus.io/docs/prometheus/latest/querying/basics/). + +## Example: Debugging with dashboards + +Example debugging workflow: + +1. A team member in the Code Review group has merged an MR which got deployed to production. +1. To verify the deployment, you can check the + [Code Review group's dashboard](https://dashboards.gitlab.net/d/stage-groups-code_review/stage-groups-group-dashboard-create-code-review?orgId=1). +1. Sidekiq Error Rate panel shows an elevated error rate, specifically `UpdateMergeRequestsWorker`. + + ![Debug 1](img/stage_group_dashboards_debug_1.png) + +1. If you select **Kibana: Kibana Sidekiq failed request logs** in the **Extra links** section, you can filter for `UpdateMergeRequestsWorker` and read through the logs. + + ![Debug 2](img/stage_group_dashboards_debug_2.png) + +1. With [Sentry](https://sentry.gitlab.net/gitlab/gitlabcom/) you can find the exception where you + can filter by transaction type and `correlation_id` from Kibana's result item. + + ![Debug 3](img/stage_group_dashboards_debug_3.png) + +1. A precise exception, including a stack trace, job arguments, and other information should now appear. + +Happy debugging! + +## Customizing the dashboard + +All Grafana dashboards at GitLab are generated from the [Jsonnet files](https://github.com/grafana/grafonnet-lib) +stored in [the runbooks project](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards). +Particularly, the stage group dashboards definitions are stored in +[`/dashboards/stage-groups`](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards/stage-groups). + +By convention, each group has a corresponding Jsonnet file. The dashboards are synced with GitLab +[stage group data](https://gitlab.com/gitlab-com/www-gitlab-com/-/raw/master/data/stages.yml) every +month. + +Expansion and customization are one of the key principles used when we designed this system. +To customize your group's dashboard, edit the corresponding file and follow the +[Runbook workflow](https://gitlab.com/gitlab-com/runbooks/-/tree/master/dashboards#dashboard-source). +The dashboard is updated after the MR is merged. + +Looking at an autogenerated file, for example, +[`product_planning.dashboard.jsonnet`](https://gitlab.com/gitlab-com/runbooks/-/blob/master/dashboards/stage-groups/product_planning.dashboard.jsonnet): + +```jsonnet +// This file is autogenerated using scripts/update_stage_groups_dashboards.rb +// Please feel free to customize this file. +local stageGroupDashboards = import './stage-group-dashboards.libsonnet'; + +stageGroupDashboards.dashboard('product_planning') +.stageGroupDashboardTrailer() +``` + +We provide basic customization to filter out the components essential to your group's activities. +By default, only the `web`, `api`, and `sidekiq` components are available in the dashboard, while +`git` is hidden. See [how to enable available components and optional graphs](#optional-graphs). + +You can also append further information or custom metrics to a dashboard. The following example +adds some links and a total request rate to the top of the page: + +```jsonnet +local stageGroupDashboards = import './stage-group-dashboards.libsonnet'; +local grafana = import 'github.com/grafana/grafonnet-lib/grafonnet/grafana.libsonnet'; +local basic = import 'grafana/basic.libsonnet'; + +stageGroupDashboards.dashboard('source_code') +.addPanel( + grafana.text.new( + title='Group information', + mode='markdown', + content=||| + Useful link for the Source Code Management group dashboard: + - [Issue list](https://gitlab.com/groups/gitlab-org/-/issues?scope=all&state=opened&label_name%5B%5D=repository) + - [Epic list](https://gitlab.com/groups/gitlab-org/-/epics?label_name[]=repository) + |||, + ), + gridPos={ x: 0, y: 0, w: 24, h: 4 } +) +.addPanel( + basic.timeseries( + title='Total Request Rate', + yAxisLabel='Requests per Second', + decimals=2, + query=||| + sum ( + rate(gitlab_transaction_duration_seconds_count{ + env='$environment', + environment='$environment', + feature_category=~'source_code_management', + }[$__interval]) + ) + ||| + ), + gridPos={ x: 0, y: 0, w: 24, h: 7 } +) +.stageGroupDashboardTrailer() +``` + +![Stage Group Dashboard Customization](img/stage_group_dashboards_time_customization.png) + +<i class="fa fa-youtube-play youtube" aria-hidden="true"></i> +If you want to see the workflow in action, we've recorded a pairing session on customizing a dashboard, +available on [GitLab Unfiltered](https://youtu.be/shEd_eiUjdI). + +For deeper customization and more complicated metrics, visit the +[Grafonnet lib](https://github.com/grafana/grafonnet-lib) project and the +[GitLab Prometheus Metrics](../../../administration/monitoring/prometheus/gitlab_metrics.md#gitlab-prometheus-metrics) +documentation. + +### Optional graphs + +Some graphs aren't relevant for all groups, so they aren't added to +the dashboard by default. They can be added by customizing the +dashboard. + +By default, only the `web`, `api`, and `sidekiq` metrics are +shown. If you wish to see the metrics from the `git` fleet (or any +other component that might be added in the future), you can configure it as follows: + +```jsonnet +stageGroupDashboards +.dashboard('source_code', components=stageGroupDashboards.supportedComponents) +.stageGroupDashboardTrailer() +``` + +If your group is interested in Sidekiq job durations and their +thresholds, you can add these graphs by calling the `.addSidekiqJobDurationByUrgency` function: + +```jsonnet +stageGroupDashboards +.dashboard('access') +.addSidekiqJobDurationByUrgency() +.stageGroupDashboardTrailer() +``` diff --git a/doc/development/img/stage_group_dashboards_error_attribution.png b/doc/development/stage_group_observability/img/stage_group_dashboards_error_attribution.png Binary files differindex f6ea7c004ac..f6ea7c004ac 100644 --- a/doc/development/img/stage_group_dashboards_error_attribution.png +++ b/doc/development/stage_group_observability/img/stage_group_dashboards_error_attribution.png diff --git a/doc/development/img/stage_group_dashboards_service_sli_detail.png b/doc/development/stage_group_observability/img/stage_group_dashboards_service_sli_detail.png Binary files differindex 5dc32063709..5dc32063709 100644 --- a/doc/development/img/stage_group_dashboards_service_sli_detail.png +++ b/doc/development/stage_group_observability/img/stage_group_dashboards_service_sli_detail.png diff --git a/doc/development/stage_group_observability/index.md b/doc/development/stage_group_observability/index.md new file mode 100644 index 00000000000..868e55735e8 --- /dev/null +++ b/doc/development/stage_group_observability/index.md @@ -0,0 +1,138 @@ +--- +stage: Platforms +group: Scalability +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Observability for stage groups + +Observability is about bringing visibility into a system to see and +understand the state of each component, with context, to support +performance tuning and debugging. To run a SaaS platform at scale, a +rich and detailed observability platform is needed. + +To make information available to [stage groups](https://about.gitlab.com/handbook/product/categories/#hierarchy), +we are aggregating metrics by feature category and then show +this information on [dashboards](dashboards/index.md) tailored to the groups. Only metrics +for the features built by the group are visible on their +dashboards. + +With a filtered view, groups can discover bugs and performance regressions that could otherwise +be missed when viewing aggregated data. + +For more specific information on dashboards, see: + +- [Dashboards](dashboards/index.md): a general overview of where to find dashboards + and how to use them. +- [Stage group dashboard](dashboards/stage_group_dashboard.md): how to use and customize the stage group dashboard. +- [Error budget detail](dashboards/error_budget_detail.md): how to explore error budget over time. + +## Error budget + +The error budget is calculated from the same [Service Level Indicators](https://en.wikipedia.org/wiki/Service_level_indicator) (SLIs) +that we use to monitor GitLab.com. The 28-day availability number for a +stage group is comparable to the +[monthly availability](https://about.gitlab.com/handbook/engineering/infrastructure/performance-indicators/#gitlabcom-availability) +we calculate for GitLab.com, except it's scoped to the features of a group. + +To learn more about how we use error budgets, see the +[Engineering Error Budgets](https://about.gitlab.com/handbook/engineering/error-budgets/) handbook page. + +By default, the first row of panels on both dashboards shows the +[error budget for the stage group](https://about.gitlab.com/handbook/engineering/error-budgets/#budget-spend-by-stage-group). +This row shows how features owned by the group contribute to our +[overall availability](https://about.gitlab.com/handbook/engineering/infrastructure/performance-indicators/#gitlabcom-availability). + +The official budget is aggregated over the 28 days. You can see it on the +[stage group dashboard](dashboards/stage_group_dashboard.md). +The [error budget detail dashboard](dashboards/error_budget_detail.md) +allows customizing the range. + +We show the information in two formats: + +- Availability: this number can be compared to GitLab.com overall + availability target of 99.95% uptime. +- Budget Spent: time over the past 28 days that features owned by the group have not been performing + adequately. + +The budget is calculated based on indicators per component. Each +component can have two indicators: + +- [Apdex](https://en.wikipedia.org/wiki/Apdex): the rate of operations that performed adequately. + + The threshold for "performing adequately" is stored in our + [metrics catalog](https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog) + and depends on the service in question. For the Puma (Rails) component of the + [API](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/api.jsonnet#L127), + [Git](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/git.jsonnet#L216), + and + [Web](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/web.jsonnet#L154) + services, that threshold is **5 seconds** when not opted in to the + [`rails_requests` SLI](../application_slis/rails_request_apdex.md). + + We've made this target configurable in [this project](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/525). + To learn how to customize the request Apdex, see + [Rails request Apdex SLI](../application_slis/rails_request_apdex.md). + This new Apdex measurement is not part of the error budget until you + [opt in](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1451). + + For Sidekiq job execution, the threshold depends on the + [job urgency](../sidekiq/worker_attributes.md#job-urgency). It is + [currently](https://gitlab.com/gitlab-com/runbooks/-/blob/f22f40b2c2eab37d85e23ccac45e658b2c914445/metrics-catalog/services/lib/sidekiq-helpers.libsonnet#L25-38) + **10 seconds** for high-urgency jobs and **5 minutes** for other jobs. + + Some stage groups might have more services. The thresholds for them are also in the metrics catalog. + +- Error rate: The rate of operations that had errors. + +The calculation of the ratio happens as follows: + +```math +\frac {operations\_meeting\_apdex + (total\_operations - operations\_with\_errors)} {total\_apdex\_measurements + total\_operations} +``` + +## Check where budget is being spent + +Both the [stage group dashboard](dashboards/stage_group_dashboard.md) +and the [error budget detail dashboard](dashboards/error_budget_detail.md) +show panels to see where the error budget was spent. The stage group +dashboard always shows a fixed 28 days. The error budget detail +dashboard allows drilling down to the SLIs over time. + +The row below the error budget row is collapsed by default. Expanding +it shows which component and violation type had the most offending +operations in the past 28 days. + +![Error attribution](img/stage_group_dashboards_error_attribution.png) + +The first panel on the left shows a table with the number of errors per +component. Digging into the first row in that table has +the biggest impact on the budget spent. + +Commonly, the components that spend most of the budget are Sidekiq or Puma. The panel in +the center explains what different violation types mean and how to dig +deeper in the logs. + +The panel on the right provides links to Kibana that should reveal +which endpoints or Sidekiq jobs are causing the errors. + +<i class="fa fa-youtube-play youtube" aria-hidden="true"></i> +To learn how to use these panels and logs for +determining which Rails endpoints are slow, +see the [Error Budget Attribution for Purchase group](https://youtu.be/M9u6unON7bU) video. + +Other components visible in the table come from +[service-level indicators](https://sre.google/sre-book/service-level-objectives/) (SLIs) defined +in the [metrics catalog](https://gitlab.com/gitlab-com/runbooks/-/blob/master/metrics-catalog/README.md). + +For those types of failures, you can follow the link to the service +dashboard linked from the `type` column. The service dashboard +contains a row specifically for the SLI that is causing the budget +spent, with links to logs and a description of what the +component means. + +For example, see the `server` component of the `web-pages` service: + +![web-pages-server-component SLI](img/stage_group_dashboards_service_sli_detail.png) + +To add more SLIs tailored to specific features, you can use an [Application SLI](../application_slis/index.md). diff --git a/doc/development/testing_guide/best_practices.md b/doc/development/testing_guide/best_practices.md index fe0c4c13ba2..7ae49d33e91 100644 --- a/doc/development/testing_guide/best_practices.md +++ b/doc/development/testing_guide/best_practices.md @@ -21,7 +21,7 @@ a level that is difficult to manage. Test heuristics can help solve this problem. They concisely address many of the common ways bugs manifest themselves in our code. When designing our tests, take time to review known test heuristics to inform our test design. We can find some helpful heuristics documented in the Handbook in the -[Test Engineering](https://about.gitlab.com/handbook/engineering/quality/test-engineering/#test-heuristics) section. +[Test Engineering](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/test-engineering/#test-heuristics) section. ## RSpec @@ -404,7 +404,7 @@ click_link _('UI testing docs') fill_in _('Search projects'), with: 'gitlab' # fill in text input with text -select _('Last updated'), from: 'Sort by' # select an option from a select input +select _('Updated date'), from: 'Sort by' # select an option from a select input check _('Checkbox label') uncheck _('Checkbox label') @@ -465,8 +465,8 @@ expect(page).to have_checked_field _('Checkbox label') expect(page).to have_unchecked_field _('Radio input label') expect(page).to have_select _('Sort by') -expect(page).to have_select _('Sort by'), selected: 'Last updated' # assert the option is selected -expect(page).to have_select _('Sort by'), options: ['Last updated', 'Created date', 'Due date'] # assert an exact list of options +expect(page).to have_select _('Sort by'), selected: 'Updated date' # assert the option is selected +expect(page).to have_select _('Sort by'), options: ['Updated date', 'Created date', 'Due date'] # assert an exact list of options expect(page).to have_select _('Sort by'), with_options: ['Created date', 'Due date'] # assert a partial list of options expect(page).to have_text _('Some paragraph text.') diff --git a/doc/development/testing_guide/end_to_end/best_practices.md b/doc/development/testing_guide/end_to_end/best_practices.md index e0f6cbe632d..bd9896934c7 100644 --- a/doc/development/testing_guide/end_to_end/best_practices.md +++ b/doc/development/testing_guide/end_to_end/best_practices.md @@ -279,6 +279,9 @@ When you add a new test that requires administrator access, apply the RSpec meta When running tests locally or configuring a pipeline, the environment variable `QA_CAN_TEST_ADMIN_FEATURES` can be set to `false` to skip tests that have the `:requires_admin` tag. +NOTE: +If the _only_ action in the test that requires administrator access is to toggle a feature flag, please use the `feature_flag` tag instead. More details can be found in [testing with feature flags](feature_flags.md). + ## Prefer `Commit` resource over `ProjectPush` In line with [using the API](#prefer-api-over-ui), use a `Commit` resource whenever possible. diff --git a/doc/development/testing_guide/end_to_end/execution_context_selection.md b/doc/development/testing_guide/end_to_end/execution_context_selection.md index 0fdcf0c8c3b..0a4c5fcf451 100644 --- a/doc/development/testing_guide/end_to_end/execution_context_selection.md +++ b/doc/development/testing_guide/end_to_end/execution_context_selection.md @@ -6,7 +6,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w # Execution context selection -Some tests are designed to be run against specific environments, or in specific [pipelines](https://about.gitlab.com/handbook/engineering/quality/guidelines/debugging-qa-test-failures/#scheduled-qa-test-pipelines) or jobs. We can specify the test execution context using the `only` and `except` metadata. +Some tests are designed to be run against specific environments, or in specific [pipelines](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/debugging-qa-test-failures/#scheduled-qa-test-pipelines) or jobs. We can specify the test execution context using the `only` and `except` metadata. ## Available switches @@ -118,7 +118,7 @@ To run a test tagged with `except` locally, you can either: Similarly to specifying that a test should only run against a specific environment, it's also possible to quarantine a test only when it runs against a specific environment. The syntax is exactly the same, except that the `only: { ... }` -hash is nested in the [`quarantine: { ... }`](https://about.gitlab.com/handbook/engineering/quality/guidelines/debugging-qa-test-failures/#quarantining-tests) hash. +hash is nested in the [`quarantine: { ... }`](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/debugging-qa-test-failures/#quarantining-tests) hash. For example, `quarantine: { only: { subdomain: :staging } }` only quarantines the test when run against `staging`. The quarantine feature can be explicitly disabled with the `DISABLE_QUARANTINE` environment variable. This can be useful when running tests locally. diff --git a/doc/development/testing_guide/end_to_end/feature_flags.md b/doc/development/testing_guide/end_to_end/feature_flags.md index c3e3f117c2b..47ebef37a4d 100644 --- a/doc/development/testing_guide/end_to_end/feature_flags.md +++ b/doc/development/testing_guide/end_to_end/feature_flags.md @@ -14,19 +14,45 @@ automatically authenticates as an administrator as long as you provide an approp token via `GITLAB_QA_ADMIN_ACCESS_TOKEN` (recommended), or provide `GITLAB_ADMIN_USERNAME` and `GITLAB_ADMIN_PASSWORD`. -Please be sure to include the tag `:requires_admin` so that the test can be skipped in environments -where administrator access is not available. +## `feature_flag` RSpec tag -WARNING: -You are strongly advised to [enable feature flags only for a group, project, user](../../feature_flags/index.md#feature-actors), -or [feature group](../../feature_flags/index.md#feature-groups). This makes it possible to -test a feature in a shared environment without affecting other users. +Please be sure to include the `feature_flag` tag so that the test can be skipped on the appropriate environments. -For example, the code below would enable a feature flag named `:feature_flag_name` for the project +**Optional metadata:** + +`name` + +- Format: `feature_flag: { name: 'feature_flag_name' }` +- Used only for informational purposes at this time. It should be included to help quickly determine what +feature flag is under test. + +`scope` + +- Format: `feature_flag: { name: 'feature_flag_name', scope: :project }` +- When `scope` is set to `:global`, the test will be **skipped on all live .com environments**. This is to avoid issues with feature flag changes affecting other tests or users on that environment. +- When `scope` is set to any other value (such as `:project`, `:group` or `:user`), or if no `scope` is specified, the test will only be **skipped on canary and production**. +This is due to the fact that admin access is not available there. + +**WARNING:** You are strongly advised to first try and [enable feature flags only for a group, project, user](../../feature_flags/index.md#feature-actors), +or [feature group](../../feature_flags/index.md#feature-groups). + +- If a global feature flag must be used, it is strongly recommended to apply `scope: :global` to the `feature_flag` metadata. This is, however, left up to the SET's discretion to determine the level of risk. + - For example, a test uses a global feature flag that only affects a small area of the application and is also needed to check for critical issues on live environments. + In such a scenario, it would be riskier to skip running the test. For cases like this, `scope` can be left out of the metadata so that it can still run in live environments + with admin access, such as staging. + +**Note on `requires_admin`:** This tag should still be applied if there are other actions within the test that require admin access that are unrelated to updating a +feature flag (ex: creating a user via the API). + +The code below would enable a feature flag named `:feature_flag_name` for the project created by the test: ```ruby -RSpec.describe "with feature flag enabled", :requires_admin do +RSpec.describe "with feature flag enabled", feature_flag: { + name: 'feature_flag_name', + scope: :project + } do + let(:project) { Resource::Project.fabricate_via_api! } before do @@ -162,7 +188,7 @@ for details. ## Confirming that end-to-end tests pass with a feature flag enabled -End-to-end tests should pass with a feature flag enabled before it is enabled on Staging or on GitLab.com. Tests that need to be updated should be identified as part of [quad-planning](https://about.gitlab.com/handbook/engineering/quality/quad-planning/). The relevant [counterpart Software Engineer in Test](https://about.gitlab.com/handbook/engineering/quality/#individual-contributors) is responsible for updating the tests or assisting another engineer to do so. However, if a change does not go through quad-planning and a required test update is not made, test failures could block deployment. +End-to-end tests should pass with a feature flag enabled before it is enabled on Staging or on GitLab.com. Tests that need to be updated should be identified as part of [quad-planning](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/quad-planning/). The relevant [counterpart Software Engineer in Test](https://about.gitlab.com/handbook/engineering/quality/#individual-contributors) is responsible for updating the tests or assisting another engineer to do so. However, if a change does not go through quad-planning and a required test update is not made, test failures could block deployment. ### Automatic test execution when a feature flag definition changes diff --git a/doc/development/testing_guide/end_to_end/index.md b/doc/development/testing_guide/end_to_end/index.md index dc989acbdcc..1e7cba9d247 100644 --- a/doc/development/testing_guide/end_to_end/index.md +++ b/doc/development/testing_guide/end_to_end/index.md @@ -135,7 +135,7 @@ The [existing scenarios](https://gitlab.com/gitlab-org/gitlab-qa/blob/master/doc that run in the downstream `gitlab-qa-mirror` pipeline include many tests, but there are times when you might want to run a test or a group of tests that are different than the groups in any of the existing scenarios. -For example, when we [dequarantine](https://about.gitlab.com/handbook/engineering/quality/guidelines/debugging-qa-test-failures/#dequarantining-tests) +For example, when we [dequarantine](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/debugging-qa-test-failures/#dequarantining-tests) a flaky test we first want to make sure that it's no longer flaky. We can do that using the `ce:custom-parallel` and `ee:custom-parallel` jobs. Both are manual jobs that you can configure using custom variables. @@ -281,6 +281,7 @@ Continued reading: - [Flows](flows.md) - [RSpec metadata/tags](rspec_metadata_tests.md) - [Execution context selection](execution_context_selection.md) +- [Troubleshooting](troubleshooting.md) ## Where can I ask for help? diff --git a/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md b/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md index f9b505a8271..45161404c73 100644 --- a/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md +++ b/doc/development/testing_guide/end_to_end/rspec_metadata_tests.md @@ -11,41 +11,42 @@ This is a partial list of the [RSpec metadata](https://relishapp.com/rspec/rspec <!-- Please keep the tags in alphabetical order --> -| Tag | Description | -|-----|-------------| -| `:elasticsearch` | The test requires an Elasticsearch service. It is used by the [instance-level scenario](https://gitlab.com/gitlab-org/gitlab-qa#definitions) [`Test::Integration::Elasticsearch`](https://gitlab.com/gitlab-org/gitlab/-/blob/72b62b51bdf513e2936301cb6c7c91ec27c35b4d/qa/qa/ee/scenario/test/integration/elasticsearch.rb) to include only tests that require Elasticsearch. | -| `:except` | The test is to be run in their typical execution contexts _except_ as specified. See [test execution context selection](execution_context_selection.md) for more information. | -| `:geo` | The test requires two GitLab Geo instances - a primary and a secondary - to be spun up. | -| `:gitaly_cluster` | The test runs against a GitLab instance where repositories are stored on redundant Gitaly nodes behind a Praefect node. All nodes are [separate containers](../../../administration/gitaly/praefect.md#requirements). Tests that use this tag have a longer setup time since there are three additional containers that need to be started. | -| `:github` | The test requires a GitHub personal access token. | -| `:group_saml` | The test requires a GitLab instance that has SAML SSO enabled at the group level. Interacts with an external SAML identity provider. Paired with the `:orchestrated` tag. | -| `:instance_saml` | The test requires a GitLab instance that has SAML SSO enabled at the instance level. Interacts with an external SAML identity provider. Paired with the `:orchestrated` tag. | -| `:integrations` | This aims to test the available [integrations](../../../user/project/integrations/overview.md#integrations-listing). The test requires Docker to be installed in the run context. It will provision the containers and can be run against a local instance or using the `gitlab-qa` scenario `Test::Integration::Integrations` | -| `:service_ping_disabled` | The test interacts with the GitLab configuration service ping at the instance level to turn admin setting service ping checkbox on or off. This tag will have the test run only in the `service_ping_disabled` job and must be paired with the `:orchestrated` and `:requires_admin` tags. | -| `:jira` | The test requires a Jira Server. [GitLab-QA](https://gitlab.com/gitlab-org/gitlab-qa) provisions the Jira Server in a Docker container when the `Test::Integration::Jira` test scenario is run. -| `:kubernetes` | The test includes a GitLab instance that is configured to be run behind an SSH tunnel, allowing a TLS-accessible GitLab. This test also includes provisioning of at least one Kubernetes cluster to test against. _This tag is often be paired with `:orchestrated`._ | -| `:ldap_no_server` | The test requires a GitLab instance to be configured to use LDAP. To be used with the `:orchestrated` tag. It does not spin up an LDAP server at orchestration time. Instead, it creates the LDAP server at runtime. | -| `:ldap_no_tls` | The test requires a GitLab instance to be configured to use an external LDAP server with TLS not enabled. | -| `:ldap_tls` | The test requires a GitLab instance to be configured to use an external LDAP server with TLS enabled. | -| `:mattermost` | The test requires a GitLab Mattermost service on the GitLab instance. | -| `:mixed_env` | The test should only be executed in environments that have a paired canary version available through traffic routing based on the existence of the `gitlab_canary=true` cookie. Tests in this category are switching the cookie mid-test to validate mixed deployment environments. | -| `:object_storage` | The test requires a GitLab instance to be configured to use multiple [object storage types](../../../administration/object_storage.md). Uses MinIO as the object storage server. | -| `:only` | The test is only to be run in specific execution contexts. See [test execution context selection](execution_context_selection.md) for more information. | -| `:orchestrated` | The GitLab instance under test may be [configured by `gitlab-qa`](https://gitlab.com/gitlab-org/gitlab-qa/-/blob/master/docs/what_tests_can_be_run.md#orchestrated-tests) to be different to the default GitLab configuration, or `gitlab-qa` may launch additional services in separate Docker containers, or both. Tests tagged with `:orchestrated` are excluded when testing environments where we can't dynamically modify the GitLab configuration (for example, Staging). | -| `:packages` | The test requires a GitLab instance that has the [Package Registry](../../../administration/packages/#gitlab-package-registry-administration) enabled. | -| `:quarantine` | The test has been [quarantined](https://about.gitlab.com/handbook/engineering/quality/guidelines/debugging-qa-test-failures/#quarantining-tests), runs in a separate job that only includes quarantined tests, and is allowed to fail. The test is skipped in its regular job so that if it fails it doesn't hold up the pipeline. Note that you can also [quarantine a test only when it runs in a specific context](execution_context_selection.md#quarantine-a-test-for-a-specific-environment). | -| `:relative_url` | The test requires a GitLab instance to be installed under a [relative URL](../../../install/relative_url.md). | -| `:reliable` | The test has been [promoted to a reliable test](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/reliable-tests/#promoting-an-existing-test-to-reliable) meaning it passes consistently in all pipelines, including merge requests. | -| `:repository_storage` | The test requires a GitLab instance to be configured to use multiple [repository storage paths](../../../administration/repository_storage_paths.md). Paired with the `:orchestrated` tag. | -| `:requires_admin` | The test requires an administrator account. Tests with the tag are excluded when run against Canary and Production environments. | -| `:requires_git_protocol_v2` | The test requires that Git protocol version 2 is enabled on the server. It's assumed to be enabled by default but if not the test can be skipped by setting `QA_CAN_TEST_GIT_PROTOCOL_V2` to `false`. | -| `:requires_praefect` | The test requires that the GitLab instance uses [Gitaly Cluster](../../../administration/gitaly/praefect.md) (a.k.a. Praefect) as the repository storage . It's assumed to be used by default but if not the test can be skipped by setting `QA_CAN_TEST_PRAEFECT` to `false`. | -| `:runner` | The test depends on and sets up a GitLab Runner instance, typically to run a pipeline. | -| `:skip_live_env` | The test is excluded when run against live deployed environments such as Staging, Canary, and Production. | -| `:skip_fips_env` | The test is excluded when run against an environment in FIPS mode. | -| `:skip_signup_disabled` | The test uses UI to sign up a new user and is skipped in any environment that does not allow new user registration via the UI. | -| `:smoke` | The test belongs to the test suite which verifies basic functionality of a GitLab instance.| -| `:smtp` | The test requires a GitLab instance to be configured to use an SMTP server. Tests SMTP notification email delivery from GitLab by using MailHog. | -| `:testcase` | The link to the test case issue in the [GitLab Project Test Cases](https://gitlab.com/gitlab-org/gitlab/-/quality/test_cases). | -| `:transient` | The test tests transient bugs. It is excluded by default. | -| `:issue`, `:issue_${num}` | Optional links to issues which might be related to the spec. Helps keep track of related issues and can also be used by tools that create test reports. Currently added automatically to `Allure` test report. Multiple tags can be used by adding an optional numeric suffix like `issue_1`, `issue_2` etc. | +| Tag | Description | +|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `:elasticsearch` | The test requires an Elasticsearch service. It is used by the [instance-level scenario](https://gitlab.com/gitlab-org/gitlab-qa#definitions) [`Test::Integration::Elasticsearch`](https://gitlab.com/gitlab-org/gitlab/-/blob/72b62b51bdf513e2936301cb6c7c91ec27c35b4d/qa/qa/ee/scenario/test/integration/elasticsearch.rb) to include only tests that require Elasticsearch. | +| `:except` | The test is to be run in their typical execution contexts _except_ as specified. See [test execution context selection](execution_context_selection.md) for more information. | +| `:feature_flag` | The test uses a feature flag and therefore requires an administrator account to run. When `scope` is set to `:global`, the test will be skipped on all live .com environments. Otherwise, it will be skipped only on Canary and Production. See [testing with feature flags](../../../development/testing_guide/end_to_end/feature_flags.md) for more details. | +| `:geo` | The test requires two GitLab Geo instances - a primary and a secondary - to be spun up. | +| `:gitaly_cluster` | The test runs against a GitLab instance where repositories are stored on redundant Gitaly nodes behind a Praefect node. All nodes are [separate containers](../../../administration/gitaly/praefect.md#requirements). Tests that use this tag have a longer setup time since there are three additional containers that need to be started. | +| `:github` | The test requires a GitHub personal access token. | +| `:group_saml` | The test requires a GitLab instance that has SAML SSO enabled at the group level. Interacts with an external SAML identity provider. Paired with the `:orchestrated` tag. | +| `:instance_saml` | The test requires a GitLab instance that has SAML SSO enabled at the instance level. Interacts with an external SAML identity provider. Paired with the `:orchestrated` tag. | +| `:integrations` | This aims to test the available [integrations](../../../user/project/integrations/overview.md#integrations-listing). The test requires Docker to be installed in the run context. It will provision the containers and can be run against a local instance or using the `gitlab-qa` scenario `Test::Integration::Integrations` | +| `:service_ping_disabled` | The test interacts with the GitLab configuration service ping at the instance level to turn admin setting service ping checkbox on or off. This tag will have the test run only in the `service_ping_disabled` job and must be paired with the `:orchestrated` and `:requires_admin` tags. | +| `:jira` | The test requires a Jira Server. [GitLab-QA](https://gitlab.com/gitlab-org/gitlab-qa) provisions the Jira Server in a Docker container when the `Test::Integration::Jira` test scenario is run. | +| `:kubernetes` | The test includes a GitLab instance that is configured to be run behind an SSH tunnel, allowing a TLS-accessible GitLab. This test also includes provisioning of at least one Kubernetes cluster to test against. _This tag is often be paired with `:orchestrated`._ | +| `:ldap_no_server` | The test requires a GitLab instance to be configured to use LDAP. To be used with the `:orchestrated` tag. It does not spin up an LDAP server at orchestration time. Instead, it creates the LDAP server at runtime. | +| `:ldap_no_tls` | The test requires a GitLab instance to be configured to use an external LDAP server with TLS not enabled. | +| `:ldap_tls` | The test requires a GitLab instance to be configured to use an external LDAP server with TLS enabled. | +| `:mattermost` | The test requires a GitLab Mattermost service on the GitLab instance. | +| `:mixed_env` | The test should only be executed in environments that have a paired canary version available through traffic routing based on the existence of the `gitlab_canary=true` cookie. Tests in this category are switching the cookie mid-test to validate mixed deployment environments. | +| `:object_storage` | The test requires a GitLab instance to be configured to use multiple [object storage types](../../../administration/object_storage.md). Uses MinIO as the object storage server. | +| `:only` | The test is only to be run in specific execution contexts. See [test execution context selection](execution_context_selection.md) for more information. | +| `:orchestrated` | The GitLab instance under test may be [configured by `gitlab-qa`](https://gitlab.com/gitlab-org/gitlab-qa/-/blob/master/docs/what_tests_can_be_run.md#orchestrated-tests) to be different to the default GitLab configuration, or `gitlab-qa` may launch additional services in separate Docker containers, or both. Tests tagged with `:orchestrated` are excluded when testing environments where we can't dynamically modify the GitLab configuration (for example, Staging). | +| `:packages` | The test requires a GitLab instance that has the [Package Registry](../../../administration/packages/#gitlab-package-registry-administration) enabled. | +| `:quarantine` | The test has been [quarantined](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/debugging-qa-test-failures/#quarantining-tests), runs in a separate job that only includes quarantined tests, and is allowed to fail. The test is skipped in its regular job so that if it fails it doesn't hold up the pipeline. Note that you can also [quarantine a test only when it runs in a specific context](execution_context_selection.md#quarantine-a-test-for-a-specific-environment). | +| `:relative_url` | The test requires a GitLab instance to be installed under a [relative URL](../../../install/relative_url.md). | +| `:reliable` | The test has been [promoted to a reliable test](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/reliable-tests/#promoting-an-existing-test-to-reliable) meaning it passes consistently in all pipelines, including merge requests. | +| `:repository_storage` | The test requires a GitLab instance to be configured to use multiple [repository storage paths](../../../administration/repository_storage_paths.md). Paired with the `:orchestrated` tag. | +| `:requires_admin` | The test requires an administrator account. Tests with the tag are excluded when run against Canary and Production environments. | +| `:requires_git_protocol_v2` | The test requires that Git protocol version 2 is enabled on the server. It's assumed to be enabled by default but if not the test can be skipped by setting `QA_CAN_TEST_GIT_PROTOCOL_V2` to `false`. | +| `:requires_praefect` | The test requires that the GitLab instance uses [Gitaly Cluster](../../../administration/gitaly/praefect.md) (a.k.a. Praefect) as the repository storage . It's assumed to be used by default but if not the test can be skipped by setting `QA_CAN_TEST_PRAEFECT` to `false`. | +| `:runner` | The test depends on and sets up a GitLab Runner instance, typically to run a pipeline. | +| `:skip_live_env` | The test is excluded when run against live deployed environments such as Staging, Canary, and Production. | +| `:skip_fips_env` | The test is excluded when run against an environment in FIPS mode. | +| `:skip_signup_disabled` | The test uses UI to sign up a new user and is skipped in any environment that does not allow new user registration via the UI. | +| `:smoke` | The test belongs to the test suite which verifies basic functionality of a GitLab instance. | +| `:smtp` | The test requires a GitLab instance to be configured to use an SMTP server. Tests SMTP notification email delivery from GitLab by using MailHog. | +| `:testcase` | The link to the test case issue in the [GitLab Project Test Cases](https://gitlab.com/gitlab-org/gitlab/-/quality/test_cases). | +| `:transient` | The test tests transient bugs. It is excluded by default. | +| `:issue`, `:issue_${num}` | Optional links to issues which might be related to the spec. Helps keep track of related issues and can also be used by tools that create test reports. Currently added automatically to `Allure` test report. Multiple tags can be used by adding an optional numeric suffix like `issue_1`, `issue_2` etc. | diff --git a/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md b/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md index 49a9124253d..599e1104b72 100644 --- a/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md +++ b/doc/development/testing_guide/end_to_end/running_tests_that_require_special_setup.md @@ -299,7 +299,7 @@ Geo requires an EE license. To visit the Geo sites in your browser, you need a r #### Notes -- You can find the full image address from a pipeline by [following these instructions](https://about.gitlab.com/handbook/engineering/quality/guidelines/tips-and-tricks/#running-gitlab-qa-pipeline-against-a-specific-gitlab-release). You might be prompted to set the `GITLAB_QA_ACCESS_TOKEN` variable if you specify the full image address. +- You can find the full image address from a pipeline by [following these instructions](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/tips-and-tricks/#running-gitlab-qa-pipeline-against-a-specific-gitlab-release). You might be prompted to set the `GITLAB_QA_ACCESS_TOKEN` variable if you specify the full image address. - You can increase the wait time for replication by setting `GEO_MAX_FILE_REPLICATION_TIME` and `GEO_MAX_DB_REPLICATION_TIME`. The default is 120 seconds. - To save time during tests, create a Personal Access Token with API access on the Geo primary node, and pass that value in as `GITLAB_QA_ACCESS_TOKEN` and `GITLAB_QA_ADMIN_ACCESS_TOKEN`. @@ -395,7 +395,7 @@ Tests that are tagged with `:mobile` can be run against specified mobile devices Running directly against an environment like staging is not recommended because Sauce Labs test logs expose credentials. Therefore, it is best practice and the default to use a tunnel. -For tunnel installation instructions, read [Sauce Connect Proxy Installation](https://docs.saucelabs.com/secure-connections/sauce-connect/installation). To start the tunnel, after following the installation above, copy the run command in Sauce Labs > Tunnels (must be logged in to Sauce Labs with the credentials found in 1Password) and run in terminal. +For tunnel installation instructions, read [Sauce Connect Proxy Installation](https://docs.saucelabs.com/secure-connections/sauce-connect/installation/index.html). To start the tunnel, after following the installation above, copy the run command in Sauce Labs > Tunnels (must be logged in to Sauce Labs with the credentials found in 1Password) and run in terminal. NOTE: It is highly recommended to use `GITLAB_QA_ACCESS_TOKEN` to speed up tests and reduce flakiness. diff --git a/doc/development/testing_guide/end_to_end/troubleshooting.md b/doc/development/testing_guide/end_to_end/troubleshooting.md new file mode 100644 index 00000000000..951fb056a4c --- /dev/null +++ b/doc/development/testing_guide/end_to_end/troubleshooting.md @@ -0,0 +1,69 @@ +--- +stage: none +group: unassigned +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Troubleshooting end-to-end tests + +## See what the browser is doing + +If end-to-end tests fail, it can be very helpful to see what is happening in your +browser when it fails. For example, if tests don't run at all, the test framework +might be trying to open a URL that isn't valid on your machine. This problem becomes +clearer if you see the page fail in the browser. + +To make the test framework show the browser as it runs the tests, +set `WEBDRIVER_HEADLESS=false`. For example: + +```shell +cd gitlab/qa +WEBDRIVER_HEADLESS=false bundle exec bin/qa Test::Instance::All http://localhost:3000 +``` + +## Enable logging + +Sometimes a test might fail and the failure stack trace doesn't provide enough +information to determine what went wrong. You can get more information by enabling +debug logs by setting `QA_DEBUG=true`, to see what the test framework is attempting. +For example: + +```shell +cd gitlab/qa +QA_DEBUG=true bundle exec bin/qa Test::Instance::All http://localhost:3000 +``` + +The test framework then outputs many logs showing the actions taken during +the tests: + +```plaintext +[date=2022-03-31 23:19:47 from=QA Tests] INFO -- Starting test: Create Merge request creation from fork can merge feature branch fork to mainline +[date=2022-03-31 23:19:49 from=QA Tests] DEBUG -- has_element? :login_page (wait: 0) returned: true +[date=2022-03-31 23:19:52 from=QA Tests] DEBUG -- filling :login_field with "root" +[date=2022-03-31 23:19:52 from=QA Tests] DEBUG -- filling :password_field with "*****" +[date=2022-03-31 23:19:52 from=QA Tests] DEBUG -- clicking :sign_in_button +``` + +## Tests don't run at all + +This section assumes you're running the tests locally (such as the GDK) and you're doing +so from the `gitlab/qa/` folder, not from `gitlab-qa`. For example, if you receive a +`Net::ReadTimeout` error, the browser might be unable to load the specified URL: + +```shell +cd gitlab/qa +bundle exec bin/qa Test::Instance::All http://localhost:3000 + +bundler: failed to load command: bin/qa (bin/qa) +Net::ReadTimeout: Net::ReadTimeout with #<TCPSocket:(closed)> +``` + +This error can happen if GitLab runs on an address that does not resolve from +`localhost`. For example, if you set GDK's `hostname` +[to a specific local IP address](https://gitlab.com/gitlab-org/gitlab-qa/-/blob/master/docs/run_qa_against_gdk.md#run-qa-tests-against-your-gdk-setup), +you must use that IP address instead of `localhost` in the command. +For example, if your IP is `192.168.0.12`: + +```shell +bundle exec bin/qa Test::Instance::All http://192.168.0.12:3000 +``` diff --git a/doc/development/testing_guide/review_apps.md b/doc/development/testing_guide/review_apps.md index 27d5ae70ed7..f5483a4b79c 100644 --- a/doc/development/testing_guide/review_apps.md +++ b/doc/development/testing_guide/review_apps.md @@ -172,8 +172,6 @@ subgraph "CNG-mirror pipeline" them in its [registry](https://gitlab.com/gitlab-org/build/CNG-mirror/container_registry). - We use the [`CNG-mirror`](https://gitlab.com/gitlab-org/build/CNG-mirror) project so that the `CNG`, (Cloud Native GitLab), project's registry is not overloaded with a lot of transient Docker images. - - Note that the official CNG images are built by the `cloud-native-image` - job, which runs only for tags, and triggers itself a [`CNG`](https://gitlab.com/gitlab-org/build/CNG) pipeline. 1. Once `review-build-cng` is done, the [`review-deploy`](https://gitlab.com/gitlab-org/gitlab/-/jobs/467724810) job deploys the Review App using [the official GitLab Helm chart](https://gitlab.com/gitlab-org/charts/gitlab/) to the [`review-apps`](https://console.cloud.google.com/kubernetes/clusters/details/us-central1-b/review-apps?project=gitlab-review-apps) @@ -224,14 +222,10 @@ If you need your Review App to stay up for a longer time, you can `review-deploy` job to update the "latest deployed at" time. The `review-cleanup` job that automatically runs in scheduled -pipelines (and is manual in merge request) stops stale Review Apps after 5 days, +pipelines stops stale Review Apps after 5 days, deletes their environment after 6 days, and cleans up any dangling Helm releases and Kubernetes resources after 7 days. -The `review-gcp-cleanup` job that automatically runs in scheduled pipelines -(and is manual in merge request) removes any dangling GCP network resources -that were not removed along with the Kubernetes resources. - ## Cluster configuration The cluster is configured via Terraform in the [`engineering-productivity-infrastructure`](https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure) project. @@ -254,189 +248,7 @@ Leading indicators may be health check failures leading to restarts or majority The [Review Apps Overview dashboard](https://console.cloud.google.com/monitoring/classic/dashboards/6798952013815386466?project=gitlab-review-apps&timeDomain=1d) aids in identifying load spikes on the cluster, and if nodes are problematic or the entire cluster is trending towards unhealthy. -### Database related errors in `review-deploy`, `review-qa-smoke`, or `review-qa-reliable` - -Occasionally the state of a Review App's database could diverge from the database schema. This could be caused by -changes to migration files or schema, such as a migration being renamed or deleted. This typically manifests in migration errors such as: - -- migration job failing with a column that already exists -- migration job failing with a column that does not exist - -To recover from this, please attempt to [redeploy Review App from a clean slate](#redeploy-review-app-from-a-clean-slate) - -### Release failed with `ImagePullBackOff` - -**Potential cause:** - -If you see an `ImagePullBackoff` status, check for a missing Docker image. - -**Where to look for further debugging:** - -To check that the Docker images were created, run the following Docker command: - -```shell -`DOCKER_CLI_EXPERIMENTAL=enabled docker manifest repository:tag` -``` - -The output of this command indicates if the Docker image exists. For example: - -```shell -DOCKER_CLI_EXPERIMENTAL=enabled docker manifest inspect registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-rails-ee:39467-allow-a-release-s-associated-milestones-to-be-edited-thro -``` - -If the Docker image does not exist: - -- Verify the `image.repository` and `image.tag` options in the `helm upgrade --install` command match the repository names used by CNG-mirror pipeline. -- Look further in the corresponding downstream CNG-mirror pipeline in `review-build-cng` job. - -### Node count is always increasing (never stabilizing or decreasing) - -**Potential cause:** - -That could be a sign that the `review-cleanup` job is -failing to cleanup stale Review Apps and Kubernetes resources. - -**Where to look for further debugging:** - -Look at the latest `review-cleanup` job log, and identify look for any -unexpected failure. - -### p99 CPU utilization is at 100% for most of the nodes and/or many components - -**Potential cause:** - -This could be a sign that Helm is failing to deploy Review Apps. When Helm has a -lot of `FAILED` releases, it seems that the CPU utilization is increasing, probably -due to Helm or Kubernetes trying to recreate the components. - -**Where to look for further debugging:** - -Look at a recent `review-deploy` job log. - -**Useful commands:** - -```shell -# Identify if node spikes are common or load on specific nodes which may get rebalanced by the Kubernetes scheduler -kubectl top nodes | sort --key 3 --numeric - -# Identify pods under heavy CPU load -kubectl top pods | sort --key 2 --numeric -``` - -### The `logging/user/events/FailedMount` chart is going up - -**Potential cause:** - -This could be a sign that there are too many stale secrets and/or configuration maps. - -**Where to look for further debugging:** - -Look at [the list of Configurations](https://console.cloud.google.com/kubernetes/config?project=gitlab-review-apps) -or `kubectl get secret,cm --sort-by='{.metadata.creationTimestamp}' | grep 'review-'`. - -Any secrets or configuration maps older than 5 days are suspect and should be deleted. - -**Useful commands:** - -```shell -# List secrets and config maps ordered by created date -kubectl get secret,cm --sort-by='{.metadata.creationTimestamp}' | grep 'review-' - -# Delete all secrets that are 5 to 9 days old -kubectl get secret --sort-by='{.metadata.creationTimestamp}' | grep '^review-' | grep '[5-9]d$' | cut -d' ' -f1 | xargs kubectl delete secret - -# Delete all secrets that are 10 to 99 days old -kubectl get secret --sort-by='{.metadata.creationTimestamp}' | grep '^review-' | grep '[1-9][0-9]d$' | cut -d' ' -f1 | xargs kubectl delete secret - -# Delete all config maps that are 5 to 9 days old -kubectl get cm --sort-by='{.metadata.creationTimestamp}' | grep 'review-' | grep -v 'dns-gitlab-review-app' | grep '[5-9]d$' | cut -d' ' -f1 | xargs kubectl delete cm - -# Delete all config maps that are 10 to 99 days old -kubectl get cm --sort-by='{.metadata.creationTimestamp}' | grep 'review-' | grep -v 'dns-gitlab-review-app' | grep '[1-9][0-9]d$' | cut -d' ' -f1 | xargs kubectl delete cm -``` - -### Using K9s - -[K9s](https://github.com/derailed/k9s) is a powerful command line dashboard which allows you to filter by labels. This can help identify trends with apps exceeding the [review-app resource requests](https://gitlab.com/gitlab-org/gitlab/-/blob/master/scripts/review_apps/base-config.yaml). Kubernetes schedules pods to nodes based on resource requests and allow for CPU usage up to the limits. - -- In K9s you can sort or add filters by typing the `/` character - - `-lrelease=<review-app-slug>` - filters down to all pods for a release. This aids in determining what is having issues in a single deployment - - `-lapp=<app>` - filters down to all pods for a specific app. This aids in determining resource usage by app. -- You can scroll to a Kubernetes resource and hit `d`(describe), `s`(shell), `l`(logs) for a deeper inspection - -![K9s](img/k9s.png) - -### Troubleshoot a pending `dns-gitlab-review-app-external-dns` Deployment - -#### Finding the problem - -[In the past](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/62834), it happened -that the `dns-gitlab-review-app-external-dns` Deployment was in a pending state, -effectively preventing all the Review Apps from getting a DNS record assigned, -making them unreachable via domain name. - -This in turn prevented other components of the Review App to properly start -(for example, `gitlab-runner`). - -After some digging, we found that new mounts fail when performed -with transient scopes (for example, pods) of `systemd-mount`: - -```plaintext -MountVolume.SetUp failed for volume "dns-gitlab-review-app-external-dns-token-sj5jm" : mount failed: exit status 1 -Mounting command: systemd-run -Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/06add1c3-87b4-11e9-80a9-42010a800107/volumes/kubernetes.io~secret/dns-gitlab-review-app-external-dns-token-sj5jm --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/06add1c3-87b4-11e9-80a9-42010a800107/volumes/kubernetes.io~secret/dns-gitlab-review-app-external-dns-token-sj5jm -Output: Failed to start transient scope unit: Connection timed out -``` - -This probably happened because the GitLab chart creates 67 resources, leading to -a lot of mount points being created on the underlying GCP node. - -The [underlying issue seems to be a `systemd` bug](https://github.com/kubernetes/kubernetes/issues/57345#issuecomment-359068048) -that was fixed in `systemd` `v237`. Unfortunately, our GCP nodes are currently -using `v232`. - -For the record, the debugging steps to find out this issue were: - -1. Switch kubectl context to `review-apps-ce` (we recommend using [`kubectx`](https://github.com/ahmetb/kubectx/)) -1. `kubectl get pods | grep dns` -1. `kubectl describe pod <pod name>` & confirm exact error message -1. Web search for exact error message, following rabbit hole to [a relevant Kubernetes bug report](https://github.com/kubernetes/kubernetes/issues/57345) -1. Access the node over SSH via the GCP console (**Computer Engine > VM - instances** then click the "SSH" button for the node where the `dns-gitlab-review-app-external-dns` pod runs) -1. In the node: `systemctl --version` => `systemd 232` -1. Gather some more information: - - `mount | grep kube | wc -l` (returns a count, for example, 290) - - `systemctl list-units --all | grep -i var-lib-kube | wc -l` (returns a count, for example, 142) -1. Check how many pods are in a bad state: - - Get all pods running a given node: `kubectl get pods --field-selector=spec.nodeName=NODE_NAME` - - Get all the `Running` pods on a given node: `kubectl get pods --field-selector=spec.nodeName=NODE_NAME | grep Running` - - Get all the pods in a bad state on a given node: `kubectl get pods --field-selector=spec.nodeName=NODE_NAME | grep -v 'Running' | grep -v 'Completed'` - -#### Solving the problem - -To resolve the problem, we needed to (forcibly) drain some nodes: - -1. Try a normal drain on the node where the `dns-gitlab-review-app-external-dns` - pod runs so that Kubernetes automatically move it to another node: `kubectl drain NODE_NAME` -1. If that doesn't work, you can also perform a forcible "drain" the node by removing all pods: `kubectl delete pods --field-selector=spec.nodeName=NODE_NAME` -1. In the node: - - Perform `systemctl daemon-reload` to remove the dead/inactive units - - If that doesn't solve the problem, perform a hard reboot: `sudo systemctl reboot` -1. Uncordon any cordoned nodes: `kubectl uncordon NODE_NAME` - -In parallel, since most Review Apps were in a broken state, we deleted them to -clean up the list of non-`Running` pods. -Following is a command to delete Review Apps based on their last deployment date -(current date was June 6th at the time) with - -```shell -helm ls -d | grep "Jun 4" | cut -f1 | xargs helm delete --purge -``` - -#### Mitigation steps taken to avoid this problem in the future - -We've created a new node pool with smaller machines to reduce the risk -that a machine reaches the "too many mount points" problem in the future. +See the [review apps page of the Engineering Productivity Runbook](https://gitlab.com/gitlab-org/quality/engineering-productivity/team/-/blob/main/runbook/review-apps.md) for troubleshooting review app releases. ## Frequently Asked Questions diff --git a/doc/development/workhorse/channel.md b/doc/development/workhorse/channel.md new file mode 100644 index 00000000000..33d7cc63f00 --- /dev/null +++ b/doc/development/workhorse/channel.md @@ -0,0 +1,201 @@ +--- +stage: Create +group: Source Code +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Websocket channel support for Workhorse + +In some cases, GitLab can provide the following through a WebSocket: + +- In-browser terminal access to an environment: a running server or container, + onto which a project has been deployed. +- Access to services running in CI. + +Workhorse manages the WebSocket upgrade and long-lived connection to the websocket +connection, which frees up GitLab to process other requests. This document outlines +the architecture of these connections. + +## Introduction to WebSockets + +Websockets are an "upgraded" `HTTP/1.1` request. They permit bidirectional +communication between a client and a server. **Websockets are not HTTP**. +Clients can send messages (known as frames) to the server at any time, and +vice versa. Client messages are not necessarily requests, and server messages are +not necessarily responses. WebSocket URLs have schemes like `ws://` (unencrypted) or +`wss://` (TLS-secured). + +When requesting an upgrade to WebSocket, the browser sends a `HTTP/1.1` +request like this: + +```plaintext +GET /path.ws HTTP/1.1 +Connection: upgrade +Upgrade: websocket +Sec-WebSocket-Protocol: terminal.gitlab.com +# More headers, including security measures +``` + +At this point, the connection is still HTTP, so this is a request. +The server can send a normal HTTP response, such as `404 Not Found` or +`500 Internal Server Error`. + +If the server decides to permit the upgrade, it sends a HTTP +`101 Switching Protocols` response. From this point, the connection is no longer +HTTP. It is now a WebSocket and frames, not HTTP requests, flow over it. The connection +persists until the client or server closes the connection. + +In addition to the sub-protocol, individual websocket frames may +also specify a message type, such as: + +- `BinaryMessage` +- `TextMessage` +- `Ping` +- `Pong` +- `Close` + +Only binary frames can contain arbitrary data. The frames are expected to be valid +UTF-8 strings, in addition to any sub-protocol expectations. + +## Browser to Workhorse + +Using the terminal as an example: + +1. GitLab serves a JavaScript terminal emulator to the browser on a URL like + `https://gitlab.com/group/project/-/environments/1/terminal`. +1. This URL opens a websocket connection to + `wss://gitlab.com/group/project/-/environments/1/terminal.ws`. + This endpoint exists only in Workhorse, and doesn't exist in GitLab. +1. When receiving the connection, Workhorse first performs a `preauthentication` + request to GitLab to confirm the client is authorized to access the requested terminal: + - If the client has the appropriate permissions and the terminal exists, GitLab + responds with a successful response that includes details of the terminal + the client should be connected to. + - Otherwise, Workhorse returns an appropriate HTTP error response. +1. If GitLab returns valid terminal details to Workhorse, it: + 1. Connects to the specified terminal. + 1. Upgrades the browser to a WebSocket. + 1. Proxies between the two connections for as long as the browser's credentials are valid. + 1. Send regular `PingMessage` control frames to the browser, to prevent intervening + proxies from terminating the connection while the browser is present. + +The browser must request an upgrade with a specific sub-protocol: + +- [`terminal.gitlab.com`](#terminalgitlabcom) +- [`base64.terminal.gitlab.com`](#base64terminalgitlabcom) + +### `terminal.gitlab.com` + +This sub-protocol considers `TextMessage` frames to be invalid. Control frames, +such as `PingMessage` or `CloseMessage`, have their usual meanings. + +- `BinaryMessage` frames sent from the browser to the server are + arbitrary text input. +- `BinaryMessage` frames sent from the server to the browser are + arbitrary text output. + +These frames are expected to contain ANSI text control codes +and may be in any encoding. + +### `base64.terminal.gitlab.com` + +This sub-protocol considers `BinaryMessage` frames to be invalid. +Control frames, such as `PingMessage` or `CloseMessage`, have +their usual meanings. + +- `TextMessage` frames sent from the browser to the server are + base64-encoded arbitrary text input. The server must + base64-decode them before inputting them. +- `TextMessage` frames sent from the server to the browser are + base64-encoded arbitrary text output. The browser must + base64-decode them before outputting them. + +In their base64-encoded form, these frames are expected to +contain ANSI terminal control codes, and may be in any encoding. + +## Workhorse to GitLab + +Using the terminal as an example, before upgrading the browser, +Workhorse sends a normal HTTP request to GitLab on a URL like +`https://gitlab.com/group/project/environments/1/terminal.ws/authorize`. +This returns a JSON response containing details of where the +terminal can be found, and how to connect it. In particular, +the following details are returned in case of success: + +- WebSocket URL to connect** to, such as `wss://example.com/terminals/1.ws?tty=1`. +- WebSocket sub-protocols to support, such as `["channel.k8s.io"]`. +- Headers to send, such as `Authorization: Token xxyyz`. +- Optional. Certificate authority to verify `wss` connections with. + +Workhorse periodically rechecks this endpoint. If it receives an error response, +or the details of the terminal change, it terminates the websocket session. + +## Workhorse to the WebSocket server + +In GitLab, environments or CI jobs may have a deployment service (like +`KubernetesService`) associated with them. This service knows +where the terminals or the service for an environment may be found, and GitLab +returns these details to Workhorse. + +These URLs are also WebSocket URLs. GitLab tells Workhorse which sub-protocols to +speak over the connection, along with any authentication details required by the +remote end. + +Before upgrading the browser's connection to a websocket, Workhorse: + +1. Opens a HTTP client connection, according to the details given to it by Workhorse. +1. Attempts to upgrade that connection to a websocket. + - If it fails, an error response is sent to the browser. + - If it succeeds, the browser is also upgraded. + +Workhorse now has two websocket connections, albeit with differing sub-protocols, +and then: + +- Decodes incoming frames from the browser, re-encodes them to the channel's + sub-protocol, and sends them to the channel. +- Decodes incoming frames from the channel, re-encodes them to the browser's + sub-protocol, and sends them to the browser. + +When either connection closes or enters an error state, Workhorse detects the error +and closes the other connection, terminating the channel session. If the browser +is the connection that has disconnected, Workhorse sends an ANSI `End of Transmission` +control code (the `0x04` byte) to the channel, encoded according to the appropriate +sub-protocol. To avoid being disconnected, Workhorse replies to any websocket ping +frame sent by the channel. + +Workhorse only supports the following sub-protocols: + +- [`channel.k8s.io`](#channelk8sio) +- [`base64.channel.k8s.io`](#base64channelk8sio) + +Supporting new deployment services requires new sub-protocols to be supported. + +### `channel.k8s.io` + +Used by Kubernetes, this sub-protocol defines a simple multiplexed channel. + +Control frames have their usual meanings. `TextMessage` frames are +invalid. `BinaryMessage` frames represent I/O to a specific file +descriptor. + +The first byte of each `BinaryMessage` frame represents the file +descriptor (`fd`) number, as a `uint8`. For example: + +- `0x00` corresponds to `fd 0`, `STDIN`. +- `0x01` corresponds to `fd 1`, `STDOUT`. + +The remaining bytes represent arbitrary data. For frames received +from the server, they are bytes that have been received from that +`fd`. For frames sent to the server, they are bytes that should be +written to that `fd`. + +### `base64.channel.k8s.io` + +Also used by Kubernetes, this sub-protocol defines a similar multiplexed +channel to `channel.k8s.io`. The main differences are: + +- `TextMessage` frames are valid, rather than `BinaryMessage` frames. +- The first byte of each `TextMessage` frame represents the file + descriptor as a numeric UTF-8 character, so the character `U+0030`, + or "0", is `fd 0`, `STDIN`. +- The remaining bytes represent base64-encoded arbitrary data. diff --git a/doc/development/workhorse/configuration.md b/doc/development/workhorse/configuration.md new file mode 100644 index 00000000000..7f9331e6f1e --- /dev/null +++ b/doc/development/workhorse/configuration.md @@ -0,0 +1,218 @@ +--- +stage: Create +group: Source Code +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Workhorse configuration + +For historical reasons, Workhorse uses: + +- Command line flags. +- A configuration file. +- Environment variables. + +Add any new Workhorse configuration options into the configuration file. + +## CLI options + +```plaintext + gitlab-workhorse [OPTIONS] + +Options: + -apiCiLongPollingDuration duration + Long polling duration for job requesting for runners (default 50ns) + -apiLimit uint + Number of API requests allowed at single time + -apiQueueDuration duration + Maximum queueing duration of requests (default 30s) + -apiQueueLimit uint + Number of API requests allowed to be queued + -authBackend string + Authentication/authorization backend (default "http://localhost:8080") + -authSocket string + Optional: Unix domain socket to dial authBackend at + -cableBackend string + Optional: ActionCable backend (default authBackend) + -cableSocket string + Optional: Unix domain socket to dial cableBackend at (default authSocket) + -config string + TOML file to load config from + -developmentMode + Allow the assets to be served from Rails app + -documentRoot string + Path to static files content (default "public") + -listenAddr string + Listen address for HTTP server (default "localhost:8181") + -listenNetwork string + Listen 'network' (tcp, tcp4, tcp6, unix) (default "tcp") + -listenUmask int + Umask for Unix socket + -logFile string + Log file location + -logFormat string + Log format to use defaults to text (text, json, structured, none) (default "text") + -pprofListenAddr string + pprof listening address, e.g. 'localhost:6060' + -prometheusListenAddr string + Prometheus listening address, e.g. 'localhost:9229' + -proxyHeadersTimeout duration + How long to wait for response headers when proxying the request (default 5m0s) + -secretPath string + File with secret key to authenticate with authBackend (default "./.gitlab_workhorse_secret") + -version + Print version and exit +``` + +The 'auth backend' refers to the GitLab Rails application. The name is +a holdover from when GitLab Workhorse only handled `git push` and `git pull` over +HTTP. + +GitLab Workhorse can listen on either a TCP or a Unix domain socket. It +can also open a second listening TCP listening socket with the Go +[`net/http/pprof` profiler server](http://golang.org/pkg/net/http/pprof/). + +GitLab Workhorse can listen on Redis build and runner registration events if you +pass a valid TOML configuration file through the `-config` flag. +A regular setup it only requires the following (replacing the string +with the actual socket) + +## Redis + +GitLab Workhorse integrates with Redis to do long polling for CI build +requests. To configure it: + +- Configure Redis settings in the TOML configuration file. +- Control polling behavior for CI build requests with the `-apiCiLongPollingDuration` + command-line flag. + +You can enable Redis in the configuration file while leaving CI polling +disabled. This configuration results in an idle Redis Pub/Sub connection. The +opposite is not possible: CI long polling requires a correct Redis configuration. + +For example, the `[redis]` section in the configuration file could contain: + +```plaintext +[redis] +URL = "unix:///var/run/gitlab/redis.sock" +Password = "my_awesome_password" +Sentinel = [ "tcp://sentinel1:23456", "tcp://sentinel2:23456" ] +SentinelMaster = "mymaster" +``` + +- `URL` - A string in the format `unix://path/to/redis.sock` or `tcp://host:port`. +- `Password` - Required only if your Redis instance is password-protected. +- `Sentinel` - Required if you use Sentinel. + +If both `Sentinel` and `URL` are given, only `Sentinel` is used. + +Optional fields: + +```plaintext +[redis] +DB = 0 +MaxIdle = 1 +MaxActive = 1 +``` + +- `DB` - The database to connect to. Defaults to `0`. +- `MaxIdle` - How many idle connections can be in the Redis pool at once. Defaults to `1`. +- `MaxActive` - How many connections the pool can keep. Defaults to `1`. + +## Relative URL support + +If you mount GitLab at a relative URL, like `example.com/gitlab`), use this +relative URL in the `authBackend` setting: + +```plaintext +gitlab-workhorse -authBackend http://localhost:8080/gitlab +``` + +## Interaction of authBackend and authSocket + +The interaction between `authBackend` and `authSocket` can be confusing. +If `authSocket` is set, it overrides the host portion of `authBackend`, but not +the relative path. + +In table form: + +| authBackend | authSocket | Workhorse connects to | Rails relative URL | +|--------------------------------|-------------------|-----------------------|--------------------| +| unset | unset | `localhost:8080` | `/` | +| `http://localhost:3000` | unset | `localhost:3000` | `/` | +| `http://localhost:3000/gitlab` | unset | `localhost:3000` | `/gitlab` | +| unset | `/path/to/socket` | `/path/to/socket` | `/` | +| `http://localhost:3000` | `/path/to/socket` | `/path/to/socket` | `/` | +| `http://localhost:3000/gitlab` | `/path/to/socket` | `/path/to/socket` | `/gitlab` | + +The same applies to `cableBackend` and `cableSocket`. + +## Error tracking + +GitLab-Workhorse supports remote error tracking with [Sentry](https://sentry.io). +To enable this feature, set the `GITLAB_WORKHORSE_SENTRY_DSN` environment variable. +You can also set the `GITLAB_WORKHORSE_SENTRY_ENVIRONMENT` environment variable to +use the Sentry environment feature to separate staging, production and +development. + +Omnibus GitLab (`/etc/gitlab/gitlab.rb`): + +```ruby +gitlab_workhorse['env'] = { + 'GITLAB_WORKHORSE_SENTRY_DSN' => 'https://foobar' + 'GITLAB_WORKHORSE_SENTRY_ENVIRONMENT' => 'production' +} +``` + +Source installations (`/etc/default/gitlab`): + +```plaintext +export GITLAB_WORKHORSE_SENTRY_DSN='https://foobar' +export GITLAB_WORKHORSE_SENTRY_ENVIRONMENT='production' +``` + +## Distributed tracing + +Workhorse supports distributed tracing through [LabKit](https://gitlab.com/gitlab-org/labkit/) +using [OpenTracing APIs](https://opentracing.io). + +By default, no tracing implementation is linked into the binary. You can link in +different OpenTracing providers with [build tags](https://golang.org/pkg/go/build/#hdr-Build_Constraints) +or build constraints by setting the `BUILD_TAGS` make variable. + +For more details of the supported providers, refer to LabKit. For an example of +Jaeger tracing support, include the tags: `BUILD_TAGS="tracer_static tracer_static_jaeger"` like this: + +```shell +make BUILD_TAGS="tracer_static tracer_static_jaeger" +``` + +After you compile Workhorse with an OpenTracing provider, configure the tracing +configuration with the `GITLAB_TRACING` environment variable, like this: + +```shell +GITLAB_TRACING=opentracing://jaeger ./gitlab-workhorse +``` + +## Continuous profiling + +Workhorse supports continuous profiling through [LabKit](https://gitlab.com/gitlab-org/labkit/) +using [Stackdriver Profiler](https://cloud.google.com/profiler). By default, the +Stackdriver Profiler implementation is linked in the binary using +[build tags](https://golang.org/pkg/go/build/#hdr-Build_Constraints), though it's not +required and can be skipped. For example: + +```shell +make BUILD_TAGS="" +``` + +After you compile Workhorse with continuous profiling, set the profiler configuration +with the `GITLAB_CONTINUOUS_PROFILING` environment variable. For example: + +```shell +GITLAB_CONTINUOUS_PROFILING="stackdriver?service=workhorse&service_version=1.0.1&project_id=test-123 ./gitlab-workhorse" +``` + +## Related topics + +- [LabKit monitoring documentation](https://gitlab.com/gitlab-org/labkit/-/blob/master/monitoring/doc.go). diff --git a/doc/development/workhorse/gitlab_features.md b/doc/development/workhorse/gitlab_features.md new file mode 100644 index 00000000000..2aa8d9d2399 --- /dev/null +++ b/doc/development/workhorse/gitlab_features.md @@ -0,0 +1,73 @@ +--- +stage: Create +group: Source Code +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Features that rely on Workhorse + +Workhorse itself is not a feature, but there are several features in +GitLab that would not work efficiently without Workhorse. + +To put the efficiency benefit in context, consider that in 2020Q3 on +GitLab.com [we see](https://thanos-query.ops.gitlab.net/graph?g0.range_input=1h&g0.max_source_resolution=0s&g0.expr=sum(ruby_process_resident_memory_bytes%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)%20%2F%20sum(puma_max_threads%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)&g0.tab=1&g1.range_input=1h&g1.max_source_resolution=0s&g1.expr=sum(go_memstats_sys_bytes%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)%2Fsum(go_goroutines%7Bapp%3D%22webservice%22%2Cenv%3D%22gprd%22%2Crelease%3D%22gitlab%22%7D)&g1.tab=1) +Rails application threads using on average +about 200MB of RSS vs about 200KB for Workhorse goroutines. + +Examples of features that rely on Workhorse: + +## 1. `git clone` and `git push` over HTTP + +Git clone, pull and push are slow because they transfer large amounts +of data and because each is CPU intensive on the GitLab side. Without +Workhorse, HTTP access to Git repositories would compete with regular +web access to the application, requiring us to run way more Rails +application servers. + +## 2. CI runner long polling + +GitLab CI runners fetch new CI jobs by polling the GitLab server. +Workhorse acts as a kind of "waiting room" where CI runners can sit +and wait for new CI jobs. Because of Go's efficiency we can fit a lot +of runners in the waiting room at little cost. Without this waiting +room mechanism we would have to add a lot more Rails server capacity. + +## 3. File uploads and downloads + +File uploads and downloads may be slow either because the file is +large or because the user's connection is slow. Workhorse can handle +the slow part for Rails. This improves the efficiency of features such +as CI artifacts, package repositories, LFS objects, etc. + +## 4. Websocket proxying + +Features such as the web terminal require a long lived connection +between the user's web browser and a container inside GitLab that is +not directly accessible from the internet. Dedicating a Rails +application thread to proxying such a connection would cost much more +memory than it costs to have Workhorse look after it. + +## Quick facts (how does Workhorse work) + +- Workhorse can handle some requests without involving Rails at all: + for example, JavaScript files and CSS files are served straight + from disk. +- Workhorse can modify responses sent by Rails: for example if you use + `send_file` in Rails then GitLab Workhorse will open the file on + disk and send its contents as the response body to the client. +- Workhorse can take over requests after asking permission from Rails. + Example: handling `git clone`. +- Workhorse can modify requests before passing them to Rails. Example: + when handling a Git LFS upload Workhorse first asks permission from + Rails, then it stores the request body in a tempfile, then it sends + a modified request containing the tempfile path to Rails. +- Workhorse can manage long-lived WebSocket connections for Rails. + Example: handling the terminal websocket for environments. +- Workhorse does not connect to PostgreSQL, only to Rails and (optionally) Redis. +- We assume that all requests that reach Workhorse pass through an + upstream proxy such as NGINX or Apache first. +- Workhorse does not accept HTTPS connections. +- Workhorse does not clean up idle client connections. +- We assume that all requests to Rails pass through Workhorse. + +For more information see ['A brief history of GitLab Workhorse'](https://about.gitlab.com/2016/04/12/a-brief-history-of-gitlab-workhorse/). diff --git a/doc/development/workhorse/index.md b/doc/development/workhorse/index.md new file mode 100644 index 00000000000..f7ca16e0f31 --- /dev/null +++ b/doc/development/workhorse/index.md @@ -0,0 +1,84 @@ +--- +stage: Create +group: Source Code +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# GitLab Workhorse + +GitLab Workhorse is a smart reverse proxy for GitLab. It handles +"large" HTTP requests such as file downloads, file uploads, Git +push/pull and Git archive downloads. + +Workhorse itself is not a feature, but there are [several features in +GitLab](gitlab_features.md) that would not work efficiently without Workhorse. + +The canonical source for Workhorse is +[`gitlab-org/gitlab/workhorse`](https://gitlab.com/gitlab-org/gitlab/tree/master/workhorse). +Prior to [epic #4826](https://gitlab.com/groups/gitlab-org/-/epics/4826), it was +[`gitlab-org/gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse/tree/master), +but that repository is no longer used for development. + +## Install Workhorse + +To install GitLab Workhorse you need [Go 1.15 or newer](https://golang.org/dl) and +[GNU Make](https://www.gnu.org/software/make/). + +To install into `/usr/local/bin` run `make install`. + +```plaintext +make install +``` + +To install into `/foo/bin` set the PREFIX variable. + +```plaintext +make install PREFIX=/foo +``` + +On some operating systems, such as FreeBSD, you may have to use +`gmake` instead of `make`. + +*NOTE*: Some features depends on build tags, make sure to check +[Workhorse configuration](configuration.md) to enable them. + +### Run time dependencies + +Workhorse uses [Exiftool](https://www.sno.phy.queensu.ca/~phil/exiftool/) for +removing EXIF data (which may contain sensitive information) from uploaded +images. If you installed GitLab: + +- Using the Omnibus package, you're all set. + *NOTE* that if you are using CentOS Minimal, you may need to install `perl` + package: `yum install perl` +- From source, make sure `exiftool` is installed: + + ```shell + # Debian/Ubuntu + sudo apt-get install libimage-exiftool-perl + + # RHEL/CentOS + sudo yum install perl-Image-ExifTool + ``` + +## Testing your code + +Run the tests with: + +```plaintext +make clean test +``` + +Each feature in GitLab Workhorse should have an integration test that +verifies that the feature 'kicks in' on the right requests and leaves +other requests unaffected. It is better to also have package-level tests +for specific behavior but the high-level integration tests should have +the first priority during development. + +It is OK if a feature is only covered by integration tests. + +<!-- +## License + +This code is distributed under the MIT license, see the [LICENSE](LICENSE) file. +--> diff --git a/doc/development/workhorse/new_features.md b/doc/development/workhorse/new_features.md new file mode 100644 index 00000000000..3ad15c1de16 --- /dev/null +++ b/doc/development/workhorse/new_features.md @@ -0,0 +1,78 @@ +--- +stage: Create +group: Source Code +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Adding new features to Workhorse + +GitLab Workhorse is a smart reverse proxy for GitLab. It handles +[long HTTP requests](#what-are-long-requests), such as: + +- File downloads. +- File uploads. +- Git pushes and pulls. +- Git archive downloads. + +Workhorse itself is not a feature, but [several features in GitLab](gitlab_features.md) +would not work efficiently without Workhorse. + +At a first glance, Workhorse appears to be just a pipeline for processing HTTP +streams to reduce the amount of logic in your Ruby on Rails controller. However, +don't treat it that way. Engineers trying to offload a feature to Workhorse often +find it takes more work than originally anticipated: + +- It's a new programming language, and only a few engineers at GitLab are Go developers. +- Workhorse has demanding requirements: + - It's stateless. + - Memory and disk usage must be kept under tight control. + - The request should not be slowed down in the process. + +## Avoid adding new features + +We suggest adding new features only if absolutely necessary and no other options exist. +Splitting a feature between the Rails codebase and Workhorse is a deliberate choice +to introduce technical debt. It adds complexity to the system, and coupling between +the two components: + +- Building features using Workhorse has a considerable complexity cost, so you should + prefer designs based on Rails requests and Sidekiq jobs. +- Even when using Rails-and-Sidekiq is more work than using Rails-and-Workhorse, + Rails-and-Sidekiq is easier to maintain in the long term. Workhorse is unique + to GitLab, while Rails-and-Sidekiq is an industry standard. +- For global behaviors around web requests, consider using a Rack middleware + instead of Workhorse. +- Generally speaking, use Rails-and-Workhorse only if the HTTP client expects + behavior reasonable to implement in Rails, like long requests. + +## What are long requests? + +One order of magnitude exists between Workhorse and Puma RAM usage. Having a connection +open for longer than milliseconds is problematic due to the amount of RAM +it monopolizes after it reaches the Ruby on Rails controller. We've identified two classes +of long requests: data transfers and HTTP long polling. Some examples: + +- `git push`. +- `git pull`. +- Uploading or downloading an artifact. +- A CI runner waiting for a new job. + +With the rise of cloud-native installations, Workhorse's feature set was extended +to add object storage direct-upload. This change removed the need for the shared +Network File System (NFS) drives. + +If you still think we should add a new feature to Workhorse, open an issue for the +Workhorse maintainers and explain: + +1. What you want to implement. +1. Why it can't be implemented in our Ruby codebase. + +The Workhorse maintainers can help you assess the situation. + +## Related topics + +- In 2020, `@nolith` presented the talk + ["Speed up the monolith. Building a smart reverse proxy in Go"](https://archive.fosdem.org/2020/schedule/event/speedupmonolith/) + at FOSDEM. The talk includes more details on the history of Workhorse and the NFS removal. +- The [uploads development documentation](../uploads.md) contains the most common + use cases for adding a new type of upload. |