diff options
Diffstat (limited to 'doc/development/elasticsearch.md')
-rw-r--r-- | doc/development/elasticsearch.md | 43 |
1 files changed, 33 insertions, 10 deletions
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md index cde221aaf16..1c92601dde9 100644 --- a/doc/development/elasticsearch.md +++ b/doc/development/elasticsearch.md @@ -1,7 +1,7 @@ --- stage: Enablement group: Global Search -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- # Elasticsearch knowledge **(STARTER ONLY)** @@ -13,9 +13,9 @@ the [Elasticsearch integration documentation](../integration/elasticsearch.md#en ## Deep Dive -In June 2019, Mario de la Ossa hosted a Deep Dive (GitLab team members only: `https://gitlab.com/gitlab-org/create-stage/issues/1`) on GitLab's [Elasticsearch integration](../integration/elasticsearch.md) to share his domain specific knowledge with anyone who may work in this part of the code base in the future. You can find the [recording on YouTube](https://www.youtube.com/watch?v=vrvl-tN2EaA), and the slides on [Google Slides](https://docs.google.com/presentation/d/1H-pCzI_LNrgrL5pJAIQgvLX8Ji0-jIKOg1QeJQzChug/edit) and in [PDF](https://gitlab.com/gitlab-org/create-stage/uploads/c5aa32b6b07476fa8b597004899ec538/Elasticsearch_Deep_Dive.pdf). Everything covered in this deep dive was accurate as of GitLab 12.0, and while specific details may have changed since then, it should still serve as a good introduction. +In June 2019, Mario de la Ossa hosted a Deep Dive (GitLab team members only: `https://gitlab.com/gitlab-org/create-stage/issues/1`) on the GitLab [Elasticsearch integration](../integration/elasticsearch.md) to share his domain specific knowledge with anyone who may work in this part of the codebase in the future. You can find the [recording on YouTube](https://www.youtube.com/watch?v=vrvl-tN2EaA), and the slides on [Google Slides](https://docs.google.com/presentation/d/1H-pCzI_LNrgrL5pJAIQgvLX8Ji0-jIKOg1QeJQzChug/edit) and in [PDF](https://gitlab.com/gitlab-org/create-stage/uploads/c5aa32b6b07476fa8b597004899ec538/Elasticsearch_Deep_Dive.pdf). Everything covered in this deep dive was accurate as of GitLab 12.0, and while specific details may have changed since then, it should still serve as a good introduction. -In August 2020, a second Deep Dive was hosted, focusing on [GitLab's specific architecture for multi-indices support](#zero-downtime-reindexing-with-multiple-indices). The [recording on YouTube](https://www.youtube.com/watch?v=0WdPR9oB2fg) and the [slides](https://lulalala.gitlab.io/gitlab-elasticsearch-deepdive) are available. Everything covered in this deep dive was accurate as of GitLab 13.3. +In August 2020, a second Deep Dive was hosted, focusing on [GitLab-specific architecture for multi-indices support](#zero-downtime-reindexing-with-multiple-indices). The [recording on YouTube](https://www.youtube.com/watch?v=0WdPR9oB2fg) and the [slides](https://lulalala.gitlab.io/gitlab-elasticsearch-deepdive/) are available. Everything covered in this deep dive was accurate as of GitLab 13.3. ## Supported Versions @@ -68,7 +68,7 @@ The `whitespace` tokenizer was selected in order to have more control over how t Please see the `code` filter for an explanation on how tokens are split. -NOTE: **Note:** +NOTE: Currently the [Elasticsearch code_analyzer doesn't account for all code cases](../integration/elasticsearch.md#known-issues). #### `code_search_analyzer` @@ -119,7 +119,7 @@ Patterns: - `'"((?:\\"|[^"]|\\")*)"'`: captures terms inside quotes, removing the quotes - `"'((?:\\'|[^']|\\')*)'"`: same as above, for single-quotes - `'\.([^.]+)(?=\.|\s|\Z)'`: separate terms with periods in-between -- `'([\p{L}_.-]+)'`: some common chars in file names to keep the whole filename intact (eg. `my_file-ñame.txt`) +- `'([\p{L}_.-]+)'`: some common chars in file names to keep the whole filename intact (for example `my_file-ñame.txt`) - `'([\p{L}\d_]+)'`: letters, numbers and underscores are the most common tokens in programming. Always capture them greedily regardless of context. ## Gotchas @@ -129,7 +129,7 @@ Patterns: ## Zero downtime reindexing with multiple indices -NOTE: **Note:** +NOTE: This is not applicable yet as multiple indices functionality is not fully implemented. Currently GitLab can only handle a single version of setting. Any setting/schema changes would require reindexing everything from scratch. Since reindexing can take a long time, this can cause search functionality downtime. @@ -168,12 +168,12 @@ The global configurations per version are now in the `Elastic::(Version)::Config ### Creating new version of schema -NOTE: **Note:** +NOTE: This is not applicable yet as multiple indices functionality is not fully implemented. Folders like `ee/lib/elastic/v12p1` contain snapshots of search logic from different versions. To keep a continuous Git history, the latest version lives under `ee/lib/elastic/latest`, but its classes are aliased under an actual version (e.g. `ee/lib/elastic/v12p3`). When referencing these classes, never use the `Latest` namespace directly, but use the actual version (e.g. `V12p3`). -The version name basically follows GitLab's release version. If setting is changed in 12.3, we will create a new namespace called `V12p3` (p stands for "point"). Raise an issue if there is a need to name a version differently. +The version name basically follows the GitLab release version. If setting is changed in 12.3, we will create a new namespace called `V12p3` (p stands for "point"). Raise an issue if there is a need to name a version differently. If the current version is `v12p1`, and we need to create a new version for `v12p3`, the steps are as follows: @@ -188,11 +188,11 @@ If the current version is `v12p1`, and we need to create a new version for `v12p > This functionality was introduced by [#234046](https://gitlab.com/gitlab-org/gitlab/-/issues/234046). -NOTE: **Note:** +NOTE: This only supported for indices created with GitLab 13.0 or greater. Migrations are stored in the [`ee/elastic/migrate/`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/elastic/migrate) folder with `YYYYMMDDHHMMSS_migration_name.rb` -file name format, which is similar to Rails database migrations: +filename format, which is similar to Rails database migrations: ```ruby # frozen_string_literal: true @@ -216,6 +216,29 @@ cron worker sequentially. Any update to the Elastic index mappings should be replicated in [`Elastic::Latest::Config`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/elastic/latest/config.rb). +### Migration options supported by the [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/workers/elastic/migration_worker.rb) + +- `batched!` - Allow the migration to run in batches. If set, the [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/workers/elastic/migration_worker.rb) +will re-enqueue itself with a delay which is set using the `throttle_delay` option described below. The batching +must be handled within the `migrate` method, this setting controls the re-enqueuing only. + +- `throttle_delay` - Sets the wait time in between batch runs. This time should be set high enough to allow each migration batch +enough time to finish. Additionally, the time should be less than 30 minutes since that is how often the +[`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/workers/elastic/migration_worker.rb) +cron worker runs. Default value is 5 minutes. + +```ruby +# frozen_string_literal: true + +class BatchedMigrationName < Elastic::Migration + # Declares a migration should be run in batches + batched! + throttle_delay 10.minutes + + # ... +end +``` + ## Performance Monitoring ### Prometheus |