summaryrefslogtreecommitdiff
path: root/doc/development/elasticsearch.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/elasticsearch.md')
-rw-r--r--doc/development/elasticsearch.md63
1 files changed, 55 insertions, 8 deletions
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md
index 185f536fc01..9f54386f1af 100644
--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
@@ -24,19 +24,19 @@ See the [Elasticsearch GDK setup instructions](https://gitlab.com/gitlab-org/git
- `gitlab:elastic:test:index_size`: Tells you how much space the current index is using, as well as how many documents are in the index.
- `gitlab:elastic:test:index_size_change`: Outputs index size, reindexes, and outputs index size again. Useful when testing improvements to indexing size.
-Additionally, if you need large repos or multiple forks for testing, please consider [following these instructions](rake_tasks.md#extra-project-seed-options)
+Additionally, if you need large repositories or multiple forks for testing, please consider [following these instructions](rake_tasks.md#extra-project-seed-options)
## How does it work?
-The Elasticsearch integration depends on an external indexer. We ship an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a Rake task but, after this is done, GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [/ee/app/models/concerns/elastic/application_versioned_search.rb](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/models/concerns/elastic/application_versioned_search.rb).
+The Elasticsearch integration depends on an external indexer. We ship an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a Rake task but, after this is done, GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [`/ee/app/models/concerns/elastic/application_versioned_search.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/app/models/concerns/elastic/application_versioned_search.rb).
-After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/topics/data-types#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
+After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/topics/data-types#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
-Search queries are generated by the concerns found in [ee/app/models/concerns/elastic](https://gitlab.com/gitlab-org/gitlab/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them!
+Search queries are generated by the concerns found in [`ee/app/models/concerns/elastic`](https://gitlab.com/gitlab-org/gitlab/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them!
## Existing Analyzers/Tokenizers/Filters
-These are all defined in [ee/lib/elastic/latest/config.rb](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/elastic/latest/config.rb)
+These are all defined in [`ee/lib/elastic/latest/config.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/elastic/latest/config.rb)
### Analyzers
@@ -54,7 +54,7 @@ Please see the `sha_tokenizer` explanation later below for an example.
#### `code_analyzer`
-Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer and the filters: [`code`](#code), [`edgeNGram_filter`](#edgengram_filter), `lowercase`, and `asciifolding`
+Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer and the filters: [`code`](#code), `lowercase`, and `asciifolding`
The `whitespace` tokenizer was selected in order to have more control over how tokens are split. For example the string `Foo::bar(4)` needs to generate tokens like `Foo` and `bar(4)` in order to be properly searched.
@@ -71,7 +71,7 @@ Not directly used for indexing, but rather used to transform a search input. Use
#### `sha_tokenizer`
-This is a custom tokenizer that uses the [`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html) to allow SHAs to be searcheable by any sub-set of it (minimum of 5 chars).
+This is a custom tokenizer that uses the [`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html) to allow SHAs to be searchable by any sub-set of it (minimum of 5 chars).
Example:
@@ -149,7 +149,7 @@ These proxy objects would talk to Elasticsearch server directly (see top half of
![Elasticsearch Architecture](img/elasticsearch_architecture.svg)
-In the planned new design, each model would have a pair of corresponding subclassed proxy objects, in which model-specific logic is located. For example, `Snippet` would have `SnippetClassProxy` and `SnippetInstanceProxy` (being subclass of `Elasticsearch::Model::Proxy::ClassMethodsProxy` and `Elasticsearch::Model::Proxy::InstanceMethodsProxy`, respectively).
+In the planned new design, each model would have a pair of corresponding sub-classed proxy objects, in which model-specific logic is located. For example, `Snippet` would have `SnippetClassProxy` and `SnippetInstanceProxy` (being subclass of `Elasticsearch::Model::Proxy::ClassMethodsProxy` and `Elasticsearch::Model::Proxy::InstanceMethodsProxy`, respectively).
`__elasticsearch__` would represent another layer of proxy object, keeping track of multiple actual proxy objects. It would forward method calls to the appropriate index. For example:
@@ -175,6 +175,53 @@ If the current version is `v12p1`, and we need to create a new version for `v12p
1. Change the namespace for files under `v12p1` folder from `Latest` to `V12p1`
1. Make changes to files under the `latest` folder as needed
+## Performance Monitoring
+
+### Prometheus
+
+GitLab exports [Prometheus
+metrics](../administration/monitoring/prometheus/gitlab_metrics.md) relating to
+the number of requests and timing for all web/API requests and Sidekiq jobs,
+which can help diagnose performance trends and compare how Elasticsearch timing
+is impacting overall performance relative to the time spent doing other things.
+
+#### Indexing queues
+
+GitLab also exports [Prometheus
+metrics](../administration/monitoring/prometheus/gitlab_metrics.md) for
+indexing queues, which can help diagnose performance bottlenecks and determine
+whether or not your GitLab instance or Elasticsearch server can keep up with
+the volume of updates.
+
+### Logs
+
+All of the indexing happens in Sidekiq, so much of the relevant logs for the
+Elasticsearch integration can be found in
+[`sidekiq.log`](../administration/logs.md#sidekiqlog). In particular, all
+Sidekiq workers that make requests to Elasticsearch in any way will log the
+number of requests and time taken querying/writing to Elasticsearch. This can
+be useful to understand whether or not your cluster is keeping up with
+indexing.
+
+Searching Elasticsearch is done via ordinary web workers handling requests. Any
+requests to load a page or make an API request, which then make requests to
+Elasticsearch, will log the number of requests and the time taken to
+[`production_json.log`](../administration/logs.md#production_jsonlog). These
+logs will also include the time spent on Database and Gitaly requests, which
+may help to diagnose which part of the search is performing poorly.
+
+There are additional logs specific to Elasticsearch that are sent to
+[`elasticsearch.log`](../administration/logs.md#elasticsearchlog-starter-only)
+that may contain information to help diagnose performance issues.
+
+### Performance Bar
+
+Elasticsearch requests will be displayed in the [`Performance
+Bar`](../administration/monitoring/performance/performance_bar.md), which can
+be used both locally in development and on any deployed GitLab instance to
+diagnose poor search performance. This will show the exact queries being made,
+which is useful to diagnose why a search might be slow.
+
## Troubleshooting
### Getting `flood stage disk watermark [95%] exceeded`