summaryrefslogtreecommitdiff
path: root/doc/development/elasticsearch.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/elasticsearch.md')
-rw-r--r--doc/development/elasticsearch.md34
1 files changed, 16 insertions, 18 deletions
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md
index d32ceb43ce9..47942817790 100644
--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
@@ -38,7 +38,7 @@ Additionally, if you need large repositories or multiple forks for testing, plea
The Elasticsearch integration depends on an external indexer. We ship an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a Rake task but, after this is done, GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [`/ee/app/models/concerns/elastic/application_versioned_search.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/elastic/application_versioned_search.rb).
-After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/topics/data-types#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
+After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/docs/manual/data-types/#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
Search queries are generated by the concerns found in [`ee/app/models/concerns/elastic`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them!
@@ -277,8 +277,8 @@ These Advanced Search migrations, like any other GitLab changes, need to support
Depending on the order of deployment, it's possible that the migration
has started or finished and there's still a server running the application code from before the
-migration. We need to take this into consideration until we can [ensure all Advanced Search migrations
-start after the deployment has finished](https://gitlab.com/gitlab-org/gitlab/-/issues/321619).
+migration. We need to take this into consideration until we can
+[ensure all Advanced Search migrations start after the deployment has finished](https://gitlab.com/gitlab-org/gitlab/-/issues/321619).
### Reverting a migration
@@ -317,9 +317,8 @@ safely can.
We choose to use GitLab major version upgrades as a safe time to remove
backwards compatibility for indices that have not been fully migrated. We
-[document this in our upgrade
-documentation](../update/index.md#upgrading-to-a-new-major-version). We also
-choose to replace the migration code with the halted migration
+[document this in our upgrade documentation](../update/index.md#upgrading-to-a-new-major-version).
+We also choose to replace the migration code with the halted migration
and remove tests so that:
- We don't need to maintain any code that is called from our Advanced Search
@@ -381,7 +380,7 @@ the volume of updates.
All of the indexing happens in Sidekiq, so much of the relevant logs for the
Elasticsearch integration can be found in
-[`sidekiq.log`](../administration/logs.md#sidekiqlog). In particular, all
+[`sidekiq.log`](../administration/logs/index.md#sidekiqlog). In particular, all
Sidekiq workers that make requests to Elasticsearch in any way will log the
number of requests and time taken querying/writing to Elasticsearch. This can
be useful to understand whether or not your cluster is keeping up with
@@ -390,26 +389,25 @@ indexing.
Searching Elasticsearch is done via ordinary web workers handling requests. Any
requests to load a page or make an API request, which then make requests to
Elasticsearch, will log the number of requests and the time taken to
-[`production_json.log`](../administration/logs.md#production_jsonlog). These
+[`production_json.log`](../administration/logs/index.md#production_jsonlog). These
logs will also include the time spent on Database and Gitaly requests, which
may help to diagnose which part of the search is performing poorly.
There are additional logs specific to Elasticsearch that are sent to
-[`elasticsearch.log`](../administration/logs.md#elasticsearchlog)
+[`elasticsearch.log`](../administration/logs/index.md#elasticsearchlog)
that may contain information to help diagnose performance issues.
### Performance Bar
-Elasticsearch requests will be displayed in the [`Performance
-Bar`](../administration/monitoring/performance/performance_bar.md), which can
+Elasticsearch requests will be displayed in the
+[`Performance Bar`](../administration/monitoring/performance/performance_bar.md), which can
be used both locally in development and on any deployed GitLab instance to
diagnose poor search performance. This will show the exact queries being made,
which is useful to diagnose why a search might be slow.
### Correlation ID and `X-Opaque-Id`
-Our [correlation
-ID](distributed_tracing.md#developer-guidelines-for-working-with-correlation-ids)
+Our [correlation ID](distributed_tracing.md#developer-guidelines-for-working-with-correlation-ids)
is forwarded by all requests from Rails to Elasticsearch as the
[`X-Opaque-Id`](https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html#_identifying_running_tasks)
header which allows us to track any
@@ -477,13 +475,13 @@ documented here in case it is useful for others. The relevant logs that could
theoretically be used to figure out what needs to be replayed are:
1. All non-repository updates that were synced can be found in
- [`elasticsearch.log`](../administration/logs.md#elasticsearchlog) by
+ [`elasticsearch.log`](../administration/logs/index.md#elasticsearchlog) by
searching for
[`track_items`](https://gitlab.com/gitlab-org/gitlab/-/blob/1e60ea99bd8110a97d8fc481e2f41cab14e63d31/ee/app/services/elastic/process_bookkeeping_service.rb#L25)
and these can be replayed by sending these items again through
`::Elastic::ProcessBookkeepingService.track!`
1. All repository updates that occurred can be found in
- [`elasticsearch.log`](../administration/logs.md#elasticsearchlog) by
+ [`elasticsearch.log`](../administration/logs/index.md#elasticsearchlog) by
searching for
[`indexing_commit_range`](https://gitlab.com/gitlab-org/gitlab/-/blob/6f9d75dd3898536b9ec2fb206e0bd677ab59bd6d/ee/lib/gitlab/elastic/indexer.rb#L41).
Replaying these requires resetting the
@@ -492,13 +490,13 @@ theoretically be used to figure out what needs to be replayed are:
the project using
[`ElasticCommitIndexerWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic_commit_indexer_worker.rb)
1. All project deletes that occurred can be found in
- [`sidekiq.log`](../administration/logs.md#sidekiqlog) by searching for
+ [`sidekiq.log`](../administration/logs/index.md#sidekiqlog) by searching for
[`ElasticDeleteProjectWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic_delete_project_worker.rb).
These updates can be replayed by triggering another
`ElasticDeleteProjectWorker`.
-With the above methods and taking regular [Elasticsearch
-snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html)
+With the above methods and taking regular
+[Elasticsearch snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html)
we should be able to recover from different kinds of data loss issues in a
relatively short period of time compared to indexing everything from
scratch.