summaryrefslogtreecommitdiff
path: root/doc/integration/elasticsearch.md
diff options
context:
space:
mode:
authorAlex Groleau <agroleau@gitlab.com>2019-08-27 12:41:39 -0400
committerAlex Groleau <agroleau@gitlab.com>2019-08-27 12:41:39 -0400
commitaa01f092829facd1044ad02f334422b7dbdc8b0e (patch)
treea754bf2497820432df7da0f2108bb7527a8dd7b8 /doc/integration/elasticsearch.md
parenta1d9c9994a9a4d79b824c3fd9322688303ac8b03 (diff)
parent6b10779053ff4233c7a64c5ab57754fce63f6710 (diff)
downloadgitlab-ce-runner-metrics-extractor.tar.gz
Merge branch 'master' of gitlab_gitlab:gitlab-org/gitlab-cerunner-metrics-extractor
Diffstat (limited to 'doc/integration/elasticsearch.md')
-rw-r--r--doc/integration/elasticsearch.md169
1 files changed, 93 insertions, 76 deletions
diff --git a/doc/integration/elasticsearch.md b/doc/integration/elasticsearch.md
index da1df07a75d..de49508b47a 100644
--- a/doc/integration/elasticsearch.md
+++ b/doc/integration/elasticsearch.md
@@ -12,6 +12,7 @@ special searches:
- [Advanced Syntax Search](../user/search/advanced_search_syntax.md)
## Version Requirements
+
<!-- Please remember to update ee/lib/system_check/app/elasticsearch_check.rb if this changes -->
| GitLab version | Elasticsearch version |
@@ -23,25 +24,32 @@ special searches:
## Installing Elasticsearch
Elasticsearch is _not_ included in the Omnibus packages. You will have to
-install it yourself whether you are using the Omnibus package or installed
-GitLab from source. Providing detailed information on installing Elasticsearch
-is out of the scope of this document.
+[install it yourself](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html "Elasticsearch installation documentation")
+whether you are using the Omnibus package or installed GitLab from source.
+Providing detailed information on installing Elasticsearch is out of the scope
+of this document.
+
+NOTE: **Note:**
+Elasticsearch should be installed on a separate server, whether you install
+it yourself or by using the
+[Amazon Elasticsearch](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg.html)
+service. Running Elasticsearch on the same server as GitLab is not recommended
+and it will likely cause performance degradation on the GitLab installation.
Once the data is added to the database or repository and [Elasticsearch is
enabled in the admin area](#enabling-elasticsearch) the search index will be
-updated automatically. Elasticsearch can be installed on the same machine as
-GitLab or on a separate server, or you can use the [Amazon Elasticsearch](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg.html)
-service.
-
-You can follow the steps as described in the [official web site](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html "Elasticsearch installation documentation") or
-use the packages that are available for your OS.
+updated automatically.
## Elasticsearch repository indexer (beta)
In order to improve elasticsearch indexing performance, GitLab has made available a [new indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer).
This will replace the included Ruby indexer in the future but should be considered beta software for now, so there may be some bugs.
-If you would like to use it, please follow the instructions below.
+The Elasticsearch Go indexer is included in Omnibus for GitLab 11.8 and newer.
+
+To use the new Elasticsearch indexer included in Omnibus, check the box "Use the new repository indexer (beta)" when [enabling the Elasticsearch integration](#enabling-elasticsearch).
+
+If you would like to use the Elasticsearch Go indexer with a source installation or an older version of GitLab, please follow the instructions below.
### Installation
@@ -115,8 +123,8 @@ production instances, they recommend considerably more resources.
Storage requirements also vary based on the installation side, but as a rule of
thumb, you should allocate the total size of your production database, **plus**
-two-thirds of the total size of your git repositories. Efforts to reduce this
-total are being tracked in this epic: [gitlab-org&153](https://gitlab.com/groups/gitlab-org/-/epics/153).
+two-thirds of the total size of your Git repositories. Efforts to reduce this
+total are being tracked in [epic &153](https://gitlab.com/groups/gitlab-org/-/epics/153).
## Enabling Elasticsearch
@@ -325,30 +333,36 @@ curl --request PUT localhost:9200/gitlab-production/_settings --data '{
Enable Elasticsearch search in **Admin > Settings > Integrations**. That's it. Enjoy it!
+### Index limit
+
+Currently for repository and snippet files, GitLab would only index up to 1 MB of content, in order to avoid indexing timeout.
+
## GitLab Elasticsearch Rake Tasks
There are several rake tasks available to you via the command line:
-- [sudo gitlab-rake gitlab:elastic:index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This is a wrapper task. It does the following:
- `sudo gitlab-rake gitlab:elastic:create_empty_index`
- `sudo gitlab-rake gitlab:elastic:clear_index_status`
- `sudo gitlab-rake gitlab:elastic:index_projects`
- `sudo gitlab-rake gitlab:elastic:index_snippets`
-- [sudo gitlab-rake gitlab:elastic:index_projects](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:index_projects`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This iterates over all projects and queues sidekiq jobs to index them in the background.
-- [sudo gitlab-rake gitlab:elastic:index_projects_status](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100.
-- [sudo gitlab-rake gitlab:elastic:create_empty_index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This generates an empty index on the Elasticsearch side.
-- [sudo gitlab-rake gitlab:elastic:clear_index_status](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This deletes all instances of IndexStatus for all projects.
-- [sudo gitlab-rake gitlab:elastic:delete_index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This removes the GitLab index on the Elasticsearch instance.
-- [sudo gitlab-rake gitlab:elastic:recreate_index](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:recreate_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- Does the same thing as `sudo gitlab-rake gitlab:elastic:create_empty_index`
-- [sudo gitlab-rake gitlab:elastic:index_snippets](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+- [`sudo gitlab-rake gitlab:elastic:index_snippets`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- Performs an Elasticsearch import that indexes the snippets data.
+- [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
+ - Displays which projects are not indexed.
### Environment Variables
@@ -424,91 +438,94 @@ Here are some common pitfalls and how to overcome them:
- **How can I verify my GitLab instance is using Elasticsearch?**
- The easiest method is via the rails console (`sudo gitlab-rails console`) by running the following:
+ The easiest method is via the rails console (`sudo gitlab-rails console`) by running the following:
- ```ruby
- u = User.find_by_username('your-username')
- s = SearchService.new(u, {:search => 'search_term'})
- pp s.search_objects.class.name
- ```
+ ```ruby
+ u = User.find_by_username('your-username')
+ s = SearchService.new(u, {:search => 'search_term'})
+ pp s.search_objects.class.name
+ ```
- If you see `Elasticsearch::Model::Response::Records`, you are using Elasticsearch.
+ If you see `Elasticsearch::Model::Response::Records`, you are using Elasticsearch.
- **I updated GitLab and now I can't find anything**
- We continuously make updates to our indexing strategies and aim to support
- newer versions of Elasticsearch. When indexing changes are made, it may
- be necessary for you to [reindex](#adding-gitlabs-data-to-the-elasticsearch-index) after updating GitLab.
+ We continuously make updates to our indexing strategies and aim to support
+ newer versions of Elasticsearch. When indexing changes are made, it may
+ be necessary for you to [reindex](#adding-gitlabs-data-to-the-elasticsearch-index) after updating GitLab.
- **I indexed all the repositories but I can't find anything**
- Make sure you indexed all the database data [as stated above](#adding-gitlabs-data-to-the-elasticsearch-index).
+ Make sure you indexed all the database data [as stated above](#adding-gitlabs-data-to-the-elasticsearch-index).
- Beyond that, check via the [Elasticsearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html) to see if the data shows up on the Elasticsearch side.
+ Beyond that, check via the [Elasticsearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html) to see if the data shows up on the Elasticsearch side.
- If it shows up via the [Elasticsearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html), check that it shows up via the rails console (`sudo gitlab-rails console`):
+ If it shows up via the [Elasticsearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html), check that it shows up via the rails console (`sudo gitlab-rails console`):
- ```ruby
- u = User.find_by_username('your-username')
- s = SearchService.new(u, {:search => 'search_term', :scope => ‘blobs’})
- pp s.search_objects.to_a
- ```
+ ```ruby
+ u = User.find_by_username('your-username')
+ s = SearchService.new(u, {:search => 'search_term', :scope => ‘blobs’})
+ pp s.search_objects.to_a
+ ```
- See [Elasticsearch Index Scopes](elasticsearch.md#elasticsearch-index-scopes) for more information on searching for specific types of data.
+ See [Elasticsearch Index Scopes](elasticsearch.md#elasticsearch-index-scopes) for more information on searching for specific types of data.
- **I indexed all the repositories but then switched Elasticsearch servers and now I can't find anything**
- You will need to re-run all the rake tasks to re-index the database, repositories, and wikis.
+ You will need to re-run all the rake tasks to re-index the database, repositories, and wikis.
- **The indexing process is taking a very long time**
- The more data present in your GitLab instance, the longer the indexing process takes.
+ The more data present in your GitLab instance, the longer the indexing process takes.
+
+- **There are some projects that weren't indexed, but we don't know which ones**
+
+ You can run `sudo gitlab-rake gitlab:elastic:projects_not_indexed` to display projects that aren't indexed.
- **No new data is added to the Elasticsearch index when I push code**
- When performing the initial indexing of blobs, we lock all projects until the project finishes indexing. It could
- happen that an error during the process causes one or multiple projects to remain locked. In order to unlock them,
- run the `gitlab:elastic:clear_locked_projects` rake task.
+ When performing the initial indexing of blobs, we lock all projects until the project finishes indexing. It could
+ happen that an error during the process causes one or multiple projects to remain locked. In order to unlock them,
+ run the `gitlab:elastic:clear_locked_projects` rake task.
- **"Can't specify parent if no parent field has been configured"**
- If you enabled Elasticsearch before GitLab 8.12 and have not rebuilt indexes you will get
- exception in lots of different cases:
-
- ```text
- Elasticsearch::Transport::Transport::Errors::BadRequest([400] {
- "error": {
- "root_cause": [{
- "type": "illegal_argument_exception",
- "reason": "Can't specify parent if no parent field has been configured"
- }],
- "type": "illegal_argument_exception",
- "reason": "Can't specify parent if no parent field has been configured"
- },
- "status": 400
- }):
- ```
-
- This is because we changed the index mapping in GitLab 8.12 and the old indexes should be removed and built from scratch again,
- see details in the [8-11-to-8-12 update guide](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/update/8.11-to-8.12.md#11-elasticsearch-index-update-if-you-currently-use-elasticsearch).
+ If you enabled Elasticsearch before GitLab 8.12 and have not rebuilt indexes you will get
+ exception in lots of different cases:
+
+ ```text
+ Elasticsearch::Transport::Transport::Errors::BadRequest([400] {
+ "error": {
+ "root_cause": [{
+ "type": "illegal_argument_exception",
+ "reason": "Can't specify parent if no parent field has been configured"
+ }],
+ "type": "illegal_argument_exception",
+ "reason": "Can't specify parent if no parent field has been configured"
+ },
+ "status": 400
+ }):
+ ```
+
+ This is because we changed the index mapping in GitLab 8.12 and the old indexes should be removed and built from scratch again,
+ see details in the [8-11-to-8-12 update guide](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/update/8.11-to-8.12.md#11-elasticsearch-index-update-if-you-currently-use-elasticsearch).
- Exception `Elasticsearch::Transport::Transport::Errors::BadRequest`
- If you have this exception (just like in the case above but the actual message is different) please check if you have the correct Elasticsearch version and you met the other [requirements](#system-requirements).
- There is also an easy way to check it automatically with `sudo gitlab-rake gitlab:check` command.
+ If you have this exception (just like in the case above but the actual message is different) please check if you have the correct Elasticsearch version and you met the other [requirements](#system-requirements).
+ There is also an easy way to check it automatically with `sudo gitlab-rake gitlab:check` command.
- Exception `Elasticsearch::Transport::Transport::Errors::RequestEntityTooLarge`
- ```text
- [413] {"Message":"Request size exceeded 10485760 bytes"}
- ```
-
- This exception is seen when your Elasticsearch cluster is configured to reject
- requests above a certain size (10MiB in this case). This corresponds to the
- `http.max_content_length` setting in `elasticsearch.yml`. Increase it to a
- larger size and restart your Elasticsearch cluster.
+ ```text
+ [413] {"Message":"Request size exceeded 10485760 bytes"}
+ ```
- AWS has [fixed limits](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-limits.html)
- for this setting ("Maximum Size of HTTP Request Payloads"), based on the size of
- the underlying instance.
+ This exception is seen when your Elasticsearch cluster is configured to reject
+ requests above a certain size (10MiB in this case). This corresponds to the
+ `http.max_content_length` setting in `elasticsearch.yml`. Increase it to a
+ larger size and restart your Elasticsearch cluster.
+ AWS has [fixed limits](http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-limits.html)
+ for this setting ("Maximum Size of HTTP Request Payloads"), based on the size of
+ the underlying instance.