diff options
author | Alex Groleau <agroleau@gitlab.com> | 2019-08-27 12:41:39 -0400 |
---|---|---|
committer | Alex Groleau <agroleau@gitlab.com> | 2019-08-27 12:41:39 -0400 |
commit | aa01f092829facd1044ad02f334422b7dbdc8b0e (patch) | |
tree | a754bf2497820432df7da0f2108bb7527a8dd7b8 /doc/development/elasticsearch.md | |
parent | a1d9c9994a9a4d79b824c3fd9322688303ac8b03 (diff) | |
parent | 6b10779053ff4233c7a64c5ab57754fce63f6710 (diff) | |
download | gitlab-ce-runner-metrics-extractor.tar.gz |
Merge branch 'master' of gitlab_gitlab:gitlab-org/gitlab-cerunner-metrics-extractor
Diffstat (limited to 'doc/development/elasticsearch.md')
-rw-r--r-- | doc/development/elasticsearch.md | 61 |
1 files changed, 58 insertions, 3 deletions
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md index 0965db29557..f2412c249c1 100644 --- a/doc/development/elasticsearch.md +++ b/doc/development/elasticsearch.md @@ -40,9 +40,11 @@ There is no need to install any plugins If you're interested on working with the new beta repo indexer, all you need to do is: -- git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git -- make -- make install +```sh +git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git +make +make install +``` this adds `gitlab-elasticsearch-indexer` to `$GOPATH/bin`, please make sure that is in your `$PATH`. After that GitLab will find it and you'll be able to enable it in the admin settings area. @@ -148,6 +150,59 @@ Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/ - Searches can have their own analyzers. Remember to check when editing analyzers - `Character` filters (as opposed to token filters) always replace the original character, so they're not a good choice as they can hinder exact searches +## Zero downtime reindexing with multiple indices + +Currently GitLab can only handle a single version of setting. Any setting/schema changes would require reindexing everything from scratch. Since reindexing can take a long time, this can cause search functionality downtime. + +To avoid downtime, GitLab is working to support multiple indices that +can function at the same time. Whenever the schema changes, the admin +will be able to create a new index and reindex to it, while searches +continue to go to the older, stable index. Any data updates will be +forwarded to both indices. Once the new index is ready, an admin can +mark it active, which will direct all searches to it, and remove the old +index. + +This is also helpful for migrating to new servers, e.g. moving to/from AWS. + +Currently we are on the process of migrating to this new design. Everything is hardwired to work with one single version for now. + +### Architecture + +The traditional setup, provided by `elasticsearch-rails`, is to communicate through its internal proxy classes. Developers would write model-specific logic in a module for the model to include in (e.g. `SnippetsSearch`). The `__elasticsearch__` methods would return a proxy object, e.g.: + +- `Issue.__elasticsearch__` returns an instance of `Elasticsearch::Model::Proxy::ClassMethodsProxy` +- `Issue.first.__elasticsearch__` returns an instance of `Elasticsearch::Model::Proxy::InstanceMethodsProxy`. + +These proxy objects would talk to Elasticsearch server directly (see top half of the diagram). + + + +In the planned new design, each model would have a pair of corresponding subclassed proxy objects, in which model-specific logic is located. For example, `Snippet` would have `SnippetClassProxy` and `SnippetInstanceProxy` (being subclass of `Elasticsearch::Model::Proxy::ClassMethodsProxy` and `Elasticsearch::Model::Proxy::InstanceMethodsProxy`, respectively). + +`__elasticsearch__` would represent another layer of proxy object, keeping track of multiple actual proxy objects. It would forward method calls to the appropriate index. For example: + +- `model.__elasticsearch__.search` would be forwarded to the one stable index, since it is a read operation. +- `model.__elasticsearch__.update_document` would be forwarded to all indices, to keep all indices up-to-date. + +The global configurations per version are now in the `Elastic::(Version)::Config` class. You can change mappings there. + +### Creating new version of schema + +NOTE: **Note:** this is not applicable yet as multiple indices functionality is not fully implemented. + +Folders like `ee/lib/elastic/v12p1` contain snapshots of search logic from different versions. To keep a continuous Git history, the latest version lives under `ee/lib/elastic/latest`, but its classes are aliased under an actual version (e.g. `ee/lib/elastic/v12p3`). When referencing these classes, never use the `Latest` namespace directly, but use the actual version (e.g. `V12p3`). + +The version name basically follows GitLab's release version. If setting is changed in 12.3, we will create a new namespace called `V12p3` (p stands for "point"). Raise an issue if there is a need to name a version differently. + +If the current version is `v12p1`, and we need to create a new version for `v12p3`, the steps are as follows: + +1. Copy the entire folder of `v12p1` as `v12p3` +1. Change the namespace for files under `v12p3` folder from `V12p1` to `V12p3` (which are still aliased to `Latest`) +1. Delete `v12p1` folder +1. Copy the entire folder of `latest` as `v12p1` +1. Change the namespace for files under `v12p1` folder from `Latest` to `V12p1` +1. Make changes to files under the `latest` folder as needed + ## Troubleshooting ### Getting `flood stage disk watermark [95%] exceeded` |