summaryrefslogtreecommitdiff
path: root/doc/development/elasticsearch.md
diff options
context:
space:
mode:
authorAlex Groleau <agroleau@gitlab.com>2019-08-27 12:41:39 -0400
committerAlex Groleau <agroleau@gitlab.com>2019-08-27 12:41:39 -0400
commitaa01f092829facd1044ad02f334422b7dbdc8b0e (patch)
treea754bf2497820432df7da0f2108bb7527a8dd7b8 /doc/development/elasticsearch.md
parenta1d9c9994a9a4d79b824c3fd9322688303ac8b03 (diff)
parent6b10779053ff4233c7a64c5ab57754fce63f6710 (diff)
downloadgitlab-ce-runner-metrics-extractor.tar.gz
Merge branch 'master' of gitlab_gitlab:gitlab-org/gitlab-cerunner-metrics-extractor
Diffstat (limited to 'doc/development/elasticsearch.md')
-rw-r--r--doc/development/elasticsearch.md61
1 files changed, 58 insertions, 3 deletions
diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md
index 0965db29557..f2412c249c1 100644
--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
@@ -40,9 +40,11 @@ There is no need to install any plugins
If you're interested on working with the new beta repo indexer, all you need to do is:
-- git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git
-- make
-- make install
+```sh
+git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git
+make
+make install
+```
this adds `gitlab-elasticsearch-indexer` to `$GOPATH/bin`, please make sure that is in your `$PATH`. After that GitLab will find it and you'll be able to enable it in the admin settings area.
@@ -148,6 +150,59 @@ Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/
- Searches can have their own analyzers. Remember to check when editing analyzers
- `Character` filters (as opposed to token filters) always replace the original character, so they're not a good choice as they can hinder exact searches
+## Zero downtime reindexing with multiple indices
+
+Currently GitLab can only handle a single version of setting. Any setting/schema changes would require reindexing everything from scratch. Since reindexing can take a long time, this can cause search functionality downtime.
+
+To avoid downtime, GitLab is working to support multiple indices that
+can function at the same time. Whenever the schema changes, the admin
+will be able to create a new index and reindex to it, while searches
+continue to go to the older, stable index. Any data updates will be
+forwarded to both indices. Once the new index is ready, an admin can
+mark it active, which will direct all searches to it, and remove the old
+index.
+
+This is also helpful for migrating to new servers, e.g. moving to/from AWS.
+
+Currently we are on the process of migrating to this new design. Everything is hardwired to work with one single version for now.
+
+### Architecture
+
+The traditional setup, provided by `elasticsearch-rails`, is to communicate through its internal proxy classes. Developers would write model-specific logic in a module for the model to include in (e.g. `SnippetsSearch`). The `__elasticsearch__` methods would return a proxy object, e.g.:
+
+- `Issue.__elasticsearch__` returns an instance of `Elasticsearch::Model::Proxy::ClassMethodsProxy`
+- `Issue.first.__elasticsearch__` returns an instance of `Elasticsearch::Model::Proxy::InstanceMethodsProxy`.
+
+These proxy objects would talk to Elasticsearch server directly (see top half of the diagram).
+
+![Elasticsearch Architecture](img/elasticsearch_architecture.svg)
+
+In the planned new design, each model would have a pair of corresponding subclassed proxy objects, in which model-specific logic is located. For example, `Snippet` would have `SnippetClassProxy` and `SnippetInstanceProxy` (being subclass of `Elasticsearch::Model::Proxy::ClassMethodsProxy` and `Elasticsearch::Model::Proxy::InstanceMethodsProxy`, respectively).
+
+`__elasticsearch__` would represent another layer of proxy object, keeping track of multiple actual proxy objects. It would forward method calls to the appropriate index. For example:
+
+- `model.__elasticsearch__.search` would be forwarded to the one stable index, since it is a read operation.
+- `model.__elasticsearch__.update_document` would be forwarded to all indices, to keep all indices up-to-date.
+
+The global configurations per version are now in the `Elastic::(Version)::Config` class. You can change mappings there.
+
+### Creating new version of schema
+
+NOTE: **Note:** this is not applicable yet as multiple indices functionality is not fully implemented.
+
+Folders like `ee/lib/elastic/v12p1` contain snapshots of search logic from different versions. To keep a continuous Git history, the latest version lives under `ee/lib/elastic/latest`, but its classes are aliased under an actual version (e.g. `ee/lib/elastic/v12p3`). When referencing these classes, never use the `Latest` namespace directly, but use the actual version (e.g. `V12p3`).
+
+The version name basically follows GitLab's release version. If setting is changed in 12.3, we will create a new namespace called `V12p3` (p stands for "point"). Raise an issue if there is a need to name a version differently.
+
+If the current version is `v12p1`, and we need to create a new version for `v12p3`, the steps are as follows:
+
+1. Copy the entire folder of `v12p1` as `v12p3`
+1. Change the namespace for files under `v12p3` folder from `V12p1` to `V12p3` (which are still aliased to `Latest`)
+1. Delete `v12p1` folder
+1. Copy the entire folder of `latest` as `v12p1`
+1. Change the namespace for files under `v12p1` folder from `Latest` to `V12p1`
+1. Make changes to files under the `latest` folder as needed
+
## Troubleshooting
### Getting `flood stage disk watermark [95%] exceeded`