diff options
Diffstat (limited to 'doc/development/geo.md')
-rw-r--r-- | doc/development/geo.md | 100 |
1 files changed, 73 insertions, 27 deletions
diff --git a/doc/development/geo.md b/doc/development/geo.md index d8669d377b0..c8e6a86eb52 100644 --- a/doc/development/geo.md +++ b/doc/development/geo.md @@ -2,10 +2,48 @@ Geo connects GitLab instances together. One GitLab instance is designated as a **primary** node and can be run with multiple -**secondary** nodes. Geo orchestrates quite a few components that are -described in more detail below. +**secondary** nodes. Geo orchestrates quite a few components that can be seen on +the diagram below and are described in more detail within this document. -## Database replication +![Geo Architecture Diagram](../administration/geo/replication/img/geo_architecture.png) + +## Replication layer + +Geo handles replication for different components: +- [Database](#database-replication): includes the entire application, except cache and jobs. +- [Git repositories](#repository-replication): includes both projects and wikis. +- [Uploaded blobs](#uploads-replication): includes anything from images attached on issues +to raw logs and assets from CI. + +With the exception of the Database replication, on a *secondary* node, everything is coordinated +by the [Geo Log Cursor](#geo-log-cursor). + +### Geo Log Cursor daemon + +The [Geo Log Cursor daemon](#geo-log-cursor-daemon) is a separate process running on +each **secondary** node. It monitors the [Geo Event Log](#geo-event-log) +for new events and creates background jobs for each specific event type. + +For example when a repository is updated, the Geo **primary** node creates +a Geo event with an associated repository updated event. The Geo Log Cursor daemon +picks the event up and schedules a `Geo::ProjectSyncWorker` job which will +use the `Geo::RepositorySyncService` and `Geo::WikiSyncService` classes +to update the repository and the wiki respectively. + +The Geo Log Cursor daemon can operate in High Availability mode automatically. +The daemon will try to acquire a lock from time to time and once acquired, it +will behave as the *active* daemon. + +Any additional running daemons on the same node, will be in standby +mode, ready to resume work if the *active* daemon releases its lock. + +We use the [`ExclusiveLease`](https://www.rubydoc.info/github/gitlabhq/gitlabhq/Gitlab/ExclusiveLease) lock type with a small TTL, that is renewed at every +pooling cycle. That allows us to implement this global lock with a timeout. + +At the end of the pooling cycle, if the daemon can't renew and/or reacquire +the lock, it switches to standby mode. + +### Database replication Geo uses [streaming replication](#streaming-replication) to replicate the database from the **primary** to the **secondary** nodes. This @@ -13,7 +51,7 @@ replication gives the **secondary** nodes access to all the data saved in the database. So users can log in on the **secondary** and read all the issues, merge requests, etc. on the **secondary** node. -## Repository replication +### Repository replication Geo also replicates repositories. Each **secondary** node keeps track of the state of every repository in the [tracking database](#tracking-database). @@ -23,7 +61,7 @@ There are a few ways a repository gets replicated by the: - [Repository Sync worker](#repository-sync-worker). - [Geo Log Cursor](#geo-log-cursor). -### Project Registry +#### Project Registry The `Geo::ProjectRegistry` class defines the model used to track the state of repository replication. For each project in the main @@ -32,15 +70,15 @@ database, one record in the tracking database is kept. It records the following about repositories: - The last time they were synced. -- The last time they were synced successfully. +- The last time they were successfully synced. - If they need to be resynced. -- When retry should be attempted. +- When a retry should be attempted. - The number of retries. -- If and when the they were verified. +- If and when they were verified. It also stores these attributes for project wikis in dedicated columns. -### Repository Sync worker +#### Repository Sync worker The `Geo::RepositorySyncWorker` class runs periodically in the background and it searches the `Geo::ProjectRegistry` model for @@ -59,26 +97,12 @@ times, Geo does a so-called _redownload_. It will do a clean clone into the `@geo-temporary` directory in the root of the storage. When it's successful, we replace the main repo with the newly cloned one. -### Geo Log Cursor - -The [Geo Log Cursor](#geo-log-cursor) is a separate process running on -each **secondary** node. It monitors the [Geo Event Log](#geo-event-log) -and handles all of the events. When it sees an unhandled event, it -starts a background worker to handle that event, depending on the type -of event. - -When a repository receives an update, the Geo **primary** node creates -a Geo event with an associated repository updated event. The cursor -picks that up, and schedules a `Geo::ProjectSyncWorker` job which will -use the `Geo::RepositorySyncService` class and `Geo::WikiSyncService` -class to update the repository and the wiki. - -## Uploads replication +### Uploads replication File uploads are also being replicated to the **secondary** node. To track the state of syncing, the `Geo::FileRegistry` model is used. -### File Registry +#### File Registry Similar to the [Project Registry](#project-registry), there is a `Geo::FileRegistry` model that tracks the synced uploads. @@ -86,7 +110,7 @@ Similar to the [Project Registry](#project-registry), there is a CI Job Artifacts are synced in a similar way as uploads or LFS objects, but they are tracked by `Geo::JobArtifactRegistry` model. -### File Download Dispatch worker +#### File Download Dispatch worker Also similar to the [Repository Sync worker](#repository-sync-worker), there is a `Geo::FileDownloadDispatchWorker` class that is run @@ -113,7 +137,7 @@ Authorization: GL-Geo <access_key>:<JWT payload> ``` The **primary** node uses the `access_key` field to look up the -corresponding Geo **secondary** node and decrypts the JWT payload, +corresponding **secondary** node and decrypts the JWT payload, which contains additional information to identify the file request. This ensures that the **secondary** node downloads the right file for the right database ID. For example, for an LFS object, the @@ -133,6 +157,28 @@ NOTE: **Note:** JWT requires synchronized clocks between the machines involved, otherwise it may fail with an encryption error. +## Git Push to Geo secondary + +The Git Push Proxy exists as a functionality built inside the `gitlab-shell` component. +It is active on a **secondary** node only. It allows the user that has cloned a repository +from the secondary node to push to the same URL. + +Git `push` requests directed to a **secondary** node will be sent over to the **primary** node, +while `pull` requests will continue to be served by the **secondary** node for maximum efficiency. + +HTTPS and SSH requests are handled differently: + +- With HTTPS, we will give the user a `HTTP 302 Redirect` pointing to the project on the **primary** node. +The git client is wise enough to understand that status code and process the redirection. +- With SSH, because there is no equivalent way to perform a redirect, we have to proxy the request. +This is done inside [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell), by first translating the request +to the HTTP protocol, and then proxying it to the **primary** node. + +The [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) daemon knows when to proxy based on the response +from `/api/v4/allowed`. A special `HTTP 300` status code is returned and we execute a "custom action", +specified in the response body. The response contains additional data that allows the proxied `push` operation +to happen on the **primary** node. + ## Using the Tracking Database Along with the main database that is replicated, a Geo **secondary** |