summaryrefslogtreecommitdiff
path: root/doc/administration/geo/replication/multiple_servers.md
blob: 87b1aa7fc44e8bb8de50b223473c98b7e6fa7ea3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
---
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
type: howto
---

# Geo for multiple nodes **(PREMIUM SELF)**

This document describes a minimal reference architecture for running Geo
in a multi-node configuration. If your multi-node setup differs from the one
described, it is possible to adapt these instructions to your needs.

## Architecture overview

![Geo multi-node diagram](img/geo-ha-diagram.png)

_[diagram source - GitLab employees only](https://docs.google.com/drawings/d/1z0VlizKiLNXVVVaERFwgsIOuEgjcUqDTWPdQYsE7Z4c/edit)_

The topology above assumes the **primary** and **secondary** Geo sites
are located in two separate locations, on their own virtual network
with private IP addresses. The network is configured such that all machines in
one geographic location can communicate with each other using their private IP addresses.
The IP addresses given are examples and may be different depending on the
network topology of your deployment.

The only external way to access the two Geo sites is by HTTPS at
`gitlab.us.example.com` and `gitlab.eu.example.com` in the example above.

NOTE:
The **primary** and **secondary** Geo sites must be able to communicate to each other over HTTPS.

## Redis and PostgreSQL for multiple nodes

Because of the additional complexity involved in setting up this configuration
for PostgreSQL and Redis, it is not covered by this Geo multi-node documentation.

For more information on setting up a multi-node PostgreSQL cluster and Redis cluster using the Omnibus GitLab package, see:

- [Geo multi-node database replication](../setup/database.md#multi-node-database-replication)
- [Redis multi-node documentation](../../redis/replication_and_failover.md)

NOTE:
It is possible to use cloud hosted services for PostgreSQL and Redis, but this is beyond the scope of this document.

## Prerequisites: Two independently working GitLab multi-node sites

One GitLab site serves as the Geo **primary** site. Use the
[GitLab reference architectures documentation](../../reference_architectures/index.md)
to set this up. You can use different reference architecture sizes for each Geo site. If
you already have a working GitLab instance that is in-use, it can be used as a
**primary** site.

The second GitLab site serves as the Geo **secondary** site. Again, use the
[GitLab reference architectures documentation](../../reference_architectures/index.md) to set this up.
It's a good idea to log in and test it. However, be aware that its data is
wiped out as part of the process of replicating from the **primary** site.

## Configure a GitLab site to be the Geo **primary** site

The following steps enable a GitLab site to serve as the Geo **primary** site.

### Step 1: Configure the **primary** frontend nodes

1. Edit `/etc/gitlab/gitlab.rb` and add the following:

   ```ruby
   ##
   ## The unique identifier for the Geo site. See
   ## https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html#common-settings
   ##
   gitlab_rails['geo_node_name'] = '<site_name_here>'

   ##
   ## Disable automatic migrations
   ##
   gitlab_rails['auto_migrate'] = false
   ```

After making these changes, [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) so the changes take effect.

NOTE:
PostgreSQL and Redis should have already been disabled on the
application nodes during normal GitLab multi-node setup. Connections
from the application nodes to services on the backend nodes should
have also been configured. See multi-node configuration documentation for
[PostgreSQL](../../postgresql/replication_and_failover.md#configuring-the-application-nodes)
and [Redis](../../redis/replication_and_failover.md#example-configuration-for-the-gitlab-application).

## Configure the other GitLab site to be a Geo **secondary** site

A **secondary** site is similar to any other GitLab multi-node site, with three
major differences:

- The main PostgreSQL database is a read-only replica of the Geo **primary** site's
  PostgreSQL database.
- There is an additional PostgreSQL database for each Geo **secondary** site,
  called the "Geo tracking database", which tracks the replication and verification
  state of various resources.
- There is an additional GitLab service [`geo-logcursor`](../index.md#geo-log-cursor)

Therefore, we set up the multi-node components one by one and include deviations
from the normal multi-node setup. However, we highly recommend configuring a
brand-new GitLab site first, as if it were not part of a Geo setup. This allows
verifying that it is a working GitLab site. And only then should it be modified
for use as a Geo **secondary** site. This helps to separate Geo setup problems from
unrelated multi-node configuration problems.

### Step 1: Configure the Redis and Gitaly services on the Geo **secondary** site

Configure the following services, again using the non-Geo multi-node
documentation:

- [Configuring Redis for GitLab](../../redis/replication_and_failover.md#example-configuration-for-the-gitlab-application) for multiple nodes.
- [Gitaly](../../gitaly/index.md), which stores data that is
  synchronized from the Geo **primary** site.

NOTE:
[NFS](../../nfs.md) can be used in place of Gitaly but is not
recommended.

### Step 2: Configure Postgres streaming replication

Follow the [Geo database replication instructions](../setup/database.md).

If using an external PostgreSQL instance, refer also to
[Geo with external PostgreSQL instances](../setup/external_database.md).

### Step 3: Configure the Geo tracking database on the Geo **secondary** site

If you want to run the Geo tracking database in a multi-node PostgreSQL cluster,
then follow [Configuring Patroni cluster for the tracking PostgreSQL database](../setup/database.md#configuring-patroni-cluster-for-the-tracking-postgresql-database).

If you want to run the Geo tracking database on a single node, then follow
the instructions below.

1. Generate an MD5 hash of the desired password for the database user that the
   GitLab application uses to access the tracking database:

   The username (`gitlab_geo` by default) is incorporated into the
   hash.

   ```shell
   gitlab-ctl pg-password-md5 gitlab_geo
   # Enter password: <your_password_here>
   # Confirm password: <your_password_here>
   # fca0b89a972d69f00eb3ec98a5838484
   ```

   Use this hash to fill in `<tracking_database_password_md5_hash>` in the next
   step.

1. On the machine where the Geo tracking database is intended to run, add the
   following to `/etc/gitlab/gitlab.rb`:

   ```ruby
   ##
   ## Enable the Geo secondary tracking database
   ##
   geo_postgresql['enable'] = true
   geo_postgresql['listen_address'] = '<ip_address_of_this_host>'
   geo_postgresql['sql_user_password'] = '<tracking_database_password_md5_hash>'

   ##
   ## Configure PostgreSQL connection to the replica database
   ##
   geo_postgresql['md5_auth_cidr_addresses'] = ['<replica_database_ip>/32']
   gitlab_rails['db_host'] = '<replica_database_ip>'

   # Prevent reconfigure from attempting to run migrations on the replica database
   gitlab_rails['auto_migrate'] = false
   ```

After making these changes, [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) so the changes take effect.

If using an external PostgreSQL instance, refer also to
[Geo with external PostgreSQL instances](../setup/external_database.md).

### Step 4: Configure the frontend application nodes on the Geo **secondary** site

In the minimal [architecture diagram](#architecture-overview) above, there are two
machines running the GitLab application services. These services are enabled
selectively in the configuration.

Configure the GitLab Rails application nodes following the relevant steps
outlined in the [reference architectures](../../reference_architectures/index.md),
then make the following modifications:

1. Edit `/etc/gitlab/gitlab.rb` on each application node in the Geo **secondary**
   site, and add the following:

   ```ruby
   ##
   ## Enable GitLab application services. The application_role enables many services.
   ## Alternatively, you can choose to enable or disable specific services on
   ## different nodes to aid in horizontal scaling and separation of concerns.
   ##
   roles ['application_role']

   ## `application_role` already enables this. You only need this line if
   ## you selectively enable individual services that depend on Rails, like
   ## `puma`, `sidekiq`, `geo-logcursor`, and so on.
   gitlab_rails['enable'] = true

   ##
   ## Enable Geo Log Cursor service
   ##
   geo_logcursor['enable'] = true

   ##
   ## The unique identifier for the Geo site. See
   ## https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html#common-settings
   ##
   gitlab_rails['geo_node_name'] = '<site_name_here>'

   ##
   ## Disable automatic migrations
   ##
   gitlab_rails['auto_migrate'] = false

   ##
   ## Configure the connection to the tracking database
   ##
   geo_secondary['enable'] = true
   geo_secondary['db_host'] = '<geo_tracking_db_host>'
   geo_secondary['db_password'] = '<geo_tracking_db_password>'

   ##
   ## Configure connection to the streaming replica database, if you haven't
   ## already
   ##
   gitlab_rails['db_host'] = '<replica_database_host>'
   gitlab_rails['db_password'] = '<replica_database_password>'

   ##
   ## Configure connection to Redis, if you haven't already
   ##
   gitlab_rails['redis_host'] = '<redis_host>'
   gitlab_rails['redis_password'] = '<redis_password>'

   ##
   ## If you are using custom users not managed by Omnibus, you need to specify
   ## UIDs and GIDs like below, and ensure they match between nodes in a
   ## cluster to avoid permissions issues
   ##
   user['uid'] = 9000
   user['gid'] = 9000
   web_server['uid'] = 9001
   web_server['gid'] = 9001
   registry['uid'] = 9002
   registry['gid'] = 9002
   ```

NOTE:
If you had set up PostgreSQL cluster using the omnibus package and had set
`postgresql['sql_user_password'] = 'md5 digest of secret'`, keep in
mind that `gitlab_rails['db_password']` and `geo_secondary['db_password']`
contains the plaintext passwords. This is used to let the Rails
nodes connect to the databases.

NOTE:
Make sure that current node's IP is listed in
`postgresql['md5_auth_cidr_addresses']` setting of the read-replica database to
allow Rails on this node to connect to Postgres.

After making these changes [Reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) so the changes take effect.

In the [architecture overview](#architecture-overview) topology, the following GitLab
services are enabled on the "frontend" nodes:

- `geo-logcursor`
- `gitlab-pages`
- `gitlab-workhorse`
- `logrotate`
- `nginx`
- `registry`
- `remote-syslog`
- `sidekiq`
- `puma`

Verify these services exist by running `sudo gitlab-ctl status` on the frontend
application nodes.

### Step 5: Set up the LoadBalancer for the Geo **secondary** site

The minimal [architecture diagram](#architecture-overview) above shows a load
balancer at each geographic location to route traffic to the application nodes.

See [Load Balancer for GitLab with multiple nodes](../../load_balancer.md) for
more information.

### Step 6: Configure the backend application nodes on the Geo **secondary** site

The minimal [architecture diagram](#architecture-overview) above shows all application services
running together on the same machines. However, for multiple nodes we
[strongly recommend running all services separately](../../reference_architectures/index.md).

For example, a Sidekiq node could be configured similarly to the frontend
application nodes above, with some changes to run only the `sidekiq` service:

1. Edit `/etc/gitlab/gitlab.rb` on each Sidekiq node in the Geo **secondary**
   site, and add the following:

   ```ruby
   ##
   ## Enable the Sidekiq service
   ##
   sidekiq['enable'] = true
   gitlab_rails['enable'] = true

   ##
   ## The unique identifier for the Geo site. See
   ## https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html#common-settings
   ##
   gitlab_rails['geo_node_name'] = '<site_name_here>'

   ##
   ## Disable automatic migrations
   ##
   gitlab_rails['auto_migrate'] = false

   ##
   ## Configure the connection to the tracking database
   ##
   geo_secondary['enable'] = true
   geo_secondary['db_host'] = '<geo_tracking_db_host>'
   geo_secondary['db_password'] = '<geo_tracking_db_password>'

   ##
   ## Configure connection to the streaming replica database, if you haven't
   ## already
   ##
   gitlab_rails['db_host'] = '<replica_database_host>'
   gitlab_rails['db_password'] = '<replica_database_password>'

   ##
   ## Configure connection to Redis, if you haven't already
   ##
   gitlab_rails['redis_host'] = '<redis_host>'
   gitlab_rails['redis_password'] = '<redis_password>'

   ##
   ## If you are using custom users not managed by Omnibus, you need to specify
   ## UIDs and GIDs like below, and ensure they match between nodes in a
   ## cluster to avoid permissions issues
   ##
   user['uid'] = 9000
   user['gid'] = 9000
   web_server['uid'] = 9001
   web_server['gid'] = 9001
   registry['uid'] = 9002
   registry['gid'] = 9002
   ```

   You can similarly configure a node to run only the `geo-logcursor` service
   with `geo_logcursor['enable'] = true` and disabling Sidekiq with
   `sidekiq['enable'] = false`.

   These nodes do not need to be attached to the load balancer.