1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
|
# Praefect
NOTE: **Note:** Praefect is an experimental service, and for testing purposes only at
this time.
Praefect is an optional reverse-proxy for [Gitaly](../index.md) to manage a
cluster of Gitaly nodes for high availability through replication.
If a Gitaly node becomes unavailable, it will be possible to fail over to a
warm Gitaly replica.
The first minimal version will support:
- Eventual consistency of the secondary replicas.
- Manual fail over from the primary to the secondary.
Follow the [HA Gitaly epic](https://gitlab.com/groups/gitlab-org/-/epics/1489)
for updates and roadmap.
## Omnibus
### Architecture
The most common architecture for Praefect is simplified in the diagram below:
```mermaid
graph TB
GitLab --> Praefect;
Praefect --- PostgreSQL;
Praefect --> Gitaly1;
Praefect --> Gitaly2;
Praefect --> Gitaly3;
```
Where `GitLab` is the collection of clients that can request Git operations.
The Praefect node has three storage nodes attached. Praefect itself doesn't
store data, but connects to three Gitaly nodes, `Gitaly-1`, `Gitaly-2`, and `Gitaly-3`.
In order to keep track of replication state, Praefect relies on a
PostgreSQL database. This database is a single point of failure so you
should use a highly available PostgreSQL server for this. GitLab
itself needs a HA PostgreSQL server too, so you could optionally co-locate the Praefect
SQL database on the PostgreSQL server you use for the rest of GitLab.
Praefect may be enabled on its own node or can be run on the GitLab server.
In the example below we will use a separate server, but the optimal configuration
for Praefect is still being determined.
Praefect will handle all Gitaly RPC requests to its child nodes. However, the child nodes
will still need to communicate with the GitLab server via its internal API for authentication
purposes.
### Setup
In this setup guide we will start by configuring Praefect, then its child
Gitaly nodes, and lastly the GitLab server configuration.
#### Secrets
We need to manage the following secrets and make them match across hosts:
1. `GITLAB_SHELL_SECRET_TOKEN`: this is used by Git hooks to make
callback HTTP API requests to GitLab when accepting a Git push. This
secret is shared with GitLab Shell for legacy reasons.
1. `PRAEFECT_EXTERNAL_TOKEN`: repositories hosted on your Praefect
cluster can only be accessed by Gitaly clients that carry this
token.
1. `PRAEFECT_INTERNAL_TOKEN`: this token is used for replication
traffic inside your Praefect cluster. This is distinct from
`PRAEFECT_EXTERNAL_TOKEN` because Gitaly clients must not be able to
access internal nodes of the Praefect cluster directly; that could
lead to data loss.
1. `PRAEFECT_SQL_PASSWORD`: this password is used by Praefect to connect to
PostgreSQL.
#### Network addresses
1. `POSTGRESQL_SERVER`: the host name or IP address of your PostgreSQL server
#### PostgreSQL
To set up a Praefect cluster you need a highly available PostgreSQL
server. You need PostgreSQL 9.6 or newer. Praefect needs to have a SQL
user with the right to create databases.
In the instructions below we assume you have administrative access to
your PostgreSQL server via `psql`. Depending on your environment, you
may also be able to do this via the web interface of your cloud
platform, or via your configuration management system, etc.
Below we assume that you have administrative access as the `postgres`
user. First open a `psql` session as the `postgres` user:
```shell
psql -h POSTGRESQL_SERVER -U postgres -d template1
```
Once you are connected, run the following command. Replace
`PRAEFECT_SQL_PASSWORD` with the actual (random) password you
generated for the `praefect` SQL user:
```sql
CREATE ROLE praefect WITH LOGIN CREATEDB PASSWORD 'PRAEFECT_SQL_PASSWORD';
\q # exit psql
```
Now connect as the `praefect` user to create the database. This has
the side effect of verifying that you have access:
```shell
psql -h POSTGRESQL_SERVER -U praefect -d template1
```
Once you have connected as the `praefect` user, run:
```sql
CREATE DATABASE praefect_production WITH ENCODING=UTF8;
\q # quit psql
```
#### Praefect
On the Praefect node we disable all other services, including Gitaly. We list each
Gitaly node that will be connected to Praefect as members of the `praefect` hash in `praefect['virtual_storages']`.
In the example below, the Gitaly nodes are named `gitaly-N`. Note that one
node is designated as primary by setting the primary to `true`.
```ruby
# /etc/gitlab/gitlab.rb on praefect server
# Avoid running unnecessary services on the Gitaly server
postgresql['enable'] = false
redis['enable'] = false
nginx['enable'] = false
prometheus['enable'] = false
grafana['enable'] = false
unicorn['enable'] = false
sidekiq['enable'] = false
gitlab_workhorse['enable'] = false
gitaly['enable'] = false
# Prevent database connections during 'gitlab-ctl reconfigure'
gitlab_rails['rake_cache_clear'] = false
gitlab_rails['auto_migrate'] = false
praefect['enable'] = true
# Make Praefect accept connections on all network interfaces. You must use
# firewalls to restrict access to this address/port.
praefect['listen_addr'] = '0.0.0.0:2305'
# Replace PRAEFECT_EXTERNAL_TOKEN with a real secret
praefect['auth_token'] = 'PRAEFECT_EXTERNAL_TOKEN'
# Replace each instance of PRAEFECT_INTERNAL_TOKEN below with a real
# secret, distinct from PRAEFECT_EXTERNAL_TOKEN.
# Name of storage hash must match storage name in git_data_dirs on GitLab server.
praefect['virtual_storages'] = {
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://gitaly-1.internal:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
'primary' => true
},
'gitaly-2' => {
'address' => 'tcp://gitaly-2.internal:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
},
'gitaly-3' => {
'address' => 'tcp://gitaly-3.internal:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
}
}
}
praefect['database_host'] = 'POSTGRESQL_SERVER'
praefect['database_port'] = 5432
praefect['database_user'] = 'praefect'
praefect['database_password'] = 'PRAEFECT_SQL_PASSWORD'
praefect['database_dbname'] = 'praefect_production'
# Uncomment the line below if you do not want to use an encrypted
# connection to PostgreSQL
# praefect['database_sslmode'] = 'disable'
# Uncomment and modify these lines if you are using a TLS client
# certificate to connect to PostgreSQL
# praefect['database_sslcert'] = '/path/to/client-cert'
# praefect['database_sslkey'] = '/path/to/client-key'
# Uncomment and modify this line if your PostgreSQL server uses a custom
# CA
# praefect['database_sslrootcert'] = '/path/to/rootcert'
```
Save the file and [reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure).
After you reconfigure, verify that Praefect can reach PostgreSQL:
```shell
sudo -u git /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-ping
```
If the check fails, make sure you have followed the steps correctly. If you edit `/etc/gitlab/gitlab.rb`,
remember to run `sudo gitlab-ctl reconfigure` again before trying the
`sql-ping` command.
#### Gitaly
Next we will configure each Gitaly server assigned to Praefect. Configuration for these
is the same as a normal standalone Gitaly server, except that we use storage names and
auth tokens from Praefect instead of GitLab.
Below is an example configuration for `gitaly-1`, the only difference for the
other Gitaly nodes is the storage name under `git_data_dirs`.
Note that `gitaly['auth_token']` matches the `token` value listed under `praefect['virtual_storages']`
on the Praefect node.
```ruby
# /etc/gitlab/gitlab.rb on gitaly node inside praefect cluster
# Avoid running unnecessary services on the Gitaly server
postgresql['enable'] = false
redis['enable'] = false
nginx['enable'] = false
prometheus['enable'] = false
grafana['enable'] = false
unicorn['enable'] = false
sidekiq['enable'] = false
gitlab_workhorse['enable'] = false
prometheus_monitoring['enable'] = false
# Prevent database connections during 'gitlab-ctl reconfigure'
gitlab_rails['rake_cache_clear'] = false
gitlab_rails['auto_migrate'] = false
# Replace GITLAB_SHELL_SECRET_TOKEN below with real secret
gitlab_shell['secret_token'] = 'GITLAB_SHELL_SECRET_TOKEN'
# Configure the gitlab-shell API callback URL. Without this, `git push` will
# fail. This can be your 'front door' GitLab URL or an internal load
# balancer.
gitlab_rails['internal_api_url'] = 'https://gitlab.example.com'
# Replace PRAEFECT_INTERNAL_TOKEN below with a real secret.
gitaly['auth_token'] = 'PRAEFECT_INTERNAL_TOKEN'
# Make Gitaly accept connections on all network interfaces. You must use
# firewalls to restrict access to this address/port.
# Comment out following line if you only want to support TLS connections
gitaly['listen_addr'] = "0.0.0.0:8075"
git_data_dirs({
"gitaly-1" => {
"path" => "/var/opt/gitlab/git-data"
}
})
```
For more information on Gitaly server configuration, see our [Gitaly documentation](index.md#3-gitaly-server-configuration).
#### GitLab
When Praefect is running, it should be exposed as a storage to GitLab. This
is done through setting the `git_data_dirs`. Assuming the default storage
is present, there should be two storages available to GitLab:
```ruby
# /etc/gitlab/gitlab.rb on gitlab server
# Replace PRAEFECT_EXTERNAL_TOKEN below with real secret.
git_data_dirs({
"default" => {
"path" => "/var/opt/gitlab/git-data"
},
"praefect" => {
"gitaly_address" => "tcp://praefect.internal:2305",
"gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
}
})
# Replace GITLAB_SHELL_SECRET_TOKEN below with real secret
gitlab_shell['secret_token'] = 'GITLAB_SHELL_SECRET_TOKEN'
```
Note that the storage name used is the same as the `praefect['virtual_storage_name']` set
on the Praefect node.
Save your changes and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).
Run `gitlab-rake gitlab:gitaly:check` to confirm that GitLab can reach Praefect.
### Testing Praefect
To test Praefect, first set it as the default storage node for new projects
using **Admin Area > Settings > Repository > Repository storage**. Next,
create a new project and check the "Initialize repository with a README" box.
If you receive an error, check `/var/log/gitlab/gitlab-rails/production.log`.
Here are common errors and potential causes:
- 500 response code
- **ActionView::Template::Error (7:permission denied)**
- `praefect['auth_token']` and `gitlab_rails['gitaly_token']` do not match on the GitLab server.
- **Unable to save project. Error: 7:permission denied**
- Secret token in `praefect['storage_nodes']` on GitLab server does not match the
value in `gitaly['auth_token']` on one or more Gitaly servers.
- 503 response code
- **GRPC::Unavailable (14:failed to connect to all addresses)**
- GitLab was unable to reach Praefect.
- **GRPC::Unavailable (14:all SubCons are in TransientFailure...)**
- Praefect cannot reach one or more of its child Gitaly nodes.
|