summaryrefslogtreecommitdiff
path: root/doc/administration/geo/secondary_proxy/index.md
blob: 2786982bb511f412c0775c2679903d7150ae9e52 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
stage: Systems
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
type: howto
---

# Geo proxying for secondary sites **(PREMIUM SELF)**

> - [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/5914) in GitLab 14.4 [with a flag](../../feature_flags.md) named `geo_secondary_proxy`. Disabled by default.
> - [Enabled by default for unified URLs](https://gitlab.com/gitlab-org/gitlab/-/issues/325732) in GitLab 14.6.
> - [Disabled by default for different URLs](https://gitlab.com/gitlab-org/gitlab/-/issues/325732) in GitLab 14.6 [with a flag](../../feature_flags.md) named `geo_secondary_proxy_separate_urls`.
> - [Enabled by default for different URLs](https://gitlab.com/gitlab-org/gitlab/-/issues/346112) in GitLab 15.1.

Use Geo proxying to:

- Have secondary sites serve read-write traffic by proxying to the primary site.
- Selectively accelerate replicated data types by directing read-only operations to the local site instead.

When enabled, users of the secondary site can use the WebUI as if they were accessing the
primary site's UI. This significantly improves the overall user experience of secondary sites.

With secondary proxying, web requests to secondary Geo sites are
proxied directly to the primary, and appear to act as a read-write site.

Proxying is done by the [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse) component.
Traffic usually sent to the Rails application on the Geo secondary site is proxied
to the [internal URL](../index.md#internal-url) of the primary Geo site instead.

Use secondary proxying for use-cases including:

- Having all Geo sites behind a single URL.
- Geographically load-balancing traffic without worrying about write access.

<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see: [Secondary proxying using geographic load-balancer and AWS Route53](https://www.youtube.com/watch?v=TALLy7__Na8).

## Set up a unified URL for Geo sites

Secondary sites can transparently serve read-write traffic. You can
use a single external URL so that requests can hit either the primary Geo site
or any secondary Geo sites that use Geo proxying.

### Configure an external URL to send traffic to both sites

Follow the [Location-aware public URL](location_aware_external_url.md) steps to create
a single URL used by all Geo sites, including the primary.

### Update the Geo sites to use the same external URL

1. On your Geo sites, SSH **into each node running Rails (Puma, Sidekiq, Log-Cursor)
   and change the `external_url` to that of the single URL:

   ```shell
   sudo editor /etc/gitlab/gitlab.rb
   ```

1. Reconfigure the updated nodes for the change to take effect if the URL was different than the one already set:

   ```shell
   gitlab-ctl reconfigure
   ```

1. To match the new external URL set on the secondary Geo sites, the primary database
   needs to reflect this change.

   In the Geo administration page of the **primary** site, edit each Geo secondary that
   is using the secondary proxying and set the `URL` field to the single URL.
   Make sure the primary site is also using this URL.

In Kubernetes, you can use the same domain under `global.hosts.domain` as for the primary site.

## Disable Geo proxying

You can disable the secondary proxying on each Geo site, separately, by following these steps with Omnibus-based packages:

1. SSH into each application node (serving user traffic directly) on your secondary Geo site
   and add the following environment variable:

   ```shell
   sudo editor /etc/gitlab/gitlab.rb
   ```

   ```ruby
   gitlab_workhorse['env'] = {
     "GEO_SECONDARY_PROXY" => "0"
   }
   ```

1. Reconfigure the updated nodes for the change to take effect:

   ```shell
   gitlab-ctl reconfigure
   ```

In Kubernetes, you can use `--set gitlab.webservice.extraEnv.GEO_SECONDARY_PROXY="0"`,
or specify the following in your values file:

```yaml
gitlab:
  webservice:
    extraEnv:
      GEO_SECONDARY_PROXY: "0"
```

## Geo proxying with Separate URLs

> Geo secondary proxying for separate URLs is [enabled by default](https://gitlab.com/gitlab-org/gitlab/-/issues/346112) in GitLab 15.1.

NOTE:
The feature flag described in this section is planned to be deprecated and removed in a future release. Support for read-only Geo secondary sites is proposed in [issue 366810](https://gitlab.com/gitlab-org/gitlab/-/issues/366810), you can upvote and share your use cases in that issue.

There are minor known issues linked in the
["Geo secondary proxying with separate URLs" epic](https://gitlab.com/groups/gitlab-org/-/epics/6865).
You can also add feedback in the epic about any use-cases that
are not possible anymore with proxying enabled.

If you run into issues, to disable this feature, disable the `geo_secondary_proxy_separate_urls` feature flag.
SSH into one node running Rails on your primary Geo site and run:

```shell
sudo gitlab-rails runner "Feature.disable(:geo_secondary_proxy_separate_urls)"
```

In Kubernetes, you can run the same command in the toolbox pod. Refer to the
[Kubernetes cheat sheet](https://docs.gitlab.com/charts/troubleshooting/kubernetes_cheat_sheet.html#gitlab-specific-kubernetes-information)
for details.

## Limitations

- When secondary proxying is used, the asynchronous Geo replication can cause unexpected issues for accelerated
  data types that may be replicated to the Geo secondaries with a delay.

  For example, we found a potential issue where
  [replication lag introduces read-after-write inconsistencies](https://gitlab.com/gitlab-org/gitlab/-/issues/345267).
  If the replication lag is high enough, this can result in Git reads receiving stale data when hitting a secondary.

- Non-Rails requests are not proxied, so other services may need to use a separate, non-unified URL to ensure requests
  are always sent to the primary. These services include:

  - GitLab Container Registry - [can be configured to use a separate domain](../../packages/container_registry.md#configure-container-registry-under-its-own-domain).
  - GitLab Pages - should always use a separate domain, as part of [the prerequisites for running GitLab Pages](../../pages/index.md#prerequisites).

- With a unified URL, Let's Encrypt can't generate certificates unless it can reach both IPs through the same domain.
  To use TLS certificates with Let's Encrypt, you can manually point the domain to one of the Geo sites, generate
  the certificate, then copy it to all other sites.

- [Viewing projects and designs data from a primary site is not possible when using a unified URL](../index.md#view-replication-data-on-the-primary-site).

- When secondary proxying is used together with separate URLs, registering [GitLab runners](https://docs.gitlab.com/runner/) to clone from
secondary sites is not supported. The runner registration will succeed, but the clone URL will default to the primary site. The runner
[clone URL](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section) is configured per GitLab deployment
and cannot be configured per Geo site. Therefore, all runners will clone from the primary site (or configured clone URL) irrespective of
which Geo site they register on. For information about GitLab CI using a specific Geo secondary to clone from, see issue
[3294](https://gitlab.com/gitlab-org/gitlab/-/issues/3294#note_1009488466).

## Behavior of secondary sites when the primary Geo site is down

Considering that web traffic is proxied to the primary, the behavior of the secondary sites differs when the primary
site is inaccessible:

- UI and API traffic return the same errors as the primary (or fail if the primary is not accessible at all), since they are proxied.
- For repositories that already exist on the specific secondary site being accessed, Git read operations still work as expected,
  including authentication through HTTP(s) or SSH.
- Git operations for repositories that are not replicated to the secondary site return the same errors
  as the primary site, since they are proxied.
- All Git write operations return the same errors as the primary site, since they are proxied.

## Features accelerated by secondary Geo sites

Most HTTP traffic sent to a secondary Geo site can be proxied to the primary Geo site. With this architecture,
secondary Geo sites are able to support write requests. Certain **read** requests are handled locally by secondary
sites for improved latency and bandwidth nearby. All write requests are proxied to the primary site.

The following table details the components currently tested through the Geo secondary site Workhorse proxy.
It does not cover all data types, more will be added in the future as they are tested.

| Feature / component                                 | Accelerated reads?     |
|:----------------------------------------------------|:-----------------------|
| Project, wiki, design repository (using the web UI) | **{dotted-circle}** No |
| Project, wiki repository (using Git)                | **{check-circle}** Yes <sup>1</sup> |
| Project, Personal Snippet (using the web UI)        | **{dotted-circle}** No |
| Project, Personal Snippet (using Git)               | **{check-circle}** Yes <sup>1</sup> |
| Group wiki repository (using the web UI)            | **{dotted-circle}** No |
| Group wiki repository (using Git)                   | **{check-circle}** Yes <sup>1</sup> |
| User uploads                                        | **{dotted-circle}** No |
| LFS objects (using the web UI)                      | **{dotted-circle}** No |
| LFS objects (using Git)                             | **{check-circle}** Yes |
| Pages                                               | **{dotted-circle}** No <sup>2</sup> |
| Advanced search (using the web UI)                  | **{dotted-circle}** No |

1. Git reads are served from the local secondary while pushes get proxied to the primary.
   Selective sync or cases where repositories don't exist locally on the Geo secondary throw a "not found" error.
1. Pages can use the same URL (without access control), but must be configured separately and are not proxied.