diff options
Diffstat (limited to 'doc/development/geo/proxying.md')
-rw-r--r-- | doc/development/geo/proxying.md | 356 |
1 files changed, 356 insertions, 0 deletions
diff --git a/doc/development/geo/proxying.md b/doc/development/geo/proxying.md new file mode 100644 index 00000000000..41c7f426c6f --- /dev/null +++ b/doc/development/geo/proxying.md @@ -0,0 +1,356 @@ +--- +stage: Systems +group: Geo +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Geo proxying + +With Geo proxying, secondaries now proxy web requests through Workhorse to the primary, so users navigating to the +secondary see a read-write UI, and are able to do all operations that they can do on the primary. + +## Request life cycle + +### Top-level view + +The proxying interaction can be explained at a high level through the following diagram: + +```mermaid +sequenceDiagram +actor client +participant secondary +participant primary + +client->>secondary: GET /explore +secondary-->>primary: GET /explore (proxied) +primary-->>secondary: HTTP/1.1 200 OK [..] +secondary->>client: HTTP/1.1 200 OK [..] +``` + +### Proxy detection mechanism + +To know whether or not it should proxy requests to the primary, and the URL of the primary (as it is stored in +the database), Workhorse polls the internal API when Geo is enabled. When proxying should be enabled, the internal +API responds with the primary URL and JWT-signed data that is passed on to the primary for every request. + +```mermaid +sequenceDiagram + participant W as Workhorse (secondary) + participant API as Internal Rails API + W->API: GET /api/v4/geo/proxy (internal) + loop Poll every 10 seconds + API-->W: {geo_proxy_primary_url, geo_proxy_extra_data}, update config + end +``` + +### In-depth request flow and local data acceleration compared with proxying + +Detailing implementation, Workhorse on the secondary (requested) site decides whether to proxy the data or not. If it +can "accelerate" the data type (that is, can serve locally to save a roundtrip request), it returns the data +immediately. Otherwise, traffic is sent to the primary's internal URL, served by Workhorse on the primary exactly +as a direct request would. The response is then be proxied back to the user through the secondary Workhorse in the +same connection. + +```mermaid +flowchart LR + A[Client]--->W1["Workhorse (secondary)"] + W1 --> W1C[Serve data locally?] + W1C -- "Yes" ----> W1 + W1C -- "No (proxy)" ----> W2["Workhorse (primary)"] + W2 --> W1 ----> A +``` + +## Sign-in + +### Requests proxied to the primary requiring authorization + +```mermaid +sequenceDiagram +autoNumber +participant Client +participant Secondary +participant Primary + +Client->>Secondary: `/group/project` request +Secondary->>Primary: proxy /group/project +opt primary not signed in +Primary-->>Secondary: 302 redirect +Secondary-->>Client: proxy 302 redirect +Client->>Secondary: /users/sign_in +Secondary->>Primary: proxy /users/sign_in +Note right of Primary: authentication happens, POST to same URL etc +Primary-->>Secondary: 302 redirect +Secondary-->>Client: proxy 302 redirect +Client->>Secondary: /group/project +Secondary->>Primary: proxy /group/project +end +Primary-->>Secondary: /group/project logged in response (session on primary created) +Secondary-->>Client: proxy full response +``` + +### Requests requiring a user session on the secondary + +At the moment, this flow only applies to Project Replication Details and Design Replication Details in the Geo Admin +Area. For more context, see +[View replication data on the primary site](../../administration/geo/index.md#view-replication-data-on-the-primary-site). + +```mermaid +sequenceDiagram +autoNumber +participant Client +participant Secondary +participant Primary + +Client->>Secondary: `admin/geo/replication/projects` request +opt secondary not signed in +Secondary-->>Client: 302 redirect +Client->>Secondary: /users/auth/geo/sign_in +Secondary-->>Client: 302 redirect +Client->>Secondary: /oauth/geo/auth/geo/sign_in +Secondary-->>Client: 302 redirect +Client->>Secondary: /oauth/authorize +Secondary->>Primary: proxy /oauth/authorize +opt primary not signed in +Primary-->>Secondary: 302 redirect +Secondary-->>Client: proxy 302 redirect +Client->>Secondary: /users/sign_in +Secondary->>Primary: proxy /users/sign_in +Note right of Primary: authentication happens, POST to same URL etc +end +Primary-->>Secondary: 302 redirect +Secondary-->>Client: proxy 302 redirect +Client->>Secondary: /oauth/geo/callback +Secondary-->>Client: 302 redirect +Client->>Secondary: admin/geo/replication/projects +end +Secondary-->>Client: admin/geo/replication/projects logged in response (session on both primary and secondary) +``` + +## Git pull + +For historical reasons, the `push_from_secondary` path is used to forward a Git pull. There is [an issue proposing to +rename this route](https://gitlab.com/gitlab-org/gitlab/-/issues/292690) to avoid confusion. + +### Git pull over HTTP(s) + +#### Accelerated repositories + +When a repository exists on the secondary and we detect is up to date with the primary, we serve it directly instead of +proxying. + +```mermaid +sequenceDiagram +participant C as Git client +participant Wsec as "Workhorse (secondary)" +participant Rsec as "Rails (secondary)" +participant Gsec as "Gitaly (secondary)" +C->>Wsec: GET /foo/bar.git/info/refs/?service=git-upload-pack +Wsec->>Rsec: <internal API check> +note over Rsec: decide that the repo is synced and up to date +Rsec-->>Wsec: 401 Unauthorized +Wsec-->>C: <response> +C->>Wsec: GET /foo/bar.git/info/refs/?service=git-upload-pack +Wsec->>Rsec: <internal API check> +Rsec-->>Wsec: Render Workhorse OK +Wsec-->>C: 200 OK +C->>Wsec: POST /foo/bar.git/git-upload-pack +Wsec->>Rsec: GitHttpController#git_receive_pack +Rsec-->>Wsec: Render Workhorse OK +Wsec->>Gsec: Workhorse gets the connection details from Rails, connects to Gitaly: SmartHTTP Service, UploadPack RPC (check the proto for details) +Gsec-->>Wsec: Return a stream of Proto messages +Wsec-->>C: Pipe messages to the Git client +``` + +#### Proxied repositories + +If a requested repository isn't synced, or we detect is not up to date, the request will be proxied to the primary, in +order to get the latest version of the changes. + +```mermaid +sequenceDiagram +participant C as Git client +participant Wsec as "Workhorse (secondary)" +participant Rsec as "Rails (secondary)" +participant W as "Workhorse (primary)" +participant R as "Rails (primary)" +participant G as "Gitaly (primary)" +C->>Wsec: GET /foo/bar.git/info/refs/?service=git-upload-pack +Wsec->>Rsec: <response> +note over Rsec: decide that the repo is out of date +Rsec-->>Wsec: 302 Redirect to /-/push_from_secondary/2/foo/bar.git/info/refs?service=git-upload-pack +Wsec-->>C: <response> +C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-upload-pack +Wsec->>W: <proxied request> +W->>R: <data> +R-->>W: 401 Unauthorized +W-->>Wsec: <proxied response> +Wsec-->>C: <response> +C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-upload-pack +note over W: proxied +Wsec->>W: <proxied request> +W->>R: <data> +R-->>W: Render Workhorse OK +W-->>Wsec: <proxied response> +Wsec-->>C: <response> +C->>Wsec: POST /-/push_from_secondary/2/foo/bar.git/git-upload-pack +Wsec->>W: <proxied request> +W->>R: GitHttpController#git_receive_pack +R-->>W: Render Workhorse OK +W->>G: Workhorse gets the connection details from Rails, connects to Gitaly: SmartHTTP Service, UploadPack RPC (check the proto for details) +G-->>W: Return a stream of Proto messages +W-->>Wsec: Pipe messages to the Git client +Wsec-->>C: Return piped messages from Git +``` + +### Git pull over SSH + +As SSH operations go through GitLab Shell instead of Workhorse, they are not proxied through the mechanism used for +Workhorse requests. With SSH operations, they are proxied as Git HTTP requests to the primary site by the secondary +Rails internal API. + +#### Accelerated repositories + +When a repository exists on the secondary and we detect is up to date with the primary, we serve it directly instead of +proxying. + +```mermaid +sequenceDiagram +participant C as Git client +participant S as GitLab Shell (secondary) +participant I as Internal API (secondary Rails) +participant G as Gitaly (secondary) +C->>S: git pull +S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..) +I-->>S: HTTP/1.1 200 OK +S->>G: InfoRefs:UploadPack RPC +G-->>S: stream Git response back +S-->>C: stream Git response back +C-->>S: stream Git data to push +S->>G: UploadPack RPC +G-->>S: stream Git response back +S-->>C: stream Git response back +``` + +#### Proxied repositories + +If a requested repository isn't synced, or we detect is not up to date, the request will be proxied to the primary, in +order to get the latest version of the changes. + +```mermaid +sequenceDiagram +participant C as Git client +participant S as GitLab Shell (secondary) +participant I as Internal API (secondary Rails) +participant P as Primary API +C->>S: git pull +S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..) +I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo} +S->>I: POST /api/v4/geo/proxy_git_ssh/info_refs_upload_pack +I->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-upload-pack +P-->>I: HTTP/1.1 200 OK +I-->>S: <response> +S-->>C: return Git response from primary +C-->>S: stream Git data to push +S->>I: POST /api/v4/geo/proxy_git_ssh/upload_pack +I->>P: POST $PRIMARY/foo/bar.git/git-upload-pack +P-->>I: HTTP/1.1 200 OK +I-->>S: <response> +S-->>C: return Git response from primary +``` + +## Git push + +### Unified URLs + +With unified URLs, a push will redirect to a local path formatted as `/-/push_from_secondary/$SECONDARY_ID/*`. Further +requests through this path will be proxied to the primary, which will handle the push. + +#### Git push over SSH + +As SSH operations go through GitLab Shell instead of Workhorse, they are not proxied through the mechanism used for +Workhorse requests. With SSH operations, they are proxied as Git HTTP requests to the primary site by the secondary +Rails internal API. + +```mermaid +sequenceDiagram +participant C as Git client +participant S as GitLab Shell (secondary) +participant I as Internal API (secondary Rails) +participant P as Primary API +C->>S: git push +S->>I: SSH key validation (api/v4/internal/authorized_keys?key=..) +I-->>S: HTTP/1.1 300 (custom action status) with {endpoint, msg, primary_repo} +S->>I: POST /api/v4/geo/proxy_git_ssh/info_refs_receive_pack +I->>P: POST $PRIMARY/foo/bar.git/info/refs/?service=git-receive-pack +P-->>I: HTTP/1.1 200 OK +I-->>S: <response> +S-->>C: return Git response from primary +C-->>S: stream Git data to push +S->>I: POST /api/v4/geo/proxy_git_ssh/receive_pack +I->>P: POST $PRIMARY/foo/bar.git/git-receive-pack +P-->>I: HTTP/1.1 200 OK +I-->>S: <response> +S-->>C: return Git response from primary +``` + +#### Git push over HTTP(s) + +```mermaid +sequenceDiagram +participant C as Git client +participant Wsec as Workhorse (secondary) +participant W as Workhorse (primary) +participant R as Rails (primary) +participant G as Gitaly (primary) +C->>Wsec: GET /foo/bar.git/info/refs/?service=git-receive-pack +Wsec->>C: 302 Redirect to /-/push_from_secondary/2/foo/bar.git/info/refs?service=git-receive-pack +C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack +Wsec->>W: <proxied request> +W->>R: <data> +R-->>W: 401 Unauthorized +W-->>Wsec: <proxied response> +Wsec-->>C: <response> +C->>Wsec: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack +Wsec->>W: <proxied request> +W->>R: <data> +R-->>W: Render Workhorse OK +W-->>Wsec: <proxied response> +Wsec-->>C: <response> +C->>Wsec: POST /-/push_from_secondary/2/foo/bar.git/git-receive-pack +Wsec->>W: <proxied request> +W->>R: GitHttpController:git_receive_pack +R-->>W: Render Workhorse OK +W->>G: Get connection details from Rails and connects to SmartHTTP Service, ReceivePack RPC +G-->>W: Return a stream of Proto messages +W-->>Wsec: Pipe messages to the Git client +Wsec-->>C: Return piped messages from Git +``` + +### Git push over HTTP with Separate URLs + +With separate URLs, the secondary will redirect to a URL formatted like `$PRIMARY/-/push_from_secondary/$SECONDARY_ID/*`. + +```mermaid +sequenceDiagram +participant Wsec as Workhorse (secondary) +participant C as Git client +participant W as Workhorse (primary) +participant R as Rails (primary) +participant G as Gitaly (primary) +C->>Wsec: GET $SECONDARY/foo/bar.git/info/refs/?service=git-receive-pack +Wsec->>C: 302 Redirect to $PRIMARY/-/push_from_secondary/2/foo/bar.git/info/refs?service=git-receive-pack +C->>W: GET $PRIMARY/-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack +W->>R: <data> +R-->>W: 401 Unauthorized +W-->>C: <response> +C->>W: GET /-/push_from_secondary/2/foo/bar.git/info/refs/?service=git-receive-pack +W->>R: <data> +R-->>W: Render Workhorse OK +W-->>C: <response> +C->>W: POST /-/push_from_secondary/2/foo/bar.git/git-receive-pack +W->>R: GitHttpController:git_receive_pack +R-->>W: Render Workhorse OK +W->>G: Get connection details from Rails and connects to SmartHTTP Service, ReceivePack RPC +G-->>W: Return a stream of Proto messages +W-->>C: Pipe messages to the Git client +``` |