summaryrefslogtreecommitdiff
path: root/doc/architecture/blueprints/pods
diff options
context:
space:
mode:
Diffstat (limited to 'doc/architecture/blueprints/pods')
-rw-r--r--doc/architecture/blueprints/pods/images/iteration0-organizations-introduction.pngbin0 -> 67160 bytes
-rw-r--r--doc/architecture/blueprints/pods/images/pods-and-fulfillment.pngbin0 -> 75803 bytes
-rw-r--r--doc/architecture/blueprints/pods/images/term-cluster.pngbin0 -> 63268 bytes
-rw-r--r--doc/architecture/blueprints/pods/images/term-organization.pngbin0 -> 7150 bytes
-rw-r--r--doc/architecture/blueprints/pods/images/term-pod.png (renamed from doc/architecture/blueprints/pods/term-pod.png)bin16104 -> 16104 bytes
-rw-r--r--doc/architecture/blueprints/pods/images/term-top-level-namespace.png (renamed from doc/architecture/blueprints/pods/term-top-level-namespace.png)bin11451 -> 11451 bytes
-rw-r--r--doc/architecture/blueprints/pods/index.md122
-rw-r--r--doc/architecture/blueprints/pods/iteration0-organizations-introduction.pngbin326285 -> 0 bytes
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-data-migration.md82
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-database-sequences.md94
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-git-access.md163
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-graphql.md94
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-organizations.md58
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-router-endpoints-classification.md46
-rw-r--r--doc/architecture/blueprints/pods/pods-feature-template.md29
-rw-r--r--doc/architecture/blueprints/pods/proposal-stateless-router-with-buffering-requests.md648
-rw-r--r--doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md672
-rw-r--r--doc/architecture/blueprints/pods/term-cluster.pngbin271291 -> 0 bytes
-rw-r--r--doc/architecture/blueprints/pods/term-organization.pngbin22575 -> 0 bytes
19 files changed, 1975 insertions, 33 deletions
diff --git a/doc/architecture/blueprints/pods/images/iteration0-organizations-introduction.png b/doc/architecture/blueprints/pods/images/iteration0-organizations-introduction.png
new file mode 100644
index 00000000000..5725b0fa71f
--- /dev/null
+++ b/doc/architecture/blueprints/pods/images/iteration0-organizations-introduction.png
Binary files differ
diff --git a/doc/architecture/blueprints/pods/images/pods-and-fulfillment.png b/doc/architecture/blueprints/pods/images/pods-and-fulfillment.png
new file mode 100644
index 00000000000..aab8556a5d3
--- /dev/null
+++ b/doc/architecture/blueprints/pods/images/pods-and-fulfillment.png
Binary files differ
diff --git a/doc/architecture/blueprints/pods/images/term-cluster.png b/doc/architecture/blueprints/pods/images/term-cluster.png
new file mode 100644
index 00000000000..87e4d631551
--- /dev/null
+++ b/doc/architecture/blueprints/pods/images/term-cluster.png
Binary files differ
diff --git a/doc/architecture/blueprints/pods/images/term-organization.png b/doc/architecture/blueprints/pods/images/term-organization.png
new file mode 100644
index 00000000000..4c82c62b8f4
--- /dev/null
+++ b/doc/architecture/blueprints/pods/images/term-organization.png
Binary files differ
diff --git a/doc/architecture/blueprints/pods/term-pod.png b/doc/architecture/blueprints/pods/images/term-pod.png
index d8f79df2f29..d8f79df2f29 100644
--- a/doc/architecture/blueprints/pods/term-pod.png
+++ b/doc/architecture/blueprints/pods/images/term-pod.png
Binary files differ
diff --git a/doc/architecture/blueprints/pods/term-top-level-namespace.png b/doc/architecture/blueprints/pods/images/term-top-level-namespace.png
index c1cd317d878..c1cd317d878 100644
--- a/doc/architecture/blueprints/pods/term-top-level-namespace.png
+++ b/doc/architecture/blueprints/pods/images/term-top-level-namespace.png
Binary files differ
diff --git a/doc/architecture/blueprints/pods/index.md b/doc/architecture/blueprints/pods/index.md
index 01d56c483ea..3ba319d169b 100644
--- a/doc/architecture/blueprints/pods/index.md
+++ b/doc/architecture/blueprints/pods/index.md
@@ -1,15 +1,15 @@
---
-stage: enablement
-group: pods
-comments: false
-description: 'Pods'
+status: accepted
+creation-date: "2022-09-07"
+authors: [ "@fzimmer", "@DylanGriffith" ]
+coach: "@kamil"
+approvers: [ "@fzimmer" ]
+owning-stage: "~devops::enablement"
+participating-stages: []
---
# Pods
-DISCLAIMER:
-This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
-
This document is a work-in-progress and represents a very early state of the Pods design. Significant aspects are not documented, though we expect to add them in the future.
## Summary
@@ -24,7 +24,7 @@ We use the following terms to describe components and properties of the Pods arc
A Pod is a set of infrastructure components that contains multiple top-level namespaces that belong to different organizations. The components include both datastores (PostgreSQL, Redis etc.) and stateless services (web etc.). The infrastructure components provided within a Pod are shared among organizations and their top-level namespaces but not shared with other Pods. This isolation of infrastructure components means that Pods are independent from each other.
-![Term Pod](term-pod.png)
+![Term Pod](images/term-pod.png)
#### Pod properties
@@ -42,7 +42,7 @@ Discouraged synonyms: GitLab instance, cluster, shard
A cluster is a collection of Pods.
-![Term Cluster](term-cluster.png)
+![Term Cluster](images/term-cluster.png)
#### Cluster properties
@@ -66,7 +66,7 @@ Organizations work under the following assumptions:
1. Users understand that the majority of pages they view are only scoped to a single organization at a time.
1. Organizations are located on a single pod.
-![Term Organization](term-organization.png)
+![Term Organization](images/term-organization.png)
#### Organization properties
@@ -94,7 +94,7 @@ Top-level namespaces may [be replaced by workspaces](https://gitlab.com/gitlab-o
Discouraged synonyms: Root-level namespace
-![Term Top-level Namespace](term-top-level-namespace.png)
+![Term Top-level Namespace](images/term-top-level-namespace.png)
#### Top-level namespace properties
@@ -111,8 +111,8 @@ Users are available globally and not restricted to a single Pod. Users can be me
- Users can create multiple top-level namespaces
- Users can be a member of multiple top-level namespaces
- Users can be a member of multiple organizations
-- Users can administrate organizations
-- User activity is aggregated within an organization
+- Users can administer organizations
+- User activity is aggregated in an organization
- Every user has one personal namespace
## Goals
@@ -160,6 +160,59 @@ A number of technical issues need to be resolved to implement Pods (in no partic
1. How are Pods provisioned?
1. How can Pods implement disaster recovery capabilities?
+## Cross-section impact
+
+Pods is a fundamental architecture change that impacts other sections and stages. This section summarizes and links to other groups that may be impacted and highlights potential conflicts that need to be resolved. The Pods group is not responsible for achieving the goals of other groups but we want to ensure that dependencies are resolved.
+
+### Summary
+
+Based on discussions with other groups the net impact of introducing Pods and a new entity called organizations is mostly neutral. It may slow down development in some areas. We did not discover major blockers for other teams.
+
+1. We need to resolve naming conflicts (proposal is TBD)
+1. Pods requires introducing Organizations. Organizations are a new entity **above** top-level groups. Because this is a new entity, it may impact the ability to consolidate settings for Group Workspace and influence their decision on [how to approach introducing a workspace](https://gitlab.com/gitlab-org/gitlab/-/issues/376285#approach-2-workspace-is-built-on-top-of-top-level-groups)
+1. Organizations may make it slightly easier for Fulfillment to realize their billing plans.
+
+### Impact on Group Manage Workspace
+
+We synced with the Workspace PM and Designer ([recording](https://youtu.be/b5Opn9cFWFk)) and discussed the similarities and differences between the Pods and Workspace proposal ([presentation](https://docs.google.com/presentation/d/1FsUi22Up15b_tu6p2m-yLML3hCZ3rgrZrmzJAxUsNmU/edit?usp=sharing)).
+
+#### Goals of Group Manage Workspace
+
+As defined in the [workspace documentation](../../../user/workspace/index.md):
+
+1. Create an entity to manage everything you do as a GitLab administrator, including:
+ 1. Defining and applying settings to all of your groups, subgroups, and projects.
+ 1. Aggregating data from all your groups, subgroups, and projects.
+1. Reach feature parity between SaaS and self-managed installations, with all Admin Area settings moving to groups (?). Hardware controls remain on the instance level.
+
+The [workspace roadmap outlines](https://gitlab.com/gitlab-org/gitlab/-/issues/368237#high-level-goals) the current goals in detail.
+
+#### Potential conflicts with Pods
+
+- Workspace and Organization are different terms for the same entity. Both define a new entity as the primary organizational object for groups and projects. This is mainly a semantic difference and **we need to decide on a name** following [user research to decide if workspace](https://gitlab.com/gitlab-org/ux-research/-/issues/2147). This is also driven by the fact that the Remote Development team is looking at better names and [are considering the term Workspace as well](https://gitlab.com/gitlab-com/Product/-/issues/4812).
+- We will only introduce one entity
+- Group workspace highlighted the need to further validate the key assumption that users only care about what happens within their organization.
+
+### Impact on Fulfillment
+
+We synced with Fulfillment ([recording](https://youtu.be/FkQF3uF7vTY)) to discuss how Pods would impact them. Fulfillment is supportive of an entity above top-level namespaces. Their perspective is outline in [!5639](https://gitlab.com/gitlab-org/customers-gitlab-com/-/merge_requests/5639/diffs).
+
+#### Goals of Fulfillment
+
+- Fulfillment has a longstanding plan to move billing from the top-level namespace to a level above. This would mean that a license applies for an organization and all its top-level namespaces.
+- Fulfillment uses Zuora for billing and would like to have a 1-to-1 relationship between an organization and their Zuora entity called BillingAccount. They want to move away from tying a license to a single user.
+- If a customer needs multiple organizations, the corresponding BillingAccounts can be rolled up into a consolidated billing account (similar to [AWS consolidated billing](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidated-billing.html))
+- Ideally, a self-managed instance has a single Organization by default, which should be enough for most customers.
+- Fulfillment prefers only one additional entity.
+
+A rough representation of this is:
+
+![Pods and Fulfillment](images/pods-and-fulfillment.png)
+
+#### Potential conflicts with Pods
+
+- There are no known conflicts between Fulfillment's plans and Pods
+
## Iteration plan
We can't ship the entire Pods architecture in one go - it is too large. Instead, we are adopting an iteration plan that provides value along the way.
@@ -189,7 +242,7 @@ Organizations solve the following problems:
1. Self-managed instances would set a default organization.
1. Organizations can control user-profiles in a central way. This could be achieved by having an organization specific user-profile. Such a profile makes it possible for the organization administrators to control the user role in a company, enforce user emails, or show a graphical indicator of a user being part of the organization. An example would be a "GitLab Employee stamp" on comments.
-![Move to Organizations](iteration0-organizations-introduction.png)
+![Move to Organizations](images/iteration0-organizations-introduction.png)
#### Why would customers opt-in to Organizations?
@@ -251,28 +304,31 @@ Based on user research, we may want to change certain features to work across or
- Specific features allow for cross-organization interactions, for example forking, search.
-### Links
+## Technical Proposals
+
+The Pods architecture do have long lasting implications to data processing, location, scalability and the GitLab architecture.
+This section links all different technical proposals that are being evaluated.
+
+- [Stateless Router That Uses a Cache to Pick Pod and Is Redirected When Wrong Pod Is Reached](proposal-stateless-router-with-buffering-requests.md)
+
+- [Stateless Router That Uses a Cache to Pick Pod and pre-flight `/api/v4/pods/learn`](proposal-stateless-router-with-routes-learning.md)
+
+## Impacted features
+
+The Pods architecture will impact many features requiring some of them to be rewritten, or changed significantly.
+This is the list of known affected features with the proposed solutions.
+
+- [Pods: Git Access](pods-feature-git-access.md)
+- [Pods: Data Migration](pods-feature-data-migration.md)
+- [Pods: Database Sequences](pods-feature-database-sequences.md)
+- [Pods: GraphQL](pods-feature-graphql.md)
+- [Pods: Organizations](pods-feature-organizations.md)
+- [Pods: Router Endpoints Classification](pods-feature-router-endpoints-classification.md)
+
+## Links
- [Internal Pods presentation](https://docs.google.com/presentation/d/1x1uIiN8FR9fhL7pzFh9juHOVcSxEY7d2_q4uiKKGD44/edit#slide=id.ge7acbdc97a_0_155)
- [Pods Epic](https://gitlab.com/groups/gitlab-org/-/epics/7582)
- [Database Group investigation](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/database/doc/root-namespace-sharding.html)
- [Shopify Pods architecture](https://shopify.engineering/a-pods-architecture-to-allow-shopify-to-scale)
- [Opstrace architecture](https://gitlab.com/gitlab-org/opstrace/opstrace/-/blob/main/docs/architecture/overview.md)
-
-### Who
-
-| Role | Who
-|------------------------------|-------------------------|
-| Author | Fabian Zimmer |
-| Architecture Evolution Coach | Kamil Trzciński |
-| Engineering Leader | TBD |
-| Product Manager | Fabian Zimmer |
-| Domain Expert / Database | TBD |
-
-DRIs:
-
-| Role | Who
-|------------------------------|------------------------|
-| Leadership | TBD |
-| Product | Fabian Zimmer |
-| Engineering | Thong Kuah |
diff --git a/doc/architecture/blueprints/pods/iteration0-organizations-introduction.png b/doc/architecture/blueprints/pods/iteration0-organizations-introduction.png
deleted file mode 100644
index 5f5cad7b169..00000000000
--- a/doc/architecture/blueprints/pods/iteration0-organizations-introduction.png
+++ /dev/null
Binary files differ
diff --git a/doc/architecture/blueprints/pods/pods-feature-data-migration.md b/doc/architecture/blueprints/pods/pods-feature-data-migration.md
new file mode 100644
index 00000000000..fad6bca45fa
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-data-migration.md
@@ -0,0 +1,82 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: Data migration'
+---
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: Data migration
+
+It is essential for Pods architecture to provide a way to migrate data out of big Pods
+into smaller ones. This describes various approaches to provide this type of split.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+### 3.1. Split large Pods
+
+A single Pod can only be divided into many Pods. This is based on principle
+that it is easier to create exact clone of an existing Pod in many replicas
+out of which some will be made authoritative once migrated. Keeping those
+replicas up-to date with Pod 0 is also much easier due to pre-existing
+replication solutions that can replicate the whole systems: Geo, PostgreSQL
+physical replication, etc.
+
+1. All data of an organization needs to not be divided across many Pods.
+1. Split should be doable online.
+1. New Pods cannot contain pre-existing data.
+1. N Pods contain exact replica of Pod 0.
+1. The data of Pod 0 is live replicated to as many Pods it needs to be split.
+1. Once consensus is achieved between Pod 0 and N-Pods the organizations to be migrated away
+ are marked as read-only cluster-wide.
+1. The `routes` is updated on for all organizations to be split to indicate an authorative
+ Pod holding the most recent data, like `gitlab-org` on `pod-100`.
+1. The data for `gitlab-org` on Pod 0, and on other non-authoritative N-Pods are dormant
+ and will be removed in the future.
+1. All accesses to `gitlab-org` on a given Pod are validated about `pod_id` of `routes`
+ to ensure that given Pod is authoritative to handle the data.
+
+### 3.2. Migrate organization from an existing Pod
+
+This is different to split, as we intend to perform logical and selective replication
+of data belonging to a single organization.
+
+Today this type of selective replication is only implemented by Gitaly where we can migrate
+Git repository from a single Gitaly node to another with minimal downtime.
+
+In this model we would require identifying all resources belonging to a given organization:
+database rows, object storage files, Git repositories, etc. and selectively copy them over
+to another (likely) existing Pod importing data into it. Ideally ensuring that we can
+perform logical replication live of all changed data, but change similarly to split
+which Pod is authoritative for this organization.
+
+1. It is hard to identify all resources belonging to organization.
+1. It requires either downtime for organization or a robust system to identify
+ live changes made.
+1. It likely will require a full database structure analysis (more robust than project import/export)
+ to perform selective PostgreSQL logical replication.
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/pods/pods-feature-database-sequences.md b/doc/architecture/blueprints/pods/pods-feature-database-sequences.md
new file mode 100644
index 00000000000..0a8bb4d250e
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-database-sequences.md
@@ -0,0 +1,94 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: Database Sequences'
+---
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: Database Sequences
+
+GitLab today ensures that every database row create has unique ID, allowing
+to access Merge Request, CI Job or Project by a known global ID.
+
+Pods will use many distinct and not connected databases, each of them having
+a separate IDs for most of entities.
+
+It might be desirable to retain globally unique IDs for all database rows
+to allow migrating resources between Pods in the future.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+This are some preliminary ideas how we can retain unique IDs across the system.
+
+### 3.1. UUID
+
+Instead of using incremental sequences use UUID (128 bit) that is stored in database.
+
+- This might break existing IDs and requires adding UUID column for all existing tables.
+- This makes all indexes larger as it requires storing 128 bit instead of 32/64 bit in index.
+
+### 3.2. Use Pod index encoded in ID
+
+Since significant number of tables already use 64 bit ID numbers we could use MSB to encode
+Pod ID effectively enabling
+
+- This might limit amount of Pods that can be enabled in system, as we might decide to only
+ allocate 1024 possible Pod numbers.
+- This might make IDs to be migratable between Pods, since even if entity from Pod 1 is migrated to Pod 100
+ this ID would still be unique.
+- If resources are migrated the ID itself will not be enough to decode Pod number and we would need
+ lookup table.
+- This requires updating all IDs to 32 bits.
+
+### 3.3. Allocate sequence ranges from central place
+
+Each Pod might receive its own range of the sequences as they are consumed from a centrally managed place.
+Once Pod consumes all IDs assigned for a given table it would be replenished and a next range would be allocated.
+Ranges would be tracked to provide a faster lookup table if a random access pattern is required.
+
+- This might make IDs to be migratable between Pods, since even if entity from Pod 1 is migrated to Pod 100
+ this ID would still be unique.
+- If resources are migrated the ID itself will not be enough to decode Pod number and we would need
+ much more robust lookup table as we could be breaking previously assigned sequence ranges.
+- This does not require updating all IDs to 64 bits.
+- This adds some performance penalty to all `INSERT` statements in Postgres or at least from Rails as we need to check for the sequence number and potentially wait for our range to be refreshed from the ID server
+- The available range will need to be stored and incremented in a centralized place so that concurrent transactions cannot possibly get the same value.
+
+### 3.4. Define only some tables to require unique IDs
+
+Maybe this is acceptable only for some tables to have a globally unique IDs. It could be projects, groups
+and other top-level entities. All other tables like `merge_requests` would only offer Pod-local ID,
+but when referenced outside it would rather use IID (an ID that is monotonic in context of a given resource, like project).
+
+- This makes the ID 10000 for `merge_requests` be present on all Pods, which might be sometimes confusing
+ as for uniqueness of the resource.
+- This might make random access by ID (if ever needed) be impossible without using composite key, like: `project_id+merge_request_id`.
+- This would require us to implement a transformation/generation of new ID if we need to migrate records to another pod. This can lead to very difficult migration processes when these IDs are also used as foreign keys for other records being migrated.
+- If IDs need to change when moving between pods this means that any links to records by ID would no longer work even if those links included the `project_id`.
+- If we plan to allow these ids to not be unique and change the unique constraint to be based on a composite key then we'd need to update all foreign key references to be based on the composite key
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/pods/pods-feature-git-access.md b/doc/architecture/blueprints/pods/pods-feature-git-access.md
new file mode 100644
index 00000000000..ae996281d46
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-git-access.md
@@ -0,0 +1,163 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: Git Access'
+---
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: Git Access
+
+This document describes impact of Pods architecture on all Git access (over HTTPS and SSH)
+patterns providing explanantion of how potentially those features should be changed
+to work well with Pods.
+
+## 1. Definition
+
+Git access is done through out the application. It can be an operation performed by the system
+(read Git repository) or by user (create a new file via Web IDE, `git clone` or `git push` via command line).
+
+The Pods architecture defines that all Git repositories will be local to the Pod,
+so no repository could be shared with another Pod.
+
+The Pods architecture will require that any Git operation done can only be handled by a Pod holding
+the data. It means that any operation either via Web interface, API, or GraphQL needs to be routed
+to the correct Pod. It means that any `git clone` or `git push` operation can only be performed
+in a context of a Pod.
+
+## 2. Data flow
+
+The are various operations performed today by the GitLab on a Git repository. This describes
+the data flow how they behave today to better represent the impact.
+
+It appears that Git access does require changes only to a few endpoints that are scoped to project.
+There appear to be different types of repositories:
+
+- Project: assigned to Group
+- Wiki: additional repository assigned to Project
+- Design: similar to Wiki, additional repository assigned to Project
+- Snippet: creates a virtual project to hold repository, likely tied to the User
+
+### 2.1. Git clone over HTTPS
+
+Execution of: `git clone` over HTTPS
+
+```mermaid
+sequenceDiagram
+ User ->> Workhorse: GET /gitlab-org/gitlab.git/info/refs?service=git-upload-pack
+ Workhorse ->> Rails: GET /gitlab-org/gitlab.git/info/refs?service=git-upload-pack
+ Rails ->> Workhorse: 200 OK
+ Workhorse ->> Gitaly: RPC InfoRefsUploadPack
+ Gitaly ->> User: Response
+ User ->> Workhorse: POST /gitlab-org/gitlab.git/git-upload-pack
+ Workhorse ->> Gitaly: RPC PostUploadPackWithSidechannel
+ Gitaly ->> User: Response
+```
+
+### 2.2. Git clone over SSH
+
+Execution of: `git clone` over SSH
+
+```mermaid
+sequenceDiagram
+ User ->> Git SSHD: ssh git@gitlab.com
+ Git SSHD ->> Rails: GET /api/v4/internal/authorized_keys
+ Rails ->> Git SSHD: 200 OK (list of accepted SSH keys)
+ Git SSHD ->> User: Accept SSH
+ User ->> Git SSHD: git clone over SSH
+ Git SSHD ->> Rails: POST /api/v4/internal/allowed?project=/gitlab-org/gitlab.git&service=git-upload-pack
+ Rails ->> Git SSHD: 200 OK
+ Git SSHD ->> Gitaly: RPC SSHUploadPackWithSidechannel
+ Gitaly ->> User: Response
+```
+
+### 2.3. Git push over HTTPS
+
+Execution of: `git push` over HTTPS
+
+```mermaid
+sequenceDiagram
+ User ->> Workhorse: GET /gitlab-org/gitlab.git/info/refs?service=git-receive-pack
+ Workhorse ->> Rails: GET /gitlab-org/gitlab.git/info/refs?service=git-receive-pack
+ Rails ->> Workhorse: 200 OK
+ Workhorse ->> Gitaly: RPC PostReceivePack
+ Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111&service=git-receive-pack
+ Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
+ Gitaly ->> User: Response
+```
+
+### 2.4. Git push over SSHD
+
+Execution of: `git clone` over SSH
+
+```mermaid
+sequenceDiagram
+ User ->> Git SSHD: ssh git@gitlab.com
+ Git SSHD ->> Rails: GET /api/v4/internal/authorized_keys
+ Rails ->> Git SSHD: 200 OK (list of accepted SSH keys)
+ Git SSHD ->> User: Accept SSH
+ User ->> Git SSHD: git clone over SSH
+ Git SSHD ->> Rails: POST /api/v4/internal/allowed?project=/gitlab-org/gitlab.git&service=git-receive-pack
+ Rails ->> Git SSHD: 200 OK
+ Git SSHD ->> Gitaly: RPC ReceivePack
+ Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
+ Gitaly ->> User: Response
+```
+
+### 2.5. Create commit via Web
+
+Execution of `Add CHANGELOG` to repository:
+
+```mermaid
+sequenceDiagram
+ Web ->> Puma: POST /gitlab-org/gitlab/-/create/main
+ Puma ->> Gitaly: RPC TreeEntry
+ Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
+ Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
+ Gitaly ->> Puma: Response
+ Puma ->> Web: See CHANGELOG
+```
+
+## 3. Proposal
+
+The Pods stateless router proposal requires that any ambigious path (that is not routable)
+will be made to be routable. It means that at least the following paths will have to be updated
+do introduce a routable entity (project, group, or organization).
+
+Change:
+
+- `/api/v4/internal/allowed` => `/api/v4/internal/projects/<gl_repository>/allowed`
+- `/api/v4/internal/pre_receive` => `/api/v4/internal/projects/<gl_repository>/pre_receive`
+- `/api/v4/internal/post_receive` => `/api/v4/internal/projects/<gl_repository>/post_receive`
+- `/api/v4/internal/lfs_authenticate` => `/api/v4/internal/projects/<gl_repository>/lfs_authenticate`
+
+Where:
+
+- `gl_repository` can be `project-1111` (`Gitlab::GlRepository`)
+- `gl_repository` in some cases might be a full path to repository as executed by GitLab Shell (`/gitlab-org/gitlab.git`)
+
+## 4. Evaluation
+
+Supporting Git repositories if a Pod can access only its own repositories does not appear to be complex.
+
+The one major complication is supporting snippets, but this likely falls in the same category as for the approach
+to support user's personal namespaces.
+
+## 4.1. Pros
+
+1. The API used for supporting HTTPS/SSH and Hooks are well defined and can easily be made routable.
+
+## 4.2. Cons
+
+1. The sharing of repositories objects is limited to the given Pod and Gitaly node.
+1. The across-Pods forks are likely impossible to be supported (discover: how this work today across different Gitaly node).
diff --git a/doc/architecture/blueprints/pods/pods-feature-graphql.md b/doc/architecture/blueprints/pods/pods-feature-graphql.md
new file mode 100644
index 00000000000..5f8a39c0b3f
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-graphql.md
@@ -0,0 +1,94 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: GraphQL'
+---
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: GraphQL
+
+GitLab exensively uses GraphQL to perform efficient data query operations.
+GraphQL due to it's nature is not directly routable. The way how GitLab uses
+it calls the `/api/graphql` endpoint, and only query or mutation of body request
+might define where the data can be accessed.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+There are at least two main ways to implement GraphQL in Pods architecture.
+
+### 3.1. GraphQL routable by endpoint
+
+Change `/api/graphql` to `/api/organization/<organization>/graphql`.
+
+- This breaks all existing usages of `/api/graphql` endpoint
+ since the API URI is changed.
+
+### 3.2. GraphQL routable by body
+
+As part of router parse GraphQL body to find a routable entity, like `project`.
+
+- This still makes the GraphQL query be executed only in context of a given Pod
+ and not allowing the data to be merged.
+
+```json
+# Good example
+{
+ project(fullPath:"gitlab-org/gitlab") {
+ id
+ description
+ }
+}
+
+# Bad example, since Merge Request is not routable
+{
+ mergeRequest(id: 1111) {
+ iid
+ description
+ }
+}
+```
+
+### 3.3. Merging GraphQL Proxy
+
+Implement as part of router GraphQL Proxy which can parse body
+and merge results from many Pods.
+
+- This might make pagination hard to achieve, or we might assume that
+ we execute many queries of which results are merged across all Pods.
+
+```json
+{
+ project(fullPath:"gitlab-org/gitlab"){
+ id, description
+ }
+ group(fullPath:"gitlab-com") {
+ id, description
+ }
+}
+```
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/pods/pods-feature-organizations.md b/doc/architecture/blueprints/pods/pods-feature-organizations.md
new file mode 100644
index 00000000000..a0a87458767
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-organizations.md
@@ -0,0 +1,58 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: Organizations'
+---
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: Organizations
+
+One of the major designs of Pods architecture is strong isolation between Groups.
+Organizations as described by this blueprint provides a way to have plausible UX
+for joining together many Groups that are isolated from the rest of systems.
+
+## 1. Definition
+
+Pods do require that all groups and projects of a single organization can
+only be stored on a single Pod since a Pod can only access data that it holds locally
+and has very limited capabilities to read information from other Pods.
+
+Pods with Organizations do require strong isolation between organizations.
+
+It will have significant implications on various user-facing features,
+like Todos, dropdowns allowing to select projects, references to other issues
+or projects, or any other social functions present at GitLab. Today those functions
+were able to reference anything in the whole system. With the introduction of
+organizations such will be forbidden.
+
+This problem definition aims to answer effort and implications required to add
+strong isolation between organizations to the system. Including features affected
+and their data processing flow. The purpose is to ensure that our solution when
+implemented consistently avoids data leakage between organizations residing on
+a single Pod.
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/pods/pods-feature-router-endpoints-classification.md b/doc/architecture/blueprints/pods/pods-feature-router-endpoints-classification.md
new file mode 100644
index 00000000000..c672342fff9
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-router-endpoints-classification.md
@@ -0,0 +1,46 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: Router Endpoints Classification'
+---
+
+DISCLAIMER:
+This page may contain information related to upcoming products, features and
+functionality. It is important to note that the information presented is for
+informational purposes only, so please do not rely on the information for
+purchasing or planning purposes. Just like with all projects, the items
+mentioned on the page are subject to change or delay, and the development,
+release, and timing of any products, features, or functionality remain at the
+sole discretion of GitLab Inc.
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: Router Endpoints Classification
+
+Classification of all endpoints is essential to properly route request
+hitting load balancer of a GitLab installation to a Pod that can serve it.
+
+Each Pod should be able to decode each request and classify for which Pod
+it belongs to.
+
+GitLab currently implements houndreds of endpoints. This document tries
+to describe various techniques that can be implemented to allow the Rails
+to provide this information efficiently.
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/pods/pods-feature-template.md b/doc/architecture/blueprints/pods/pods-feature-template.md
new file mode 100644
index 00000000000..dfae21b5406
--- /dev/null
+++ b/doc/architecture/blueprints/pods/pods-feature-template.md
@@ -0,0 +1,29 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods: Problem A'
+---
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Pods: A
+
+> TL;DR
+
+## 1. Definition
+
+## 2. Data flow
+
+## 3. Proposal
+
+## 4. Evaluation
+
+## 4.1. Pros
+
+## 4.2. Cons
diff --git a/doc/architecture/blueprints/pods/proposal-stateless-router-with-buffering-requests.md b/doc/architecture/blueprints/pods/proposal-stateless-router-with-buffering-requests.md
new file mode 100644
index 00000000000..21aa72273fe
--- /dev/null
+++ b/doc/architecture/blueprints/pods/proposal-stateless-router-with-buffering-requests.md
@@ -0,0 +1,648 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods Stateless Router Proposal'
+---
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Proposal: Stateless Router
+
+We will decompose `gitlab_users`, `gitlab_routes` and `gitlab_admin` related
+tables so that they can be shared between all pods and allow any pod to
+authenticate a user and route requests to the correct pod. Pods may receive
+requests for the resources they don't own, but they know how to redirect back
+to the correct pod.
+
+The router is stateless and does not read from the `routes` database which
+means that all interactions with the database still happen from the Rails
+monolith. This architecture also supports regions by allowing for low traffic
+databases to be replicated across regions.
+
+Users are not directly exposed to the concept of Pods but instead they see
+different data dependent on their currently chosen "organization".
+[Organizations](index.md#organizations) will be a new model introduced to enforce isolation in the
+application and allow us to decide which request route to which pod, since an
+organization can only be on a single pod.
+
+## Differences
+
+The main difference between this proposal and the one [with learning routes](proposal-stateless-router-with-routes-learning.md)
+is that this proposal always sends requests to any of the Pods. If the requests cannot be processed,
+the requests will be bounced back with relevant headers. This requires that request to be buffered.
+It allows that request decoding can be either via URI or Body of request by Rails.
+This means that each request might be sent more than once and be processed more than once as result.
+
+The [with learning routes proposal](proposal-stateless-router-with-routes-learning.md) requires that
+routable information is always encoded in URI, and the router sends a pre-flight request.
+
+## Summary in diagrams
+
+This shows how a user request routes via DNS to the nearest router and the router chooses a pod to send the request to.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ pod_us0{Pod US0};
+ pod_us1{Pod US1};
+ pod_eu0{Pod EU0};
+ pod_eu1{Pod EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->pod_eu0;
+ router_eu-->pod_eu1;
+ end
+ subgraph United States
+ router_us-->pod_us0;
+ router_us-->pod_us1;
+ end
+```
+
+<details><summary>More detail</summary>
+
+This shows that the router can actually send requests to any pod. The user will
+get the closest router to them geographically.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ pod_us0{Pod US0};
+ pod_us1{Pod US1};
+ pod_eu0{Pod EU0};
+ pod_eu1{Pod EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->pod_eu0;
+ router_eu-->pod_eu1;
+ end
+ subgraph United States
+ router_us-->pod_us0;
+ router_us-->pod_us1;
+ end
+ router_eu-.->pod_us0;
+ router_eu-.->pod_us1;
+ router_us-.->pod_eu0;
+ router_us-.->pod_eu1;
+```
+
+</details>
+
+<details><summary>Even more detail</summary>
+
+This shows the databases. `gitlab_users` and `gitlab_routes` exist only in the
+US region but are replicated to other regions. Replication does not have an
+arrow because it's too hard to read the diagram.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ pod_us0{Pod US0};
+ pod_us1{Pod US1};
+ pod_eu0{Pod EU0};
+ pod_eu1{Pod EU1};
+ db_gitlab_users[(gitlab_users Primary)];
+ db_gitlab_routes[(gitlab_routes Primary)];
+ db_gitlab_users_replica[(gitlab_users Replica)];
+ db_gitlab_routes_replica[(gitlab_routes Replica)];
+ db_pod_us0[(gitlab_main/gitlab_ci Pod US0)];
+ db_pod_us1[(gitlab_main/gitlab_ci Pod US1)];
+ db_pod_eu0[(gitlab_main/gitlab_ci Pod EU0)];
+ db_pod_eu1[(gitlab_main/gitlab_ci Pod EU1)];
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->pod_eu0;
+ router_eu-->pod_eu1;
+ pod_eu0-->db_pod_eu0;
+ pod_eu0-->db_gitlab_users_replica;
+ pod_eu0-->db_gitlab_routes_replica;
+ pod_eu1-->db_gitlab_users_replica;
+ pod_eu1-->db_gitlab_routes_replica;
+ pod_eu1-->db_pod_eu1;
+ end
+ subgraph United States
+ router_us-->pod_us0;
+ router_us-->pod_us1;
+ pod_us0-->db_pod_us0;
+ pod_us0-->db_gitlab_users;
+ pod_us0-->db_gitlab_routes;
+ pod_us1-->db_gitlab_users;
+ pod_us1-->db_gitlab_routes;
+ pod_us1-->db_pod_us1;
+ end
+ router_eu-.->pod_us0;
+ router_eu-.->pod_us1;
+ router_us-.->pod_eu0;
+ router_us-.->pod_eu1;
+```
+
+</details>
+
+## Summary of changes
+
+1. Tables related to User data (including profile settings, authentication credentials, personal access tokens) are decomposed into a `gitlab_users` schema
+1. The `routes` table is decomposed into `gitlab_routes` schema
+1. The `application_settings` (and probably a few other instance level tables) are decomposed into `gitlab_admin` schema
+1. A new column `routes.pod_id` is added to `routes` table
+1. A new Router service exists to choose which pod to route a request to.
+1. A new concept will be introduced in GitLab called an organization and a user can select a "default organization" and this will be a user level setting. The default organization is used to redirect users away from ambiguous routes like `/dashboard` to organization scoped routes like `/organizations/my-organization/-/dashboard`. Legacy users will have a special default organization that allows them to keep using global resources on `Pod US0`. All existing namespaces will initially move to this public organization.
+1. If a pod receives a request for a `routes.pod_id` that it does not own it returns a `302` with `X-Gitlab-Pod-Redirect` header so that the router can send the request to the correct pod. The correct pod can also set a header `X-Gitlab-Pod-Cache` which contains information about how this request should be cached to remember the pod. For example if the request was `/gitlab-org/gitlab` then the header would encode `/gitlab-org/* => Pod US0` (ie. any requests starting with `/gitlab-org/` can always be routed to `Pod US0`
+1. When the pod does not know (from the cache) which pod to send a request to it just picks a random pod within it's region
+1. Writes to `gitlab_users` and `gitlab_routes` are sent to a primary PostgreSQL server in our `US` region but reads can come from replicas in the same region. This will add latency for these writes but we expect they are infrequent relative to the rest of GitLab.
+
+## Detailed explanation of default organization in the first iteration
+
+All users will get a new column `users.default_organization` which they can
+control in user settings. We will introduce a concept of the
+`GitLab.com Public` organization. This will be set as the default organization for all existing
+users. This organization will allow the user to see data from all namespaces in
+`Pod US0` (ie. our original GitLab.com instance). This behavior can be invisible to
+existing users such that they don't even get told when they are viewing a
+global page like `/dashboard` that it's even scoped to an organization.
+
+Any new users with a default organization other than `GitLab.com Public` will have
+a distinct user experience and will be fully aware that every page they load is
+only ever scoped to a single organization. These users can never
+load any global pages like `/dashboard` and will end up being redirected to
+`/organizations/<DEFAULT_ORGANIZATION>/-/dashboard`. This may also be the case
+for legacy APIs and such users may only ever be able to use APIs scoped to a
+organization.
+
+## Detailed explanation of Admin Area settings
+
+We believe that maintaining and synchronizing Admin Area settings will be
+frustrating and painful so to avoid this we will decompose and share all Admin Area
+settings in the `gitlab_admin` schema. This should be safe (similar to other
+shared schemas) because these receive very little write traffic.
+
+In cases where different pods need different settings (eg. the
+Elasticsearch URL), we will either decide to use a templated
+format in the relevant `application_settings` row which allows it to be dynamic
+per pod. Alternatively if that proves difficult we'll introduce a new table
+called `per_pod_application_settings` and this will have 1 row per pod to allow
+setting different settings per pod. It will still be part of the `gitlab_admin`
+schema and shared which will allow us to centrally manage it and simplify
+keeping settings in sync for all pods.
+
+## Pros
+
+1. Router is stateless and can live in many regions. We use Anycast DNS to resolve to nearest region for the user.
+1. Pods can receive requests for namespaces in the wrong pod and the user
+ still gets the right response as well as caching at the router that
+ ensures the next request is sent to the correct pod so the next request
+ will go to the correct pod
+1. The majority of the code still lives in `gitlab` rails codebase. The Router doesn't actually need to understand how GitLab URLs are composed.
+1. Since the responsibility to read and write `gitlab_users`,
+ `gitlab_routes` and `gitlab_admin` still lives in Rails it means minimal
+ changes will be needed to the Rails application compared to extracting
+ services that need to isolate the domain models and build new interfaces.
+1. Compared to a separate routing service this allows the Rails application
+ to encode more complex rules around how to map URLs to the correct pod
+ and may work for some existing API endpoints.
+1. All the new infrastructure (just a router) is optional and a single-pod
+ self-managed installation does not even need to run the Router and there are
+ no other new services.
+
+## Cons
+
+1. `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases may need to be
+ replicated across regions and writes need to go across regions. We need to
+ do an analysis on write TPS for the relevant tables to determine if this is
+ feasible.
+1. Sharing access to the database from many different Pods means that they are
+ all coupled at the Postgres schema level and this means changes to the
+ database schema need to be done carefully in sync with the deployment of all
+ Pods. This limits us to ensure that Pods are kept in closely similar
+ versions compared to an architecture with shared services that have an API
+ we control.
+1. Although most data is stored in the right region there can be requests
+ proxied from another region which may be an issue for certain types
+ of compliance.
+1. Data in `gitlab_users` and `gitlab_routes` databases must be replicated in
+ all regions which may be an issue for certain types of compliance.
+1. The router cache may need to be very large if we get a wide variety of URLs
+ (ie. long tail). In such a case we may need to implement a 2nd level of
+ caching in user cookies so their frequently accessed pages always go to the
+ right pod the first time.
+1. Having shared database access for `gitlab_users` and `gitlab_routes`
+ from multiple pods is an unusual architecture decision compared to
+ extracting services that are called from multiple pods.
+1. It is very likely we won't be able to find cacheable elements of a
+ GraphQL URL and often existing GraphQL endpoints are heavily dependent on
+ ids that won't be in the `routes` table so pods won't necessarily know
+ what pod has the data. As such we'll probably have to update our GraphQL
+ calls to include an organization context in the path like
+ `/api/organizations/<organization>/graphql`.
+1. This architecture implies that implemented endpoints can only access data
+ that are readily accessible on a given Pod, but are unlikely
+ to aggregate information from many Pods.
+1. All unknown routes are sent to the latest deployment which we assume to be `Pod US0`.
+ This is required as newly added endpoints will be only decodable by latest pod.
+ This Pod could later redirect to correct one that can serve the given request.
+ Since request processing might be heavy some Pods might receive significant amount
+ of traffic due to that.
+
+## Example database configuration
+
+Handling shared `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases, while having dedicated `gitlab_main` and `gitlab_ci` databases should already be handled by the way we use `config/database.yml`. We should also, already be able to handle the dedicated EU replicas while having a single US primary for `gitlab_users` and `gitlab_routes`. Below is a snippet of part of the database configuration for the Pod architecture described above.
+
+<details><summary>Pod US0</summary>
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.pod-us0.primary.consul
+ load_balancing:
+ discovery: postgres-main.pod-us0.replicas.consul
+ ci:
+ host: postgres-ci.pod-us0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.pod-us0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.us.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.us.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.us.consul
+```
+
+</details>
+
+<details><summary>Pod EU0</summary>
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.pod-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-main.pod-eu0.replicas.consul
+ ci:
+ host: postgres-ci.pod-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.pod-eu0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.eu.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.eu.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.eu.consul
+```
+
+</details>
+
+## Request flows
+
+1. `gitlab-org` is a top level namespace and lives in `Pod US0` in the `GitLab.com Public` organization
+1. `my-company` is a top level namespace and lives in `Pod EU0` in the `my-organization` organization
+
+### Experience for paying user that is part of `my-organization`
+
+Such a user will have a default organization set to `/my-organization` and will be
+unable to load any global routes outside of this organization. They may load other
+projects/namespaces but their MR/Todo/Issue counts at the top of the page will
+not be correctly populated in the first iteration. The user will be aware of
+this limitation.
+
+#### Navigates to `/my-company/my-project` while logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. They request `/my-company/my-project` without the router cache, so the router chooses randomly `Pod EU1`
+1. `Pod EU1` does not have `/my-company`, but it knows that it lives in `Pod EU0` so it redirects the router to `Pod EU0`
+1. `Pod EU0` returns the correct response as well as setting the cache headers for the router `/my-company/* => Pod EU0`
+1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Pod EU0`
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_eu1 as Pod EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu1: GET /my-company/my-project
+ pod_eu1->>router_eu: 302 /my-company/my-project X-Gitlab-Pod-Redirect={pod:Pod EU0}
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: <h1>My Project... X-Gitlab-Pod-Cache={path_prefix:/my-company/}
+```
+
+#### Navigates to `/my-company/my-project` while not logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router does not have `/my-company/*` cached yet so it chooses randomly `Pod EU1`
+1. `Pod EU1` redirects them through a login flow
+1. Stil they request `/my-company/my-project` without the router cache, so the router chooses a random pod `Pod EU1`
+1. `Pod EU1` does not have `/my-company`, but it knows that it lives in `Pod EU0` so it redirects the router to `Pod EU0`
+1. `Pod EU0` returns the correct response as well as setting the cache headers for the router `/my-company/* => Pod EU0`
+1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Pod EU0`
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_eu1 as Pod EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu1: GET /my-company/my-project
+ pod_eu1->>user: 302 /users/sign_in?redirect=/my-company/my-project
+ user->>router_eu: GET /users/sign_in?redirect=/my-company/my-project
+ router_eu->>pod_eu1: GET /users/sign_in?redirect=/my-company/my-project
+ pod_eu1->>user: <h1>Sign in...
+ user->>router_eu: POST /users/sign_in?redirect=/my-company/my-project
+ router_eu->>pod_eu1: POST /users/sign_in?redirect=/my-company/my-project
+ pod_eu1->>user: 302 /my-company/my-project
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu1: GET /my-company/my-project
+ pod_eu1->>router_eu: 302 /my-company/my-project X-Gitlab-Pod-Redirect={pod:Pod EU0}
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: <h1>My Project... X-Gitlab-Pod-Cache={path_prefix:/my-company/}
+```
+
+#### Navigates to `/my-company/my-other-project` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router cache now has `/my-company/* => Pod EU0`, so the router chooses `Pod EU0`
+1. `Pod EU0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_eu1 as Pod EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: <h1>My Project... X-Gitlab-Pod-Cache={path_prefix:/my-company/}
+```
+
+#### Navigates to `/gitlab-org/gitlab` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router has no cached value for this URL so randomly chooses `Pod EU0`
+1. `Pod EU0` redirects the router to `Pod US0`
+1. `Pod US0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_us0 as Pod US0
+ user->>router_eu: GET /gitlab-org/gitlab
+ router_eu->>pod_eu0: GET /gitlab-org/gitlab
+ pod_eu0->>router_eu: 302 /gitlab-org/gitlab X-Gitlab-Pod-Redirect={pod:Pod US0}
+ router_eu->>pod_us0: GET /gitlab-org/gitlab
+ pod_us0->>user: <h1>GitLab.org... X-Gitlab-Pod-Cache={path_prefix:/gitlab-org/}
+```
+
+In this case the user is not on their "default organization" so their TODO
+counter will not include their normal todos. We may choose to highlight this in
+the UI somewhere. A future iteration may be able to fetch that for them from
+their default organization.
+
+#### Navigates to `/`
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router choose `Pod EU0` randomly
+1. The Rails application knows the users default organization is `/my-organization`, so
+ it redirects the user to `/organizations/my-organization/-/dashboard`
+1. The Router has a cached value for `/organizations/my-organization/*` so it then sends the
+ request to `POD EU0`
+1. `Pod EU0` serves up a new page `/organizations/my-organization/-/dashboard` which is the same
+ dashboard view we have today but scoped to an organization clearly in the UI
+1. The user is (optionally) presented with a message saying that data on this page is only
+ from their default organization and that they can change their default
+ organization if it's not right.
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ user->>router_eu: GET /
+ router_eu->>pod_eu0: GET /
+ pod_eu0->>user: 302 /organizations/my-organization/-/dashboard
+ user->>router: GET /organizations/my-organization/-/dashboard
+ router->>pod_eu0: GET /organizations/my-organization/-/dashboard
+ pod_eu0->>user: <h1>My Company Dashboard... X-Gitlab-Pod-Cache={path_prefix:/organizations/my-organization/}
+```
+
+#### Navigates to `/dashboard`
+
+As above, they will end up on `/organizations/my-organization/-/dashboard` as
+the rails application will already redirect `/` to the dashboard page.
+
+### Navigates to `/not-my-company/not-my-project` while logged in (but they don't have access since this project/group is private)
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router knows that `/not-my-company` lives in `Pod US1` so sends the request to this
+1. The user does not have access so `Pod US1` returns 404
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_us1 as Pod US1
+ user->>router_eu: GET /not-my-company/not-my-project
+ router_eu->>pod_us1: GET /not-my-company/not-my-project
+ pod_us1->>user: 404
+```
+
+#### Creates a new top level namespace
+
+The user will be asked which organization they want the namespace to belong to.
+If they select `my-organization` then it will end up on the same pod as all
+other namespaces in `my-organization`. If they select nothing we default to
+`GitLab.com Public` and it is clear to the user that this is isolated from
+their existing organization such that they won't be able to see data from both
+on a single page.
+
+### Experience for GitLab team member that is part of `/gitlab-org`
+
+Such a user is considered a legacy user and has their default organization set to
+`GitLab.com Public`. This is a "meta" organization that does not really exist but
+the Rails application knows to interpret this organization to mean that they are
+allowed to use legacy global functionality like `/dashboard` to see data across
+namespaces located on `Pod US0`. The rails backend also knows that the default pod to render any ambiguous
+routes like `/dashboard` is `Pod US0`. Lastly the user will be allowed to
+navigate to organizations on another pod like `/my-organization` but when they do the
+user will see a message indicating that some data may be missing (eg. the
+MRs/Issues/Todos) counts.
+
+#### Navigates to `/gitlab-org/gitlab` while not logged in
+
+1. User is in the US so DNS resolves to the US router
+1. The router knows that `/gitlab-org` lives in `Pod US0` so sends the request
+ to this pod
+1. `Pod US0` serves up the response
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant pod_us0 as Pod US0
+ user->>router_us: GET /gitlab-org/gitlab
+ router_us->>pod_us0: GET /gitlab-org/gitlab
+ pod_us0->>user: <h1>GitLab.org... X-Gitlab-Pod-Cache={path_prefix:/gitlab-org/}
+```
+
+#### Navigates to `/`
+
+1. User is in US so DNS resolves to the router in US
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Pod US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it redirects the user to `/dashboards` (only legacy users can see
+ `/dashboard` global view)
+1. Router does not have a cache for `/dashboard` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Pod US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it allows the user to load `/dashboards` (only legacy users can see
+ `/dashboard` global view) and redirects to router the legacy pod which is `Pod US0`
+1. `Pod US0` serves up the global view dashboard page `/dashboard` which is the same
+ dashboard view we have today
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant pod_us0 as Pod US0
+ participant pod_us1 as Pod US1
+ user->>router_us: GET /
+ router_us->>pod_us1: GET /
+ pod_us1->>user: 302 /dashboard
+ user->>router_us: GET /dashboard
+ router_us->>pod_us1: GET /dashboard
+ pod_us1->>router_us: 302 /dashboard X-Gitlab-Pod-Redirect={pod:Pod US0}
+ router_us->>pod_us0: GET /dashboard
+ pod_us0->>user: <h1>Dashboard...
+```
+
+#### Navigates to `/my-company/my-other-project` while logged in (but they don't have access since this project is private)
+
+They get a 404.
+
+### Experience for non-logged in users
+
+Flow is similar to logged in users except global routes like `/dashboard` will
+redirect to the login page as there is no default organization to choose from.
+
+### A new customers signs up
+
+They will be asked if they are already part of an organization or if they'd
+like to create one. If they choose neither they end up no the default
+`GitLab.com Public` organization.
+
+### An organization is moved from 1 pod to another
+
+TODO
+
+### GraphQL/API requests which don't include the namespace in the URL
+
+TODO
+
+### The autocomplete suggestion functionality in the search bar which remembers recent issues/MRs
+
+TODO
+
+### Global search
+
+TODO
+
+## Administrator
+
+### Loads `/admin` page
+
+1. Router picks a random pod `Pod US0`
+1. Pod US0 redirects user to `/admin/pods/podus0`
+1. Pod US0 renders an Admin Area page and also returns a cache header to cache `/admin/podss/podus0/* => Pod US0`. The Admin Area page contains a dropdown list showing other pods they could select and it changes the query parameter.
+
+Admin Area settings in Postgres are all shared across all pods to avoid
+divergence but we still make it clear in the URL and UI which pod is serving
+the Admin Area page as there is dynamic data being generated from these pages and
+the operator may want to view a specific pod.
+
+## More Technical Problems To Solve
+
+### Replicating User Sessions Between All Pods
+
+Today user sessions live in Redis but each pod will have their own Redis instance. We already use a dedicated Redis instance for sessions so we could consider sharing this with all pods like we do with `gitlab_users` PostgreSQL database. But an important consideration will be latency as we would still want to mostly fetch sessions from the same region.
+
+An alternative might be that user sessions get moved to a JWT payload that encodes all the session data but this has downsides. For example, it is difficult to expire a user session, when their password changes or for other reasons, if the session lives in a JWT controlled by the user.
+
+### How do we migrate between Pods
+
+Migrating data between pods will need to factor all data stores:
+
+1. PostgreSQL
+1. Redis Shared State
+1. Gitaly
+1. Elasticsearch
+
+### Is it still possible to leak the existence of private groups via a timing attack?
+
+If you have router in EU, and you know that EU router by default redirects
+to EU located Pods, you know their latency (lets assume 10ms). Now, if your
+request is bounced back and redirected to US which has different latency
+(lets assume that roundtrip will be around 60ms) you can deduce that 404 was
+returned by US Pod and know that your 404 is in fact 403.
+
+We may defer this until we actually implement a pod in a different region. Such timing attacks are already theoretically possible with the way we do permission checks today but the timing difference is probably too small to be able to detect.
+
+One technique to mitigate this risk might be to have the router add a random
+delay to any request that returns 404 from a pod.
+
+## Should runners be shared across all pods?
+
+We have 2 options and we should decide which is easier:
+
+1. Decompose runner registration and queuing tables and share them across all
+ pods. This may have implications for scalability, and we'd need to consider
+ if this would include group/project runners as this may have scalability
+ concerns as these are high traffic tables that would need to be shared.
+1. Runners are registered per-pod and, we probably have a separate fleet of
+ runners for every pod or just register the same runners to many pods which
+ may have implications for queueing
+
+## How do we guarantee unique ids across all pods for things that cannot conflict?
+
+This project assumes at least namespaces and projects have unique ids across
+all pods as many requests need to be routed based on their ID. Since those
+tables are across different databases then guaranteeing a unique ID will
+require a new solution. There are likely other tables where unique IDs are
+necessary and depending on how we resolve routing for GraphQL and other APIs
+and other design goals it may be determined that we want the primary key to be
+unique for all tables.
diff --git a/doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md b/doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md
new file mode 100644
index 00000000000..e7520f3d6a8
--- /dev/null
+++ b/doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md
@@ -0,0 +1,672 @@
+---
+stage: enablement
+group: pods
+comments: false
+description: 'Pods Stateless Router Proposal'
+---
+
+This document is a work-in-progress and represents a very early state of the
+Pods design. Significant aspects are not documented, though we expect to add
+them in the future. This is one possible architecture for Pods, and we intend to
+contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that
+we can document the reasons for not choosing this approach.
+
+# Proposal: Stateless Router
+
+We will decompose `gitlab_users`, `gitlab_routes` and `gitlab_admin` related
+tables so that they can be shared between all pods and allow any pod to
+authenticate a user and route requests to the correct pod. Pods may receive
+requests for the resources they don't own, but they know how to redirect back
+to the correct pod.
+
+The router is stateless and does not read from the `routes` database which
+means that all interactions with the database still happen from the Rails
+monolith. This architecture also supports regions by allowing for low traffic
+databases to be replicated across regions.
+
+Users are not directly exposed to the concept of Pods but instead they see
+different data dependent on their currently chosen "organization".
+[Organizations](index.md#organizations) will be a new model introduced to enforce isolation in the
+application and allow us to decide which request route to which pod, since an
+organization can only be on a single pod.
+
+## Differences
+
+The main difference between this proposal and one [with buffering requests](proposal-stateless-router-with-buffering-requests.md)
+is that this proposal uses a pre-flight API request (`/api/v4/pods/learn`) to redirect the request body to the correct Pod.
+This means that each request is sent exactly once to be processed, but the URI is used to decode which Pod it should be directed.
+
+## Summary in diagrams
+
+This shows how a user request routes via DNS to the nearest router and the router chooses a pod to send the request to.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ pod_us0{Pod US0};
+ pod_us1{Pod US1};
+ pod_eu0{Pod EU0};
+ pod_eu1{Pod EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->pod_eu0;
+ router_eu-->pod_eu1;
+ end
+ subgraph United States
+ router_us-->pod_us0;
+ router_us-->pod_us1;
+ end
+```
+
+### More detail
+
+This shows that the router can actually send requests to any pod. The user will
+get the closest router to them geographically.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ pod_us0{Pod US0};
+ pod_us1{Pod US1};
+ pod_eu0{Pod EU0};
+ pod_eu1{Pod EU1};
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->pod_eu0;
+ router_eu-->pod_eu1;
+ end
+ subgraph United States
+ router_us-->pod_us0;
+ router_us-->pod_us1;
+ end
+ router_eu-.->pod_us0;
+ router_eu-.->pod_us1;
+ router_us-.->pod_eu0;
+ router_us-.->pod_eu1;
+```
+
+### Even more detail
+
+This shows the databases. `gitlab_users` and `gitlab_routes` exist only in the
+US region but are replicated to other regions. Replication does not have an
+arrow because it's too hard to read the diagram.
+
+```mermaid
+graph TD;
+ user((User));
+ dns[DNS];
+ router_us(Router);
+ router_eu(Router);
+ pod_us0{Pod US0};
+ pod_us1{Pod US1};
+ pod_eu0{Pod EU0};
+ pod_eu1{Pod EU1};
+ db_gitlab_users[(gitlab_users Primary)];
+ db_gitlab_routes[(gitlab_routes Primary)];
+ db_gitlab_users_replica[(gitlab_users Replica)];
+ db_gitlab_routes_replica[(gitlab_routes Replica)];
+ db_pod_us0[(gitlab_main/gitlab_ci Pod US0)];
+ db_pod_us1[(gitlab_main/gitlab_ci Pod US1)];
+ db_pod_eu0[(gitlab_main/gitlab_ci Pod EU0)];
+ db_pod_eu1[(gitlab_main/gitlab_ci Pod EU1)];
+ user-->dns;
+ dns-->router_us;
+ dns-->router_eu;
+ subgraph Europe
+ router_eu-->pod_eu0;
+ router_eu-->pod_eu1;
+ pod_eu0-->db_pod_eu0;
+ pod_eu0-->db_gitlab_users_replica;
+ pod_eu0-->db_gitlab_routes_replica;
+ pod_eu1-->db_gitlab_users_replica;
+ pod_eu1-->db_gitlab_routes_replica;
+ pod_eu1-->db_pod_eu1;
+ end
+ subgraph United States
+ router_us-->pod_us0;
+ router_us-->pod_us1;
+ pod_us0-->db_pod_us0;
+ pod_us0-->db_gitlab_users;
+ pod_us0-->db_gitlab_routes;
+ pod_us1-->db_gitlab_users;
+ pod_us1-->db_gitlab_routes;
+ pod_us1-->db_pod_us1;
+ end
+ router_eu-.->pod_us0;
+ router_eu-.->pod_us1;
+ router_us-.->pod_eu0;
+ router_us-.->pod_eu1;
+```
+
+## Summary of changes
+
+1. Tables related to User data (including profile settings, authentication credentials, personal access tokens) are decomposed into a `gitlab_users` schema
+1. The `routes` table is decomposed into `gitlab_routes` schema
+1. The `application_settings` (and probably a few other instance level tables) are decomposed into `gitlab_admin` schema
+1. A new column `routes.pod_id` is added to `routes` table
+1. A new Router service exists to choose which pod to route a request to.
+1. If a router receives a new request it will send `/api/v4/pods/learn?method=GET&path_info=/group-org/project` to learn which Pod can process it
+1. A new concept will be introduced in GitLab called an organization
+1. We require all existing endpoints to be routable by URI, or be fixed to a specific Pod for processing. This requires changing ambiguous endpoints like `/dashboard` to be scoped like `/organizations/my-organization/-/dashboard`
+1. Endpoints like `/admin` would be routed always to the specific Pod, like `pod_0`
+1. Each Pod can respond to `/api/v4/pods/learn` and classify each endpoint
+1. Writes to `gitlab_users` and `gitlab_routes` are sent to a primary PostgreSQL server in our `US` region but reads can come from replicas in the same region. This will add latency for these writes but we expect they are infrequent relative to the rest of GitLab.
+
+## Pre-flight request learning
+
+While processing a request the URI will be decoded and a pre-flight request
+will be sent for each non-cached endpoint.
+
+When asking for the endpoint GitLab Rails will return information about
+the routable path. GitLab Rails will decode `path_info` and match it to
+an existing endpoint and find a routable entity (like project). The router will
+treat this as short-lived cache information.
+
+1. Prefix match: `/api/v4/pods/learn?method=GET&path_info=/gitlab-org/gitlab-test/-/issues`
+
+ ```json
+ {
+ "path": "/gitlab-org/gitlab-test",
+ "pod": "pod_0",
+ "source": "routable"
+ }
+ ```
+
+1. Some endpoints might require an exact match: `/api/v4/pods/learn?method=GET&path_info=/-/profile`
+
+ ```json
+ {
+ "path": "/-/profile",
+ "pod": "pod_0",
+ "source": "fixed",
+ "exact": true
+ }
+ ```
+
+## Detailed explanation of default organization in the first iteration
+
+All users will get a new column `users.default_organization` which they can
+control in user settings. We will introduce a concept of the
+`GitLab.com Public` organization. This will be set as the default organization for all existing
+users. This organization will allow the user to see data from all namespaces in
+`Pod US0` (ie. our original GitLab.com instance). This behavior can be invisible to
+existing users such that they don't even get told when they are viewing a
+global page like `/dashboard` that it's even scoped to an organization.
+
+Any new users with a default organization other than `GitLab.com Public` will have
+a distinct user experience and will be fully aware that every page they load is
+only ever scoped to a single organization. These users can never
+load any global pages like `/dashboard` and will end up being redirected to
+`/organizations/<DEFAULT_ORGANIZATION>/-/dashboard`. This may also be the case
+for legacy APIs and such users may only ever be able to use APIs scoped to a
+organization.
+
+## Detailed explanation of Admin Area settings
+
+We believe that maintaining and synchronizing Admin Area settings will be
+frustrating and painful so to avoid this we will decompose and share all Admin Area
+settings in the `gitlab_admin` schema. This should be safe (similar to other
+shared schemas) because these receive very little write traffic.
+
+In cases where different pods need different settings (eg. the
+Elasticsearch URL), we will either decide to use a templated
+format in the relevant `application_settings` row which allows it to be dynamic
+per pod. Alternatively if that proves difficult we'll introduce a new table
+called `per_pod_application_settings` and this will have 1 row per pod to allow
+setting different settings per pod. It will still be part of the `gitlab_admin`
+schema and shared which will allow us to centrally manage it and simplify
+keeping settings in sync for all pods.
+
+## Pros
+
+1. Router is stateless and can live in many regions. We use Anycast DNS to resolve to nearest region for the user.
+1. Pods can receive requests for namespaces in the wrong pod and the user
+ still gets the right response as well as caching at the router that
+ ensures the next request is sent to the correct pod so the next request
+ will go to the correct pod
+1. The majority of the code still lives in `gitlab` rails codebase. The Router doesn't actually need to understand how GitLab URLs are composed.
+1. Since the responsibility to read and write `gitlab_users`,
+ `gitlab_routes` and `gitlab_admin` still lives in Rails it means minimal
+ changes will be needed to the Rails application compared to extracting
+ services that need to isolate the domain models and build new interfaces.
+1. Compared to a separate routing service this allows the Rails application
+ to encode more complex rules around how to map URLs to the correct pod
+ and may work for some existing API endpoints.
+1. All the new infrastructure (just a router) is optional and a single-pod
+ self-managed installation does not even need to run the Router and there are
+ no other new services.
+
+## Cons
+
+1. `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases may need to be
+ replicated across regions and writes need to go across regions. We need to
+ do an analysis on write TPS for the relevant tables to determine if this is
+ feasible.
+1. Sharing access to the database from many different Pods means that they are
+ all coupled at the Postgres schema level and this means changes to the
+ database schema need to be done carefully in sync with the deployment of all
+ Pods. This limits us to ensure that Pods are kept in closely similar
+ versions compared to an architecture with shared services that have an API
+ we control.
+1. Although most data is stored in the right region there can be requests
+ proxied from another region which may be an issue for certain types
+ of compliance.
+1. Data in `gitlab_users` and `gitlab_routes` databases must be replicated in
+ all regions which may be an issue for certain types of compliance.
+1. The router cache may need to be very large if we get a wide variety of URLs
+ (ie. long tail). In such a case we may need to implement a 2nd level of
+ caching in user cookies so their frequently accessed pages always go to the
+ right pod the first time.
+1. Having shared database access for `gitlab_users` and `gitlab_routes`
+ from multiple pods is an unusual architecture decision compared to
+ extracting services that are called from multiple pods.
+1. It is very likely we won't be able to find cacheable elements of a
+ GraphQL URL and often existing GraphQL endpoints are heavily dependent on
+ ids that won't be in the `routes` table so pods won't necessarily know
+ what pod has the data. As such we'll probably have to update our GraphQL
+ calls to include an organization context in the path like
+ `/api/organizations/<organization>/graphql`.
+1. This architecture implies that implemented endpoints can only access data
+ that are readily accessible on a given Pod, but are unlikely
+ to aggregate information from many Pods.
+1. All unknown routes are sent to the latest deployment which we assume to be `Pod US0`.
+ This is required as newly added endpoints will be only decodable by latest pod.
+ Likely this is not a problem for the `/pods/learn` is it is lightweight
+ to process and this should not cause a performance impact.
+
+## Example database configuration
+
+Handling shared `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases, while having dedicated `gitlab_main` and `gitlab_ci` databases should already be handled by the way we use `config/database.yml`. We should also, already be able to handle the dedicated EU replicas while having a single US primary for `gitlab_users` and `gitlab_routes`. Below is a snippet of part of the database configuration for the Pod architecture described above.
+
+**Pod US0**:
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.pod-us0.primary.consul
+ load_balancing:
+ discovery: postgres-main.pod-us0.replicas.consul
+ ci:
+ host: postgres-ci.pod-us0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.pod-us0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.us.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.us.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.us.consul
+```
+
+**Pod EU0**:
+
+```yaml
+# config/database.yml
+production:
+ main:
+ host: postgres-main.pod-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-main.pod-eu0.replicas.consul
+ ci:
+ host: postgres-ci.pod-eu0.primary.consul
+ load_balancing:
+ discovery: postgres-ci.pod-eu0.replicas.consul
+ users:
+ host: postgres-users-primary.consul
+ load_balancing:
+ discovery: postgres-users-replicas.eu.consul
+ routes:
+ host: postgres-routes-primary.consul
+ load_balancing:
+ discovery: postgres-routes-replicas.eu.consul
+ admin:
+ host: postgres-admin-primary.consul
+ load_balancing:
+ discovery: postgres-admin-replicas.eu.consul
+```
+
+## Request flows
+
+1. `gitlab-org` is a top level namespace and lives in `Pod US0` in the `GitLab.com Public` organization
+1. `my-company` is a top level namespace and lives in `Pod EU0` in the `my-organization` organization
+
+### Experience for paying user that is part of `my-organization`
+
+Such a user will have a default organization set to `/my-organization` and will be
+unable to load any global routes outside of this organization. They may load other
+projects/namespaces but their MR/Todo/Issue counts at the top of the page will
+not be correctly populated in the first iteration. The user will be aware of
+this limitation.
+
+#### Navigates to `/my-company/my-project` while logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. They request `/my-company/my-project` without the router cache, so the router chooses randomly `Pod EU1`
+1. The `/pods/learn` is sent to `Pod EU1`, which responds that resource lives on `Pod EU0`
+1. `Pod EU0` returns the correct response
+1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Pod EU0`
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_eu1 as Pod EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu1: /api/v4/pods/learn?method=GET&path_info=/my-company/my-project
+ pod_eu1->>router_eu: {path: "/my-company", pod: "pod_eu0", source: "routable"}
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: <h1>My Project...
+```
+
+#### Navigates to `/my-company/my-project` while not logged in
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router does not have `/my-company/*` cached yet so it chooses randomly `Pod EU1`
+1. The `/pods/learn` is sent to `Pod EU1`, which responds that resource lives on `Pod EU0`
+1. `Pod EU0` redirects them through a login flow
+1. User requests `/users/sign_in`, uses random Pod to run `/pods/learn`
+1. The `Pod EU1` responds with `pod_0` as a fixed route
+1. User after login requests `/my-company/my-project` which is cached and stored in `Pod EU0`
+1. `Pod EU0` returns the correct response
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_eu1 as Pod EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu1: /api/v4/pods/learn?method=GET&path_info=/my-company/my-project
+ pod_eu1->>router_eu: {path: "/my-company", pod: "pod_eu0", source: "routable"}
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: 302 /users/sign_in?redirect=/my-company/my-project
+ user->>router_eu: GET /users/sign_in?redirect=/my-company/my-project
+ router_eu->>pod_eu1: /api/v4/pods/learn?method=GET&path_info=/users/sign_in
+ pod_eu1->>router_eu: {path: "/users", pod: "pod_eu0", source: "fixed"}
+ router_eu->>pod_eu0: GET /users/sign_in?redirect=/my-company/my-project
+ pod_eu0-->>user: <h1>Sign in...
+ user->>router_eu: POST /users/sign_in?redirect=/my-company/my-project
+ router_eu->>pod_eu0: POST /users/sign_in?redirect=/my-company/my-project
+ pod_eu0->>user: 302 /my-company/my-project
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu0: GET /my-company/my-project
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: <h1>My Project...
+```
+
+#### Navigates to `/my-company/my-other-project` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router cache now has `/my-company/* => Pod EU0`, so the router chooses `Pod EU0`
+1. `Pod EU0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_eu1 as Pod EU1
+ user->>router_eu: GET /my-company/my-project
+ router_eu->>pod_eu0: GET /my-company/my-project
+ pod_eu0->>user: <h1>My Project...
+```
+
+#### Navigates to `/gitlab-org/gitlab` after last step
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router has no cached value for this URL so randomly chooses `Pod EU0`
+1. `Pod EU0` redirects the router to `Pod US0`
+1. `Pod US0` returns the correct response as well as the cache header again
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ participant pod_us0 as Pod US0
+ user->>router_eu: GET /gitlab-org/gitlab
+ router_eu->>pod_eu0: /api/v4/pods/learn?method=GET&path_info=/gitlab-org/gitlab
+ pod_eu0->>router_eu: {path: "/gitlab-org", pod: "pod_us0", source: "routable"}
+ router_eu->>pod_us0: GET /gitlab-org/gitlab
+ pod_us0->>user: <h1>GitLab.org...
+```
+
+In this case the user is not on their "default organization" so their TODO
+counter will not include their normal todos. We may choose to highlight this in
+the UI somewhere. A future iteration may be able to fetch that for them from
+their default organization.
+
+#### Navigates to `/`
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router choose `Pod EU0` randomly
+1. The Rails application knows the users default organization is `/my-organization`, so
+ it redirects the user to `/organizations/my-organization/-/dashboard`
+1. The Router has a cached value for `/organizations/my-organization/*` so it then sends the
+ request to `POD EU0`
+1. `Pod EU0` serves up a new page `/organizations/my-organization/-/dashboard` which is the same
+ dashboard view we have today but scoped to an organization clearly in the UI
+1. The user is (optionally) presented with a message saying that data on this page is only
+ from their default organization and that they can change their default
+ organization if it's not right.
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_eu0 as Pod EU0
+ user->>router_eu: GET /
+ router_eu->>pod_eu0: GET /
+ pod_eu0->>user: 302 /organizations/my-organization/-/dashboard
+ user->>router: GET /organizations/my-organization/-/dashboard
+ router->>pod_eu0: GET /organizations/my-organization/-/dashboard
+ pod_eu0->>user: <h1>My Company Dashboard... X-Gitlab-Pod-Cache={path_prefix:/organizations/my-organization/}
+```
+
+#### Navigates to `/dashboard`
+
+As above, they will end up on `/organizations/my-organization/-/dashboard` as
+the rails application will already redirect `/` to the dashboard page.
+
+### Navigates to `/not-my-company/not-my-project` while logged in (but they don't have access since this project/group is private)
+
+1. User is in Europe so DNS resolves to the router in Europe
+1. The router knows that `/not-my-company` lives in `Pod US1` so sends the request to this
+1. The user does not have access so `Pod US1` returns 404
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_eu as Router EU
+ participant pod_us1 as Pod US1
+ user->>router_eu: GET /not-my-company/not-my-project
+ router_eu->>pod_us1: GET /not-my-company/not-my-project
+ pod_us1->>user: 404
+```
+
+#### Creates a new top level namespace
+
+The user will be asked which organization they want the namespace to belong to.
+If they select `my-organization` then it will end up on the same pod as all
+other namespaces in `my-organization`. If they select nothing we default to
+`GitLab.com Public` and it is clear to the user that this is isolated from
+their existing organization such that they won't be able to see data from both
+on a single page.
+
+### Experience for GitLab team member that is part of `/gitlab-org`
+
+Such a user is considered a legacy user and has their default organization set to
+`GitLab.com Public`. This is a "meta" organization that does not really exist but
+the Rails application knows to interpret this organization to mean that they are
+allowed to use legacy global functionality like `/dashboard` to see data across
+namespaces located on `Pod US0`. The rails backend also knows that the default pod to render any ambiguous
+routes like `/dashboard` is `Pod US0`. Lastly the user will be allowed to
+navigate to organizations on another pod like `/my-organization` but when they do the
+user will see a message indicating that some data may be missing (eg. the
+MRs/Issues/Todos) counts.
+
+#### Navigates to `/gitlab-org/gitlab` while not logged in
+
+1. User is in the US so DNS resolves to the US router
+1. The router knows that `/gitlab-org` lives in `Pod US0` so sends the request
+ to this pod
+1. `Pod US0` serves up the response
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant pod_us0 as Pod US0
+ user->>router_us: GET /gitlab-org/gitlab
+ router_us->>pod_us0: GET /gitlab-org/gitlab
+ pod_us0->>user: <h1>GitLab.org...
+```
+
+#### Navigates to `/`
+
+1. User is in US so DNS resolves to the router in US
+1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Pod US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it redirects the user to `/dashboards` (only legacy users can see
+ `/dashboard` global view)
+1. Router does not have a cache for `/dashboard` route (specifically rails never tells it to cache this route)
+1. The Router chooses `Pod US1` randomly
+1. The Rails application knows the users default organization is `GitLab.com Public`, so
+ it allows the user to load `/dashboards` (only legacy users can see
+ `/dashboard` global view) and redirects to router the legacy pod which is `Pod US0`
+1. `Pod US0` serves up the global view dashboard page `/dashboard` which is the same
+ dashboard view we have today
+
+```mermaid
+sequenceDiagram
+ participant user as User
+ participant router_us as Router US
+ participant pod_us0 as Pod US0
+ participant pod_us1 as Pod US1
+ user->>router_us: GET /
+ router_us->>pod_us1: GET /
+ pod_us1->>user: 302 /dashboard
+ user->>router_us: GET /dashboard
+ router_us->>pod_us1: /api/v4/pods/learn?method=GET&path_info=/dashboard
+ pod_us1->>router_us: {path: "/dashboard", pod: "pod_us0", source: "routable"}
+ router_us->>pod_us0: GET /dashboard
+ pod_us0->>user: <h1>Dashboard...
+```
+
+#### Navigates to `/my-company/my-other-project` while logged in (but they don't have access since this project is private)
+
+They get a 404.
+
+### Experience for non-logged in users
+
+Flow is similar to logged in users except global routes like `/dashboard` will
+redirect to the login page as there is no default organization to choose from.
+
+### A new customers signs up
+
+They will be asked if they are already part of an organization or if they'd
+like to create one. If they choose neither they end up no the default
+`GitLab.com Public` organization.
+
+### An organization is moved from 1 pod to another
+
+TODO
+
+### GraphQL/API requests which don't include the namespace in the URL
+
+TODO
+
+### The autocomplete suggestion functionality in the search bar which remembers recent issues/MRs
+
+TODO
+
+### Global search
+
+TODO
+
+## Administrator
+
+### Loads `/admin` page
+
+1. The `/admin` is locked to `Pod US0`
+1. Some endpoints of `/admin`, like Projects in Admin are scoped to a Pod
+ and users needs to choose the correct one in a dropdown, which results in endpoint
+ like `/admin/pods/pod_0/projects`.
+
+Admin Area settings in Postgres are all shared across all pods to avoid
+divergence but we still make it clear in the URL and UI which pod is serving
+the Admin Area page as there is dynamic data being generated from these pages and
+the operator may want to view a specific pod.
+
+## More Technical Problems To Solve
+
+### Replicating User Sessions Between All Pods
+
+Today user sessions live in Redis but each pod will have their own Redis instance. We already use a dedicated Redis instance for sessions so we could consider sharing this with all pods like we do with `gitlab_users` PostgreSQL database. But an important consideration will be latency as we would still want to mostly fetch sessions from the same region.
+
+An alternative might be that user sessions get moved to a JWT payload that encodes all the session data but this has downsides. For example, it is difficult to expire a user session, when their password changes or for other reasons, if the session lives in a JWT controlled by the user.
+
+### How do we migrate between Pods
+
+Migrating data between pods will need to factor all data stores:
+
+1. PostgreSQL
+1. Redis Shared State
+1. Gitaly
+1. Elasticsearch
+
+### Is it still possible to leak the existence of private groups via a timing attack?
+
+If you have router in EU, and you know that EU router by default redirects
+to EU located Pods, you know their latency (lets assume 10ms). Now, if your
+request is bounced back and redirected to US which has different latency
+(lets assume that roundtrip will be around 60ms) you can deduce that 404 was
+returned by US Pod and know that your 404 is in fact 403.
+
+We may defer this until we actually implement a pod in a different region. Such timing attacks are already theoretically possible with the way we do permission checks today but the timing difference is probably too small to be able to detect.
+
+One technique to mitigate this risk might be to have the router add a random
+delay to any request that returns 404 from a pod.
+
+## Should runners be shared across all pods?
+
+We have 2 options and we should decide which is easier:
+
+1. Decompose runner registration and queuing tables and share them across all
+ pods. This may have implications for scalability, and we'd need to consider
+ if this would include group/project runners as this may have scalability
+ concerns as these are high traffic tables that would need to be shared.
+1. Runners are registered per-pod and, we probably have a separate fleet of
+ runners for every pod or just register the same runners to many pods which
+ may have implications for queueing
+
+## How do we guarantee unique ids across all pods for things that cannot conflict?
+
+This project assumes at least namespaces and projects have unique ids across
+all pods as many requests need to be routed based on their ID. Since those
+tables are across different databases then guaranteeing a unique ID will
+require a new solution. There are likely other tables where unique IDs are
+necessary and depending on how we resolve routing for GraphQL and other APIs
+and other design goals it may be determined that we want the primary key to be
+unique for all tables.
diff --git a/doc/architecture/blueprints/pods/term-cluster.png b/doc/architecture/blueprints/pods/term-cluster.png
deleted file mode 100644
index f52e31b52ad..00000000000
--- a/doc/architecture/blueprints/pods/term-cluster.png
+++ /dev/null
Binary files differ
diff --git a/doc/architecture/blueprints/pods/term-organization.png b/doc/architecture/blueprints/pods/term-organization.png
deleted file mode 100644
index f605adb124d..00000000000
--- a/doc/architecture/blueprints/pods/term-organization.png
+++ /dev/null
Binary files differ