diff options
Diffstat (limited to 'doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md')
-rw-r--r-- | doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md | 672 |
1 files changed, 672 insertions, 0 deletions
diff --git a/doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md b/doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md new file mode 100644 index 00000000000..e7520f3d6a8 --- /dev/null +++ b/doc/architecture/blueprints/pods/proposal-stateless-router-with-routes-learning.md @@ -0,0 +1,672 @@ +--- +stage: enablement +group: pods +comments: false +description: 'Pods Stateless Router Proposal' +--- + +This document is a work-in-progress and represents a very early state of the +Pods design. Significant aspects are not documented, though we expect to add +them in the future. This is one possible architecture for Pods, and we intend to +contrast this with alternatives before deciding which approach to implement. +This documentation will be kept even if we decide not to implement this so that +we can document the reasons for not choosing this approach. + +# Proposal: Stateless Router + +We will decompose `gitlab_users`, `gitlab_routes` and `gitlab_admin` related +tables so that they can be shared between all pods and allow any pod to +authenticate a user and route requests to the correct pod. Pods may receive +requests for the resources they don't own, but they know how to redirect back +to the correct pod. + +The router is stateless and does not read from the `routes` database which +means that all interactions with the database still happen from the Rails +monolith. This architecture also supports regions by allowing for low traffic +databases to be replicated across regions. + +Users are not directly exposed to the concept of Pods but instead they see +different data dependent on their currently chosen "organization". +[Organizations](index.md#organizations) will be a new model introduced to enforce isolation in the +application and allow us to decide which request route to which pod, since an +organization can only be on a single pod. + +## Differences + +The main difference between this proposal and one [with buffering requests](proposal-stateless-router-with-buffering-requests.md) +is that this proposal uses a pre-flight API request (`/api/v4/pods/learn`) to redirect the request body to the correct Pod. +This means that each request is sent exactly once to be processed, but the URI is used to decode which Pod it should be directed. + +## Summary in diagrams + +This shows how a user request routes via DNS to the nearest router and the router chooses a pod to send the request to. + +```mermaid +graph TD; + user((User)); + dns[DNS]; + router_us(Router); + router_eu(Router); + pod_us0{Pod US0}; + pod_us1{Pod US1}; + pod_eu0{Pod EU0}; + pod_eu1{Pod EU1}; + user-->dns; + dns-->router_us; + dns-->router_eu; + subgraph Europe + router_eu-->pod_eu0; + router_eu-->pod_eu1; + end + subgraph United States + router_us-->pod_us0; + router_us-->pod_us1; + end +``` + +### More detail + +This shows that the router can actually send requests to any pod. The user will +get the closest router to them geographically. + +```mermaid +graph TD; + user((User)); + dns[DNS]; + router_us(Router); + router_eu(Router); + pod_us0{Pod US0}; + pod_us1{Pod US1}; + pod_eu0{Pod EU0}; + pod_eu1{Pod EU1}; + user-->dns; + dns-->router_us; + dns-->router_eu; + subgraph Europe + router_eu-->pod_eu0; + router_eu-->pod_eu1; + end + subgraph United States + router_us-->pod_us0; + router_us-->pod_us1; + end + router_eu-.->pod_us0; + router_eu-.->pod_us1; + router_us-.->pod_eu0; + router_us-.->pod_eu1; +``` + +### Even more detail + +This shows the databases. `gitlab_users` and `gitlab_routes` exist only in the +US region but are replicated to other regions. Replication does not have an +arrow because it's too hard to read the diagram. + +```mermaid +graph TD; + user((User)); + dns[DNS]; + router_us(Router); + router_eu(Router); + pod_us0{Pod US0}; + pod_us1{Pod US1}; + pod_eu0{Pod EU0}; + pod_eu1{Pod EU1}; + db_gitlab_users[(gitlab_users Primary)]; + db_gitlab_routes[(gitlab_routes Primary)]; + db_gitlab_users_replica[(gitlab_users Replica)]; + db_gitlab_routes_replica[(gitlab_routes Replica)]; + db_pod_us0[(gitlab_main/gitlab_ci Pod US0)]; + db_pod_us1[(gitlab_main/gitlab_ci Pod US1)]; + db_pod_eu0[(gitlab_main/gitlab_ci Pod EU0)]; + db_pod_eu1[(gitlab_main/gitlab_ci Pod EU1)]; + user-->dns; + dns-->router_us; + dns-->router_eu; + subgraph Europe + router_eu-->pod_eu0; + router_eu-->pod_eu1; + pod_eu0-->db_pod_eu0; + pod_eu0-->db_gitlab_users_replica; + pod_eu0-->db_gitlab_routes_replica; + pod_eu1-->db_gitlab_users_replica; + pod_eu1-->db_gitlab_routes_replica; + pod_eu1-->db_pod_eu1; + end + subgraph United States + router_us-->pod_us0; + router_us-->pod_us1; + pod_us0-->db_pod_us0; + pod_us0-->db_gitlab_users; + pod_us0-->db_gitlab_routes; + pod_us1-->db_gitlab_users; + pod_us1-->db_gitlab_routes; + pod_us1-->db_pod_us1; + end + router_eu-.->pod_us0; + router_eu-.->pod_us1; + router_us-.->pod_eu0; + router_us-.->pod_eu1; +``` + +## Summary of changes + +1. Tables related to User data (including profile settings, authentication credentials, personal access tokens) are decomposed into a `gitlab_users` schema +1. The `routes` table is decomposed into `gitlab_routes` schema +1. The `application_settings` (and probably a few other instance level tables) are decomposed into `gitlab_admin` schema +1. A new column `routes.pod_id` is added to `routes` table +1. A new Router service exists to choose which pod to route a request to. +1. If a router receives a new request it will send `/api/v4/pods/learn?method=GET&path_info=/group-org/project` to learn which Pod can process it +1. A new concept will be introduced in GitLab called an organization +1. We require all existing endpoints to be routable by URI, or be fixed to a specific Pod for processing. This requires changing ambiguous endpoints like `/dashboard` to be scoped like `/organizations/my-organization/-/dashboard` +1. Endpoints like `/admin` would be routed always to the specific Pod, like `pod_0` +1. Each Pod can respond to `/api/v4/pods/learn` and classify each endpoint +1. Writes to `gitlab_users` and `gitlab_routes` are sent to a primary PostgreSQL server in our `US` region but reads can come from replicas in the same region. This will add latency for these writes but we expect they are infrequent relative to the rest of GitLab. + +## Pre-flight request learning + +While processing a request the URI will be decoded and a pre-flight request +will be sent for each non-cached endpoint. + +When asking for the endpoint GitLab Rails will return information about +the routable path. GitLab Rails will decode `path_info` and match it to +an existing endpoint and find a routable entity (like project). The router will +treat this as short-lived cache information. + +1. Prefix match: `/api/v4/pods/learn?method=GET&path_info=/gitlab-org/gitlab-test/-/issues` + + ```json + { + "path": "/gitlab-org/gitlab-test", + "pod": "pod_0", + "source": "routable" + } + ``` + +1. Some endpoints might require an exact match: `/api/v4/pods/learn?method=GET&path_info=/-/profile` + + ```json + { + "path": "/-/profile", + "pod": "pod_0", + "source": "fixed", + "exact": true + } + ``` + +## Detailed explanation of default organization in the first iteration + +All users will get a new column `users.default_organization` which they can +control in user settings. We will introduce a concept of the +`GitLab.com Public` organization. This will be set as the default organization for all existing +users. This organization will allow the user to see data from all namespaces in +`Pod US0` (ie. our original GitLab.com instance). This behavior can be invisible to +existing users such that they don't even get told when they are viewing a +global page like `/dashboard` that it's even scoped to an organization. + +Any new users with a default organization other than `GitLab.com Public` will have +a distinct user experience and will be fully aware that every page they load is +only ever scoped to a single organization. These users can never +load any global pages like `/dashboard` and will end up being redirected to +`/organizations/<DEFAULT_ORGANIZATION>/-/dashboard`. This may also be the case +for legacy APIs and such users may only ever be able to use APIs scoped to a +organization. + +## Detailed explanation of Admin Area settings + +We believe that maintaining and synchronizing Admin Area settings will be +frustrating and painful so to avoid this we will decompose and share all Admin Area +settings in the `gitlab_admin` schema. This should be safe (similar to other +shared schemas) because these receive very little write traffic. + +In cases where different pods need different settings (eg. the +Elasticsearch URL), we will either decide to use a templated +format in the relevant `application_settings` row which allows it to be dynamic +per pod. Alternatively if that proves difficult we'll introduce a new table +called `per_pod_application_settings` and this will have 1 row per pod to allow +setting different settings per pod. It will still be part of the `gitlab_admin` +schema and shared which will allow us to centrally manage it and simplify +keeping settings in sync for all pods. + +## Pros + +1. Router is stateless and can live in many regions. We use Anycast DNS to resolve to nearest region for the user. +1. Pods can receive requests for namespaces in the wrong pod and the user + still gets the right response as well as caching at the router that + ensures the next request is sent to the correct pod so the next request + will go to the correct pod +1. The majority of the code still lives in `gitlab` rails codebase. The Router doesn't actually need to understand how GitLab URLs are composed. +1. Since the responsibility to read and write `gitlab_users`, + `gitlab_routes` and `gitlab_admin` still lives in Rails it means minimal + changes will be needed to the Rails application compared to extracting + services that need to isolate the domain models and build new interfaces. +1. Compared to a separate routing service this allows the Rails application + to encode more complex rules around how to map URLs to the correct pod + and may work for some existing API endpoints. +1. All the new infrastructure (just a router) is optional and a single-pod + self-managed installation does not even need to run the Router and there are + no other new services. + +## Cons + +1. `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases may need to be + replicated across regions and writes need to go across regions. We need to + do an analysis on write TPS for the relevant tables to determine if this is + feasible. +1. Sharing access to the database from many different Pods means that they are + all coupled at the Postgres schema level and this means changes to the + database schema need to be done carefully in sync with the deployment of all + Pods. This limits us to ensure that Pods are kept in closely similar + versions compared to an architecture with shared services that have an API + we control. +1. Although most data is stored in the right region there can be requests + proxied from another region which may be an issue for certain types + of compliance. +1. Data in `gitlab_users` and `gitlab_routes` databases must be replicated in + all regions which may be an issue for certain types of compliance. +1. The router cache may need to be very large if we get a wide variety of URLs + (ie. long tail). In such a case we may need to implement a 2nd level of + caching in user cookies so their frequently accessed pages always go to the + right pod the first time. +1. Having shared database access for `gitlab_users` and `gitlab_routes` + from multiple pods is an unusual architecture decision compared to + extracting services that are called from multiple pods. +1. It is very likely we won't be able to find cacheable elements of a + GraphQL URL and often existing GraphQL endpoints are heavily dependent on + ids that won't be in the `routes` table so pods won't necessarily know + what pod has the data. As such we'll probably have to update our GraphQL + calls to include an organization context in the path like + `/api/organizations/<organization>/graphql`. +1. This architecture implies that implemented endpoints can only access data + that are readily accessible on a given Pod, but are unlikely + to aggregate information from many Pods. +1. All unknown routes are sent to the latest deployment which we assume to be `Pod US0`. + This is required as newly added endpoints will be only decodable by latest pod. + Likely this is not a problem for the `/pods/learn` is it is lightweight + to process and this should not cause a performance impact. + +## Example database configuration + +Handling shared `gitlab_users`, `gitlab_routes` and `gitlab_admin` databases, while having dedicated `gitlab_main` and `gitlab_ci` databases should already be handled by the way we use `config/database.yml`. We should also, already be able to handle the dedicated EU replicas while having a single US primary for `gitlab_users` and `gitlab_routes`. Below is a snippet of part of the database configuration for the Pod architecture described above. + +**Pod US0**: + +```yaml +# config/database.yml +production: + main: + host: postgres-main.pod-us0.primary.consul + load_balancing: + discovery: postgres-main.pod-us0.replicas.consul + ci: + host: postgres-ci.pod-us0.primary.consul + load_balancing: + discovery: postgres-ci.pod-us0.replicas.consul + users: + host: postgres-users-primary.consul + load_balancing: + discovery: postgres-users-replicas.us.consul + routes: + host: postgres-routes-primary.consul + load_balancing: + discovery: postgres-routes-replicas.us.consul + admin: + host: postgres-admin-primary.consul + load_balancing: + discovery: postgres-admin-replicas.us.consul +``` + +**Pod EU0**: + +```yaml +# config/database.yml +production: + main: + host: postgres-main.pod-eu0.primary.consul + load_balancing: + discovery: postgres-main.pod-eu0.replicas.consul + ci: + host: postgres-ci.pod-eu0.primary.consul + load_balancing: + discovery: postgres-ci.pod-eu0.replicas.consul + users: + host: postgres-users-primary.consul + load_balancing: + discovery: postgres-users-replicas.eu.consul + routes: + host: postgres-routes-primary.consul + load_balancing: + discovery: postgres-routes-replicas.eu.consul + admin: + host: postgres-admin-primary.consul + load_balancing: + discovery: postgres-admin-replicas.eu.consul +``` + +## Request flows + +1. `gitlab-org` is a top level namespace and lives in `Pod US0` in the `GitLab.com Public` organization +1. `my-company` is a top level namespace and lives in `Pod EU0` in the `my-organization` organization + +### Experience for paying user that is part of `my-organization` + +Such a user will have a default organization set to `/my-organization` and will be +unable to load any global routes outside of this organization. They may load other +projects/namespaces but their MR/Todo/Issue counts at the top of the page will +not be correctly populated in the first iteration. The user will be aware of +this limitation. + +#### Navigates to `/my-company/my-project` while logged in + +1. User is in Europe so DNS resolves to the router in Europe +1. They request `/my-company/my-project` without the router cache, so the router chooses randomly `Pod EU1` +1. The `/pods/learn` is sent to `Pod EU1`, which responds that resource lives on `Pod EU0` +1. `Pod EU0` returns the correct response +1. The router now caches and remembers any request paths matching `/my-company/*` should go to `Pod EU0` + +```mermaid +sequenceDiagram + participant user as User + participant router_eu as Router EU + participant pod_eu0 as Pod EU0 + participant pod_eu1 as Pod EU1 + user->>router_eu: GET /my-company/my-project + router_eu->>pod_eu1: /api/v4/pods/learn?method=GET&path_info=/my-company/my-project + pod_eu1->>router_eu: {path: "/my-company", pod: "pod_eu0", source: "routable"} + router_eu->>pod_eu0: GET /my-company/my-project + pod_eu0->>user: <h1>My Project... +``` + +#### Navigates to `/my-company/my-project` while not logged in + +1. User is in Europe so DNS resolves to the router in Europe +1. The router does not have `/my-company/*` cached yet so it chooses randomly `Pod EU1` +1. The `/pods/learn` is sent to `Pod EU1`, which responds that resource lives on `Pod EU0` +1. `Pod EU0` redirects them through a login flow +1. User requests `/users/sign_in`, uses random Pod to run `/pods/learn` +1. The `Pod EU1` responds with `pod_0` as a fixed route +1. User after login requests `/my-company/my-project` which is cached and stored in `Pod EU0` +1. `Pod EU0` returns the correct response + +```mermaid +sequenceDiagram + participant user as User + participant router_eu as Router EU + participant pod_eu0 as Pod EU0 + participant pod_eu1 as Pod EU1 + user->>router_eu: GET /my-company/my-project + router_eu->>pod_eu1: /api/v4/pods/learn?method=GET&path_info=/my-company/my-project + pod_eu1->>router_eu: {path: "/my-company", pod: "pod_eu0", source: "routable"} + router_eu->>pod_eu0: GET /my-company/my-project + pod_eu0->>user: 302 /users/sign_in?redirect=/my-company/my-project + user->>router_eu: GET /users/sign_in?redirect=/my-company/my-project + router_eu->>pod_eu1: /api/v4/pods/learn?method=GET&path_info=/users/sign_in + pod_eu1->>router_eu: {path: "/users", pod: "pod_eu0", source: "fixed"} + router_eu->>pod_eu0: GET /users/sign_in?redirect=/my-company/my-project + pod_eu0-->>user: <h1>Sign in... + user->>router_eu: POST /users/sign_in?redirect=/my-company/my-project + router_eu->>pod_eu0: POST /users/sign_in?redirect=/my-company/my-project + pod_eu0->>user: 302 /my-company/my-project + user->>router_eu: GET /my-company/my-project + router_eu->>pod_eu0: GET /my-company/my-project + router_eu->>pod_eu0: GET /my-company/my-project + pod_eu0->>user: <h1>My Project... +``` + +#### Navigates to `/my-company/my-other-project` after last step + +1. User is in Europe so DNS resolves to the router in Europe +1. The router cache now has `/my-company/* => Pod EU0`, so the router chooses `Pod EU0` +1. `Pod EU0` returns the correct response as well as the cache header again + +```mermaid +sequenceDiagram + participant user as User + participant router_eu as Router EU + participant pod_eu0 as Pod EU0 + participant pod_eu1 as Pod EU1 + user->>router_eu: GET /my-company/my-project + router_eu->>pod_eu0: GET /my-company/my-project + pod_eu0->>user: <h1>My Project... +``` + +#### Navigates to `/gitlab-org/gitlab` after last step + +1. User is in Europe so DNS resolves to the router in Europe +1. The router has no cached value for this URL so randomly chooses `Pod EU0` +1. `Pod EU0` redirects the router to `Pod US0` +1. `Pod US0` returns the correct response as well as the cache header again + +```mermaid +sequenceDiagram + participant user as User + participant router_eu as Router EU + participant pod_eu0 as Pod EU0 + participant pod_us0 as Pod US0 + user->>router_eu: GET /gitlab-org/gitlab + router_eu->>pod_eu0: /api/v4/pods/learn?method=GET&path_info=/gitlab-org/gitlab + pod_eu0->>router_eu: {path: "/gitlab-org", pod: "pod_us0", source: "routable"} + router_eu->>pod_us0: GET /gitlab-org/gitlab + pod_us0->>user: <h1>GitLab.org... +``` + +In this case the user is not on their "default organization" so their TODO +counter will not include their normal todos. We may choose to highlight this in +the UI somewhere. A future iteration may be able to fetch that for them from +their default organization. + +#### Navigates to `/` + +1. User is in Europe so DNS resolves to the router in Europe +1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route) +1. The Router choose `Pod EU0` randomly +1. The Rails application knows the users default organization is `/my-organization`, so + it redirects the user to `/organizations/my-organization/-/dashboard` +1. The Router has a cached value for `/organizations/my-organization/*` so it then sends the + request to `POD EU0` +1. `Pod EU0` serves up a new page `/organizations/my-organization/-/dashboard` which is the same + dashboard view we have today but scoped to an organization clearly in the UI +1. The user is (optionally) presented with a message saying that data on this page is only + from their default organization and that they can change their default + organization if it's not right. + +```mermaid +sequenceDiagram + participant user as User + participant router_eu as Router EU + participant pod_eu0 as Pod EU0 + user->>router_eu: GET / + router_eu->>pod_eu0: GET / + pod_eu0->>user: 302 /organizations/my-organization/-/dashboard + user->>router: GET /organizations/my-organization/-/dashboard + router->>pod_eu0: GET /organizations/my-organization/-/dashboard + pod_eu0->>user: <h1>My Company Dashboard... X-Gitlab-Pod-Cache={path_prefix:/organizations/my-organization/} +``` + +#### Navigates to `/dashboard` + +As above, they will end up on `/organizations/my-organization/-/dashboard` as +the rails application will already redirect `/` to the dashboard page. + +### Navigates to `/not-my-company/not-my-project` while logged in (but they don't have access since this project/group is private) + +1. User is in Europe so DNS resolves to the router in Europe +1. The router knows that `/not-my-company` lives in `Pod US1` so sends the request to this +1. The user does not have access so `Pod US1` returns 404 + +```mermaid +sequenceDiagram + participant user as User + participant router_eu as Router EU + participant pod_us1 as Pod US1 + user->>router_eu: GET /not-my-company/not-my-project + router_eu->>pod_us1: GET /not-my-company/not-my-project + pod_us1->>user: 404 +``` + +#### Creates a new top level namespace + +The user will be asked which organization they want the namespace to belong to. +If they select `my-organization` then it will end up on the same pod as all +other namespaces in `my-organization`. If they select nothing we default to +`GitLab.com Public` and it is clear to the user that this is isolated from +their existing organization such that they won't be able to see data from both +on a single page. + +### Experience for GitLab team member that is part of `/gitlab-org` + +Such a user is considered a legacy user and has their default organization set to +`GitLab.com Public`. This is a "meta" organization that does not really exist but +the Rails application knows to interpret this organization to mean that they are +allowed to use legacy global functionality like `/dashboard` to see data across +namespaces located on `Pod US0`. The rails backend also knows that the default pod to render any ambiguous +routes like `/dashboard` is `Pod US0`. Lastly the user will be allowed to +navigate to organizations on another pod like `/my-organization` but when they do the +user will see a message indicating that some data may be missing (eg. the +MRs/Issues/Todos) counts. + +#### Navigates to `/gitlab-org/gitlab` while not logged in + +1. User is in the US so DNS resolves to the US router +1. The router knows that `/gitlab-org` lives in `Pod US0` so sends the request + to this pod +1. `Pod US0` serves up the response + +```mermaid +sequenceDiagram + participant user as User + participant router_us as Router US + participant pod_us0 as Pod US0 + user->>router_us: GET /gitlab-org/gitlab + router_us->>pod_us0: GET /gitlab-org/gitlab + pod_us0->>user: <h1>GitLab.org... +``` + +#### Navigates to `/` + +1. User is in US so DNS resolves to the router in US +1. Router does not have a cache for `/` route (specifically rails never tells it to cache this route) +1. The Router chooses `Pod US1` randomly +1. The Rails application knows the users default organization is `GitLab.com Public`, so + it redirects the user to `/dashboards` (only legacy users can see + `/dashboard` global view) +1. Router does not have a cache for `/dashboard` route (specifically rails never tells it to cache this route) +1. The Router chooses `Pod US1` randomly +1. The Rails application knows the users default organization is `GitLab.com Public`, so + it allows the user to load `/dashboards` (only legacy users can see + `/dashboard` global view) and redirects to router the legacy pod which is `Pod US0` +1. `Pod US0` serves up the global view dashboard page `/dashboard` which is the same + dashboard view we have today + +```mermaid +sequenceDiagram + participant user as User + participant router_us as Router US + participant pod_us0 as Pod US0 + participant pod_us1 as Pod US1 + user->>router_us: GET / + router_us->>pod_us1: GET / + pod_us1->>user: 302 /dashboard + user->>router_us: GET /dashboard + router_us->>pod_us1: /api/v4/pods/learn?method=GET&path_info=/dashboard + pod_us1->>router_us: {path: "/dashboard", pod: "pod_us0", source: "routable"} + router_us->>pod_us0: GET /dashboard + pod_us0->>user: <h1>Dashboard... +``` + +#### Navigates to `/my-company/my-other-project` while logged in (but they don't have access since this project is private) + +They get a 404. + +### Experience for non-logged in users + +Flow is similar to logged in users except global routes like `/dashboard` will +redirect to the login page as there is no default organization to choose from. + +### A new customers signs up + +They will be asked if they are already part of an organization or if they'd +like to create one. If they choose neither they end up no the default +`GitLab.com Public` organization. + +### An organization is moved from 1 pod to another + +TODO + +### GraphQL/API requests which don't include the namespace in the URL + +TODO + +### The autocomplete suggestion functionality in the search bar which remembers recent issues/MRs + +TODO + +### Global search + +TODO + +## Administrator + +### Loads `/admin` page + +1. The `/admin` is locked to `Pod US0` +1. Some endpoints of `/admin`, like Projects in Admin are scoped to a Pod + and users needs to choose the correct one in a dropdown, which results in endpoint + like `/admin/pods/pod_0/projects`. + +Admin Area settings in Postgres are all shared across all pods to avoid +divergence but we still make it clear in the URL and UI which pod is serving +the Admin Area page as there is dynamic data being generated from these pages and +the operator may want to view a specific pod. + +## More Technical Problems To Solve + +### Replicating User Sessions Between All Pods + +Today user sessions live in Redis but each pod will have their own Redis instance. We already use a dedicated Redis instance for sessions so we could consider sharing this with all pods like we do with `gitlab_users` PostgreSQL database. But an important consideration will be latency as we would still want to mostly fetch sessions from the same region. + +An alternative might be that user sessions get moved to a JWT payload that encodes all the session data but this has downsides. For example, it is difficult to expire a user session, when their password changes or for other reasons, if the session lives in a JWT controlled by the user. + +### How do we migrate between Pods + +Migrating data between pods will need to factor all data stores: + +1. PostgreSQL +1. Redis Shared State +1. Gitaly +1. Elasticsearch + +### Is it still possible to leak the existence of private groups via a timing attack? + +If you have router in EU, and you know that EU router by default redirects +to EU located Pods, you know their latency (lets assume 10ms). Now, if your +request is bounced back and redirected to US which has different latency +(lets assume that roundtrip will be around 60ms) you can deduce that 404 was +returned by US Pod and know that your 404 is in fact 403. + +We may defer this until we actually implement a pod in a different region. Such timing attacks are already theoretically possible with the way we do permission checks today but the timing difference is probably too small to be able to detect. + +One technique to mitigate this risk might be to have the router add a random +delay to any request that returns 404 from a pod. + +## Should runners be shared across all pods? + +We have 2 options and we should decide which is easier: + +1. Decompose runner registration and queuing tables and share them across all + pods. This may have implications for scalability, and we'd need to consider + if this would include group/project runners as this may have scalability + concerns as these are high traffic tables that would need to be shared. +1. Runners are registered per-pod and, we probably have a separate fleet of + runners for every pod or just register the same runners to many pods which + may have implications for queueing + +## How do we guarantee unique ids across all pods for things that cannot conflict? + +This project assumes at least namespaces and projects have unique ids across +all pods as many requests need to be routed based on their ID. Since those +tables are across different databases then guaranteeing a unique ID will +require a new solution. There are likely other tables where unique IDs are +necessary and depending on how we resolve routing for GraphQL and other APIs +and other design goals it may be determined that we want the primary key to be +unique for all tables. |