summaryrefslogtreecommitdiff
path: root/doc/architecture/blueprints/graphql_api/index.md
blob: 95ff834cd271eb04ae0f0b83652e2fc0b87fc262 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
status: accepted
creation-date: "2021-01-07"
authors: [ "@grzesiek" ]
coach: [ "@ayufan", "@grzesiek" ]
approvers: [ "@dsatcher", "@deuley" ]
owning-stage: "~devops::manage"
participating-stages: []
---

# GraphQL API

[GraphQL](https://graphql.org/) is a data query and manipulation language for
APIs, and a runtime for fulfilling queries with existing data.

At GitLab we want to adopt GraphQL to make it easier for the wider community to
interact with GitLab in a reliable way, but also to advance our own product by
modeling communication between backend and frontend components using GraphQL.

We've recently increased the pace of the adoption by defining quarterly OKRs
related to GraphQL migration. This resulted in us spending more time on the
GraphQL development and helped to surface the need of improving tooling we use
to extend the new API.

This document describes the work that is needed to build a stable foundation that
will support our development efforts and a large-scale usage of the [GraphQL API](../../../api/graphql/index.md).

## Summary

The GraphQL initiative at GitLab [started around three years ago](https://gitlab.com/gitlab-org/gitlab/-/commit/9c6c17cbcdb8bf8185fc1b873dcfd08f723e4df5).
Most of the work around the GraphQL ecosystem has been done by volunteers that are
[GraphQL experts](https://gitlab.com/groups/gitlab-org/graphql-experts/-/group_members?with_inherited_permissions=exclude).

The [retrospective on our progress](https://gitlab.com/gitlab-org/gitlab/-/issues/235659)
surfaced a few opportunities to streamline our GraphQL development efforts and
to reduce the risk of performance degradations and possible outages that may
be related to the gaps in the essential mechanisms needed to make the GraphQL
API observable and operable at scale.

Amongst small improvements to the GraphQL engine itself we want to build a
comprehensive monitoring dashboard, that will enable team members to make sense
of what is happening inside our GraphQL API. We want to make it possible to define
SLOs, triage breached SLIs and to be able to zoom into relevant details using
Grafana and Elastic. We want to see historical data and predict future usage.

It is an opportunity to learn from our experience in evolving the REST API, for
the scale, and to apply this knowledge onto the GraphQL development efforts. We
can do that by building query-to-feature correlation mechanisms, adding
scalable state synchronization support and aligning GraphQL with other
architectural initiatives being executed in parallel, like
[the support for direct uploads](https://gitlab.com/gitlab-org/gitlab/-/issues/280819).

GraphQL should be secure by default. We can avoid common security mistakes by
building mechanisms that will help us to enforce
[OWASP GraphQL recommendations](https://cheatsheetseries.owasp.org/cheatsheets/GraphQL_Cheat_Sheet.html)
that are relevant to us.

Understanding what are the needs of the wider community will also allow us to
plan deprecation policies better and to design parity between GraphQL and REST
API that suits their needs.

## Challenges

### Make sense of what is happening in GraphQL

Being able to see how GraphQL performs in a production environment is a
prerequisite for improving performance and reliability of that service.

We do not yet have tools that would make it possible for us to answer a
question of how GraphQL performs and what the bottlenecks we should optimize
are. This, combined with a pace of GraphQL adoption and the scale in which we
expect it operate, imposes a risk of an increased rate of production incidents
what will be difficult to resolve.

We want to build a comprehensive Grafana dashboard that will focus on
delivering insights of how GraphQL endpoint performs, while still empowering
team members with capability of zooming in into details. We want to improve
logging to make it possible to better correlate GraphQL queries with feature
using Elastic and to index them in a way that performance problems can be
detected early.

- Build a comprehensive Grafana dashboard for GraphQL
- Build a GraphQL query-to-feature correlation mechanisms
- Improve logging GraphQL queries in Elastic
- Redesign error handling on frontend to surface warnings

### Manage volatile GraphQL data structures

Our GraphQL API will evolve with time. GraphQL has been designed to make such
evolution easier. GraphQL APIs are easier to extend because of how composable
GraphQL is. On the other hand this is also a reason why versioning of GraphQL
APIs is considered unnecessary. Instead of versioning the API we want to mark
some fields as deprecated, but we need to have a way to understand what is the
usage of deprecated fields, types and a way to visualize it in a way that is
easy to understand. We might want to detect usage of deprecated fields and
notify users that we plan to remove them.

- Define a data-informed deprecation policy that will serve our users better
- Build a dashboard showing usage frequency of deprecated GraphQL fields
- Build mechanisms required to send deprecated fields usage in Service Ping

### Ensure consistency with the rest of the codebase

GraphQL is not the only thing we work on, but it cuts across the entire
application. It is being used to expose data collected and processed in almost
every part of our product. It makes it tightly coupled with our monolithic
codebase.

We need to ensure that how we use GraphQL is consistent with other mechanisms
we've designed to improve performance and reliability of GitLab.

We have extensive experience with evolving our REST API. We want to apply
this knowledge onto GraphQL and make it performant and secure by default.

- Design direct uploads for GraphQL
- Build GraphQL query depth and complexity histograms
- Visualize the amount of GraphQL queries reaching limits
- Add support for GraphQL ETags for existing features

### Design GraphQL interoperability with REST API

We do not plan to deprecate our REST API. It is a simple way to interact with
GitLab, and GraphQL might never become a full replacement of a traditional REST
API. The two APIs will need to coexist together. We will need to remove
duplication between them to make their codebases maintainable. This symbiosis,
however, is not only a technical challenge we need to resolve on the backend.
Users might want to use the two APIs interchangeably or even at the same time.
Making it interoperable by exposing a common scheme for resource identifiers is
a prerequisite for interoperability.

- Make GraphQL and REST API interoperable
- Design common resource identifiers for both APIs

### Design scalable state synchronization mechanisms

One of the most important goals related to GraphQL adoption at GitLab is using
it to model interactions between GitLab backend and frontend components. This
is an ongoing process that has already surfaced the need of building better
state synchronization mechanisms and hooking into existing ones.

- Design a scalable state synchronization mechanism
- Evaluate state synchronization through pub/sub and websockets
- Build a generic support for GraphQL feature correlation and feature ETags
- Redesign frontend code responsible for managing shared global state

## Iterations

### In the scope of the blueprint

1. [GraphQL API architecture](https://gitlab.com/groups/gitlab-org/-/epics/5842)
    1. [Build comprehensive Grafana dashboard for GraphQL](https://gitlab.com/groups/gitlab-org/-/epics/5841)
    1. [Improve logging of GraphQL requests in Elastic](https://gitlab.com/groups/gitlab-org/-/epics/4646)
    1. [Build GraphQL query correlation mechanisms](https://gitlab.com/groups/gitlab-org/-/epics/5320)
    1. [Design a better data-informed deprecation policy](https://gitlab.com/groups/gitlab-org/-/epics/5321)

### Future iterations

1. [Build a scalable state synchronization for GraphQL](https://gitlab.com/groups/gitlab-org/-/epics/5319)
1. [Add support for direct uploads for GraphQL](https://gitlab.com/gitlab-org/gitlab/-/issues/280819)
1. [Review GraphQL design choices related to security](https://gitlab.com/gitlab-org/security/gitlab/-/issues/339)