summaryrefslogtreecommitdiff
path: root/doc/development/graphql_guide/pagination.md
blob: 5fd2179ea9baee6092b71fc8e08b11e65487b9ce (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
---
stage: Plan
group: Project Management
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---

# GraphQL pagination

## Types of pagination

GitLab uses two primary types of pagination: **offset** and **keyset**
(sometimes called cursor-based) pagination.
The GraphQL API mainly uses keyset pagination, falling back to offset pagination when needed.

### Performance considerations

See the [general pagination guidelines section](../database/pagination_guidelines.md) for more information.

### Offset pagination

This is the traditional, page-by-page pagination, that is most common,
and used across much of GitLab. You can recognize it by
a list of page numbers near the bottom of a page, which, when clicked,
take you to that page of results.

For example, when you click **Page 100**, we send `100` to the
backend. For example, if each page has say 20 items, the
backend calculates `20 * 100 = 2000`,
and it queries the database by offsetting (skipping) the first 2000
records and pulls the next 20.

```plaintext
page number * page size = where to find my records
```

There are a couple of problems with this:

- Performance. When we query for page 100 (which gives an offset of
  2000), then the database has to scan through the table to that
  specific offset, and then pick up the next 20 records. As the offset
  increases, the performance degrades quickly.
  Read more in
  [The SQL I Love <3. Efficient pagination of a table with 100M records](http://allyouneedisbackend.com/blog/2017/09/24/the-sql-i-love-part-1-scanning-large-table/).

- Data stability. When you get the 20 items for page 100 (at
  offset 2000), GitLab shows those 20 items. If someone then
  deletes or adds records in page 99 or before, the items at
  offset 2000 become a different set of items. You can even get into a
  situation where, when paginating, you could skip over items,
  because the list keeps changing.
  Read more in
  [Pagination: You're (Probably) Doing It Wrong](https://coderwall.com/p/lkcaag/pagination-you-re-probably-doing-it-wrong).

### Keyset pagination

Given any specific record, if you know how to calculate what comes
after it, you can query the database for those specific records.

For example, suppose you have a list of issues sorted by creation date.
If you know the first item on a page has a specific date (say Jan 1), you can ask
for all records that were created after that date and take the first 20.
It no longer matters if many are deleted or added, as you always ask for
the ones after that date, and so get the correct items.

Unfortunately, there is no easy way to know if the issue created
on Jan 1 is on page 20 or page 100.

Some of the benefits and tradeoffs of keyset pagination are

- Performance is much better.

- More data stability for end-users since records are not missing from lists due to
  deletions or insertions.

- It's the best way to do infinite scrolling.

- It's more difficult to program and maintain. Easy for `updated_at` and
  `sort_order`, complicated (or impossible) for [complex sorting scenarios](#limitations-of-query-complexity).

## Implementation

When pagination is supported for a query, GitLab defaults to using
keyset pagination. You can see where this is configured in
[`pagination/connections.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/connections.rb).
If a query returns `ActiveRecord::Relation`, keyset pagination is automatically used.

This was a conscious decision to support performance and data stability.

However, there are some cases where we have to use the offset
pagination connection, `OffsetActiveRecordRelationConnection`, such as when
sorting by label priority in issues, due to the complexity of the sort.

If you return a relation from a resolver that is not suitable for keyset
pagination (due to the sort order for example), then you can use the
`BaseResolver#offset_pagination` method to wrap the relation in the correct
connection type. For example:

```ruby
def resolve(**args)
  result = Finder.new(object, current_user, args).execute
  result = offset_pagination(result) if needs_offset?(args[:sort])

  result
end
```

### Keyset pagination

The keyset pagination implementation is a subclass of `GraphQL::Pagination::ActiveRecordRelationConnection`,
which is a part of the `graphql` gem. This is installed as the default for all `ActiveRecord::Relation`.
However, instead of using a cursor based on an offset (which is the default), GitLab uses a more specialized cursor.

The cursor is created by encoding a JSON object which contains the relevant ordering fields. For example:

```ruby
ordering = {"id"=>"72410125", "created_at"=>"2020-10-08 18:05:21.953398000 UTC"}
json = ordering.to_json
cursor = Base64Bp.urlsafe_encode64(json, padding: false)

"eyJpZCI6IjcyNDEwMTI1IiwiY3JlYXRlZF9hdCI6IjIwMjAtMTAtMDggMTg6MDU6MjEuOTUzMzk4MDAwIFVUQyJ9"

json = Base64Bp.urlsafe_decode64(cursor)
Gitlab::Json.parse(json)

{"id"=>"72410125", "created_at"=>"2020-10-08 18:05:21.953398000 UTC"}
```

The benefits of storing the order attribute values in the cursor:

- If only the ID of the object were stored, the object and its attributes could be queried.
  That would require an additional query, and if the object is no longer there, then the needed
  attributes are not available.
- If an attribute is `NULL`, then one SQL query can be used. If it's not `NULL`, then a
  different SQL query can be used.

Based on whether the main attribute field being sorted on is `NULL` in the cursor, the proper query
condition is built. The last ordering field is considered to be unique (a primary key), meaning the
column never contains `NULL` values.

#### Limitations of query complexity

We only support two ordering fields, and one of those fields needs to be the primary key.

Here are two examples of pseudocode for the query:

- **Two-condition query.** `X` represents the values from the cursor. `C` represents
  the columns in the database, sorted in ascending order, using an `:after` cursor, and with `NULL`
  values sorted last.

  ```plaintext
  X1 IS NOT NULL
    AND
      (C1 > X1)
        OR
      (C1 IS NULL)
        OR
      (C1 = X1
        AND
       C2 > X2)

  X1 IS NULL
    AND
      (C1 IS NULL
        AND
       C2 > X2)
  ```

  Below is an example based on the relation `Issue.order(relative_position: :asc).order(id: :asc)`
  with an after cursor of `relative_position: 1500, id: 500`:

  ```plaintext
  when cursor[relative_position] is not NULL

      ("issues"."relative_position" > 1500)
      OR (
        "issues"."relative_position" = 1500
        AND
        "issues"."id" > 500
      )
      OR ("issues"."relative_position" IS NULL)

  when cursor[relative_position] is NULL

      "issues"."relative_position" IS NULL
      AND
      "issues"."id" > 500
  ```

- **Three-condition query.** The example below is not complete, but shows the
  complexity of adding one more condition. `X` represents the values from the cursor. `C` represents
  the columns in the database, sorted in ascending order, using an `:after` cursor, and with `NULL`
  values sorted last.

  ```plaintext
  X1 IS NOT NULL
    AND
      (C1 > X1)
        OR
      (C1 IS NULL)
        OR
      (C1 = X1 AND C2 > X2)
        OR
      (C1 = X1
        AND
          X2 IS NOT NULL
            AND
              ((C2 > X2)
                 OR
               (C2 IS NULL)
                 OR
               (C2 = X2 AND C3 > X3)
        OR
          X2 IS NULL.....
  ```

By using
[`Gitlab::Graphql::Pagination::Keyset::QueryBuilder`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/keyset/query_builder.rb),
we're able to build the necessary SQL conditions and apply them to the Active Record relation.

Complex queries can be difficult or impossible to use. For example,
in [`issuable.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb),
the `order_due_date_and_labels_priority` method creates a very complex query.

These types of queries are not supported. In these instances, you can use offset pagination.

#### Gotchas

Do not define a collection's order using the string syntax:

```ruby
# Bad
items.order('created_at DESC')
```

Instead, use the hash syntax:

```ruby
# Good
items.order(created_at: :desc)
```

The first example won't correctly embed the sort information (`created_at`, in
the example above) into the pagination cursors, which will result in an
incorrect sort order.

### Offset pagination

There are times when the [complexity of sorting](#limitations-of-query-complexity)
is more than our keyset pagination can handle.

For example, in [`IssuesResolver`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/resolvers/issues_resolver.rb),
when sorting by `priority_asc`, we can't use keyset pagination as the ordering is much
too complex. For more information, read [`issuable.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb).

In cases like this, we can fall back to regular offset pagination by returning a
[`Gitlab::Graphql::Pagination::OffsetActiveRecordRelationConnection`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/offset_active_record_relation_connection.rb)
instead of an `ActiveRecord::Relation`:

```ruby
    def resolve(parent, finder, **args)
      issues = apply_lookahead(Gitlab::Graphql::Loaders::IssuableLoader.new(parent, finder).batching_find_all)

      if non_stable_cursor_sort?(args[:sort])
        # Certain complex sorts are not supported by the stable cursor pagination yet.
        # In these cases, we use offset pagination, so we return the correct connection.
        offset_pagination(issues)
      else
        issues
      end
    end
```

<!-- ### External pagination -->

### External pagination

There may be times where you need to return data through the GitLab API that is stored in
another system. In these cases you may have to paginate a third-party's API.

An example of this is with our [Error Tracking](../../operations/error_tracking.md) implementation,
where we proxy [Sentry errors](../../operations/error_tracking.md#sentry-error-tracking) through
the GitLab API. We do this by calling the Sentry API which enforces its own pagination rules.
This means we cannot access the collection within GitLab to perform our own custom pagination.

For consistency, we manually set the pagination cursors based on values returned by the external API, using `Gitlab::Graphql::ExternallyPaginatedArray.new(previous_cursor, next_cursor, *items)`.

You can see an example implementation in the following files:

- [`types/error__tracking/sentry_error_collection_type.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/types/error_tracking/sentry_error_collection_type.rb) which adds an extension to  `field :errors`.
- [`resolvers/error_tracking/sentry_errors_resolver.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/resolvers/error_tracking/sentry_errors_resolver.rb) which returns the data from the resolver.

## Testing

Any GraphQL field that supports pagination and sorting should be tested
using the sorted paginated query shared example found in
[`graphql/sorted_paginated_query_shared_examples.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/shared_examples/graphql/sorted_paginated_query_shared_examples.rb).
It helps verify that your sort keys are compatible and that cursors
work properly.

This is particularly important when using keyset pagination, as some sort keys might not be supported.

Add a section to your request specs like this:

```ruby
describe 'sorting and pagination' do
  ...
end
```

You can then use
[`issues_spec.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/requests/api/graphql/project/issues_spec.rb)
as an example to construct your tests.

[`graphql/sorted_paginated_query_shared_examples.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/shared_examples/graphql/sorted_paginated_query_shared_examples.rb)
also contains some documentation on how to use the shared examples.

The shared example requires certain `let` variables and methods to be set up:

```ruby
describe 'sorting and pagination' do
  let_it_be(:sort_project) { create(:project, :public) }
  let(:data_path)    { [:project, :issues] }

  def pagination_query(params)
    graphql_query_for( :project, { full_path: sort_project.full_path },
      query_nodes(:issues, :id, include_pagination_info: true, args: params))
    )
  end

  def pagination_results_data(nodes)
    nodes.map { |issue| issue['iid'].to_i }
  end

  context 'when sorting by weight' do
    let_it_be(:issues) { make_some_issues_with_weights }

    context 'when ascending' do
      let(:ordered_issues) { issues.sort_by(&:weight) }

      it_behaves_like 'sorted paginated query' do
        let(:sort_param)       { :WEIGHT_ASC }
        let(:first_param)      { 2 }
        let(:expected_results) { ordered_issues.map(&:iid) }
      end
    end
  end
```