diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2020-11-19 08:27:35 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2020-11-19 08:27:35 +0000 |
commit | 7e9c479f7de77702622631cff2628a9c8dcbc627 (patch) | |
tree | c8f718a08e110ad7e1894510980d2155a6549197 /doc/development/graphql_guide | |
parent | e852b0ae16db4052c1c567d9efa4facc81146e88 (diff) | |
download | gitlab-ce-7e9c479f7de77702622631cff2628a9c8dcbc627.tar.gz |
Add latest changes from gitlab-org/gitlab@13-6-stable-eev13.6.0-rc42
Diffstat (limited to 'doc/development/graphql_guide')
-rw-r--r-- | doc/development/graphql_guide/batchloader.md | 121 | ||||
-rw-r--r-- | doc/development/graphql_guide/index.md | 6 | ||||
-rw-r--r-- | doc/development/graphql_guide/pagination.md | 173 |
3 files changed, 296 insertions, 4 deletions
diff --git a/doc/development/graphql_guide/batchloader.md b/doc/development/graphql_guide/batchloader.md new file mode 100644 index 00000000000..6d529358499 --- /dev/null +++ b/doc/development/graphql_guide/batchloader.md @@ -0,0 +1,121 @@ +--- +stage: Enablement +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +--- + +# GraphQL BatchLoader + +GitLab uses the [batch-loader](https://github.com/exAspArk/batch-loader) Ruby gem to optimize and avoid N+1 SQL queries. + +It is the properties of the GraphQL query tree that create opportunities for batching like this - disconnected nodes might need the same data, but cannot know about themselves. + +## When should you use it? + +We should try to batch DB requests as much as possible during GraphQL **query** execution. There is no need to batch loading during **mutations** because they are executed serially. If you need to make a database query, and it is possible to combine two similar (but not identical) queries, then consider using the batch-loader. + +When implementing a new endpoint we should aim to minimise the number of SQL queries. For stability and scalability we must also ensure that our queries do not suffer from N+1 performance issues. + +## Implementation + +Batch loading is useful when a series of queries for inputs `Qα, Qβ, ... Qω` can be combined to a single query for `Q[α, β, ... ω]`. An example of this is lookups by ID, where we can find two users by usernames as cheaply as one, but real-world examples can be more complex. + +Batchloading is not suitable when the result sets have different sort-orders, grouping, aggregation or other non-composable features. + +There are two ways to use the batch-loader in your code. For simple ID lookups, use `::Gitlab::Graphql::Loaders::BatchModelLoader.new(model, id).find`. For more complex cases, you can use the batch API directly. + +For example, to load a `User` by `username`, we can add batching as follows: + +```ruby +class UserResolver < BaseResolver + type UserType, null: true + argument :username, ::GraphQL::STRING_TYPE, required: true + + def resolve(**args) + BatchLoader::GraphQL.for(username).batch do |usernames, loader| + User.by_username(usernames).each do |user| + loader.call(user.username, user) + end + end + end +end +``` + +- `project_id` is the `ID` of the current project being queried +- `loader.call` is used to map the result back to the input key (here a project ID) +- `BatchLoader::GraphQL` returns a lazy object (suspended promise to fetch the data) + +Here an [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/46549) illustrating how to use our `BatchLoading` mechanism. + +## How does it work exactly? + +Each lazy object knows which data it needs to load and how to batch the query. When we need to use the lazy objects (which we announce by calling `#sync`), they will be loaded along with all other similar objects in the current batch. + +Inside the block we execute a batch query for our items (`User`). After that, all we have to do is to call loader by passing an item which was used in `BatchLoader::GraphQL.for` method (`usernames`) and the loaded object itself (`user`): + +```ruby +BatchLoader::GraphQL.for(username).batch do |usernames, loader| + User.by_username(usernames).each do |user| + loader.call(user.username, user) + end +end +``` + +### What does lazy mean? + +It is important to avoid syncing batches too early. In the example below we can see how calling sync too early can eliminate opportunities for batching: + +```ruby +x = find_lazy(1) +y = find_lazy(2) + +# calling .sync will flush the current batch and will inhibit maximum laziness +x.sync + +z = find_lazy(3) + +y.sync +z.sync + +# => will run 2 queries +``` + +```ruby +x = find_lazy(1) +y = find_lazy(2) +z = find_lazy(3) + +x.sync +y.sync +z.sync + +# => will run 1 query +``` + +## Testing + +Any GraphQL field that supports `BatchLoading` should be tested using the `batch_sync` method available in [GraphQLHelpers](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/helpers/graphql_helpers.rb). + +```ruby +it 'returns data as a batch' do + results = batch_sync(max_queries: 1) do + [{ id: 1 }, { id: 2 }].map { |args| resolve(args) } + end + + expect(results).to eq(expected_results) +end + +def resolve(args = {}, context = { current_user: current_user }) + resolve(described_class, obj: obj, args: args, ctx: context) +end +``` + +We can also use [QueryRecorder](../query_recorder.md) to make sure we are performing only **one SQL query** per call. + +```ruby +it 'executes only 1 SQL query' do + query_count = ActiveRecord::QueryRecorder.new { subject }.count + + expect(query_count).to eq(1) +end +``` diff --git a/doc/development/graphql_guide/index.md b/doc/development/graphql_guide/index.md index 9d7fb5ba0a8..12b4f9796c7 100644 --- a/doc/development/graphql_guide/index.md +++ b/doc/development/graphql_guide/index.md @@ -1,3 +1,9 @@ +--- +stage: none +group: unassigned +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +--- + # GraphQL development guidelines This guide contains all the information to successfully contribute to GitLab's diff --git a/doc/development/graphql_guide/pagination.md b/doc/development/graphql_guide/pagination.md index bf9eaa99158..d5140363396 100644 --- a/doc/development/graphql_guide/pagination.md +++ b/doc/development/graphql_guide/pagination.md @@ -1,3 +1,9 @@ +--- +stage: none +group: unassigned +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +--- + # GraphQL pagination ## Types of pagination @@ -59,13 +65,13 @@ Some of the benefits and tradeoffs of keyset pagination are - Performance is much better. -- Data stability is greater since you're not going to miss records due to +- More data stability for end-users since records are not missing from lists due to deletions or insertions. - It's the best way to do infinite scrolling. - It's more difficult to program and maintain. Easy for `updated_at` and - `sort_order`, complicated (or impossible) for complex sorting scenarios. + `sort_order`, complicated (or impossible) for [complex sorting scenarios](#limitations-of-query-complexity). ## Implementation @@ -80,12 +86,171 @@ However, there are some cases where we have to use the offset pagination connection, `OffsetActiveRecordRelationConnection`, such as when sorting by label priority in issues, due to the complexity of the sort. -<!-- ### Keyset pagination --> +### Keyset pagination + +The keyset pagination implementation is a subclass of `GraphQL::Pagination::ActiveRecordRelationConnection`, +which is a part of the `graphql` gem. This is installed as the default for all `ActiveRecord::Relation`. +However, instead of using a cursor based on an offset (which is the default), GitLab uses a more specialized cursor. + +The cursor is created by encoding a JSON object which contains the relevant ordering fields. For example: + +```ruby +ordering = {"id"=>"72410125", "created_at"=>"2020-10-08 18:05:21.953398000 UTC"} +json = ordering.to_json +cursor = Base64Bp.urlsafe_encode64(json, padding: false) + +"eyJpZCI6IjcyNDEwMTI1IiwiY3JlYXRlZF9hdCI6IjIwMjAtMTAtMDggMTg6MDU6MjEuOTUzMzk4MDAwIFVUQyJ9" + +json = Base64Bp.urlsafe_decode64(cursor) +Gitlab::Json.parse(json) + +{"id"=>"72410125", "created_at"=>"2020-10-08 18:05:21.953398000 UTC"} +``` + +The benefits of storing the order attribute values in the cursor: + +- If only the ID of the object were stored, the object and its attributes could be queried. + That would require an additional query, and if the object is no longer there, then the needed + attributes are not available. +- If an attribute is `NULL`, then one SQL query can be used. If it's not `NULL`, then a + different SQL query can be used. + +Based on whether the main attribute field being sorted on is `NULL` in the cursor, the proper query +condition is built. The last ordering field is considered to be unique (a primary key), meaning the +column never contains `NULL` values. + +#### Limitations of query complexity + +We only support two ordering fields, and one of those fields needs to be the primary key. + +Here are two examples of pseudocode for the query: + +- **Two-condition query.** `X` represents the values from the cursor. `C` represents + the columns in the database, sorted in ascending order, using an `:after` cursor, and with `NULL` + values sorted last. + + ```plaintext + X1 IS NOT NULL + AND + (C1 > X1) + OR + (C1 IS NULL) + OR + (C1 = X1 + AND + C2 > X2) + + X1 IS NULL + AND + (C1 IS NULL + AND + C2 > X2) + ``` + + Below is an example based on the relation `Issue.order(relative_position: :asc).order(id: :asc)` + with an after cursor of `relative_position: 1500, id: 500`: + + ```plaintext + when cursor[relative_position] is not NULL + + ("issues"."relative_position" > 1500) + OR ( + "issues"."relative_position" = 1500 + AND + "issues"."id" > 500 + ) + OR ("issues"."relative_position" IS NULL) + + when cursor[relative_position] is NULL + + "issues"."relative_position" IS NULL + AND + "issues"."id" > 500 + ``` + +- **Three-condition query.** The example below is not complete, but shows the + complexity of adding one more condition. `X` represents the values from the cursor. `C` represents + the columns in the database, sorted in ascending order, using an `:after` cursor, and with `NULL` + values sorted last. + + ```plaintext + X1 IS NOT NULL + AND + (C1 > X1) + OR + (C1 IS NULL) + OR + (C1 = X1 AND C2 > X2) + OR + (C1 = X1 + AND + X2 IS NOT NULL + AND + ((C2 > X2) + OR + (C2 IS NULL) + OR + (C2 = X2 AND C3 > X3) + OR + X2 IS NULL..... + ``` + +By using +[`Gitlab::Graphql::Pagination::Keyset::QueryBuilder`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/keyset/query_builder.rb), +we're able to build the necessary SQL conditions and apply them to the Active Record relation. + +Complex queries can be difficult or impossible to use. For example, +in [`issuable.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb), +the `order_due_date_and_labels_priority` method creates a very complex query. + +These types of queries are not supported. In these instances, you can use offset pagination. + +### Offset pagination + +There are times when the [complexity of sorting](#limitations-of-query-complexity) +is more than our keyset pagination can handle. + +For example, in [`IssuesResolver`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/resolvers/issues_resolver.rb), +when sorting by `priority_asc`, we can't use keyset pagination as the ordering is much +too complex. For more information, read [`issuable.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/concerns/issuable.rb). -<!-- ### Offset pagination --> +In cases like this, we can fall back to regular offset pagination by returning a +[`Gitlab::Graphql::Pagination::OffsetActiveRecordRelationConnection`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/offset_active_record_relation_connection.rb) +instead of an `ActiveRecord::Relation`: + +```ruby + def resolve(parent, finder, **args) + issues = apply_lookahead(Gitlab::Graphql::Loaders::IssuableLoader.new(parent, finder).batching_find_all) + + if non_stable_cursor_sort?(args[:sort]) + # Certain complex sorts are not supported by the stable cursor pagination yet. + # In these cases, we use offset pagination, so we return the correct connection. + Gitlab::Graphql::Pagination::OffsetActiveRecordRelationConnection.new(issues) + else + issues + end + end +``` <!-- ### External pagination --> +### External pagination + +There may be times where you need to return data through the GitLab API that is stored in +another system. In these cases you may have to paginate a third-party's API. + +An example of this is with our [Error Tracking](../../operations/error_tracking.md) implementation, +where we proxy [Sentry errors](../../operations/error_tracking.md#sentry-error-tracking) through +the GitLab API. We do this by calling the Sentry API which enforces its own pagination rules. +This means we cannot access the collection within GitLab to perform our own custom pagination. + +For consistency, we manually set the pagination cursors based on values returned by the external API, using `Gitlab::Graphql::ExternallyPaginatedArray.new(previous_cursor, next_cursor, *items)`. + +You can see an example implementation in the following files: + +- [`types/error__tracking/sentry_error_collection_type.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/graphql/types/error_tracking/sentry_error_collection_type.rb) which adds an extension to `field :errors`. +- [`resolvers/error_tracking/sentry_errors_resolver.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/app/graphql/resolvers/error_tracking/sentry_errors_resolver.rb) which returns the data from the resolver. + ## Testing Any GraphQL field that supports pagination and sorting should be tested |