From 5e0a2db7068296d2b376a014d76dc5a54bdb8243 Mon Sep 17 00:00:00 2001 From: Oswaldo Ferreira Date: Mon, 9 Apr 2018 18:24:26 -0300 Subject: Add developer documentation on diff handling and limits --- doc/development/README.md | 1 + doc/development/diffs.md | 115 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100644 doc/development/diffs.md (limited to 'doc') diff --git a/doc/development/README.md b/doc/development/README.md index 45e9565f9a7..c3ebc6e23bf 100644 --- a/doc/development/README.md +++ b/doc/development/README.md @@ -42,6 +42,7 @@ comments: false - [Issue and merge requests state models](object_state_models.md) - [How to dump production data to staging](db_dump.md) - [Working with the GitHub importer](github_importer.md) +- [Working with Merge Request diffs](diffs.md) ## Performance guides diff --git a/doc/development/diffs.md b/doc/development/diffs.md new file mode 100644 index 00000000000..55fc16e0b33 --- /dev/null +++ b/doc/development/diffs.md @@ -0,0 +1,115 @@ +# Working with Merge Request diffs + +Currently we rely on different sources to present merge request diffs, these include: + +- Rugged gem +- Gitaly service +- Database (through `merge_request_diff_files`) +- Redis (cached highlighted diffs) + +We're constantly moving Rugged calls to Gitaly and the progress can be followed through [Gitaly repo](https://gitlab.com/gitlab-org/gitaly). + +## Architecture overview + +When refreshing a Merge Request (pushing to a source branch, force-pushing to target branch, or if the target branch now contains any commits from the MR) +we fetch the comparison information using `Gitlab::Git::Compare`, which fetches `base` and `head` data using Gitaly and diff between them through +`Gitlab::Git::Diff.between` (which uses _Gitaly_ if it's enabled, otherwise _Rugged_). +The diffs fetching process _limits_ single file diff sizes and the overall size of the whole diff through a series of constant values. Raw diff files are +then persisted on `merge_request_diff_files` table. + +Even though diffs higher than 10kb are collapsed (`Gitlab::Git::Diff::COLLAPSE_LIMIT`), we still keep them on Postgres. However, diff files over _safety limits_ +(see the [Diff limits section](#diff-limits)) are _not_ persisted. + +In order to present diffs information on the Merge Request diffs page, we: + +1. Fetch all diff files from database `merge_request_diff_files` +2. Fetch the _old_ and _new_ file blobs in batch to: + 1. Highlight old and new file content + 2. Know which viewer it should use for each file (text, image, deleted, etc) + 3. Know if the file content changed + 4. Know if it was stored externally + 5. Know if it had storage errors +3. If the diff file is cacheable (text-based), it's cached on Redis +using `Gitlab::Diff::FileCollection::MergeRequestDiff` + +## Diff limits + +As explained above, we limit single diff files and the size of the whole diff. There are scenarios where we collapse the diff file, +and cases where the diff file is not presented at all, and the user is guided to the Blob view. Here we'll go into details about +these limits. + +### Diff collection limits + +Limits that act onto all diff files collection. Files number, lines number and files size are considered. + +```ruby +Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_files] = 100 +``` + +File diffs will be collapsed (but be expandable) if 100 files have already been rendered. + + +```ruby +Gitlab::Git::DiffCollection.collection_limits[:safe_max_lines] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000 +``` + +File diffs will be collapsed (but be expandable) if 5000 lines have already been rendered. + + +```ruby +Gitlab::Git::DiffCollection.collection_limits[:safe_max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] * 5.kilobytes = 500.kilobytes +``` + +File diffs will be collapsed (but be expandable) if 500 kilobytes have already been rendered. + + +```ruby +Gitlab::Git::DiffCollection.collection_limits[:max_files] = Commit::DIFF_HARD_LIMIT_FILES = 1000 +``` + +No more files will be rendered at all if 1000 files have already been rendered. + + +```ruby +Gitlab::Git::DiffCollection.collection_limits[:max_lines] = Commit::DIFF_HARD_LIMIT_LINES = 50000 +``` + +No more files will be rendered at all if 50,000 lines have already been rendered. + +```ruby +Gitlab::Git::DiffCollection.collection_limits[:max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:max_files] * 5.kilobytes = 5000.kilobytes +``` + +No more files will be rendered at all if 5 megabytes have already been rendered. + + +### Individual diff file limits + +Limits that act onto each diff file of a collection. Files number, lines number and files size are considered. + +```ruby +Gitlab::Git::Diff::COLLAPSE_LIMIT = 10.kilobytes +``` + +File diff will be collapsed (but be expandable) if it is larger than 10 kilobytes. + +```ruby +Gitlab::Git::Diff::SIZE_LIMIT = 100.kilobytes +``` + +File diff will not be rendered if it's larger than 100 kilobytes. + + +```ruby +Commit::DIFF_SAFE_LINES = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000 +``` + +File diff will be suppressed (technically different from collapsed, but behaves the same, and is expandable) if it has more than 5000 lines. + +## Viewers + +Diff Viewers, which can be found on `models/diff_viewer/*` are classes used to map metadata about each type of Diff File. It has information +whether it's a binary, which partial should be used to render it or which File extensions this class accounts for. + +`DiffViewer::Base` validates _blobs_ (old and new versions) content, extension and file type in order to check if it can be rendered. + -- cgit v1.2.1