summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/development/README.md1
-rw-r--r--doc/development/diffs.md115
2 files changed, 116 insertions, 0 deletions
diff --git a/doc/development/README.md b/doc/development/README.md
index 45e9565f9a7..c3ebc6e23bf 100644
--- a/doc/development/README.md
+++ b/doc/development/README.md
@@ -42,6 +42,7 @@ comments: false
- [Issue and merge requests state models](object_state_models.md)
- [How to dump production data to staging](db_dump.md)
- [Working with the GitHub importer](github_importer.md)
+- [Working with Merge Request diffs](diffs.md)
## Performance guides
diff --git a/doc/development/diffs.md b/doc/development/diffs.md
new file mode 100644
index 00000000000..55fc16e0b33
--- /dev/null
+++ b/doc/development/diffs.md
@@ -0,0 +1,115 @@
+# Working with Merge Request diffs
+
+Currently we rely on different sources to present merge request diffs, these include:
+
+- Rugged gem
+- Gitaly service
+- Database (through `merge_request_diff_files`)
+- Redis (cached highlighted diffs)
+
+We're constantly moving Rugged calls to Gitaly and the progress can be followed through [Gitaly repo](https://gitlab.com/gitlab-org/gitaly).
+
+## Architecture overview
+
+When refreshing a Merge Request (pushing to a source branch, force-pushing to target branch, or if the target branch now contains any commits from the MR)
+we fetch the comparison information using `Gitlab::Git::Compare`, which fetches `base` and `head` data using Gitaly and diff between them through
+`Gitlab::Git::Diff.between` (which uses _Gitaly_ if it's enabled, otherwise _Rugged_).
+The diffs fetching process _limits_ single file diff sizes and the overall size of the whole diff through a series of constant values. Raw diff files are
+then persisted on `merge_request_diff_files` table.
+
+Even though diffs higher than 10kb are collapsed (`Gitlab::Git::Diff::COLLAPSE_LIMIT`), we still keep them on Postgres. However, diff files over _safety limits_
+(see the [Diff limits section](#diff-limits)) are _not_ persisted.
+
+In order to present diffs information on the Merge Request diffs page, we:
+
+1. Fetch all diff files from database `merge_request_diff_files`
+2. Fetch the _old_ and _new_ file blobs in batch to:
+ 1. Highlight old and new file content
+ 2. Know which viewer it should use for each file (text, image, deleted, etc)
+ 3. Know if the file content changed
+ 4. Know if it was stored externally
+ 5. Know if it had storage errors
+3. If the diff file is cacheable (text-based), it's cached on Redis
+using `Gitlab::Diff::FileCollection::MergeRequestDiff`
+
+## Diff limits
+
+As explained above, we limit single diff files and the size of the whole diff. There are scenarios where we collapse the diff file,
+and cases where the diff file is not presented at all, and the user is guided to the Blob view. Here we'll go into details about
+these limits.
+
+### Diff collection limits
+
+Limits that act onto all diff files collection. Files number, lines number and files size are considered.
+
+```ruby
+Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_files] = 100
+```
+
+File diffs will be collapsed (but be expandable) if 100 files have already been rendered.
+
+
+```ruby
+Gitlab::Git::DiffCollection.collection_limits[:safe_max_lines] = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000
+```
+
+File diffs will be collapsed (but be expandable) if 5000 lines have already been rendered.
+
+
+```ruby
+Gitlab::Git::DiffCollection.collection_limits[:safe_max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:safe_max_files] * 5.kilobytes = 500.kilobytes
+```
+
+File diffs will be collapsed (but be expandable) if 500 kilobytes have already been rendered.
+
+
+```ruby
+Gitlab::Git::DiffCollection.collection_limits[:max_files] = Commit::DIFF_HARD_LIMIT_FILES = 1000
+```
+
+No more files will be rendered at all if 1000 files have already been rendered.
+
+
+```ruby
+Gitlab::Git::DiffCollection.collection_limits[:max_lines] = Commit::DIFF_HARD_LIMIT_LINES = 50000
+```
+
+No more files will be rendered at all if 50,000 lines have already been rendered.
+
+```ruby
+Gitlab::Git::DiffCollection.collection_limits[:max_bytes] = Gitlab::Git::DiffCollection.collection_limits[:max_files] * 5.kilobytes = 5000.kilobytes
+```
+
+No more files will be rendered at all if 5 megabytes have already been rendered.
+
+
+### Individual diff file limits
+
+Limits that act onto each diff file of a collection. Files number, lines number and files size are considered.
+
+```ruby
+Gitlab::Git::Diff::COLLAPSE_LIMIT = 10.kilobytes
+```
+
+File diff will be collapsed (but be expandable) if it is larger than 10 kilobytes.
+
+```ruby
+Gitlab::Git::Diff::SIZE_LIMIT = 100.kilobytes
+```
+
+File diff will not be rendered if it's larger than 100 kilobytes.
+
+
+```ruby
+Commit::DIFF_SAFE_LINES = Gitlab::Git::DiffCollection::DEFAULT_LIMITS[:max_lines] = 5000
+```
+
+File diff will be suppressed (technically different from collapsed, but behaves the same, and is expandable) if it has more than 5000 lines.
+
+## Viewers
+
+Diff Viewers, which can be found on `models/diff_viewer/*` are classes used to map metadata about each type of Diff File. It has information
+whether it's a binary, which partial should be used to render it or which File extensions this class accounts for.
+
+`DiffViewer::Base` validates _blobs_ (old and new versions) content, extension and file type in order to check if it can be rendered.
+