Preload pipeline data for project pipelines

When displaying the pipelines of a project we now preload the following data: 1. Authors of the commits that belong to these pipelines 2. The number of warnings per pipeline, which is used by Ci::Pipeline#has_warnings? == Commit Authors Previously this data was queried for every Commit separately, leading to 20 SQL queries being executed in the worst case. With an average of 3 to 5 milliseconds per SQL query this could result in 100 milliseconds being spent in _just_ getting Commit authors. To preload this data Commit#author now uses BatchLoader (through Commit#lazy_author), and a separate module Gitlab::Ci::Pipeline::Preloader is used to ensure all authors are loaded before they are used. == Number of warnings This changes Ci::Pipeline#has_warnings? so it supports preloading of the number of warnings per pipeline. This removes the need for executing a COUNT(*) query for every pipeline just to see if it has any warnings or not.
author: Yorick Peterse <yorickpeterse@gmail.com> 2018-05-07 18:22:07 +0200
committer: Yorick Peterse <yorickpeterse@gmail.com> 2018-05-17 13:53:00 +0200
commit: 19428e800895ba20eacb3357285acef8d69f6d8c (patch)
tree: 0f16e630b6a808b6013d463146c32a134d6ee9c0 /app/models
parent: 70985aa19b389c2ee8234edfbb516b5403a7bfcf (diff)
download: gitlab-ce-19428e800895ba20eacb3357285acef8d69f6d8c.tar.gz
2 files changed, 39 insertions, 2 deletions
diff --git a/app/models/ci/pipeline.rb b/app/models/ci/pipeline.rb
index 1f49764e7cc..c26f0b6dcdc 100644
--- a/app/models/ci/pipeline.rb
+++ b/app/models/ci/pipeline.rb
@@ -406,7 +406,18 @@ module Ci
     end
 
     def has_warnings?
-      builds.latest.failed_but_allowed.any?
+      number_of_warnings.positive?
+    end
+
+    def number_of_warnings
+      BatchLoader.for(id).batch(default_value: 0) do |pipeline_ids, loader|
+        Build.where(commit_id: pipeline_ids)
+          .latest
+          .failed_but_allowed
+          .group(:commit_id)
+          .count
+          .each { |id, amount| loader.call(id, amount) }
+      end
     end
 
     def set_config_source
diff --git a/app/models/commit.rb b/app/models/commit.rb
index b46f9f34689..56d4c86774e 100644
--- a/app/models/commit.rb
+++ b/app/models/commit.rb
@@ -224,8 +224,34 @@ class Commit
     Gitlab::ClosingIssueExtractor.new(project, current_user).closed_by_message(safe_message)
   end
 
+  def lazy_author
+    BatchLoader.for(author_email.downcase).batch do |emails, loader|
+      # A Hash that maps user Emails to the corresponding User objects. The
+      # Emails at this point are the _primary_ Emails of the Users.
+      users_for_emails = User
+        .by_any_email(emails)
+        .each_with_object({}) { |user, hash| hash[user.email] = user }
+
+      users_for_ids = users_for_emails
+        .values
+        .each_with_object({}) { |user, hash| hash[user.id] = user }
+
+      # Some commits may have used an alternative Email address. In this case we
+      # need to query the "emails" table to map those addresses to User objects.
+      Email
+        .where(email: emails - users_for_emails.keys)
+        .pluck(:email, :user_id)
+        .each { |(email, id)| users_for_emails[email] = users_for_ids[id] }
+
+      users_for_emails.each { |email, user| loader.call(email, user) }
+    end
+  end
+
   def author
-    User.find_by_any_email(author_email.downcase)
+    # We use __sync so that we get the actual objects back (including an actual
+    # nil), instead of a wrapper, as returning a wrapped nil breaks a lot of
+    # code.
+    lazy_author.__sync
   end
   request_cache(:author) { author_email.downcase }
author	Yorick Peterse <yorickpeterse@gmail.com>	2018-05-07 18:22:07 +0200
committer	Yorick Peterse <yorickpeterse@gmail.com>	2018-05-17 13:53:00 +0200
commit	19428e800895ba20eacb3357285acef8d69f6d8c (patch)
tree	0f16e630b6a808b6013d463146c32a134d6ee9c0 /app/models
parent	70985aa19b389c2ee8234edfbb516b5403a7bfcf (diff)
download	gitlab-ce-19428e800895ba20eacb3357285acef8d69f6d8c.tar.gz