summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLin Jen-Shin <godfat@godfat.org>2016-08-31 03:09:16 +0800
committerLin Jen-Shin <godfat@godfat.org>2016-08-31 03:09:16 +0800
commit1e49a8bc6cd593910577a0f09ea13f0c933de1e9 (patch)
tree537e76f3a08f9f4016c546e828b0a9473882c472
parent09a53359e4847dfd146141cc3672d48f6b1df233 (diff)
downloadgitlab-ce-1e49a8bc6cd593910577a0f09ea13f0c933de1e9.tar.gz
Introduction to PipelineDuration
-rw-r--r--lib/gitlab/ci/pipeline_duration.rb98
1 files changed, 98 insertions, 0 deletions
diff --git a/lib/gitlab/ci/pipeline_duration.rb b/lib/gitlab/ci/pipeline_duration.rb
index 55d870e2e9d..f97727e8548 100644
--- a/lib/gitlab/ci/pipeline_duration.rb
+++ b/lib/gitlab/ci/pipeline_duration.rb
@@ -1,5 +1,103 @@
module Gitlab
module Ci
+ # The problem this class is trying to solve is finding the total running
+ # time amongst all the jobs, excluding retries and pending (queue) time.
+ # We could reduce this problem down to finding the union of periods.
+ #
+ # So each job would be represented as a `Period`, which consists of
+ # `Period#first` and `Period#last`. A simple example here would be:
+ #
+ # * A (1, 3)
+ # * B (2, 4)
+ # * C (6, 7)
+ #
+ # Here A begins from 1, and ends to 3. B begins from 2, and ends to 4.
+ # C begins from 6, and ends to 7. Visually it could be viewed as:
+ #
+ # 0 1 2 3 4 5 6 7
+ # AAAAAAA
+ # BBBBBBB
+ # CCCC
+ #
+ # The union of A, B, and C would be (1, 4) and (6, 7), therefore the
+ # total running time should be:
+ #
+ # (4 - 1) + (7 - 6) => 4
+ #
+ # And the pending (queue) time would be (4, 6) like this: (marked as X)
+ #
+ # 0 1 2 3 4 5 6 7
+ # AAAAAAA
+ # BBBBBBB
+ # CCCC
+ # XXXXX
+ #
+ # Which could be calculated by having (1, 7) as total time, minus
+ # the running time we have above, 4. The full calculation would be:
+ #
+ # total = (7 - 1)
+ # duration = (4 - 1) + (7 - 6)
+ # pending = total - duration # 6 - 4 => 2
+ #
+ # Which the answer to pending would be 2 in this example.
+ #
+ # The algorithm used here for union would be described as follow.
+ # First we make sure that all periods are sorted by `Period#first`.
+ # Then we try to merge periods by iterating through the first period
+ # to the last period. The goal would be merging all overlapped periods
+ # so that in the end all the periods are discrete. When all periods
+ # are discrete, we're free to just sum all the periods to get real
+ # running time.
+ #
+ # Here we begin from A, and compare it to B. We could find that
+ # before A ends, B already started. That is `B.first <= A.last`
+ # that is `2 <= 3` which means A and B are overlapping!
+ #
+ # When we found that two periods are overlapping, we would need to merge
+ # them into a new period and disregard the old periods. To make a new
+ # period, we take `A.first` as the new first because remember? we sorted
+ # them, so `A.first` must be smaller or equal to `B.first`. And we take
+ # `[A.last, B.last].max` as the new last because we want whoever ended
+ # later. This could be broken into two cases:
+ #
+ # 0 1 2 3 4
+ # AAAAAAA
+ # BBBBBBB
+ #
+ # Or:
+ #
+ # 0 1 2 3 4
+ # AAAAAAAAAA
+ # BBBB
+ #
+ # So that we need to take whoever ends later. Back to our example,
+ # after merging and discard A and B it could be visually viewed as:
+ #
+ # 0 1 2 3 4 5 6 7
+ # DDDDDDDDDD
+ # CCCC
+ #
+ # Now we could go on and compare the newly created D and the old C.
+ # We could figure out that D and C are not overlapping by checking
+ # `C.first <= D.last` is `false`. Therefore we need to keep both C
+ # and D. The example would end here because there are no more jobs.
+ #
+ # After having the union of all periods, the rest is simple and
+ # described in the beginning. To summarise:
+ #
+ # duration = (4 - 1) + (7 - 6)
+ # total = (7 - 1)
+ # pending = total - duration # 6 - 4 => 2
+ #
+ # Note that the pending time is actually not the final pending time
+ # for pipelines, because we still need to accumulate the pending time
+ # before the first job (A in this example) even started! That is:
+ #
+ # total_pending = pipeline.started_at - pipeline.created_at + pending
+ #
+ # Would be the final answer. We deal with that in pipeline itself
+ # but not here because here we try not to be depending on pipeline
+ # and it's trivial enough to get that information.
class PipelineDuration
PeriodStruct = Struct.new(:first, :last)
class Period < PeriodStruct