doc/development/value_stream_analytics.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351

---
stage: Plan
group: Optimize
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Value stream analytics development guide

For information on how to configure value stream analytics (VSA) in GitLab, see our [analytics documentation](../user/analytics/value_stream_analytics.md).

## How does Value Stream Analytics work?

Value Stream Analytics calculates the duration between two timestamp columns or timestamp
expressions and runs various aggregations on the data.

For example:

- Duration between the Merge Request creation time and Merge Request merge time.
- Duration between the Issue creation time and Issue close time.

This duration is exposed in various ways:

- Aggregation: median, average
- Listing: list the duration for individual Merge Request and Issue records

Apart from the durations, we expose the record count within a stage.

## Feature availability

- Group level (licensed): Requires Ultimate or Premium subscription. This version is the most
feature-full.
- Project level (licensed): We are continually adding features to project level VSA to bring it in line with group level VSA.
- Project level (FOSS): Keep it as is.

|Feature|Group level (licensed)|Project level (licensed)|Project level (FOSS)|
|-|-|-|-|
|Create custom value streams|Yes|No, only one value stream (default) is present with the default stages|no, only one value stream (default) is present with the default stages|
|Create custom stages|Yes|No|No|
|Filtering (author, label, milestone, etc.)|Yes|Yes|Yes|
|Stage time chart|Yes|No|No|
|Total time chart|Yes|No|No|
|Task by type chart|Yes|No|No|
|DORA Metrics|Yes|Yes|No|
|Cycle time and lead time summary (Key metrics)|Yes|Yes|No|
|New issues, commits and deploys (Key metrics)|Yes, excluding commits|Yes|Yes|
|Uses aggregated backend|Yes|No|No|
|Date filter behavior|Filters items [finished within the date range](https://gitlab.com/groups/gitlab-org/-/epics/6046)|Filters items by creation date.|Filters items by creation date.|
|Authorization|At least reporter|At least reporter|Can be public.|

## VSA core domain objects

### Stages

A stage represents an event pair (start and end events) with additional metadata, such as the name
of the stage. Stages are configurable by the user within the pairing rules defined in the backend.

**Example stage: Code Review**

- Start event identifier: Merge request creation time.
- Start event column: uses the `merge_requests.created_at` timestamp column.
- End event identifier: Merge request merge time.
- End event column: uses the `merge_request_metrics.merged_at` timestamp column.
- Stage event hash ID: a calculated hash for the pair of start and end event identifiers.
  - If two stages have the same configuration of start and end events, then their stage event hash.
  IDs are identical.
  - The stage event hash ID is later used to store the aggregated data in partitioned database tables.

Historically, value stream analytics defined [7 stages](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/analytics/cycle_analytics/default_stages.rb)
which are always available to the end-users regardless of the subscription.

### Value streams

Value streams are container objects for the stages. There can be multiple value streams per group
focusing on different aspects of the DevOps lifecycle.

### Events

Events are the smallest building blocks of the value stream analytics feature. A stage consists of two events:

- Start event
- End event

These events play a key role in the duration calculation.

Formula: `duration = end_event_time - start_event_time`

To make the duration calculation flexible, each `Event` is implemented as a separate class.
They're responsible for defining a timestamp expression that is used in the calculation query.

#### Implementing an `Event` class

You must implement a few methods, as described in the `StageEvent` base class.
The most important methods are:

- `object_type`
- `timestamp_projection`

The `object_type` method defines which domain object is queried for the calculation. Currently two models are allowed:

- `Issue`
- `MergeRequest`

For the duration calculation the `timestamp_projection` method is used.

```ruby
def timestamp_projection
  # your timestamp expression comes here
end

# event will use the issue creation time in the duration calculation
def timestamp_projection
  Issue.arel_table[:created_at]
end
```

More complex expressions are also possible (for example, using `COALESCE`).
Review the existing event classes for examples.

In some cases, defining the `timestamp_projection` method is not enough. The calculation query should know which table contains the timestamp expression. Each `Event` class is responsible for making modifications to the calculation query to make the `timestamp_projection` work. This usually means joining an additional table.

Example for joining the `issue_metrics` table and using the `first_mentioned_in_commit_at` column as the timestamp expression:

```ruby
def object_type
  Issue
end

def timestamp_projection
  IssueMetrics.arel_table[:first_mentioned_in_commit_at]
end

def apply_query_customization(query)
  # in this case the query attribute will be based on the Issue model: `Issue.where(...)`
  query.joins(:metrics)
end
```

#### Validating start and end events

Some start/end event pairs are not "compatible" with each other. For example:

- "Issue created" to "Merge Request created": The event classes are defined on different domain models, the `object_type` method is different.
- "Issue closed" to "Issue created": Issue must be created first before it can be closed.
- "Issue closed" to "Issue closed": Duration is always 0.

The `StageEvents` module describes the allowed `start_event` and `end_event` pairings (`PAIRING_RULES` constant). If a new event is added, it needs to be registered in this module.
To add a new event:

1. Add an entry in `ENUM_MAPPING` with a unique number, which is used in the `Stage` model as `enum`.
1. Define which events are compatible with the event in the `PAIRING_RULES` hash.

Supported start/end event pairings:

```mermaid
graph LR;
  IssueCreated --> IssueClosed;
  IssueCreated --> IssueFirstAddedToBoard;
  IssueCreated --> IssueFirstAssociatedWithMilestone;
  IssueCreated --> IssueFirstMentionedInCommit;
  IssueCreated --> IssueLastEdited;
  IssueCreated --> IssueLabelAdded;
  IssueCreated --> IssueLabelRemoved;
  MergeRequestCreated --> MergeRequestMerged;
  MergeRequestCreated --> MergeRequestClosed;
  MergeRequestCreated --> MergeRequestFirstDeployedToProduction;
  MergeRequestCreated --> MergeRequestLastBuildStarted;
  MergeRequestCreated --> MergeRequestLastBuildFinished;
  MergeRequestCreated --> MergeRequestLastEdited;
  MergeRequestCreated --> MergeRequestLabelAdded;
  MergeRequestCreated --> MergeRequestLabelRemoved;
  MergeRequestLastBuildStarted --> MergeRequestLastBuildFinished;
  MergeRequestLastBuildStarted --> MergeRequestClosed;
  MergeRequestLastBuildStarted --> MergeRequestFirstDeployedToProduction;
  MergeRequestLastBuildStarted --> MergeRequestLastEdited;
  MergeRequestLastBuildStarted --> MergeRequestMerged;
  MergeRequestLastBuildStarted --> MergeRequestLabelAdded;
  MergeRequestLastBuildStarted --> MergeRequestLabelRemoved;
  MergeRequestMerged --> MergeRequestFirstDeployedToProduction;
  MergeRequestMerged --> MergeRequestClosed;
  MergeRequestMerged --> MergeRequestFirstDeployedToProduction;
  MergeRequestMerged --> MergeRequestLastEdited;
  MergeRequestMerged --> MergeRequestLabelAdded;
  MergeRequestMerged --> MergeRequestLabelRemoved;
  IssueLabelAdded --> IssueLabelAdded;
  IssueLabelAdded --> IssueLabelRemoved;
  IssueLabelAdded --> IssueClosed;
  IssueLabelRemoved --> IssueClosed;
  IssueFirstAddedToBoard --> IssueClosed;
  IssueFirstAddedToBoard --> IssueFirstAssociatedWithMilestone;
  IssueFirstAddedToBoard --> IssueFirstMentionedInCommit;
  IssueFirstAddedToBoard --> IssueLastEdited;
  IssueFirstAddedToBoard --> IssueLabelAdded;
  IssueFirstAddedToBoard --> IssueLabelRemoved;
  IssueFirstAssociatedWithMilestone --> IssueClosed;
  IssueFirstAssociatedWithMilestone --> IssueFirstAddedToBoard;
  IssueFirstAssociatedWithMilestone --> IssueFirstMentionedInCommit;
  IssueFirstAssociatedWithMilestone --> IssueLastEdited;
  IssueFirstAssociatedWithMilestone --> IssueLabelAdded;
  IssueFirstAssociatedWithMilestone --> IssueLabelRemoved;
  IssueFirstMentionedInCommit --> IssueClosed;
  IssueFirstMentionedInCommit --> IssueFirstAssociatedWithMilestone;
  IssueFirstMentionedInCommit --> IssueFirstAddedToBoard;
  IssueFirstMentionedInCommit --> IssueLastEdited;
  IssueFirstMentionedInCommit --> IssueLabelAdded;
  IssueFirstMentionedInCommit --> IssueLabelRemoved;
  IssueClosed --> IssueLastEdited;
  IssueClosed --> IssueLabelAdded;
  IssueClosed --> IssueLabelRemoved;
  MergeRequestClosed --> MergeRequestFirstDeployedToProduction;
  MergeRequestClosed --> MergeRequestLastEdited;
  MergeRequestClosed --> MergeRequestLabelAdded;
  MergeRequestClosed --> MergeRequestLabelRemoved;
  MergeRequestFirstDeployedToProduction --> MergeRequestLastEdited;
  MergeRequestFirstDeployedToProduction --> MergeRequestLabelAdded;
  MergeRequestFirstDeployedToProduction --> MergeRequestLabelRemoved;
  MergeRequestLastBuildFinished --> MergeRequestClosed;
  MergeRequestLastBuildFinished --> MergeRequestFirstDeployedToProduction;
  MergeRequestLastBuildFinished --> MergeRequestLastEdited;
  MergeRequestLastBuildFinished --> MergeRequestMerged;
  MergeRequestLastBuildFinished --> MergeRequestLabelAdded;
  MergeRequestLastBuildFinished --> MergeRequestLabelRemoved;
  MergeRequestLabelAdded --> MergeRequestLabelAdded;
  MergeRequestLabelAdded --> MergeRequestLabelRemoved;
  MergeRequestLabelRemoved --> MergeRequestLabelAdded;
  MergeRequestLabelRemoved --> MergeRequestLabelRemoved;
```

## Default stages

The [original implementation](https://gitlab.com/gitlab-org/gitlab/-/issues/847) of value stream analytics defined 7 stages. These stages are always available for each parent, however altering these stages is not possible.

To make things efficient and reduce the number of records created, the default stages are expressed as in-memory objects (not persisted). When the user creates a custom stage for the first time, all the stages are persisted. This behavior is implemented in the value stream analytics service objects.

The reason for this was that we'd like to add the abilities to hide and order stages later on.

## Data Collector

`DataCollector` is the central point where the data is queried from the database. The class always operates on a single stage and consists of the following components:

- `BaseQueryBuilder`:
  - Responsible for composing the initial query.
  - Deals with `Stage` specific configuration: events and their query customizations.
  - Parameters coming from the UI: date ranges.
- `Median`: Calculates the median duration for a stage using the query from  `BaseQueryBuilder`.
- `RecordsFetcher`: Loads relevant records for a stage using the query from  `BaseQueryBuilder` and specific `Finder` classes to apply visibility rules.
- `DataForDurationChart`: Loads calculated durations with the finish time (end event timestamp) for the scatterplot chart.

For a new calculation or a query, implement it as a new method call in the `DataCollector` class.

To support the aggregated value stream analytics backend, these classes were reimplemented within [`Aggregated`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/analytics/cycle_analytics/aggregated) namespace.

### Database query backend

VSA supports two backends: [aggregated](value_stream_analytics/value_stream_analytics_aggregated_backend.md) and "live". The live query backend can be
considered legacy, which will be phased out at some point.

- "live": uses the standard `IssuableFinders`.
- aggregated: queries data from pre-aggregated database tables.

## High-level overview

- Rails Controller (`Analytics::CycleAnalytics` module): Value stream analytics exposes its data via JSON endpoints, implemented within the `analytics` workspace. Configuring the stages are also implements JSON endpoints (CRUD).
- Services (`Analytics::CycleAnalytics` module): All `Stage` related actions are delegated to respective service objects.
- Models (`Analytics::CycleAnalytics` module): Models are used to persist the `Stage` objects `ProjectStage` and `GroupStage`.
- Feature classes (`Gitlab::Analytics::CycleAnalytics` module):
  - Responsible for composing queries and define feature specific business logic.
  - `DataCollector`, `Event`, `StageEvents`, etc.

## Frontend

[Project VSA](../user/analytics/value_stream_analytics.md) is available for all users and:

- Includes a mixture of key and DORA metrics based on the tier.
- Uses the set of [default stages](#default-stages).

[Group VSA](../user/group/value_stream_analytics/index.md) is only available for licensed users and extends project VSA to include:

- An [overview stage](https://gitlab.com/gitlab-org/gitlab/-/issues/321438).
- The ability to create custom value streams.

The group and project level VSA frontends are both built with Vue and Vuex and follow a similar pattern:

- The `index.js` file extracts any URL query parameters, creates the Vue app and Vuex store, and dispatches an `initialize` Vuex action.
- The `base.vue` file is used to render the main components for each page, metrics, filters, charts, and the stage table.

The group VSA Vuex store makes use of [Vuex modules](https://vuex.vuejs.org/guide/modules.html) to separate some of the state and logic used for rendering the charts.

### Shared components

Parts of the UI are shared between project VSA and group VSA such as the stage table and path. These shared components live in the project VSA directory `app/assets/javascripts/cycle_analytics/components` and are included at the group level VSA where needed.

All the frontend code for group-level features are located in `ee/app/assets/javascripts/analytics/cycle_analytics/components`.

## Testing

Since we have a lots of events and possible pairings, testing each pairing is not possible. The rule is to have at least one test case using an `Event` class.

Writing a test case for a stage using a new `Event` can be challenging since data must be created for both events. To make this a bit simpler, each test case must be implemented in the `data_collector_spec.rb` where the stage is tested through the `DataCollector`. Each test case is turned into multiple tests, covering the following cases:

- Different parents: `Group` or `Project`
- Different calculations: `Median`, `RecordsFetcher` or `DataForDurationChart`

The VSA frontend is tested extensively on two different levels (integration, unit):

- End-to-end integration tests using a real backend via Capybara and RSpec.
- Jest frontend tests with pre-generated data fixtures.

## Development setup and testing

Running Value Stream Analytics can be done via the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit). By default, you'll be able to view the project-level (FOSS) version of the feature.

If your GDK is up and running, you can run the seed script to generate some data:

```shell
SEED_CYCLE_ANALYTICS=true SEED_VSA=true FILTER=cycle_analytics rake db:seed_fu
```

The data generator script creates a new group and a new project with issue and merge request
data (see the output of the script). To view the group-level version of the feature, you
need to request a license for your GDK instance.

After this step, you can access the group level value stream analytics page where you can create
value streams and stages. The data aggregation might be delayed so you might not see the
data right after the stage creation. To speed up this process, you can run the following command
in your rails console (`rails c`):

```ruby
Analytics::CycleAnalytics::ReaggregationWorker.new.perform
```

### Seed data

#### Value stream analytics

Seed issues and merge requests for value stream analytics:

  ```shell
  // Seed 10 issues for the project specified by <project-id>
  $ VSA_SEED_PROJECT_ID=<project-id> VSA_ISSUE_COUNT=10 SEED_VSA=true FILTER=cycle_analytics rake db:seed_fu
  ```

#### DORA metrics

Seed DORA daily metrics for value stream, insights and CI/CD analytics:

1. [Create an environment from the UI](../ci/environments/index.md#create-a-static-environment) named `production`.
1. Open the rails console:

   ```shell
   rails c
   ```