diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2020-02-07 21:08:39 +0000 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2020-02-07 21:08:39 +0000 |
commit | 0c6bc5443aa6c8f3e4becccb89fc0f135b4c64c8 (patch) | |
tree | 55f13e752e9061c1800cce510a52fc78b13282ca /doc/development/value_stream_analytics.md | |
parent | d7ce7307dca551759ffa972015875f8ebe476927 (diff) | |
download | gitlab-ce-0c6bc5443aa6c8f3e4becccb89fc0f135b4c64c8.tar.gz |
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development/value_stream_analytics.md')
-rw-r--r-- | doc/development/value_stream_analytics.md | 248 |
1 files changed, 248 insertions, 0 deletions
diff --git a/doc/development/value_stream_analytics.md b/doc/development/value_stream_analytics.md new file mode 100644 index 00000000000..5c6d3f18d9a --- /dev/null +++ b/doc/development/value_stream_analytics.md @@ -0,0 +1,248 @@ +# Value Stream Analytics development guide + +Value stream analytics calculates the time between two arbitrary events recorded on domain objects and provides aggregated statistics about the duration. + +For information on how to configure Value Stream Analytics in GitLab, see our [analytics documentation](../user/analytics/value_stream_analytics.md). + +## Stage + +During development, events occur that move issues and merge requests through different stages of progress until they are considered finished. These stages can be expressed with the `Stage` model. + +Example stage: + +- Name: Development +- Start event: Issue created +- End event: Issue first mentioned in commit +- Parent: `Group: gitlab-org` + +### Events + +Events are the smallest building blocks of the value stream analytics feature. A stage consists of two events: + +- Start +- End + +These events play a key role in the duration calculation. + +Formula: `duration = end_event_time - start_event_time` + +To make the duration calculation flexible, each `Event` is implemented as a separate class. They're responsible for defining a timestamp expression that will be used in the calculation query. + +#### Implementing an `Event` class + +There are a few methods that are required to be implemented, the `StageEvent` base class describes them in great detail. The most important ones are: + +- `object_type` +- `timestamp_projection` + +The `object_type` method defines which domain object will be queried for the calculation. Currently two models are allowed: + +- `Issue` +- `MergeRequest` + +For the duration calculation the `timestamp_projection` method will be used. + +```ruby +def timestamp_projection + # your timestamp expression comes here +end + +# event will use the issue creation time in the duration calculation +def timestamp_projection + Issue.arel_table[:created_at] +end +``` + +NOTE: **Note:** +More complex expressions are also possible (e.g. using `COALESCE`). Look at the existing event classes for examples. + +In some cases, defining the `timestamp_projection` method is not enough. The calculation query should know which table contains the timestamp expression. Each `Event` class is responsible for making modifications to the calculation query to make the `timestamp_projection` work. This usually means joining an additional table. + +Example for joining the `issue_metrics` table and using the `first_mentioned_in_commit_at` column as the timestamp expression: + +```ruby +def object_type + Issue +end + +def timestamp_projection + IssueMetrics.arel_table[:first_mentioned_in_commit_at] +end + +def apply_query_customization(query) + # in this case the query attribute will be based on the Issue model: `Issue.where(...)` + query.joins(:metrics) +end +``` + +### Validating start and end events + +Some start/end event pairs are not "compatible" with each other. For example: + +- "Issue created" to "Merge Request created": The event classes are defined on different domain models, the `object_type` method is different. +- "Issue closed" to "Issue created": Issue must be created first before it can be closed. +- "Issue closed" to "Issue closed": Duration is always 0. + +The `StageEvents` module describes the allowed `start_event` and `end_event` pairings (`PAIRING_RULES` constant). If a new event is added, it needs to be registered in this module. +βTo add a new event:β + +1. Add an entry in `ENUM_MAPPING` with a unique number, it'll be used in the `Stage` model as `enum`. +1. Define which events are compatible with the event in the `PAIRING_RULES` hash. + +Supported start/end event pairings: + +```mermaid +graph LR; + IssueCreated --> IssueClosed; + IssueCreated --> IssueFirstAddedToBoard; + IssueCreated --> IssueFirstAssociatedWithMilestone; + IssueCreated --> IssueFirstMentionedInCommit; + IssueCreated --> IssueLastEdited; + IssueCreated --> IssueLabelAdded; + IssueCreated --> IssueLabelRemoved; + MergeRequestCreated --> MergeRequestMerged; + MergeRequestCreated --> MergeRequestClosed; + MergeRequestCreated --> MergeRequestFirstDeployedToProduction; + MergeRequestCreated --> MergeRequestLastBuildStarted; + MergeRequestCreated --> MergeRequestLastBuildFinished; + MergeRequestCreated --> MergeRequestLastEdited; + MergeRequestCreated --> MergeRequestLabelAdded; + MergeRequestCreated --> MergeRequestLabelRemoved; + MergeRequestLastBuildStarted --> MergeRequestLastBuildFinished; + MergeRequestLastBuildStarted --> MergeRequestClosed; + MergeRequestLastBuildStarted --> MergeRequestFirstDeployedToProduction; + MergeRequestLastBuildStarted --> MergeRequestLastEdited; + MergeRequestLastBuildStarted --> MergeRequestMerged; + MergeRequestLastBuildStarted --> MergeRequestLabelAdded; + MergeRequestLastBuildStarted --> MergeRequestLabelRemoved; + MergeRequestMerged --> MergeRequestFirstDeployedToProduction; + MergeRequestMerged --> MergeRequestClosed; + MergeRequestMerged --> MergeRequestFirstDeployedToProduction; + MergeRequestMerged --> MergeRequestLastEdited; + MergeRequestMerged --> MergeRequestLabelAdded; + MergeRequestMerged --> MergeRequestLabelRemoved; + IssueLabelAdded --> IssueLabelAdded; + IssueLabelAdded --> IssueLabelRemoved; + IssueLabelAdded --> IssueClosed; + IssueLabelRemoved --> IssueClosed; + IssueFirstAddedToBoard --> IssueClosed; + IssueFirstAddedToBoard --> IssueFirstAssociatedWithMilestone; + IssueFirstAddedToBoard --> IssueFirstMentionedInCommit; + IssueFirstAddedToBoard --> IssueLastEdited; + IssueFirstAddedToBoard --> IssueLabelAdded; + IssueFirstAddedToBoard --> IssueLabelRemoved; + IssueFirstAssociatedWithMilestone --> IssueClosed; + IssueFirstAssociatedWithMilestone --> IssueFirstAddedToBoard; + IssueFirstAssociatedWithMilestone --> IssueFirstMentionedInCommit; + IssueFirstAssociatedWithMilestone --> IssueLastEdited; + IssueFirstAssociatedWithMilestone --> IssueLabelAdded; + IssueFirstAssociatedWithMilestone --> IssueLabelRemoved; + IssueFirstMentionedInCommit --> IssueClosed; + IssueFirstMentionedInCommit --> IssueFirstAssociatedWithMilestone; + IssueFirstMentionedInCommit --> IssueFirstAddedToBoard; + IssueFirstMentionedInCommit --> IssueLastEdited; + IssueFirstMentionedInCommit --> IssueLabelAdded; + IssueFirstMentionedInCommit --> IssueLabelRemoved; + IssueClosed --> IssueLastEdited; + IssueClosed --> IssueLabelAdded; + IssueClosed --> IssueLabelRemoved; + MergeRequestClosed --> MergeRequestFirstDeployedToProduction; + MergeRequestClosed --> MergeRequestLastEdited; + MergeRequestClosed --> MergeRequestLabelAdded; + MergeRequestClosed --> MergeRequestLabelRemoved; + MergeRequestFirstDeployedToProduction --> MergeRequestLastEdited; + MergeRequestFirstDeployedToProduction --> MergeRequestLabelAdded; + MergeRequestFirstDeployedToProduction --> MergeRequestLabelRemoved; + MergeRequestLastBuildFinished --> MergeRequestClosed; + MergeRequestLastBuildFinished --> MergeRequestFirstDeployedToProduction; + MergeRequestLastBuildFinished --> MergeRequestLastEdited; + MergeRequestLastBuildFinished --> MergeRequestMerged; + MergeRequestLastBuildFinished --> MergeRequestLabelAdded; + MergeRequestLastBuildFinished --> MergeRequestLabelRemoved; + MergeRequestLabelAdded --> MergeRequestLabelAdded; + MergeRequestLabelAdded --> MergeRequestLabelRemoved; + MergeRequestLabelRemoved --> MergeRequestLabelAdded; + MergeRequestLabelRemoved --> MergeRequestLabelRemoved; +``` + +### Parent + +Teams and organizations might define their own way of building software, thus stages can be completely different. For each stage, a parent object needs to be defined. + +Currently supported parents: + +- `Project` +- `Group` + +#### How parent relationship it work + +1. User navigates to the value stream analytics page. +1. User selects a group. +1. Backend loads the defined stages for the selected group. +1. Additions and modifications to the stages will be persisted within the selected group only. + +### Default stages + +The [original implementation](https://gitlab.com/gitlab-org/gitlab/issues/847) of value stream analytics defined 7 stages. These stages are always available for each parent, however altering these stages is not possible. +β +To make things efficient and reduce the number of records created, the default stages are expressed as in-memory objects (not persisted). When the user creates a custom stage for the first time, all the stages will be persisted. This behaviour is implemented in the value stream analytics service objects. +β +The reason for this was that we'd like to add the abilities to hide and order stages later on. + +## Data Collector + +`DataCollector` is the central point where the data will be queried from the database. The class always operates on a single stage and consists of the following components: + +- `BaseQueryBuilder`: + - Responsible for composing the initial query. + - Deals with `Stage` specific configuration: events and their query customizations. + - Parameters coming from the UI: date ranges. +- `Median`: Calculates the median duration for a stage using the query from `BaseQueryBuilder`. +- `RecordsFetcher`: Loads relevant records for a stage using the query from `BaseQueryBuilder` and specific `Finder` classes to apply visibility rules. +- `DataForDurationChart`: Loads calculated durations with the finish time (end event timestamp) for the scatterplot chart. + +For a new calculation or a query, implement it as a new method call in the `DataCollector` class. + +## Database query + +Structure of the database query: + +```sql +SELECT (customized by: Median or RecordsFetcher or DataForDurationChart) +FROM OBJECT_TYPE (Issue or MergeRequest) +INNER JOIN (several JOIN statements, depending on the events) +WHERE + (Filter by the PARENT model, example: filter Issues from Project A) + (Date range filter based on the OBJECT_TYPE.created_at) + (Check if the START_EVENT is earlier than END_EVENT, preventing negative duration) +``` + +Structure of the `SELECT` statement for `Median`: + +```sql +SELECT (calculate median from START_EVENT_TIME-END_EVENT_TIME) +``` + +Structure of the `SELECT` statement for `DataForDurationChart`: + +```sql +SELECT (START_EVENT_TIME-END_EVENT_TIME) as duration, END_EVENT.timestamp +``` + +## High-level overview + +- Rails Controller (`Analytics::CycleAnalytics` module): Value stream analytics exposes its data via JSON endpoints, implemented within the `analytics` workspace. Configuring the stages are also implements JSON endpoints (CRUD). +- Services (`Analytics::CycleAnalytics` module): All `Stage` related actions will be delegated to respective service objects. +- Models (`Analytics::CycleAnalytics` module): Models are used to persist the `Stage` objects `ProjectStage` and `GroupStage`. +- Feature classes (`Gitlab::Analytics::CycleAnalytics` module): + - Responsible for composing queries and define feature specific busines logic. + - `DataCollector`, `Event`, `StageEvents`, etc. + +## Testing + +Since we have a lots of events and possible pairings, testing each pairing is not possible. The rule is to have at least one test case using an `Event` class. + +Writing a test case for a stage using a new `Event` can be challenging since data must be created for both events. To make this a bit simpler, each test case must be implemented in the `data_collector_spec.rb` where the stage is tested through the `DataCollector`. Each test case will be turned into multiple tests, covering the following cases: + +- Different parents: `Group` or `Project` +- Different calculations: `Median`, `RecordsFetcher` or `DataForDurationChart` |