summaryrefslogtreecommitdiff
path: root/doc/development/experiment_guide/gitlab_experiment.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/experiment_guide/gitlab_experiment.md')
-rw-r--r--doc/development/experiment_guide/gitlab_experiment.md547
1 files changed, 547 insertions, 0 deletions
diff --git a/doc/development/experiment_guide/gitlab_experiment.md b/doc/development/experiment_guide/gitlab_experiment.md
new file mode 100644
index 00000000000..6b15449b812
--- /dev/null
+++ b/doc/development/experiment_guide/gitlab_experiment.md
@@ -0,0 +1,547 @@
+---
+stage: Growth
+group: Adoption
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Implementing an A/B/n experiment using GLEX
+
+## Introduction
+
+`Gitlab::Experiment` (GLEX) is tightly coupled with the concepts provided by
+[Feature flags in development of GitLab](../feature_flags/index.md). Here, we refer
+to this layer as feature flags, and may also use the term Flipper, because we
+built our development and experiment feature flags atop it.
+
+You're strongly encouraged to read and understand the
+[Feature flags in development of GitLab](../feature_flags/index.md) portion of the
+documentation before considering running experiments. Experiments add additional
+concepts which may seem confusing or advanced without understanding the underpinnings
+of how GitLab uses feature flags in development. One concept: GLEX supports multivariate
+experiments, which are sometimes referred to as A/B/n tests.
+
+The [`gitlab-experiment` project](https://gitlab.com/gitlab-org/gitlab-experiment)
+exists in a separate repository, so it can be shared across any GitLab property that uses
+Ruby. You should feel comfortable reading the documentation on that project as well
+if you want to dig into more advanced topics.
+
+## Glossary of terms
+
+To ensure a shared language, you should understand these fundamental terms we use
+when communicating about experiments:
+
+- `experiment`: Any deviation of code paths we want to run at some times, but not others.
+- `context`: A consistent experience we provide in an experiment.
+- `control`: The default, or "original" code path.
+- `candidate`: Defines an experiment with only one code path.
+- `variant(s)`: Defines an experiment with multiple code paths.
+
+### How it works
+
+Use this decision tree diagram to understand how GLEX works. When an experiment runs,
+the following logic is executed to determine what variant should be provided,
+given how the experiment has been defined and using the provided context:
+
+```mermaid
+graph TD
+ GP[General Pool/Population] --> Running?
+ Running? -->|Yes| Cached?[Cached? / Pre-segmented?]
+ Running? -->|No| Excluded[Control / No Tracking]
+ Cached? -->|No| Excluded?
+ Cached? -->|Yes| Cached[Cached Value]
+ Excluded? -->|Yes / Cached| Excluded
+ Excluded? -->|No| Segmented?
+ Segmented? -->|Yes / Cached| VariantA
+ Segmented? -->|No| Included?[Experiment Group?]
+ Included? -->|Yes| Rollout
+ Included? -->|No| Control
+ Rollout -->|Cached| VariantA
+ Rollout -->|Cached| VariantB
+ Rollout -->|Cached| VariantC
+
+classDef included fill:#380d75,color:#ffffff,stroke:none
+classDef excluded fill:#fca121,stroke:none
+classDef cached fill:#2e2e2e,color:#ffffff,stroke:none
+classDef default fill:#fff,stroke:#6e49cb
+
+class VariantA,VariantB,VariantC included
+class Control,Excluded excluded
+class Cached cached
+```
+
+## Implement an experiment
+
+Start by generating a feature flag using the `bin/feature-flag` command as you
+normally would for a development feature flag, making sure to use `experiment` for
+the type. For the sake of documentation let's name our feature flag (and experiment)
+"pill_color".
+
+```shell
+bin/feature-flag pill_color -t experiment
+```
+
+After you generate the desired feature flag, you can immediately implement an
+experiment in code. An experiment implementation can be as simple as:
+
+```ruby
+experiment(:pill_color, actor: current_user) do |e|
+ e.use { 'control' }
+ e.try(:red) { 'red' }
+ e.try(:blue) { 'blue' }
+end
+```
+
+When this code executes, the experiment is run, a variant is assigned, and (if within a
+controller or view) a `window.gon.experiment.pillColor` object will be available in the
+client layer, with details like:
+
+- The assigned variant.
+- The context key for client tracking events.
+
+In addition, when an experiment runs, an event is tracked for
+the experiment `:assignment`. We cover more about events, tracking, and
+the client layer later.
+
+In local development, you can make the experiment active by using the feature flag
+interface. You can also target specific cases by providing the relevant experiment
+to the call to enable the feature flag:
+
+```ruby
+# Enable for everyone
+Feature.enable(:pill_color)
+
+# Get the `experiment` method -- already available in controllers, views, and mailers.
+include Gitlab::Experiment::Dsl
+# Enable for only the first user
+Feature.enable(:pill_color, experiment(:pill_color, actor: User.first))
+```
+
+To roll out your experiment feature flag on an environment, run
+the following command using ChatOps (which is covered in more depth in the
+[Feature flags in development of GitLab](../feature_flags/index.md) documentation).
+This command creates a scenario where half of everyone who encounters
+the experiment would be assigned the _control_, 25% would be assigned the _red_
+variant, and 25% would be assigned the _blue_ variant:
+
+```slack
+/chatops run feature set pill_color 50 --actors
+```
+
+For an even distribution in this example, change the command to set it to 66% instead
+of 50.
+
+NOTE:
+To immediately stop running an experiment, use the
+`/chatops run feature set pill_color false` command.
+
+WARNING:
+We strongly recommend using the `--actors` flag when using the ChatOps commands,
+as anything else may give odd behaviors due to how the caching of variant assignment is
+handled.
+
+We can also implement this experiment in a HAML file with HTML wrappings:
+
+```haml
+#cta-interface
+ - experiment(:pill_color, actor: current_user) do |e|
+ - e.use do
+ .pill-button control
+ - e.try(:red) do
+ .pill-button.red red
+ - e.try(:blue) do
+ .pill-button.blue blue
+```
+
+### The importance of context
+
+In our previous example experiment, our context (this is an important term) is a hash
+that's set to `{ actor: current_user }`. Context must be unique based on how you
+want to run your experiment, and should be understood at a lower level.
+
+It's expected, and recommended, that you use some of these
+contexts to simplify reporting:
+
+- `{ actor: current_user }`: Assigns a variant and is "sticky" to each user
+ (or "client" if `current_user` is nil) who enters the experiment.
+- `{ project: project }`: Assigns a variant and is "sticky" to the project currently
+ being viewed. If running your experiment is more useful when viewing a project,
+ rather than when a specific user is viewing any project, consider this approach.
+- `{ group: group }`: Similar to the project example, but applies to a wider
+ scope of projects and users.
+- `{ actor: current_user, project: project }`: Assigns a variant and is "sticky"
+ to the user who is viewing the given project. This creates a different variant
+ assignment possibility for every project that `current_user` views. Understand this
+ can create a large cache size if an experiment like this in a highly trafficked part
+ of the application.
+- `{ wday: Time.current.wday }`: Assigns a variant based on the current day of the
+ week. In this example, it would consistently assign one variant on Friday, and a
+ potentially different variant on Saturday.
+
+Context is critical to how you define and report on your experiment. It's usually
+the most important aspect of how you choose to implement your experiment, so consider
+it carefully, and discuss it with the wider team if needed. Also, take into account
+that the context you choose affects our cache size.
+
+After the above examples, we can state the general case: *given a specific
+and consistent context, we can provide a consistent experience and track events for
+that experience.* To dive a bit deeper into the implementation details: a context key
+is generated from the context that's provided. Use this context key to:
+
+- Determine the assigned variant.
+- Identify events tracked against that context key.
+
+We can think about this as the experience that we've rendered, which is both dictated
+and tracked by the context key. The context key is used to track the interaction and
+results of the experience we've rendered to that context key. These concepts are
+somewhat abstract and hard to understand initially, but this approach enables us to
+communicate about experiments as something that's wider than just user behavior.
+
+NOTE:
+Using `actor:` utilizes cookies if the `current_user` is nil. If you don't need
+cookies though - meaning that the exposed functionality would only be visible to
+signed in users - `{ user: current_user }` would be just as effective.
+
+WARNING:
+The caching of variant assignment is done by using this context, and so consider
+your impact on the cache size when defining your experiment. If you use
+`{ time: Time.current }` you would be inflating the cache size every time the
+experiment is run. Not only that, your experiment would not be "sticky" and events
+wouldn't be resolvable.
+
+### Advanced experimentation
+
+GLEX allows for two general implementation styles:
+
+1. The simple experiment style described previously.
+1. A more advanced style where an experiment class can be provided.
+
+The advanced style is handled by naming convention, and works similar to what you
+would expect in Rails.
+
+To generate a custom experiment class that can override the defaults in
+`ApplicationExperiment` (our base GLEX implementation), use the rails generator:
+
+```shell
+rails generate gitlab:experiment pill_color control red blue
+```
+
+This generates an experiment class in `app/experiments/pill_color_experiment.rb`
+with the variants (or _behaviors_) we've provided to the generator. Here's an example
+of how that class would look after migrating the previous example into it:
+
+```ruby
+class PillColorExperiment < ApplicationExperiment
+ def control_behavior
+ 'control'
+ end
+
+ def red_behavior
+ 'red'
+ end
+
+ def blue_behavior
+ 'blue'
+ end
+end
+```
+
+We can now simplify where we run our experiment to the following call, instead of
+providing the block we were initially providing, by explicitly calling `run`:
+
+```ruby
+experiment(:pill_color, actor: current_user).run
+```
+
+The _behavior_ methods we defined in our experiment class represent the default
+implementation. You can still use the block syntax to override these _behavior_
+methods however, so the following would also be valid:
+
+```ruby
+experiment(:pill_color, actor: current_user) do |e|
+ e.use { '<strong>control</strong>' }
+end
+```
+
+NOTE:
+When passing a block to the `experiment` method, it is implicitly invoked as
+if `run` has been called.
+
+#### Segmentation rules
+
+You can use runtime segmentation rules to, for instance, segment contexts into a specific
+variant. The `segment` method is a callback (like `before_action`) and so allows providing
+a block or method name.
+
+In this example, any user named `'Richard'` would always be assigned the _red_
+variant, and any account older than 2 weeks old would be assigned the _blue_ variant:
+
+```ruby
+class PillColorExperiment < ApplicationExperiment
+ segment(variant: :red) { context.actor.first_name == 'Richard' }
+ segment :old_account?, variant: :blue
+
+ # ...behaviors
+
+ private
+
+ def old_account?
+ context.actor.created_at < 2.weeks.ago
+ end
+end
+```
+
+When an experiment runs, the segmentation rules are executed in the order they're
+defined. The first segmentation rule to produce a truthy result assigns the variant.
+
+In our example, any user named `'Richard'`, regardless of account age, will always
+be assigned the _red_ variant. If you want the opposite logic, flip the order.
+
+NOTE:
+Keep in mind when defining segmentation rules: after a truthy result, the remaining
+segmentation rules are skipped to achieve optimal performance.
+
+#### Exclusion rules
+
+Exclusion rules are similar to segmentation rules, but are intended to determine
+if a context should even be considered as something we should include in the experiment
+and track events toward. Exclusion means we don't care about the events in relation
+to the given context.
+
+These examples exclude all users named `'Richard'`, *and* any account
+older than 2 weeks old. Not only are they given the control behavior - which could
+be nothing - but no events are tracked in these cases as well.
+
+```ruby
+class PillColorExperiment < ApplicationExperiment
+ exclude :old_account?, ->{ context.actor.first_name == 'Richard' }
+
+ # ...behaviors
+
+ private
+
+ def old_account?
+ context.actor.created_at < 2.weeks.ago
+ end
+end
+```
+
+We can also do exclusion when we run the experiment. For instance,
+if we wanted to prevent the inclusion of non-administrators in an experiment, consider
+the following experiment. This type of logic enables us to do complex experiments
+while preventing us from passing things into our experiments, because
+we want to minimize passing things into our experiments:
+
+```ruby
+experiment(:pill_color, actor: current_user) do |e|
+ e.exclude! unless can?(current_user, :admin_project, project)
+end
+```
+
+You may also need to check exclusion in custom tracking logic by calling `should_track?`:
+
+```ruby
+class PillColorExperiment < ApplicationExperiment
+ # ...behaviors
+
+ def expensive_tracking_logic
+ return unless should_track?
+
+ track(:my_event, value: expensive_method_call)
+ end
+end
+```
+
+Exclusion rules aren't the best way to determine if an experiment is active. Override
+the `enabled?` method for a high-level way of determining if an experiment should
+run and track. Make the `enabled?` check as efficient as possible because it's the
+first early opt-out path an experiment can implement.
+
+### Tracking events
+
+One of the most important aspects of experiments is gathering data and reporting on
+it. GLEX provides an interface that allows tracking events across an experiment.
+You can implement it consistently if you provide the same context between
+calls to your experiment. If you do not yet understand context, you should read
+about contexts now.
+
+We can assume we run the experiment in one or a few places, but
+track events potentially in many places. The tracking call remains the same, with
+the arguments you would normally use when
+[tracking events using snowplow](../snowplow.md). The easiest example
+of tracking an event in Ruby would be:
+
+```ruby
+experiment(:pill_color, actor: current_user).track(:created)
+```
+
+When you run an experiment with any of these examples, an `:assigned` event
+is tracked automatically by default. All events that are tracked from an
+experiment have a special
+[experiment context](https://gitlab.com/gitlab-org/iglu/-/blob/master/public/schemas/com.gitlab/gitlab_experiment/jsonschema/1-0-0)
+added to the event. This can be used - typically by the data team - to create a connection
+between the events on a given experiment.
+
+If our current user hasn't encountered the experiment yet (meaning where the experiment
+is run), and we track an event for them, they are assigned a variant and see
+that variant if they ever encountered the experiment later, when an `:assignment`
+event would be tracked at that time for them.
+
+NOTE:
+GitLab tries to be sensitive and respectful of our customers regarding tracking,
+so GLEX allows us to implement an experiment without ever tracking identifying
+IDs. It's not always possible, though, based on experiment reporting requirements.
+You may be asked from time to time to track a specific record ID in experiments.
+The approach is largely up to the PM and engineer creating the implementation.
+No recommendations are provided here at this time.
+
+## Test with RSpec
+
+This gem provides some RSpec helpers and custom matchers. These are in flux as of GitLab 13.10.
+
+First, require the RSpec support file to mix in some of the basics:
+
+```ruby
+require 'gitlab/experiment/rspec'
+```
+
+You still need to include matchers and other aspects, which happens
+automatically for files in `spec/experiments`, but for other files and specs
+you want to include it in, you can specify the `:experiment` type:
+
+```ruby
+it "tests", :experiment do
+end
+```
+
+### Stub helpers
+
+You can stub experiments using `stub_experiments`. Pass it a hash using experiment
+names as the keys, and the variants you want each to resolve to, as the values:
+
+```ruby
+# Ensures the experiments named `:example` & `:example2` are both
+# "enabled" and that each will resolve to the given variant
+# (`:my_variant` & `:control` respectively).
+stub_experiments(example: :my_variant, example2: :control)
+
+experiment(:example) do |e|
+ e.enabled? # => true
+ e.variant.name # => 'my_variant'
+end
+
+experiment(:example2) do |e|
+ e.enabled? # => true
+ e.variant.name # => 'control'
+end
+```
+
+### Exclusion and segmentation matchers
+
+You can also test the exclusion and segmentation matchers.
+
+```ruby
+class ExampleExperiment < ApplicationExperiment
+ exclude { context.actor.first_name == 'Richard' }
+ segment(variant: :candidate) { context.actor.username == 'jejacks0n' }
+end
+
+excluded = double(username: 'rdiggitty', first_name: 'Richard')
+segmented = double(username: 'jejacks0n', first_name: 'Jeremy')
+
+# exclude matcher
+expect(experiment(:example)).to exclude(actor: excluded)
+expect(experiment(:example)).not_to exclude(actor: segmented)
+
+# segment matcher
+expect(experiment(:example)).to segment(actor: segmented).into(:candidate)
+expect(experiment(:example)).not_to segment(actor: excluded)
+```
+
+### Tracking matcher
+
+Tracking events is a major aspect of experimentation. We try
+to provide a flexible way to ensure your tracking calls are covered.
+
+You can do this on the instance level or at an "any instance" level:
+
+```ruby
+subject = experiment(:example)
+
+expect(subject).to track(:my_event)
+
+subject.track(:my_event)
+```
+
+You can use the `on_any_instance` chain method to specify that it could happen on
+any instance of the experiment. This helps you if you're calling
+`experiment(:example).track` downstream:
+
+```ruby
+expect(experiment(:example)).to track(:my_event).on_any_instance
+
+experiment(:example).track(:my_event)
+```
+
+A full example of the methods you can chain onto the `track` matcher:
+
+```ruby
+expect(experiment(:example)).to track(:my_event, value: 1, property: '_property_')
+ .on_any_instance
+ .with_context(foo: :bar)
+ .for(:variant_name)
+
+experiment(:example, :variant_name, foo: :bar).track(:my_event, value: 1, property: '_property_')
+```
+
+## Experiments in the client layer
+
+This is in flux as of GitLab 13.10, and can't be documented just yet.
+
+Any experiment that's been run in the request lifecycle surfaces in `window.gon.experiment`,
+and matches [this schema](https://gitlab.com/gitlab-org/iglu/-/blob/master/public/schemas/com.gitlab/gitlab_experiment/jsonschema/1-0-0)
+so you can use it when resolving some concepts around experimentation in the client layer.
+
+## Notes on feature flags
+
+NOTE:
+We use the terms "enabled" and "disabled" here, even though it's against our
+[documentation style guide recommendations](../documentation/styleguide/index.md#avoid-ableist-language)
+because these are the terms that the feature flag documentation uses.
+
+You may already be familiar with the concept of feature flags in GitLab, but using
+feature flags in experiments is a bit different. While in general terms, a feature flag
+is viewed as being either `on` or `off`, this isn't accurate for experiments.
+
+Generally, `off` means that when we ask if a feature flag is enabled, it will always
+return `false`, and `on` means that it will always return `true`. An interim state,
+considered `conditional`, also exists. GLEX takes advantage of this trinary state of
+feature flags. To understand this `conditional` aspect: consider that either of these
+settings puts a feature flag into this state:
+
+- Setting a `percentage_of_actors` of any percent greater than 0%.
+- Enabling it for a single user or group.
+
+Conditional means that it returns `true` in some situations, but not all situations.
+
+When a feature flag is disabled (meaning the state is `off`), the experiment is
+considered _inactive_. You can visualize this in the [decision tree diagram](#how-it-works)
+as reaching the first [Running?] node, and traversing the negative path.
+
+When a feature flag is rolled out to a `percentage_of_actors` or similar (meaning the
+state is `conditional`) the experiment is considered to be _running_
+where sometimes the control is assigned, and sometimes the candidate is assigned.
+We don't refer to this as being enabled, because that's a confusing and overloaded
+term here. In the experiment terms, our experiment is _running_, and the feature flag is
+`conditional`.
+
+When a feature flag is enabled (meaning the state is `on`), the candidate will always be
+assigned.
+
+We should try to be consistent with our terms, and so for experiments, we have an
+_inactive_ experiment until we set the feature flag to `conditional`. After which,
+our experiment is then considered _running_. If you choose to "enable" your feature flag,
+you should consider the experiment to be _resolved_, because everyone is assigned
+the candidate unless they've opted out of experimentation.
+
+As of GitLab 13.10, work is being done to improve this process and how we communicate
+about it.