summaryrefslogtreecommitdiff
path: root/doc/development/distributed_tracing.md
blob: e5293c0804c74047a7c2b1766d7efe07775b796f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
stage: Monitor
group: Monitor
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---

# Distributed Tracing - development guidelines **(FREE)**

GitLab is instrumented for distributed tracing. Distributed Tracing in GitLab is currently considered **experimental**, as it has not yet been tested at scale on GitLab.com.

According to [Open Tracing](https://opentracing.io/docs/overview/what-is-tracing/):

> Distributed tracing, also called distributed request tracing, is a method used to profile and
> monitor applications, especially those built using a microservices architecture. Distributed
> tracing helps to pinpoint where failures occur and what causes poor performance.

Distributed tracing is especially helpful in understanding the life cycle of a request as it passes
through the different components of the GitLab application. At present, Workhorse, Rails, Sidekiq,
and Gitaly support tracing instrumentation.

Distributed tracing adds minimal overhead when disabled, but imposes only small overhead when
enabled and is therefore capable in any environment, including production. For this reason, it can
be useful in diagnosing production issues, particularly performance problems.

## Using Correlation IDs to investigate distributed requests

The GitLab application passes correlation IDs between the various components in a request. A
correlation ID is a token, unique to a single request, used to correlate a single request between
different GitLab subsystems (for example, Rails, Workhorse). Since correlation IDs are included in
log output, Engineers can use the correlation ID to correlate logs from different subsystems and
better understand the end-to-end path of a request through the system. When a request traverses
process boundaries, the correlation ID is injected into the outgoing request. This enables
the propagation of the correlation ID to each downstream subsystem.

Correlation IDs are normally generated in the Rails application in response to
certain web requests. Some user facing systems don't generate correlation IDs in
response to user requests (for example, Git pushes over SSH).

### Developer guidelines for working with correlation IDs

When integrating tracing into a new system, developers should avoid making
certain assumptions about correlation IDs. The following guidelines apply to
all subsystems at GitLab:

- Correlation IDs are always optional.
  - Never have non-tracing features depend on the existence of a correlation ID
    from an upstream system.
- Correlation IDs are always free text.
  - Correlation IDs should never be used to pass context (for example, a username or an IP address).
  - Correlation IDs should never be _parsed_, or manipulated in other ways (for example, split).

The [LabKit library](https://gitlab.com/gitlab-org/labkit) provides a standardized interface for working with GitLab
correlation IDs in the Go programming language. LabKit can be used as a
reference implementation for developers working with tracing and correlation IDs
on non-Go GitLab subsystems.

## Enabling distributed tracing

GitLab uses the `GITLAB_TRACING` environment variable to configure distributed tracing. The same
configuration is used for all components (e.g., Workhorse, Rails, etc).

When `GITLAB_TRACING` is not set, the application isn't instrumented, meaning that there is
no overhead at all.

To enable `GITLAB_TRACING`, a valid _"configuration-string"_ value should be set, with a URL-like
form:

```shell
GITLAB_TRACING=opentracing://<driver>?<param_name>=<param_value>&<param_name_2>=<param_value_2>
```

In this example, we have the following hypothetical values:

- `driver`: the driver. [GitLab supports
  `jaeger`](../operations/tracing.md). In future, other
  tracing implementations may also be supported.
- `param_name`, `param_value`: these are driver specific configuration values. Configuration
  parameters for Jaeger are documented [further on in this
  document](#2-configure-the-gitlab_tracing-environment-variable) they should be URL encoded.
  Multiple values should be separated by `&` characters like a URL.

## Using Jaeger in the GitLab Development Kit

The first tracing implementation that GitLab supports is Jaeger, and the [GitLab Development
Kit](https://gitlab.com/gitlab-org/gitlab-development-kit/) supports distributed tracing with
Jaeger out-of-the-box.

The easiest way to access tracing from a GDK environment is through the
[performance-bar](../administration/monitoring/performance/performance_bar.md). This can be shown
by typing `p` `b` in the browser window.

![Jaeger Search UI](img/distributed_tracing_performance_bar.png)

Once the performance bar is enabled, click on the **Trace** link in the performance bar to go to
the Jaeger UI.

The Jaeger search UI returns a query for the `Correlation-ID` of the current request. Normally,
this search should return a single trace result. Clicking this result shows the detail of the
trace in a hierarchical time-line.

![Jaeger Search UI](img/distributed_tracing_jaeger_ui.png)

## Using Jaeger without the GitLab Developer Kit

Distributed Tracing can be enabled in non-GDK development environments as well as production or
staging environments, for troubleshooting. Please note that at this time, this functionality is
experimental, and not supported in production environments at present. In this first release, it is intended to be
used for debugging in development environments only.

Jaeger tracing can be enabled through a three-step process:

1. [Start Jaeger](#1-start-jaeger).
1. [Configure the `GITLAB_TRACING` environment variable](#2-configure-the-gitlab_tracing-environment-variable).
1. [Start the GitLab application](#3-start-the-gitlab-application).
1. [Go to the Jaeger Search UI in your browser](#4-open-the-jaeger-search-ui).

### 1. Start Jaeger

Jaeger has many configuration options, but is very easy to start in an "all-in-one" mode which uses
memory for trace storage (and is therefore non-persistent). The main advantage of "all-in-one" mode
being ease of use.

For more detailed configuration options, refer to the [Jaeger
documentation](https://www.jaegertracing.io/docs/1.9/getting-started/).

#### Using Docker

If you have Docker available, the easier approach to running the Jaeger all-in-one is through
Docker, using the following command:

```shell
$ docker run \
  --rm \
  -e COLLECTOR_ZIPKIN_HTTP_PORT=9411  \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14268:14268 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest
```

#### Using the Jaeger process

Without Docker, the all-in-one process is still easy to setup.

1. Download the [latest Jaeger release](https://github.com/jaegertracing/jaeger/releases) for your
   platform.
1. Extract the archive and run the `bin/all-in-one` process.

This should start the process with the default listening ports.

### 2. Configure the `GITLAB_TRACING` environment variable

Once you have Jaeger running, configure the `GITLAB_TRACING` variable with the
appropriate configuration string.

If you're running everything on the same host, use the following value:

```shell
export GITLAB_TRACING="opentracing://jaeger?http_endpoint=http%3A%2F%2Flocalhost%3A14268%2Fapi%2Ftraces&sampler=const&sampler_param=1"
```

This configuration string uses the Jaeger driver `opentracing://jaeger` with the following options:

| Name | Value | Description |
|------|-------|-------------|
| `http_endpoint` | `http://localhost:14268/api/traces` | Configures Jaeger to send trace information to the HTTP endpoint running on `http://localhost:14268/`. Alternatively, the `upd_endpoint` can be used. |
| `sampler` | `const` | Configures Jaeger to use the constant sampler (either on or off). |
| `sampler_param` | `1` | Configures the `const` sampler to sample _all_ traces. Using `0` would sample _no_ traces. |

**Other parameter values are also possible:**

| Name | Example | Description |
|------|-------|-------------|
| `udp_endpoint` | `localhost:6831` | This is the default. Configures Jaeger to send trace information to the UDP listener on port `6831` using compact thrift protocol. Note that we've experienced some issues with the [Jaeger Client for Ruby](https://github.com/salemove/jaeger-client-ruby) when using this protocol. |
| `sampler` | `probabalistic` | Configures Jaeger to use a probabilistic random sampler. The rate of samples is configured by the `sampler_param` value. |
| `sampler_param` | `0.01` | Use a ratio of `0.01` to configure the `probabalistic` sampler to randomly sample _1%_ of traces. |
| `service_name` | `api` | Override the service name used by the Jaeger backend. This parameter takes precedence over the application-supplied value. |

NOTE:
The same `GITLAB_TRACING` value should to be configured in the environment
variables for all GitLab processes, including Workhorse, Gitaly, Rails, and Sidekiq.

### 3. Start the GitLab application

After the `GITLAB_TRACING` environment variable is exported to all GitLab services, start the
application.

When `GITLAB_TRACING` is configured properly, the application logs this on startup:

```shell
13:41:53 gitlab-workhorse.1      | 2019/02/12 13:41:53 Tracing enabled
...
13:41:54 gitaly.1                | 2019/02/12 13:41:54 Tracing enabled
...
```

If `GITLAB_TRACING` is not configured correctly, this issue is logged:

```shell
13:43:45 gitaly.1                | 2019/02/12 13:43:45 skipping tracing configuration step: tracer: unable to load driver mytracer
```

By default, GitLab ships with the Jaeger tracer, but other tracers can be included at compile time.
Details of how this can be done are included in the [LabKit tracing
documentation](https://pkg.go.dev/gitlab.com/gitlab-org/labkit/tracing).

If no log messages about tracing are emitted, the `GITLAB_TRACING` environment variable is likely
not set.

### 4. Open the Jaeger Search UI

By default, the Jaeger search UI is available at <http://localhost:16686/search>.

NOTE:
Don't forget that you must generate traces by using the application before
they appear in the Jaeger UI.