summaryrefslogtreecommitdiff
path: root/Documentation/topics/usdt-probes.rst
blob: 004817b1c54070461e4535d6d0fc3f21475b2711 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
..
      Licensed under the Apache License, Version 2.0 (the "License"); you may
      not use this file except in compliance with the License. You may obtain
      a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
      License for the specific language governing permissions and limitations
      under the License.

      Convention for heading levels in Open vSwitch documentation:

      =======  Heading 0 (reserved for the title in a document)
      -------  Heading 1
      ~~~~~~~  Heading 2
      +++++++  Heading 3
      '''''''  Heading 4

      Avoid deeper levels because they do not render well.

=============================================
User Statically-Defined Tracing (USDT) probes
=============================================

Sometimes it's desired to troubleshoot one of OVS's components in the field.
One of the techniques used is to add dynamic tracepoints, for example using
perf_. However, the desired dynamic tracepoint and/or the desired variable,
might not be available due to compiler optimizations.

In this case, a well-thought-off, static tracepoint could be permanently added,
so it's always available. For OVS we use the DTrace probe macro's, which have
little to no overhead when disabled. Various tools exist to enable them. See
some examples below.


Compiling with USDT probes enabled
----------------------------------

Since USDT probes are compiled out by default, a compile-time option is
available to include them. To add the probes to the generated code, use the
following configure option ::

    $ ./configure --enable-usdt-probes

The following line should be seen in the configure output when the above option
is used ::

    checking whether USDT probes are enabled... yes

As USDT probes internally use the ``DTRACE_PROBExx`` macros, which are part of
the SystemTap framework, you need to install the appropriate package for your
Linux distribution. For example, on Fedora, you need to install the
``systemtap-sdt-devel`` package.


Listing available probes
------------------------

There are various ways to display USDT probes available in a userspace
application. Here we show three examples. All assuming ovs-vswitchd is in the
search path with USDT probes enabled:

You can use the **perf** tool as follows ::

    $ perf buildid-cache --add $(which ovs-vswitchd)
    $ perf list | grep sdt_
      sdt_main:poll_block                                [SDT event]
      sdt_main:run_start                                 [SDT event]

You can use the bpftrace_ tool ::

    # bpftrace -l "usdt:$(which ovs-vswitchd):*"
    usdt:/usr/sbin/ovs-vswitchd:main:poll_block
    usdt:/usr/sbin/ovs-vswitchd:main:run_start

.. note::

   If you execute this on a running process,
   ``bpftrace -lp $(pidof ovs-vswitchd) "usdt:*"`` , it will list all USDT
   events, i.e., also the ones available in the used shared libraries.

Finally, you can use the **tplist** tool which is part of the bcc_ framework ::

    $ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd)
    b'main':b'poll_block' [sema 0x0]
      location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc
    b'main':b'run_start' [sema 0x0]
      location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6


Using probes
------------

We will use the OVS sandbox environment in combination with the probes shown
above to try out some of the available trace tools. To start up the virtual
environment use the ``make sandbox`` command. In addition we have to create
a bridge to kick of the main bridge loop ::

    $ ovs-vsctl add-br test_bridge
    $ ovs-vsctl show
    055acdca-2f0c-4f6e-b542-f4b6d2c44e08
        Bridge test_bridge
            Port test_bridge
                Interface test_bridge
                    type: internal

perf
~~~~

Perf is using Linux uprobe based event tracing to for capturing the events.
To enable the main:\* probes as displayed above and take an actual trace, you
need to execute the following sequence of perf commands ::

    # perf buildid-cache --add $(which ovs-vswitchd)

    # perf list | grep sdt_
      sdt_main:poll_block                                [SDT event]
      sdt_main:run_start                                 [SDT event]

    # perf probe --add=sdt_main:poll_block --add=sdt_main:run_start
    Added new events:
      sdt_main:poll_block  (on %poll_block in /usr/sbin/ovs-vswitchd)
      sdt_main:run_start   (on %run_start in /usr/sbin/ovs-vswitchd)

    You can now use it in all perf tools, such as:

      perf record -e sdt_main:run_start -aR sleep 1

    # perf record -e sdt_main:run_start -e sdt_main:poll_block \
        -p $(pidof ovs-vswitchd) sleep 30
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ]

    # perf script
        ovs-vswitchd  8576 [011] 21031.340433:  sdt_main:run_start: (407ff6)
        ovs-vswitchd  8576 [011] 21031.340516: sdt_main:poll_block: (407fdc)
        ovs-vswitchd  8576 [011] 21031.841726:  sdt_main:run_start: (407ff6)
        ovs-vswitchd  8576 [011] 21031.842088: sdt_main:poll_block: (407fdc)
    ...

Note that the above examples works with the sandbox environment, so make sure
you execute the above command while in the sandbox shell!

There are a lot more options available with perf, for example, the
``--call-graph dwarf`` option, which would give you a call graph in the
``perf script`` output. See the perf documentation for more information.

One other interesting feature is that the perf data can be converted for use
by the trace visualizer `Trace Compass`_. This can be done using the
``--all --to-ctf`` option to the ``perf data convert`` tool.


bpftrace
~~~~~~~~

bpftrace is a high-level tracing language based on eBPF, which can be used to
script USDT probes. Here we will show a simple one-liner to display the
USDT probes being hit. However, the script section below reference some more
advanced bpftrace scripts.

This is a simple bpftrace one-liner to show all ``main:*`` USDT probes ::

    # bpftrace -p $(pidof ovs-vswitchd) -e \
        'usdt::main:* { printf("%s %u [%u] %u %s\n",
          comm, pid, cpu, elapsed, probe); }'
    Attaching 2 probes...
    ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start
    ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block
    ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start
    ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block


bcc
~~~

The BPF Compiler Collection (BCC) is a set of tools and scripts that also use
eBPF for tracing. The example below uses the ``trace`` tool to show the events
while they are being generated ::

    # /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \
        'u::main:run_start' 'u::main:poll_block'
    TIME     PID     TID     COMM            FUNC
    15:49:06 8576    8576    ovs-vswitchd    main:run_start
    15:49:06 8576    8576    ovs-vswitchd    main:poll_block
    15:49:06 8576    8576    ovs-vswitchd    main:run_start
    15:49:06 8576    8576    ovs-vswitchd    main:poll_block
    ^C


Scripts
-------
To not have to re-invent the wheel when trying to debug complex OVS issues, a
set of scripts are provided in the source repository. They are located in the
``utilities/usdt-scripts/`` directory, and each script contains detailed
information on how they should be used, and what information they provide.


Available probes
----------------
The next sections describes all the available probes, their use case, and if
used in any script, which one. Any new probes being added to OVS should get
their own section. See the below "Adding your own probes" section for the
used naming convention.

Available probes in ``ovs_vswitchd``:

- dpif_netlink_operate\_\_:op_flow_del
- dpif_netlink_operate\_\_:op_flow_execute
- dpif_netlink_operate\_\_:op_flow_get
- dpif_netlink_operate\_\_:op_flow_put
- dpif_recv:recv_upcall
- main:poll_block
- main:run_start


dpif_netlink_operate\_\_:op_flow_del
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Description**:

This probe gets triggered when the Netlink datapath is about to execute the
DPIF_OP_FLOW_DEL operation as part of the dpif ``operate()`` callback.

**Arguments**:

- *arg0*: ``(struct dpif_netlink *) dpif``
- *arg1*: ``(struct dpif_flow_del *) del``
- *arg2*: ``(struct dpif_netlink_flow *) flow``
- *arg3*: ``(struct ofpbuf *) aux->request``

**Script references**:

- *None*


dpif_netlink_operate\_\_:op_flow_execute
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Description**:

This probe gets triggered when the Netlink datapath is about to execute the
DPIF_OP_FLOW_EXECUTE operation as part of the dpif ``operate()`` callback.

**Arguments**:

- *arg0*: ``(struct dpif_netlink *) dpif``
- *arg1*: ``(struct dpif_execute *) op->execute``
- *arg2*: ``dp_packet_data(op->execute.packet)``
- *arg3*: ``dp_packet_size(op->execute.packet)``
- *arg4*: ``(struct ofpbuf *) aux->request``

**Script references**:

- ``utilities/usdt-scripts/dpif_nl_exec_monitor.py``
- ``utilities/usdt-scripts/upcall_cost.py``


dpif_netlink_operate\_\_:op_flow_get
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Description**:

This probe gets triggered when the Netlink datapath is about to execute the
DPIF_OP_FLOW_GET operation as part of the dpif ``operate()`` callback.

**Arguments**:

- *arg0*: ``(struct dpif_netlink *) dpif``
- *arg1*: ``(struct dpif_flow_get *) get``
- *arg2*: ``(struct dpif_netlink_flow *) flow``
- *arg3*: ``(struct ofpbuf *) aux->request``

**Script references**:

- *None*


dpif_netlink_operate\_\_:op_flow_put
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Description**:

This probe gets triggered when the Netlink datapath is about to execute the
DPIF_OP_FLOW_PUT operation as part of the dpif ``operate()`` callback.

**Arguments**:

- *arg0*: ``(struct dpif_netlink *) dpif``
- *arg1*: ``(struct dpif_flow_put *) put``
- *arg2*: ``(struct dpif_netlink_flow *) flow``
- *arg3*: ``(struct ofpbuf *) aux->request``

**Script references**:

- ``utilities/usdt-scripts/upcall_cost.py``


probe dpif_recv:recv_upcall
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Description**:

This probe gets triggered when the datapath independent layer gets notified
that a packet needs to be processed by userspace. This allows the probe to
intercept all packets sent by the kernel to ``ovs-vswitchd``. The
``upcall_monitor.py`` script uses this probe to display and capture all packets
sent to ``ovs-vswitchd``.

**Arguments**:

- *arg0*: ``(struct dpif *)->full_name``
- *arg1*: ``(struct dpif_upcall *)->type``
- *arg2*: ``dp_packet_data((struct dpif_upcall *)->packet)``
- *arg3*: ``dp_packet_size((struct dpif_upcall *)->packet)``
- *arg4*: ``(struct dpif_upcall *)->key``
- *arg5*: ``(struct dpif_upcall *)->key_len``

**Script references**:

- ``utilities/usdt-scripts/upcall_cost.py``
- ``utilities/usdt-scripts/upcall_monitor.py``


probe main:run_start
~~~~~~~~~~~~~~~~~~~~

**Description**:
The ovs-vswitchd's main process contains a loop that runs every time some work
needs to be done. This probe gets triggered every time the loop starts from the
beginning. See also the ``main:poll_block`` probe below.

**Arguments**:

*None*

**Script references**:

- ``utilities/usdt-scripts/bridge_loop.bt``


probe main:poll_block
~~~~~~~~~~~~~~~~~~~~~

**Description**:
The ovs-vswitchd's main process contains a loop that runs every time some work
needs to be done. This probe gets triggered every time the loop is done, and
it's about to wait for being re-started by a poll_block() call returning.
See also the ``main:run_start`` probe above.

**Arguments**:

*None*

**Script references**:

- ``utilities/usdt-scripts/bridge_loop.bt``


Adding your own probes
----------------------

Adding your own probes is as simple as adding the ``OVS_USDT_PROBE()`` macro
to the code. It's similar to the ``DTRACE_PROBExx`` macro's with the difference
that it does automatically determine the number of optional arguments.

The macro requires at least two arguments. The first one being the *provider*,
and the second one being the *name*. To keep some consistency with the probe
naming, please use the following convention. The *provider* should be the
function name, and the *name* should be the name of the tracepoint. If you do
function entry and exit like probes, please use ``entry`` and ``exit``.

If, for some reason, you do not like to use the function name as a *provider*,
please prefix it with ``__``, so we know it's not a function name.

The remaining parameters, up to 10, can be variables, pointers, etc., that
might be of interest to capture at this point in the code. Note that the
provided variables can cause the compiler to be less effective in optimizing.



.. _perf : https://developers.redhat.com/blog/2020/05/29/debugging-vhost-user-tx-contention-in-open-vswitch#
.. _bpftrace : https://github.com/iovisor/bpftrace
.. _bcc : https://github.com/iovisor/bcc
.. _Trace Compass : https://www.eclipse.org/tracecompass/