From ff4c712d45d7b696039061b1821a39dcd72d88fb Mon Sep 17 00:00:00 2001 From: Eelco Chaudron Date: Wed, 22 Dec 2021 10:17:44 +0100 Subject: Documentation: Add USDT documentation and bpftrace example. Add the USDT documentation and a bpftrace example using the bridge run USDT probes. Signed-off-by: Eelco Chaudron Acked-by: Paolo Valerio Signed-off-by: Ilya Maximets --- Documentation/automake.mk | 1 + Documentation/topics/index.rst | 1 + Documentation/topics/usdt-probes.rst | 269 +++++++++++++++++++++++++++++++++++ 3 files changed, 271 insertions(+) create mode 100644 Documentation/topics/usdt-probes.rst (limited to 'Documentation') diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 01e3c4f9e..6c2c57739 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -57,6 +57,7 @@ DOC_SOURCE = \ Documentation/topics/porting.rst \ Documentation/topics/record-replay.rst \ Documentation/topics/tracing.rst \ + Documentation/topics/usdt-probes.rst \ Documentation/topics/userspace-tso.rst \ Documentation/topics/userspace-tx-steering.rst \ Documentation/topics/windows.rst \ diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst index 3699fd5c4..90d4c66e6 100644 --- a/Documentation/topics/index.rst +++ b/Documentation/topics/index.rst @@ -56,3 +56,4 @@ OVS idl-compound-indexes ovs-extensions userspace-tx-steering + usdt-probes diff --git a/Documentation/topics/usdt-probes.rst b/Documentation/topics/usdt-probes.rst new file mode 100644 index 000000000..8aa596b80 --- /dev/null +++ b/Documentation/topics/usdt-probes.rst @@ -0,0 +1,269 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +============================================= +User Statically-Defined Tracing (USDT) probes +============================================= + +Sometimes it's desired to troubleshoot one of OVS's components in the field. +One of the techniques used is to add dynamic tracepoints, for example using +perf_. However, the desired dynamic tracepoint and/or the desired variable, +might not be available due to compiler optimizations. + +In this case, a well-thought-off, static tracepoint could be permanently added, +so it's always available. For OVS we use the DTrace probe macro's, which have +little to no overhead when disabled. Various tools exist to enable them. See +some examples below. + + +Compiling with USDT probes enabled +---------------------------------- + +Since USDT probes are compiled out by default, a compile-time option is +available to include them. To add the probes to the generated code, use the +following configure option :: + + $ ./configure --enable-usdt-probes + +The following line should be seen in the configure output when the above option +is used :: + + checking whether USDT probes are enabled... yes + + +Listing available probes +------------------------ + +There are various ways to display USDT probes available in a userspace +application. Here we show three examples. All assuming ovs-vswitchd is in the +search path with USDT probes enabled: + +You can use the **perf** tool as follows :: + + $ perf buildid-cache --add $(which ovs-vswitchd) + $ perf list | grep sdt_ + sdt_main:poll_block [SDT event] + sdt_main:run_start [SDT event] + +You can use the bpftrace_ tool :: + + # bpftrace -l "usdt:$(which ovs-vswitchd):*" + usdt:/usr/sbin/ovs-vswitchd:main:poll_block + usdt:/usr/sbin/ovs-vswitchd:main:run_start + +.. note:: + + If you execute this on a running process, + ``bpftrace -lp $(pidof ovs-vswitchd) "usdt:*"`` , it will list all USDT + events, i.e., also the ones available in the used shared libraries. + +Finally, you can use the **tplist** tool which is part of the bcc_ framework :: + + $ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd) + b'main':b'poll_block' [sema 0x0] + location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc + b'main':b'run_start' [sema 0x0] + location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6 + + +Using probes +------------ + +We will use the OVS sandbox environment in combination with the probes shown +above to try out some of the available trace tools. To start up the virtual +environment use the ``make sandbox`` command. In addition we have to create +a bridge to kick of the main bridge loop :: + + $ ovs-vsctl add-br test_bridge + $ ovs-vsctl show + 055acdca-2f0c-4f6e-b542-f4b6d2c44e08 + Bridge test_bridge + Port test_bridge + Interface test_bridge + type: internal + +perf +~~~~ + +Perf is using Linux uprobe based event tracing to for capturing the events. +To enable the main:\* probes as displayed above and take an actual trace, you +need to execute the following sequence of perf commands :: + + # perf buildid-cache --add $(which ovs-vswitchd) + + # perf list | grep sdt_ + sdt_main:poll_block [SDT event] + sdt_main:run_start [SDT event] + + # perf probe --add=sdt_main:poll_block --add=sdt_main:run_start + Added new events: + sdt_main:poll_block (on %poll_block in /usr/sbin/ovs-vswitchd) + sdt_main:run_start (on %run_start in /usr/sbin/ovs-vswitchd) + + You can now use it in all perf tools, such as: + + perf record -e sdt_main:run_start -aR sleep 1 + + # perf record -e sdt_main:run_start -e sdt_main:poll_block \ + -p $(pidof ovs-vswitchd) sleep 30 + [ perf record: Woken up 1 times to write data ] + [ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ] + + # perf script + ovs-vswitchd 8576 [011] 21031.340433: sdt_main:run_start: (407ff6) + ovs-vswitchd 8576 [011] 21031.340516: sdt_main:poll_block: (407fdc) + ovs-vswitchd 8576 [011] 21031.841726: sdt_main:run_start: (407ff6) + ovs-vswitchd 8576 [011] 21031.842088: sdt_main:poll_block: (407fdc) + ... + +Note that the above examples works with the sandbox environment, so make sure +you execute the above command while in the sandbox shell! + +There are a lot more options available with perf, for example, the +``--call-graph dwarf`` option, which would give you a call graph in the +``perf script`` output. See the perf documentation for more information. + +One other interesting feature is that the perf data can be converted for use +by the trace visualizer `Trace Compass`_. This can be done using the +``--all --to-ctf`` option to the ``perf data convert`` tool. + + +bpftrace +~~~~~~~~ + +bpftrace is a high-level tracing language based on eBPF, which can be used to +script USDT probes. Here we will show a simple one-liner to display the +USDT probes being hit. However, the script section below reference some more +advanced bpftrace scripts. + +This is a simple bpftrace one-liner to show all ``main:*`` USDT probes :: + + # bpftrace -p $(pidof ovs-vswitchd) -e \ + 'usdt::main:* { printf("%s %u [%u] %u %s\n", + comm, pid, cpu, elapsed, probe); }' + Attaching 2 probes... + ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start + ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block + ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start + ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block + + +bcc +~~~ + +The BPF Compiler Collection (BCC) is a set of tools and scripts that also use +eBPF for tracing. The example below uses the ``trace`` tool to show the events +while they are being generated :: + + # /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \ + 'u::main:run_start' 'u::main:poll_block' + TIME PID TID COMM FUNC + 15:49:06 8576 8576 ovs-vswitchd main:run_start + 15:49:06 8576 8576 ovs-vswitchd main:poll_block + 15:49:06 8576 8576 ovs-vswitchd main:run_start + 15:49:06 8576 8576 ovs-vswitchd main:poll_block + ^C + + +Scripts +------- +To not have to re-invent the wheel when trying to debug complex OVS issues, a +set of scripts are provided in the source repository. They are located in the +``utilities/usdt-scripts/`` directory, and each script contains detailed +information on how they should be used, and what information they provide. + + +Available probes +---------------- +The next sections describes all the available probes, their use case, and if +used in any script, which one. Any new probes being added to OVS should get +their own section. See the below "Adding your own probes" section for the +used naming convention. + +Available probes in ``ovs_vswitchd``: + +- main:poll_block +- main:run_start + + +probe main:run_start +~~~~~~~~~~~~~~~~~~~~ + +**Description**: +The ovs-vswitchd's main process contains a loop that runs every time some work +needs to be done. This probe gets triggered every time the loop starts from the +beginning. See also the ``main:poll_block`` probe below. + +**Arguments**: + +*None* + +**Script references**: + +- ``utilities/usdt-scripts/bridge_loop.bt`` + + +probe main:poll_block +~~~~~~~~~~~~~~~~~~~~~ + +**Description**: +The ovs-vswitchd's main process contains a loop that runs every time some work +needs to be done. This probe gets triggered every time the loop is done, and +it's about to wait for being re-started by a poll_block() call returning. +See also the ``main:run_start`` probe above. + +**Arguments**: + +*None* + +**Script references**: + +- ``utilities/usdt-scripts/bridge_loop.bt`` + + +Adding your own probes +---------------------- + +Adding your own probes is as simple as adding the ``OVS_USDT_PROBE()`` macro +to the code. It's similar to the ``DTRACE_PROBExx`` macro's with the difference +that it does automatically determine the number of optional arguments. + +The macro requires at least two arguments. The first one being the *provider*, +and the second one being the *name*. To keep some consistency with the probe +naming, please use the following convention. The *provider* should be the +function name, and the *name* should be the name of the tracepoint. If you do +function entry and exit like probes, please use ``entry`` and ``exit``. + +If, for some reason, you do not like to use the function name as a *provider*, +please prefix it with ``__``, so we know it's not a function name. + +The remaining parameters, up to 10, can be variables, pointers, etc., that +might be of interest to capture at this point in the code. Note that the +provided variables can cause the compiler to be less effective in optimizing. + + + +.. _perf : https://developers.redhat.com/blog/2020/05/29/debugging-vhost-user-tx-contention-in-open-vswitch# +.. _bpftrace : https://github.com/iovisor/bpftrace +.. _bcc : https://github.com/iovisor/bcc +.. _Trace Compass : https://www.eclipse.org/tracecompass/ -- cgit v1.2.1