summaryrefslogtreecommitdiff
path: root/Documentation/topics/dpdk/bridge.rst
blob: 913b3e6f6c1ab1b39e41e61ac68eb57112d3479c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
..
      Licensed under the Apache License, Version 2.0 (the "License"); you may
      not use this file except in compliance with the License. You may obtain
      a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
      License for the specific language governing permissions and limitations
      under the License.

      Convention for heading levels in Open vSwitch documentation:

      =======  Heading 0 (reserved for the title in a document)
      -------  Heading 1
      ~~~~~~~  Heading 2
      +++++++  Heading 3
      '''''''  Heading 4

      Avoid deeper levels because they do not render well.

============
DPDK Bridges
============

Bridge must be specially configured to utilize DPDK-backed
:doc:`physical <phy>` and :doc:`virtual <vhost-user>` ports.

Quick Example
-------------

This example demonstrates how to add a bridge that will take advantage
of DPDK::

    $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev

This assumes Open vSwitch has been built with DPDK support. Refer to
:doc:`/intro/install/dpdk` for more information.

.. _extended-statistics:

Extended & Custom Statistics
----------------------------

The DPDK Extended Statistics API allows PMDs to expose a unique set of
statistics.  The Extended Statistics are implemented and supported only for
DPDK physical and vHost ports. Custom statistics are a dynamic set of counters
which can vary depending on the driver. Those statistics are implemented for
DPDK physical ports and contain all "dropped", "error" and "management"
counters from ``XSTATS``.  A list of all ``XSTATS`` counters can be found
`here`__.

__ https://wiki.opnfv.org/display/fastpath/Collectd+Metrics+and+Events

.. note::

    vHost ports only support RX packet size-based counters. TX packet size
    counters are not available.

To enable statistics, you have to enable OpenFlow 1.4 support for OVS. To
configure a bridge, ``br0``, to support OpenFlow version 1.4, run::

    $ ovs-vsctl set bridge br0 datapath_type=netdev \
      protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14

Once configured, check the OVSDB protocols column in the bridge table to ensure
OpenFlow 1.4 support is enabled::

    $ ovsdb-client dump Bridge protocols

You can also query the port statistics by explicitly specifying the ``-O
OpenFlow14`` option::

    $ ovs-ofctl -O OpenFlow14 dump-ports br0

There are custom statistics that OVS accumulates itself and these stats has
``ovs_`` as prefix. These custom stats are shown along with other stats
using the following command::

    $ ovs-vsctl get Interface <iface> statistics

EMC Insertion Probability
-------------------------

By default 1 in every 100 flows is inserted into the Exact Match Cache (EMC).
It is possible to change this insertion probability by setting the
``emc-insert-inv-prob`` option::

    $ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N

where:

``N``
  A positive integer representing the inverse probability of insertion, i.e. on
  average 1 in every ``N`` packets with a unique flow will generate an EMC
  insertion.

If ``N`` is set to 1, an insertion will be performed for every flow. If set to
0, no insertions will be performed and the EMC will effectively be disabled.

With default ``N`` set to 100, higher megaflow hits will occur initially as
observed with pmd stats::

    $ ovs-appctl dpif-netdev/pmd-stats-show

For certain traffic profiles with many parallel flows, it's recommended to set
``N`` to '0' to achieve higher forwarding performance.

It is also possible to enable/disable EMC on per-port basis using::

    $ ovs-vsctl set interface <iface> other_config:emc-enable={true,false}

.. note::

   This could be useful for cases where different number of flows expected on
   different ports. For example, if one of the VMs encapsulates traffic using
   additional headers, it will receive large number of flows but only few flows
   will come out of this VM. In this scenario it's much faster to use EMC
   instead of classifier for traffic from the VM, but it's better to disable
   EMC for the traffic which flows to the VM.

For more information on the EMC refer to :doc:`/intro/install/dpdk` .


SMC cache
---------

SMC cache or signature match cache is a new cache level after EMC cache.
The difference between SMC and EMC is SMC only stores a signature of a flow
thus it is much more memory efficient. With same memory space, EMC can store 8k
flows while SMC can store 1M flows. When traffic flow count is much larger than
EMC size, it is generally beneficial to turn off EMC and turn on SMC. It is
currently turned off by default.

To turn on SMC::

    $ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true

Datapath Classifier Performance
-------------------------------

The datapath classifier (dpcls) performs wildcard rule matching, a compute
intensive process of matching a packet ``miniflow`` to a rule ``miniflow``. The
code that does this compute work impacts datapath performance, and optimizing
it can provide higher switching performance.

Modern CPUs provide extensive SIMD instructions which can be used to get higher
performance. The CPU OVS is being deployed on must be capable of running these
SIMD instructions in order to take advantage of the performance benefits.
In OVS v2.14 runtime CPU detection was introduced to enable identifying if
these CPU ISA additions are available, and to allow the user to enable them.

OVS provides multiple implementations of dpcls. The following command enables
the user to check what implementations are available in a running instance ::

    $ ovs-appctl dpif-netdev/subtable-lookup-prio-get
    Available lookup functions (priority : name)
            0 : autovalidator
            1 : generic
            0 : avx512_gather

To set the priority of a lookup function, run the ``prio-set`` command ::

    $ ovs-appctl dpif-netdev/subtable-lookup-prio-set avx512_gather 5
    Lookup priority change affected 1 dpcls ports and 1 subtables.

The highest priority lookup function is used for classification, and the output
above indicates that one subtable of one DPCLS port is has changed its lookup
function due to the command being run. To verify the prioritization, re-run the
get command, note the updated priority of the ``avx512_gather`` function ::

    $ ovs-appctl dpif-netdev/subtable-lookup-prio-get
    Available lookup functions (priority : name)
            0 : autovalidator
            1 : generic
            5 : avx512_gather

If two lookup functions have the same priority, the first one in the list is
chosen, and the 2nd occurance of that priority is not used. Put in logical
terms, a subtable is chosen if its priority is greater than the previous
best candidate.

CPU ISA Testing and Validation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As multiple versions of DPCLS can co-exist, each with different CPU ISA
optimizations, it is important to validate that they all give the exact same
results. To easily test all DPCLS implementations, an ``autovalidator``
implementation of the DPCLS exists. This implementation runs all other
available DPCLS implementations, and verifies that the results are identical.

Running the OVS unit tests with the autovalidator enabled ensures all
implementations provide the same results. Note that the performance of the
autovalidator is lower than all other implementations, as it tests the scalar
implementation against itself, and against all other enabled DPCLS
implementations.

To adjust the DPCLS autovalidator priority, use this command ::

    $ ovs-appctl dpif-netdev/subtable-lookup-prio-set autovalidator 7

Running Unit Tests with Autovalidator
+++++++++++++++++++++++++++++++++++++

To run the OVS unit test suite with the DPCLS autovalidator as the default
implementation, it is required to recompile OVS. During the recompilation,
the default priority of the `autovalidator` implementation is set to the
maximum priority, ensuring every test will be run with every lookup
implementation ::

    $ ./configure --enable-autovalidator

Compile OVS in debug mode to have `ovs_assert` statements error out if
there is a mis-match in the DPCLS lookup implementation.

Datapath Interface Performance
------------------------------

The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
packets through the major components of the userspace datapath; such as
miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
stats associated with the datapath.

Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
improve performance.

OVS provides multiple implementations of the DPIF. The available
implementations can be listed with the following command ::

    $ ovs-appctl dpif-netdev/dpif-impl-get
    Available DPIF implementations:
      dpif_scalar (pmds: none)
      dpif_avx512 (pmds: 1,2,6,7)

By default, dpif_scalar is used. The DPIF implementation can be selected by
name ::

    $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
    DPIF implementation set to dpif_avx512.

    $ ovs-appctl dpif-netdev/dpif-impl-set dpif_scalar
    DPIF implementation set to dpif_scalar.

Running Unit Tests with AVX512 DPIF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since the AVX512 DPIF is disabled by default, a compile time option is
available in order to test it with the OVS unit test suite. When building with
a CPU that supports AVX512, use the following configure option ::

    $ ./configure --enable-dpif-default-avx512

The following line should be seen in the configure output when the above option
is used ::

    checking whether DPIF AVX512 is default implementation... yes

Miniflow Extract
----------------

Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
important header information into a compressed miniflow. This miniflow is
composed of bits and blocks where the bits signify which blocks are set or
have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
values are used by the datapath for switching decisions later. The Optimized
miniflow extract is traffic specific to speed up the lookup, whereas the
scalar works for ALL traffic patterns

Most modern CPUs have SIMD capabilities. These SIMD instructions are able
to process a vector rather than act on one variable. OVS provides multiple
implementations of miniflow extract. This allows the user to take advantage
of SIMD instructions like AVX512 to gain additional performance.

A list of implementations can be obtained by the following command. The
command also shows whether the CPU supports each implementation ::

    $ ovs-appctl dpif-netdev/miniflow-parser-get
        Available Optimized Miniflow Extracts:
            autovalidator (available: True, pmds: none)
            scalar (available: True, pmds: 1,15)
            study (available: True, pmds: none)

An implementation can be selected manually by the following command ::

    $ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
                                                 [study_cnt]

The above command has two optional parameters: study_cnt and core_id.
The core_id sets a particular miniflow extract function to a specific
pmd thread on the core. The third parameter study_cnt, which is specific
to study and ignored by other implementations, means how many packets
are needed to choose the best implementation.

Also user can select the study implementation which studies the traffic for
a specific number of packets by applying all available implementations of
miniflow extract and then chooses the one with the most optimal result for
that traffic pattern. The user can optionally provide an packet count
[study_cnt] parameter which is the minimum number of packets that OVS must
study before choosing an optimal implementation. If no packet count is
provided, then the default value, 128 is chosen. Also, as there is no
synchronization point between threads, one PMD thread might still be running
a previous round, and can now decide on earlier data.

The per packet count is a global value, and parallel study executions with
differing packet counts will use the most recent count value provided by user.

Study can be selected with packet count by the following command ::

    $ ovs-appctl dpif-netdev/miniflow-parser-set study 1024

Study can be selected with packet count and explicit PMD selection
by the following command ::

    $ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024

In the above command the first parameter is the CORE ID of the PMD
thread and this can also be used to explicitly set the miniflow
extraction function pointer on different PMD threads.

Scalar can be selected on core 3 by the following command where
study count should not be provided for any implementation other
than study ::

    $ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar

Miniflow Extract Validation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

As multiple versions of miniflow extract can co-exist, each with different
CPU ISA optimizations, it is important to validate that they all give the
exact same results. To easily test all miniflow implementations, an
``autovalidator`` implementation of the miniflow exists. This implementation
runs all other available miniflow extract implementations, and verifies that
the results are identical.

Running the OVS unit tests with the autovalidator enabled ensures all
implementations provide the same results.

To set the Miniflow autovalidator, use this command ::

    $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

A compile time option is available in order to test it with the OVS unit
test suite. Use the following configure option ::

    $ ./configure --enable-mfex-default-autovalidator

Unit Test Miniflow Extract
++++++++++++++++++++++++++

Unit test can also be used to test the workflow mentioned above by running
the following test-case in tests/system-dpdk.at ::

    make check-dpdk TESTSUITEFLAGS='-k MFEX'
    OVS-DPDK - MFEX Autovalidator

The unit test uses mulitple traffic type to test the correctness of the
implementaions.

The MFEX commands can also be tested for negative and positive cases to
verify that the MFEX set command does not allow for incorrect parameters.
A user can directly run the following configuration test case in
tests/system-dpdk.at ::

    make check-dpdk TESTSUITEFLAGS='-k MFEX'
    OVS-DPDK - MFEX Configuration

Running Fuzzy test with Autovalidator
+++++++++++++++++++++++++++++++++++++

Fuzzy tests can also be done on miniflow extract with the help of
auto-validator and Scapy. The steps below describes the steps to
reproduce the setup with IP being fuzzed to generate packets.

Scapy is used to create fuzzy IP packets and save them into a PCAP ::

    pkt = fuzz(Ether()/IP()/TCP())

Set the miniflow extract to autovalidator using ::

    $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

OVS is configured to receive the generated packets ::

    $ ovs-vsctl add-port br0 pcap0 -- \
        set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
        "rx_pcap=fuzzy.pcap"

With this workflow, the autovalidator will ensure that all MFEX
implementations are classifying each packet in exactly the same way.
If an optimized MFEX implementation causes a different miniflow to be
generated, the autovalidator has ovs_assert and logging statements that
will inform about the issue.

Unit Fuzzy test with Autovalidator
+++++++++++++++++++++++++++++++++++++

Unit test can also be used to test the workflow mentioned above by running
the following test-case in tests/system-dpdk.at ::

    make check-dpdk TESTSUITEFLAGS='-k MFEX'
    OVS-DPDK - MFEX Autovalidator Fuzzy