summaryrefslogtreecommitdiff
path: root/INSTALL.DPDK.md
diff options
context:
space:
mode:
authorKevin Traynor <kevin.traynor@intel.com>2015-09-09 18:00:38 +0100
committerDaniele Di Proietto <diproiettod@vmware.com>2015-09-16 17:02:01 +0100
commit188d29d7f89a976684cabc19d0fbeb9ef9560c20 (patch)
treef398f89af440df02e150b490f9fdbe9b42d9c20e /INSTALL.DPDK.md
parentd2873d530517ed99737960c158739d63c56e89ee (diff)
downloadopenvswitch-188d29d7f89a976684cabc19d0fbeb9ef9560c20.tar.gz
docs: Expand performance tuning section in INSTALL.DPDK.md.
Split performance tuning into dedicated section and add more detail. Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Diffstat (limited to 'INSTALL.DPDK.md')
-rw-r--r--INSTALL.DPDK.md241
1 files changed, 202 insertions, 39 deletions
diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 6026217c2..7bf110ce7 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -207,60 +207,223 @@ Using the DPDK with ovs-vswitchd:
./ovs-ofctl add-flow br0 in_port=2,action=output:1
```
-8. Performance tuning
+Performance Tuning:
+-------------------
- With pmd multi-threading support, OVS creates one pmd thread for each
- numa node as default. The pmd thread handles the I/O of all DPDK
- interfaces on the same numa node. The following two commands can be used
- to configure the multi-threading behavior.
+ 1. PMD affinitization
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
+ A poll mode driver (pmd) thread handles the I/O of all DPDK
+ interfaces assigned to it. A pmd thread will busy loop through
+ the assigned port/rxq's polling for packets, switch the packets
+ and send to a tx port if required. Typically, it is found that
+ a pmd thread is CPU bound, meaning that the greater the CPU
+ occupancy the pmd thread can get, the better the performance. To
+ that end, it is good practice to ensure that a pmd thread has as
+ many cycles on a core available to it as possible. This can be
+ achieved by affinitizing the pmd thread with a core that has no
+ other workload. See section 7 below for a description of how to
+ isolate cores for this purpose also.
- The command above asks for a CPU mask for setting the affinity of pmd
- threads. A set bit in the mask means a pmd thread is created and pinned
- to the corresponding CPU core. For more information, please refer to
- `man ovs-vswitchd.conf.db`
+ The following command can be used to specify the affinity of the
+ pmd thread(s).
- `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>`
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
- The command above sets the number of rx queues of each DPDK interface. The
- rx queues are assigned to pmd threads on the same numa node in round-robin
- fashion. For more information, please refer to `man ovs-vswitchd.conf.db`
+ By setting a bit in the mask, a pmd thread is created and pinned
+ to the corresponding CPU core. e.g. to run a pmd thread on core 1
- Ideally for maximum throughput, the pmd thread should not be scheduled out
- which temporarily halts its execution. The following affinitization methods
- can help.
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2`
- Lets pick core 4,6,8,10 for pmd threads to run on. Also assume a dual 8 core
- sandy bridge system with hyperthreading enabled where CPU1 has cores 0,...,7
- and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31. (A different cpu
- configuration could have different core mask requirements).
+ For more information, please refer to the Open_vSwitch TABLE section in
- To kernel bootline add core isolation list for cores and associated hype cores
- (e.g. isolcpus=4,20,6,22,8,24,10,26,). Reboot system for isolation to take
- effect, restart everything.
+ `man ovs-vswitchd.conf.db`
- Configure pmd threads on core 4,6,8,10 using 'pmd-cpu-mask':
+ Note, that a pmd thread on a NUMA node is only created if there is
+ at least one DPDK interface from that NUMA node added to OVS.
- `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=00000550`
+ 2. Multiple poll mode driver threads
- You should be able to check that pmd threads are pinned to the correct cores
- via:
+ With pmd multi-threading support, OVS creates one pmd thread
+ for each NUMA node by default. However, it can be seen that in cases
+ where there are multiple ports/rxq's producing traffic, performance
+ can be improved by creating multiple pmd threads running on separate
+ cores. These pmd threads can then share the workload by each being
+ responsible for different ports/rxq's. Assignment of ports/rxq's to
+ pmd threads is done automatically.
- ```
- top -p `pidof ovs-vswitchd` -H -d1
- ```
+ The following command can be used to specify the affinity of the
+ pmd threads.
- Note, the pmd threads on a numa node are only created if there is at least
- one DPDK interface from the numa node that has been added to OVS.
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>`
+
+ A set bit in the mask means a pmd thread is created and pinned
+ to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2
+
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
+
+ For more information, please refer to the Open_vSwitch TABLE section in
+
+ `man ovs-vswitchd.conf.db`
+
+ For example, when using dpdk and dpdkvhostuser ports in a bi-directional
+ VM loopback as shown below, spreading the workload over 2 or 4 pmd
+ threads shows significant improvements as there will be more total CPU
+ occupancy available.
+
+ NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
+
+ The OVS log can be checked to confirm that the port/rxq assignment to
+ pmd threads is as required. This can also be checked with the following
+ commands:
+
+ ```
+ top -H
+ taskset -p <pid_of_pmd>
+ ```
+
+ To understand where most of the pmd thread time is spent and whether the
+ caches are being utilized, these commands can be used:
+
+ ```
+ # Clear previous stats
+ ovs-appctl dpif-netdev/pmd-stats-clear
+
+ # Check current stats
+ ovs-appctl dpif-netdev/pmd-stats-show
+ ```
+
+ 3. DPDK port Rx Queues
+
+ `ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=<integer>`
+
+ The command above sets the number of rx queues for each DPDK interface.
+ The rx queues are assigned to pmd threads on the same NUMA node in a
+ round-robin fashion. For more information, please refer to the
+ Open_vSwitch TABLE section in
+
+ `man ovs-vswitchd.conf.db`
+
+ 4. Exact Match Cache
+
+ Each pmd thread contains one EMC. After initial flow setup in the
+ datapath, the EMC contains a single table and provides the lowest level
+ (fastest) switching for DPDK ports. If there is a miss in the EMC then
+ the next level where switching will occur is the datapath classifier.
+ Missing in the EMC and looking up in the datapath classifier incurs a
+ significant performance penalty. If lookup misses occur in the EMC
+ because it is too small to handle the number of flows, its size can
+ be increased. The EMC size can be modified by editing the define
+ EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
+
+ As mentioned above an EMC is per pmd thread. So an alternative way of
+ increasing the aggregate amount of possible flow entries in EMC and
+ avoiding datapath classifier lookups is to have multiple pmd threads
+ running. This can be done as described in section 2.
+
+ 5. Compiler options
+
+ The default compiler optimization level is '-O2'. Changing this to
+ more aggressive compiler optimizations such as '-O3' or
+ '-Ofast -march=native' with gcc can produce performance gains.
+
+ 6. Simultaneous Multithreading (SMT)
+
+ With SMT enabled, one physical core appears as two logical cores
+ which can improve performance.
+
+ SMT can be utilized to add additional pmd threads without consuming
+ additional physical cores. Additional pmd threads may be added in the
+ same manner as described in section 2. If trying to minimize the use
+ of physical cores for pmd threads, care must be taken to set the
+ correct bits in the pmd-cpu-mask to ensure that the pmd threads are
+ pinned to SMT siblings.
+
+ For example, when using 2x 10 core processors in a dual socket system
+ with HT enabled, /proc/cpuinfo will report 40 logical cores. To use
+ two logical cores which share the same physical core for pmd threads,
+ the following command can be used to identify a pair of logical cores.
+
+ `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`
+
+ where N is the logical core number. In this example, it would show that
+ cores 1 and 21 share the same physical core. The pmd-cpu-mask to enable
+ two pmd threads running on these two logical cores (one physical core)
+ is.
+
+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
+
+ Note that SMT is enabled by the Hyper-Threading section in the
+ BIOS, and as such will apply to the whole system. So the impact of
+ enabling/disabling it for the whole system should be considered
+ e.g. If workloads on the system can scale across multiple cores,
+ SMT may very beneficial. However, if they do not and perform best
+ on a single physical core, SMT may not be beneficial.
+
+ 7. The isolcpus kernel boot parameter
+
+ isolcpus can be used on the kernel bootline to isolate cores from the
+ kernel scheduler and hence dedicate them to OVS or other packet
+ forwarding related workloads. For example a Linux kernel boot-line
+ could be:
+
+ 'GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=4 default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"'
+
+ 8. NUMA/Cluster On Die
+
+ Ideally inter NUMA datapaths should be avoided where possible as packets
+ will go across QPI and there may be a slight performance penalty when
+ compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3,
+ Cluster On Die is introduced on models that have 10 cores or more.
+ This makes it possible to logically split a socket into two NUMA regions
+ and again it is preferred where possible to keep critical datapaths
+ within the one cluster.
+
+ It is good practice to ensure that threads that are in the datapath are
+ pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
+ responsible for forwarding.
+
+ 9. Rx Mergeable buffers
+
+ Rx Mergeable buffers is a virtio feature that allows chaining of multiple
+ virtio descriptors to handle large packet sizes. As such, large packets
+ are handled by reserving and chaining multiple free descriptors
+ together. Mergeable buffer support is negotiated between the virtio
+ driver and virtio device and is supported by the DPDK vhost library.
+ This behavior is typically supported and enabled by default, however
+ in the case where the user knows that rx mergeable buffers are not needed
+ i.e. jumbo frames are not needed, it can be forced off by adding
+ rx_mrgbuf=off to the QEMU command line options. By not reserving multiple
+ chains of descriptors it will make more individual virtio descriptors
+ available for rx to the guest using dpdkvhost ports and this can improve
+ performance.
+
+ 10. Packet processing in the guest
+
+ It is good practice whether simply forwarding packets from one
+ interface to another or more complex packet processing in the guest,
+ to ensure that the thread performing this work has as much CPU
+ occupancy as possible. For example when the DPDK sample application
+ `testpmd` is used to forward packets in the guest, multiple QEMU vCPU
+ threads can be created. Taskset can then be used to affinitize the
+ vCPU thread responsible for forwarding to a dedicated core not used
+ for other general processing on the host system.
+
+ 11. DPDK virtio pmd in the guest
+
+ dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the path
+ to the guest using the DPDK vhost library. This library is compatible with
+ virtio-net drivers in the guest but significantly better performance can
+ be observed when using the DPDK virtio pmd driver in the guest. The DPDK
+ `testpmd` application can be used in the guest as an example application
+ that forwards packet from one DPDK vhost port to another. An example of
+ running `testpmd` in the guest can be seen here.
+
+ `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i --txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start`
+
+ See below information on dpdkvhostcuse and dpdkvhostuser ports.
+ See [DPDK Docs] for more information on `testpmd`.
- To understand where most of the time is spent and whether the caches are
- effective, these commands can be used:
- ```
- ovs-appctl dpif-netdev/pmd-stats-clear #To reset statistics
- ovs-appctl dpif-netdev/pmd-stats-show
- ```
DPDK Rings :
------------