summaryrefslogtreecommitdiff
path: root/Documentation/topics
Commit message (Collapse)AuthorAgeFilesLines
* utilities: Add revalidator measurement script and needed USDT probes.Eelco Chaudron2023-01-271-0/+84
| | | | | | | | | | | | | | | | | | | | This patch adds a Python script that can be used to analyze the revalidator runs by providing statistics (including some real time graphs). The USDT events can also be captured to a file and used for later offline analysis. The following blog explains the Open vSwitch revalidator implementation and how this tool can help you understand what is happening in your system. https://developers.redhat.com/articles/2022/10/19/open-vswitch-revalidator-process-explained Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Set PMD load based sleep start/inc to 1 us.Kevin Traynor2023-01-231-9/+6
| | | | | | | | | | | | Now that the timer slack for the PMD threads is reduced we can also reduce the start/increment for PMD load based sleeping to match it. This will further reduce initial sleep times making it more resilient to interfaces that might be sensitive to large sleep times. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Set timer slack for PMD threads.David Marchand2023-01-231-5/+0
| | | | | | | | | | | | | | | | | | | The default Linux timer slack groups timer expires into 50 uS intervals. With some traffic patterns this can mean that returning to process packets after a sleep takes too long and packets are dropped. Add a helper to util.c and set use it to reduce the timer slack for PMD threads, so that sleeps with smaller resolutions can be done to prevent sleeping for too long. Fixes: de3bbdc479a9 ("dpif-netdev: Add PMD load based sleeping.") Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401121.html Reported-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Add PMD load based sleeping.Kevin Traynor2023-01-121-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sleep for an incremental amount of time if none of the Rx queues assigned to a PMD have at least half a batch of packets (i.e. 16 pkts) on an polling iteration of the PMD. Upon detecting the threshold of >= 16 pkts on an Rxq, reset the sleep time to zero (i.e. no sleep). Sleep time will be increased on each iteration where the low load conditions remain up to a total of the max sleep time which is set by the user e.g: ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500 The default pmd-maxsleep value is 0, which means that no sleeps will occur and the default behaviour is unchanged from previously. Also add new stats to pmd-perf-show to get visibility of operation e.g. ... - sleep iterations: 153994 ( 76.8 % of iterations) Sleep time (us): 9159399 ( 59 us/iteration avg.) ... Reviewed-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Remove link to obsolete sources.David Marchand2023-01-121-15/+14
| | | | | | | | | | This archive website disappeared. On the other hand, the link to an obsolete dpif-provider man page probably did not provide much info and we can simply mention the current file. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Fix link to iproute2 git repository.David Marchand2023-01-111-1/+1
| | | | | | | | | | iproute2 git repositories were split and moved around v4.15 [1]. It is time to fix the link in OVS documentation. 1: https://lore.kernel.org/netdev/20180129082052.0eb85e9b@xeon-e3/ Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Fix links in the DPDK guide on physical ports.Ilya Maximets2023-01-061-7/+7
| | | | | | | | | | The text enclosed in '<...>' supposed to be an actual link and not the name of the link. This generates incorrect links that lead nowhere. Also, a single underscore supposed to be used for external links. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* utilities: Add USDT script to monitor dpif netlink execute message queuing.Eelco Chaudron2023-01-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the dpif_nl_exec_monitor.py script that will used the existing dpif_netlink_operate__:op_flow_execute USDT probe to show all DPIF_OP_EXECUTE operations being queued for transmission over the netlink interface. Here is an example, truncated output: Display DPIF_OP_EXECUTE operations being queued for transmission... TIME CPU COMM PID NL_SIZE 3124.516679897 1 ovs-vswitchd 8219 180 nlmsghdr : len = 0, type = 36, flags = 1, seq = 0, pid = 0 genlmsghdr: cmd = 3, version = 1, reserver = 0 ovs_header: dp_ifindex = 21 > Decode OVS_PACKET_ATTR_* TLVs: nla_len 46, nla_type OVS_PACKET_ATTR_PACKET[1], data: 00 00 00... nla_len 20, nla_type OVS_PACKET_ATTR_KEY[2], data: 08 00 02 00... > Decode OVS_KEY_ATTR_* TLVs: nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00... nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00... nla_len 88, nla_type OVS_PACKET_ATTR_ACTIONS[3], data: 4c 00 03... > Decode OVS_ACTION_ATTR_* TLVs: nla_len 76, nla_type OVS_ACTION_ATTR_SET[3], data: 48 00... > Decode OVS_TUNNEL_KEY_ATTR_* TLVs: nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_ID[0], data:... nla_len 20, nla_type OVS_TUNNEL_KEY_ATTR_IPV6_DST[13], ... nla_len 5, nla_type OVS_TUNNEL_KEY_ATTR_TTL[4], data: 40 nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT[5]... nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_CSUM[6], data: nla_len 6, nla_type OVS_TUNNEL_KEY_ATTR_TP_DST[10],... nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS[8],... nla_len 8, nla_type OVS_ACTION_ATTR_OUTPUT[1], data: 02 00 00 00 - Dumping OVS_PACKET_ATR_PACKET data: ###[ Ethernet ]### dst = 00:00:00:00:ec:01 src = 04:f4:bc:28:57:00 type = IPv4 ###[ IP ]### version = 4 ihl = 5 tos = 0x0 len = 50 id = 0 flags = frag = 0 ttl = 127 proto = icmp chksum = 0x2767 src = 10.0.0.1 dst = 10.0.0.100 \options \ ###[ ICMP ]### type = echo-request code = 0 chksum = 0xf7f3 id = 0x0 seq = 0xc Acked-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Calculate per numa variance.Cheng Li2022-12-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, pmd_rebalance_dry_run() calculate overall variance of all pmds regardless of their numa location. The overall result may hide un-balance in an individual numa. Considering the following case. Numa0 is free because VMs on numa0 are not sending pkts, while numa1 is busy. Within numa1, pmds workloads are not balanced. Obviously, moving 500 kpps workloads from pmd 126 to pmd 62 will make numa1 much more balance. For numa1 the variance improvement will be almost 100%, because after rebalance each pmd in numa1 holds same workload(variance ~= 0). But the overall variance improvement is only about 20%, which may not trigger auto_lb. ``` numa_id core_id kpps 0 30 0 0 31 0 0 94 0 0 95 0 1 126 1500 1 127 1000 1 63 1000 1 62 500 ``` As auto_lb doesn't balance workload across numa nodes. So it makes more sense to calculate variance improvement per numa node. Signed-off-by: Cheng Li <lic121@chinatelecom.cn> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Co-authored-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* docs: Add documentation for pmd-rxq-show secs parameter.Kevin Traynor2022-12-211-5/+18
| | | | | | | | | Add description of new '-secs' parameter in docs. Also, add to NEWS as it is a user facing change. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* travis: Drop support.David Marchand2022-12-211-40/+0
| | | | | | | | | | | | | | Following a change in the terms of use, free Travis credits are really too low for a realistic usage by OVS contributors. As a consequence, testing OVS with Travis has been abandoned by most (if not all) contributors to the project. Drop the Travis configuration from our repository, clean references in the documentation and move GHA specifics to the association yml. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Update to use v22.11.1.Ian Stokes2022-12-065-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit add support to for DPDK v22.11.1, it includes the following changes. 1. ci: Reduce DPDK compilation time. 2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07. http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528 3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07. http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332 4. netdev-dpdk: Report device bus specific information. 5. netdev-dpdk: Drop reference to Rx header split. http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808 In addition documentation was also updated in this commit for use with DPDK v22.11.1. The Debian shared DPDK compilation test is removed as part of this patch due to a packaging requirement. Once DPDK v22.11.1 is available in Debian repositories it should be re-enabled in OVS. For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Tested-by: Michael Phelan <michael.phelan@intel.com> Tested-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* Documentation: Use new syntax for dpdk port representors.Robin Jarry2022-11-021-6/+6
| | | | | | | | | | Since DPDK 21.05, the representor identifier now handles a relative VF offset. The legacy representor ID seems only valid in certain cases (first dpdk port). Link: https://github.com/DPDK/dpdk/commit/cebf7f17159a8 Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* xenserver: Remove xenserver.Greg Rose2022-08-152-29/+16
| | | | | | | | | | | | | Remove the current xenserver implementation - it is obsolete and since 3.0 we do not support kernel module builds [1]. 1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html [i.maximets] Can be added back if people willing to maintain it will be found. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* python: Add flow filtering syntax.Adrian Moreno2022-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | Based on pyparsing, create a very simple filtering syntax. It supports basic logic statements (and, &, or, ||, not, !), numerical operations (<, >), equality (=, !=), and masking (~=). The latter is only supported in certain fields (IntMask, EthMask, IPMask). Masking operation is semantically equivalent to "includes", therefore: ip_src ~= 192.168.1.1 means that ip_src field is either a host IP address equal to 192.168.1.1 or an IPMask that includes it (e.g: 192.168.1.1/24). Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* python: Add mask, ip and eth decoders.Adrian Moreno2022-07-151-0/+9
| | | | | | | | | | | | | | | | | | Add more decoders that can be used by KVParser. For IPv4 and IPv6 addresses, create a new class that wraps netaddr.IPAddress. For Ethernet addresses, create a new class that wraps netaddr.EUI. For Integers, create a new class that performs basic bitwise mask comparisons netaddr is added as a new shoft dependency: - extras_require in setup.py - Suggests in deb and rpm packages Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tests: Remove support for check-kmod test.Greg Rose2022-07-151-7/+0
| | | | | | | | The OVS kernel module is no longer supported as of OVS 2.18 Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp-execute: Add ISA implementation of actions.Emma Finn2022-07-152-8/+46
| | | | | | | | | | | | | | This commit adds the AVX512 implementation of the action functionality. Usage: $ ovs-appctl odp-execute/action-impl-set avx512 Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Add shared mempool config.Kevin Traynor2022-07-141-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mempools may currently be shared between DPDK ports based on port MTU and NUMA. With some hint from the user we can increase the sharing on MTU and hence reduce memory consumption in many cases. For example, a port with MTU 9000, uses a mempool with an mbuf size based on 9000 MTU. A port with MTU 1500, uses a different mempool with an mbuf size based on 1500 MTU. In this case, assuming same NUMA, both these ports could share the 9000 MTU mempool. The user must give a hint as order of creation of ports and setting of MTUs may vary and we need to ensure that upgrades from older OVS versions do not require more memory. This scheme can also prevent multiple mempools being created for cases where a port is added picking up a default MTU and an appropriate mempool, but later has it's MTU changed to a different value requiring a different mempool. Example usage: $ ovs-vsctl --no-wait set Open_vSwitch . \ other_config:shared-mempool-config=9000,1500:1,6000:1 Port added on NUMA 0: * MTU 1500, use mempool based on 9000 MTU * MTU 5000, use mempool based on 9000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Port added on NUMA 1: * MTU 1500, use mempool based on 1500 MTU * MTU 5000, use mempool based on 6000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Default behaviour is unchanged and mempools are still only created when needed. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpcls: Change info-get function to fetch dpcls usage stats.Kumar Amber2022-05-241-10/+10
| | | | | | | | | | | | | | | | | | | | | | | Modified the dplcs info-get command output to include the count for different dpcls implementations. $ovs-appctl dpif-netdev/subtable-lookup-info-get Available dpcls implementations: autovalidator (Use count: 1, Priority: 5) generic (Use count: 0, Priority: 1) avx512_gather (Use count: 0, Priority: 3) Test case to verify changes: 1061: PMD - dpcls configuration ok Signed-off-by: Kumar Amber <kumar.amber@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* Documentation: Fix use of rst verbatim code chunk syntax.Kevin Traynor2022-05-042-2/+2
| | | | | | | | | In some places it is using Markdown syntax and in others it is not needed as there is already a code block. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Clarify QEMU version requirement.Cian Ferriter2022-05-041-1/+1
| | | | | | | | The QEMU version requirement of >= 2.7 is for vhost-user-client ports specifically. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Update USDT documentation to include systemtap dependency.Eelco Chaudron2022-02-161-0/+5
| | | | | | | | | | | Update the documentation to include details on SystemTap dependency when enabling USDT probes. Suggested-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Adrián Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Fix userspace Tx steering section.Maxime Coquelin2022-01-311-6/+7
| | | | | | | | | | | | | | | | This patch fixes the thread mode part, as the static thread-to-txq mapping selection depends on whether the number of queues is strictly greater than the number of PMD threads, and not greater or equal. The section is also reworded as per Ilya's suggestion. Fixes: c18e707b2f25 ("dpif-netdev: Introduce hash-based Tx packet steering mode.") Reported-by: Kevin Traynor <ktraynor@redhat.com> Reported-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Remove experimental tag for PMD ALB.Kevin Traynor2022-01-181-2/+2
| | | | | | | | | | | | | | PMD Auto Load Balance was introduced as an experimental feature in OVS 2.11. It is used to detect that the Rx queue to PMD assignments are no longer balanced and it would be better to reassign. It is disabled by default, and can be enabled with: $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Update PMD Auto Load Balance section.Kevin Traynor2022-01-181-41/+33
| | | | | | | | | | | Updates to the PMD Auto Load Balance section to make it more readable. No change to the core content. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Update PMD thread statistics.Kevin Traynor2022-01-181-0/+18
| | | | | | | | | | | | 'pmd-perf-show' gives some extra information and has nicer formatting than 'pmd-stats-show'. Let the user know they can use that as well to get PMD stats. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Minor spelling and grammar fixes.Kevin Traynor2022-01-181-8/+7
| | | | | | | | | Some minor spelling and grammar fixes in pmd.rst. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Fix Rx/Tx queue configuration section.Kevin Traynor2022-01-181-8/+10
| | | | | | | | | | | | | ovs-vsctl is used to configure physical Rx queues, not ovs-appctl. Number of Tx queues are configured differently depending on whether physical or virtual. Present documentation does not distinguish. Fixes: 31d0dae22a0e ("doc: Add "PMD" topic document") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* utilities: Add netlink flow operation USDT probes and upcall_cost script.Eelco Chaudron2022-01-181-0/+86
| | | | | | | | | | | This patch adds a series of NetLink flow operation USDT probes. These probes are in turn used in the upcall_cost Python script, which in addition of some kernel tracepoints, give an insight into the time spent on processing upcall. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* utilities: Add upcall USDT probe and associated script.Eelco Chaudron2022-01-181-0/+26
| | | | | | | | | | | | | | | | | | | Added the dpif_recv:recv_upcall USDT probe, which is used by the included upcall_monitor.py script. This script receives all upcall packets sent by the kernel to ovs-vswitchd. By default, it will show all upcall events, which looks something like this: TIME CPU COMM PID DPIF_NAME TYPE PKT_LEN FLOW_KEY_LEN 5952147.003848809 2 handler4 1381158 system@ovs-system 0 98 132 5952147.003879643 2 handler4 1381158 system@ovs-system 0 70 160 5952147.003914924 2 handler4 1381158 system@ovs-system 0 98 152 It can also dump the packet and NetLink content, and if required, the packets can also be written to a pcap file. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Documentation: Add USDT documentation and bpftrace example.Eelco Chaudron2022-01-182-0/+270
| | | | | | | | | Add the USDT documentation and a bpftrace example using the bridge run USDT probes. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Introduce hash-based Tx packet steering mode.Maxime Coquelin2022-01-172-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new hash Tx steering mode that distributes the traffic on all the Tx queues, whatever the number of PMD threads. It would be useful for guests expecting traffic to be distributed on all the vCPUs. The idea here is to re-use the 5-tuple hash of the packets, already computed to build the flows batches (and so it does not provide flexibility on which fields are part of the hash). There are also no user-configurable indirection table, given the feature is transparent to the guest. The queue selection is just a modulo operation between the packet hash and the number of Tx queues. There are no (at least intentionnally) functionnal changes for the existing XPS and static modes. There should not be noticeable performance changes for these modes (only one more branch in the hot path). For the hash mode, performance could be impacted due to locking when multiple PMD threads are in use (same as XPS mode) and also because of the second level of batching. Regarding the batching, the existing Tx port output_pkts is not modified. It means that at maximum, NETDEV_MAX_BURST can be batched for all the Tx queues. A second level of batching is done in dp_netdev_pmd_flush_output_on_port(), only for this hash mode. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Forwarding optimization for flows with a simple match.Ilya Maximets2022-01-071-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are cases where users might want simple forwarding or drop rules for all packets received from a specific port, e.g :: "in_port=1,actions=2" "in_port=2,actions=IN_PORT" "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop" "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3" There are also cases where complex OpenFlow rules can be simplified down to datapath flows with very simple match criteria. In theory, for very simple forwarding, OVS doesn't need to parse packets at all in order to follow these rules. "Simple match" lookup optimization is intended to speed up packet forwarding in these cases. Design: Due to various implementation constraints userspace datapath has following flow fields always in exact match (i.e. it's required to match at least these fields of a packet even if the OF rule doesn't need that): - recirc_id - in_port - packet_type - dl_type - vlan_tci (CFI + VID) - in most cases - nw_frag - for ip packets Not all of these fields are related to packet itself. We already know the current 'recirc_id' and the 'in_port' before starting the packet processing. It also seems safe to assume that we're working with Ethernet packets. So, for the simple OF rule we need to match only on 'dl_type', 'vlan_tci' and 'nw_frag'. 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be combined in a single 64bit integer (mark) that can be used as a hash in hash map. We are using only VID and CFI form the 'vlan_tci', flows that need to match on PCP will not qualify for the optimization. Workaround for matching on non-existence of vlan updated to match on CFI and VID only in order to qualify for the optimization. CFI is always set by OVS if vlan is present in a packet, so there is no need to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside the simple match mark. New per-PMD flow table 'simple_match_table' introduced to store simple match flows only. 'dp_netdev_flow_add' adds flow to the usual 'flow_table' and to the 'simple_match_table' if the flow meets following constraints: - 'recirc_id' in flow match is 0. - 'packet_type' in flow match is Ethernet. - Flow wildcards contains only minimal set of non-wildcarded fields (listed above). If the number of flows for current 'in_port' in a regular 'flow_table' equals number of flows for current 'in_port' in a 'simple_match_table', we may use simple match optimization, because all the flows we have are simple match flows. This means that we only need to parse 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching. Now we make the unique flow mark from the 'in_port', 'dl_type', 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'. On successful lookup we don't need to run full 'miniflow_extract()'. Unsuccessful lookup technically means that we have no suitable flow in the datapath and upcall will be required. So, in this case EMC and SMC lookups are disabled. We may optimize this path in the future by bypassing the dpcls lookup too. Performance improvement of this solution on a 'simple match' flows should be comparable with partial HW offloading, because it parses same packet fields and uses similar flow lookup scheme. However, unlike partial HW offloading, it works for all port types including virtual ones. Performance results when compared to EMC: Test setup: virtio-user OVS virtio-user Testpmd1 ------------> pmd1 ------------> Testpmd2 (txonly) x<------ pmd2 <------------ (mac swap) Single stream of 64byte packets. Actions: in_port=vhost0,actions=vhost1 in_port=vhost1,actions=vhost0 Stats collected from pmd1 and pmd2, so there are 2 scenarios: Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2. Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1. Here the packet sent from pmd2 to Testpmd1 is always dropped, because the virtqueue is full since Testpmd1 is in txonly mode and doesn't receive any packets. This should be closer to the performance of a VM-to-Phy scenario. Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz. Table below represents improvement in throughput when compared to EMC. +----------------+------------------------+------------------------+ | | Default (-g -O2) | "-Ofast -march=native" | | Scenario +------------+-----------+------------+-----------+ | | GCC | Clang | GCC | Clang | +----------------+------------+-----------+------------+-----------+ | Virt-to-Virt | +18.9% | +25.5% | +10.8% | +16.7% | | Virt-to-NoCopy | +24.3% | +33.7% | +14.9% | +22.0% | +----------------+------------+-----------+------------+-----------+ For Phy-to-Phy case performance improvement should be even higher, but it's not the main use-case for this functionality. Performance difference for the non-simple flows is within a margin of error. Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* docs: Re-work the documentation around CPU ISA optimizations.Ilya Maximets2021-12-152-195/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Few problems with a current documentation: 1. bridge.rst is the high-level documentation for the end user. Unit testing and complex implementation details are for developers, hence should not be there. Testing instructions for developers should be in testing.rst. Words in the doc should be understandable for the user who doesn't know OVS internals. 2. Some paragraphs in the current documentation are repeating each other almost to the word. 3. Some paragraphs are incorrectly formatted. That affects the rendering. 4. There is no point describing every separate test of a system-dpdk testsuite. What is done: 1. All the testing related paragraphs are consolidated and moved to the testing.rst. 2. Most of abbreviations replaced with more readable and understandable for the end user words. 3. Meaning or the purpose of several sentences I failed to understand, therefore just deleted. 4. Fixed formatting and a few typos along the way. IMO, some parts of the doc still needs some re-wording, but this change provides at least a starting point for improvement setting a better structure for the document. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: David Marchand <david.marchand@redhat.com>
* dpdk: Update to use DPDK v21.11.Ian Stokes2021-12-095-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for DPDK v21.11, it includes the following changes. 1. ci: Install python elftools for DPDK 21.02. 2. ci: Update meson requirement for DPDK 21.05. 3. netdev-dpdk: Fix build with 21.05. 4. ci: Compile DPDK in non developer mode. http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=* 5. netdev-dpdk: Remove access to DPDK internals. 6. netdev-dpdk: Remove unused attribute from rte_flow rule. 7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1. 8. netdev-dpdk: Fix vhost namespace with 21.11-rc2. http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=* In addition documentation and DPDK unit tests were also updated in this commit for use with DPDK v21.11. For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Emma Finn <emma.finn"intel.com> Tested-by: Seamus Ryan <seamus.ryan@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* Documentation: Cleanup PMD information.Kevin Traynor2021-09-161-50/+53
| | | | | | | | | | | | | | | | | | | | The 'Port/Rx Queue Assigment to PMD Threads' section has expanded over time and now includes info about stats/commands, manual pinning and different options for OVS assigning Rxqs to PMDs. Split them into different sections with sub-headings and move the two similar paragraphs about stats together. Rename 'Automatic assignment of Port/Rx Queue to PMD Threads' section to 'PMD Automatic Load Balance'. A few other minor cleanups as I was reading. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* docs/dpdk/bridge: Fix dpif-netdev/miniflow-parser-set formattingCian Ferriter2021-08-161-2/+2
| | | | | | | | The "name" parameter isn't optional so don't use brackets around it. Fixes: 5c5c98cec21b ("docs/dpdk/bridge: Add miniflow extract section.") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* Documentation: Remove duplicate words.David Marchand2021-07-194-6/+5
| | | | | | | This is a simple cleanup with a script of mine. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif-netdev: Report overhead busy cycles per pmd.David Marchand2021-07-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users complained that per rxq pmd usage was confusing: summing those values per pmd would never reach 100% even if increasing traffic load beyond pmd capacity. This is because the dpif-netdev/pmd-rxq-show command only reports "pure" rxq cycles while some cycles are used in the pmd mainloop and adds up to the total pmd load. dpif-netdev/pmd-stats-show does report per pmd load usage. This load is measured since the last dpif-netdev/pmd-stats-clear call. On the other hand, the per rxq pmd usage reflects the pmd load on a 10s sliding window which makes it non trivial to correlate. Gather per pmd busy cycles with the same periodicity and report the difference as overhead in dpif-netdev/pmd-rxq-show so that we have all info in a single command. Example: $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 1 core_id 3: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 90 % overhead: 4 % pmd thread numa_id 1 core_id 5: isolated : false port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost1 queue-id: 0 (enabled) pmd usage: 93 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % port: vhost6 queue-id: 0 (enabled) pmd usage: 0 % overhead: 6 % pmd thread numa_id 1 core_id 31: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 86 % overhead: 4 % pmd thread numa_id 1 core_id 33: isolated : false port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % port: vhost4 queue-id: 0 (enabled) pmd usage: 0 % port: vhost5 queue-id: 0 (enabled) pmd usage: 92 % port: vhost7 queue-id: 0 (enabled) pmd usage: 0 % overhead: 7 % Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Allow pin rxq and non-isolate PMD.Kevin Traynor2021-07-161-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Pinning an rxq to a PMD with pmd-rxq-affinity may be done for various reasons such as reserving a full PMD for an rxq, or to ensure that multiple rxqs from a port are handled on different PMDs. Previously pmd-rxq-affinity always isolated the PMD so no other rxqs could be assigned to it by OVS. There may be cases where there is unused cycles on those pmds and the user would like other rxqs to also be able to be assigned to it by OVS. Add an option to pin the rxq and non-isolate the PMD. The default behaviour is unchanged, which is pin and isolate the PMD. In order to pin and non-isolate: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false Note this is available only with group assignment type, as pinning conflicts with the operation of the other rxq assignment algorithms. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Add group rxq scheduling assignment type.Kevin Traynor2021-07-161-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an rxq scheduling option that allows rxqs to be grouped on a pmd based purely on their load. The current default 'cycles' assignment sorts rxqs by measured processing load and then assigns them to a list of round robin PMDs. This helps to keep the rxqs that require most processing on different cores but as it selects the PMDs in round robin order, it equally distributes rxqs to PMDs. 'cycles' assignment has the advantage in that it separates the most loaded rxqs from being on the same core but maintains the rxqs being spread across a broad range of PMDs to mitigate against changes to traffic pattern. 'cycles' assignment has the disadvantage that in order to make the trade off between optimising for current traffic load and mitigating against future changes, it tries to assign and equal amount of rxqs per PMD in a round robin manner and this can lead to a less than optimal balance of the processing load. Now that PMD auto load balance can help mitigate with future changes in traffic patterns, a 'group' assignment can be used to assign rxqs based on their measured cycles and the estimated running total of the PMDs. In this case, there is no restriction about keeping equal number of rxqs per PMD as it is purely load based. This means that one PMD may have a group of low load rxqs assigned to it while another PMD has one high load rxq assigned to it, as that is the best balance of their measured loads across the PMDs. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Assign PMD for failed pinned rxqs.Kevin Traynor2021-07-161-3/+3
| | | | | | | | | | | | | | | | | | | | Previously, if pmd-rxq-affinity was used to pin an rxq to a core that was not in pmd-cpu-mask the rxq was not polled for and the user received a warning. This meant that no traffic would be received from that rxq. Now that pinned and non-pinned rxqs are assigned to PMDs in a common call to rxq scheduling, if an invalid core is selected in pmd-rxq-affinity the rxq can be assigned an available PMD (if any). A warning will still be logged as the requested core could not be used. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif/dpcls: limit count subtable search info logsHarry van Haaren2021-07-161-0/+34
| | | | | | | | | | | | | | This commit avoids many instances of "using subtable X for miniflow (x,y)" in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs when no specialized subtable is found, and the generic "_any" version of the avx512 subtable search implementation was used. This change logs the subtable usage once, avoiding duplicates. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: kumar Amber <kumar.amber@intel.com> Co-authored-by: kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* test/sytem-dpdk: Add unit test for mfex autovalidatorKumar Amber2021-07-161-0/+56
| | | | | | | | | | | | | | | | | Tests: 6: OVS-DPDK - MFEX Autovalidator 7: OVS-DPDK - MFEX Autovalidator Fuzzy 8: OVS-DPDK - MFEX Configuration Added a new directory to store the PCAP file used in the tests and a script to generate the fuzzy traffic type pcap to be used in fuzzy unit test. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Add packet count and core id paramters for studyKumar Amber2021-07-161-2/+36
| | | | | | | | | | | | | | | This commit introduces additional command line paramter for mfex study function. If user provides additional packet out it is used in study to compare minimum packets which must be processed else a default value is choosen. Also introduces a third paramter for choosing a particular pmd core. $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3 Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Add configure to enable autovalidator at build time.Kumar Amber2021-07-161-0/+5
| | | | | | | | | | | | | | | This commit adds a new command to allow the user to enable autovalidatior by default at build time thus allowing for runnig unit test by default. $ ./configure --enable-mfex-default-autovalidator Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* docs/dpdk/bridge: Add miniflow extract section.Kumar Amber2021-07-161-0/+51
| | | | | | | | | | | | | | | | | | This commit adds a section to the dpdk/bridge.rst netdev documentation, detailing the added miniflow functionality. The newly added commands are documented, and sample output is provided. The use of auto-validator and special study function is also described in detail as well as running fuzzy tests. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* docs: Add documentation for ovsdb relay mode.Ilya Maximets2021-07-152-0/+125
| | | | | | | | | Main documentation for the service model and tutorial with the use case and configuration examples. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Add command to get dpif implementations.Harry van Haaren2021-07-091-0/+8
| | | | | | | | | | | | | | | | This commit adds a new command to retrieve the list of available DPIF implementations. This can be used by to check what implementations of the DPIF are available in any given OVS binary. It also returns which implementations are in use by the OVS PMD threads. Usage: $ ovs-appctl dpif-netdev/dpif-impl-get Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>