summaryrefslogtreecommitdiff
path: root/lib/netdev-linux-private.h
Commit message (Collapse)AuthorAgeFilesLines
* add port-based ingress policing based packet-per-second rate-limitingYong Xu2021-07-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVS has support for using policing to enforce a rate limit in kilobits per second. This is configured using OVSDB. f.e. $ ovs-vsctl set interface tap0 ingress_policing_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_burst=100 This patch adds a related feature, allowing policing to enforce a rate limit in kilo-packets per second. This is also configured using OVSDB. $ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100 The kilo-bit and kilo-packet rate limits may be used separately or in combination. Add separate action for BPS and PPS in netlink message. Revise code and change action result to pipe to allow traffic pipe into second action. This patch implements the feature for: * OVSDB (northbound API) * TC policer when used both with and without TC offload (kernel API) Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* netdev-linux: Prepend the std packet in the TSO packetFlavio Leitner2020-02-061-1/+2
| | | | | | | | | | | | | | Usually TSO packets are close to 50k, 60k bytes long, so to to copy less bytes when receiving a packet from the kernel change the approach. Instead of extending the MTU sized packet received and append with remaining TSO data from the TSO buffer, allocate a TSO packet with enough headroom to prepend the std packet data. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux-private: fix max length to be 16 bitsFlavio Leitner2020-02-061-1/+2
| | | | | | | | | The dp_packet length is limited to 16 bits, so document that and fix the length value accordingly. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: Detect numa node id.William Tu2020-01-181-0/+2
| | | | | | | | | | | | The patch detects the numa node id from the name of the netdev, by reading the '/sys/class/net/<devname>/device/numa_node'. If not available, ex: virtual device, or any error happens, return numa id 0. Currently only the afxdp netdev type uses it, other linux netdev types are disabled due to no use case. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add TCP Segmentation Offload supportFlavio Leitner2020-01-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* ovs-thread: Avoid huge alignment on a base spinlock structure.Ilya Maximets2019-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marking the structure as 64 bytes aligned forces compiler to produce big holes in the containing structures in order to fulfill this requirement. Also, any structure that contains this one as a member automatically inherits this huge alignment making resulted memory layout not efficient. For example, 'struct umem_pool' currently uses 3 full cache lines (192 bytes) with only 32 bytes of actual data: struct umem_pool { int index; /* 0 4 */ unsigned int size; /* 4 4 */ /* XXX 56 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ struct ovs_spin lock __attribute__((__aligned__(64))); /* 64 64 */ /* XXX last struct has 48 bytes of padding */ /* --- cacheline 2 boundary (128 bytes) --- */ void * * array; /* 128 8 */ /* size: 192, cachelines: 3, members: 4 */ /* sum members: 80, holes: 1, sum holes: 56 */ /* padding: 56 */ /* paddings: 1, sum paddings: 48 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 56 */ } __attribute__((__aligned__(64))); Actual alignment of a spin lock is required only for Tx queue locks inside netdev-afxdp to avoid false sharing, in all other cases alignment only produces inefficient memory usage. Also, CACHE_LINE_SIZE macro should be used instead of 64 as different platforms may have different cache line sizes. Using PADDED_MEMBERS to avoid alignment inheritance. Fixes: ae36d63d7e3c ("ovs-thread: Make struct spin lock cache aligned.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* netdev-afxdp: Best-effort configuration of XDP mode.Ilya Maximets2019-11-201-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now there was only two options for XDP mode in OVS: SKB or DRV. i.e. 'generic XDP' or 'native XDP with zero-copy enabled'. Devices like 'veth' interfaces in Linux supports native XDP, but doesn't support zero-copy mode. This case can not be covered by existing API and we have to use slower generic XDP for such devices. There are few more issues, e.g. TCP is not supported in generic XDP mode for veth interfaces due to kernel limitations, however it is supported in native mode. This change introduces ability to use native XDP without zero-copy along with best-effort configuration option that enabled by default. In best-effort case OVS will sequentially try different modes starting from the fastest one and will choose the first acceptable for current interface. This will guarantee the best possible performance. If user will want to choose specific mode, it's still possible by setting the 'options:xdp-mode'. This change additionally changes the API by renaming the configuration knob from 'xdpmode' to 'xdp-mode' and also renaming the modes themselves to be more user-friendly. The full list of currently supported modes: * native-with-zerocopy - former DRV * native - new one, DRV without zero-copy * generic - former SKB * best-effort - new one, chooses the best available from 3 above modes Since 'best-effort' is a default mode, users will not need to explicitely set 'xdp-mode' in most cases. TCP related tests enabled back in system afxdp testsuite, because 'best-effort' will choose 'native' mode for veth interfaces and this mode has no issues with TCP. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>
* netdev-afxdp: Add need_wakeup support.William Tu2019-10-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for using need_wakeup flag in AF_XDP rings. A new option, use-need-wakeup, is added. When this option is used, it means that OVS has to explicitly wake up the kernel RX, using poll() syscall and wake up TX, using sendto() syscall. This feature improves the performance by avoiding unnecessary sendto syscalls for TX. For RX, instead of kernel always busy-spinning on fille queue, OVS wakes up the kernel RX processing when fill queue is replenished. The need_wakeup feature is merged into Linux kernel bpf-next tee with commit 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and OVS enables it by default, if libbpf supports it. If users enable it but runs in an older version of libbpf, then the need_wakeup feature has no effect, and a warning message is logged. For virtual interface, it's better set use-need-wakeup=false, since the virtual device's AF_XDP xmit is synchronous: the sendto syscall enters kernel and process the TX packet on tx queue directly. On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port to physical port improves from 6.1Mpps to 7.3Mpps. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-afxdp: Fix use of unconfigured device.Ilya Maximets2019-07-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | In case of failure of 'xsk_configure_all()', 'n_rxq' and 'xdpmode' will remain in a new state. This will result in successful reconfiguration (immediate return, because configuration is already applied) if 'netdev_reconfigure()' will be called again. Same issue was fixed previously for netdev-dpdk using 'dev->started' flag in commit: 606f66507250 ("netdev-dpdk: Don't use PMD driver if not configured successfully") Let's use similar approach with checking the 'dev->xsks' which only exists if configuration was successful. Additionally implemented 'netdev_afxdp_construct()' function to explicitly initialize all the specific fields and request the reconfiguration. CC: William Tu <u9012063@gmail.com> Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.") Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
* netdev-afxdp: add new netdev type for AF_XDP.William Tu2019-07-191-0/+130
The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket type built upon the eBPF and XDP technology. It is aims to have comparable performance to DPDK but cooperate better with existing kernel's networking stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program attached to the netdev, by-passing a couple of Linux kernel's subsystems As a result, AF_XDP socket shows much better performance than AF_PACKET For more details about AF_XDP, please see linux kernel's Documentation/networking/af_xdp.rst. Note that by default, this feature is not compiled in. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>