summaryrefslogtreecommitdiff
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* vlog: Fix OVS_REQUIRES macro.William Tu2020-03-241-1/+1
| | | | | | | Pass lock objects, not their addresses, to the annotation macros. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* conntrack: Reset ct_state when entering a new zone.Dumitru Ceara2020-03-241-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a new conntrack zone is entered, the ct_state field is zeroed in order to avoid using state information from different zones. One such scenario is when a packet is double NATed. Assuming two zones and 3 flows performing the following actions in order on the packet: 1. ct(zone=5,nat), recirc 2. ct(zone=1), recirc 3. ct(zone=1,nat) If at step #1 the packet matches an existing NAT entry, it will get translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT. At step #2 the new tuple might match an existing connection and pkt->md.ct_zone is set to 1. If at step #3 the packet matches an existing NAT entry in zone 1, handle_nat() will be called to perform the translation but it will return early because the packet's zone matches the conntrack zone and the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the translations in zone 5. In order to reliably detect when a packet enters a new conntrack zone we also need to make sure that the pkt->md.ct_zone is properly initialized if pkt->md.ct_state is non-zero. This already happens for most cases. The only exception is when matched conntrack connection is of type CT_CONN_TYPE_UN_NAT and the master connection is missing. To cover this path we now call write_ct_md() in that case too. Remove setting the CS_TRACKED flag as in this case as it will be done by the new call to write_ct_md(). CC: Darrell Ball <dlu998@gmail.com> Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.") Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* fatal-signal: Fix clang error due to lock.William Tu2020-03-242-9/+14
| | | | | | | | | | | | | | | | Due to not acquiring lock, clang reports: lib/vlog.c:618:12: error: reading variable 'log_fd' requires holding mutex 'log_file_mutex' [-Werror,-Wthread-safety-analysis] return log_fd; The patch fixes it by creating a function in vlog.c to write directly to log file unsafely. Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666165883 Fixes: ecd4a8fcdff2 ("fatal-signal: Log backtrace when no monitor daemon.") Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
* lockfile: Fix OVS_REQUIRES macro.William Tu2020-03-241-8/+8
| | | | | | | | | Pass lock objects, not their addresses, to the annotation macros. Fixes: f21fa45f3085 ("lockfile: Minor code cleanup.") Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666098338 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* fatal-signal: Log backtrace when no monitor daemon.William Tu2020-03-234-2/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the backtrace logging is only available when monitor daemon is running. This patch enables backtrace logging when no monitor daemon exists. At signal handling context, it detects whether monitor daemon exists. If not, write directly the backtrace to the vlog fd. Note that using VLOG_* macro doesn't work due to it's buffer I/O, so this patch directly issue write() syscall to the file descriptor. For some system we stop using monitor daemon and use systemd to monitor ovs-vswitchd, thus need this patch. Example of ovs-vswitchd.log (note that there is no timestamp printed): 2020-03-23T14:42:12.949Z|00049|memory|INFO|175332 kB peak resident 2020-03-23T14:42:12.949Z|00050|memory|INFO|handlers:2 ports:3 reva SIGSEGV detected, backtrace: 0x0000000000486969 <fatal_signal_handler+0x49> 0x00007f7f5e57f4b0 <killpg+0x40> 0x000000000047daa8 <pmd_thread_main+0x238> 0x0000000000504edd <ovsthread_wrapper+0x7d> 0x00007f7f5f0476ba <start_thread+0xca> 0x00007f7f5e65141d <clone+0x6d> 0x0000000000000000 <+0x0> Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
* trivial: Fix typo in comments.William Tu2020-03-231-2/+2
| | | | | | s/daemon_complete/daemonize_complete/ Signed-off-by: William Tu <u9012063@gmail.com>
* trivial: Fix indentation.William Tu2020-03-201-1/+1
| | | | | | | Add extra space to fix indentation. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* ofp-actions: Fix memory leak.William Tu2020-03-201-0/+1
| | | | | | | | | Coverity CID 279274 reports leaking previously allocated 'error' buffer when 'return xasprintf("input too big");'. Cc: Usman Ansari <uansari@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* conntrack: Fix NULL pointer dereference.William Tu2020-03-191-1/+1
| | | | | | | | | Coverity CID 279957 reports NULL pointer derefence when 'conn' is NULL and calling ct_print_conn_info. Cc: Usman Ansari <uansari@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Dumitru Ceara <dceara@redhat.com>
* dpif-netlink: avoid netlink modify flow put op failed after tc modify flow ↵wenxu2020-03-193-2/+10
| | | | | | | | | | | | | | | put op failed. The tc modify flow put always delete the original flow first and then add the new flow. If the modfiy flow put operation failed, the flow put operation will change from modify to create if success to delete the original flow in tc (which will be always failed with ENOENT, the flow is already be deleted before add the new flow in tc). Finally, the modify flow put will failed to add in kernel datapath. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif-netdev: Enter quiescent state after each offloading operation.Ilya Maximets2020-03-151-0/+1
| | | | | | | | | | | | | | | If the offloading queue is big and filled continuously, offloading thread may have no chance to quiesce blocking rcu callbacks and other threads waiting for synchronization. Fix that by entering momentary quiescent state after each operation since we're not holding any rcu-protected memory here. Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread") Reported-by: Eli Britstein <elibr@mellanox.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049768.html Acked-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* pvector: Use acquire-release semantics for size.Yanqin Wei2020-03-152-11/+20
| | | | | | | | | | | | | | | | | | | | Read/write concurrency of pvector library is implemented by a temp vector and RCU protection. Considering performance reason, insertion does not follow this scheme. In insertion function, a thread fence ensures size increment is done after new entry is stored. But there is no barrier in the iteration fuction(pvector_cursor_init). Entry point access may be reordered before loading vector size, so the invalid entry point may be loaded when vector iteration. This patch fixes it by acquire-release pair. It can guarantee new size is observed by reader after new entry stored by writer. And this is implemented by one-way barrier instead of two-way memory fence. Fixes: fe7cfa5c3f19 ("lib/pvector: Non-intrusive RCU priority vector.") Reviewed-by: Gavin Hu <Gavin.Hu@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com> Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tc: Fix nat port range when offloading ct actionPaul Blakey2020-03-131-1/+1
| | | | | | | | | | | Port range struct is currently union so the last min/max port assignment wins, and kernel doesn't receive the range. Change it to struct type. Fixes: 2bf6ffb76ac6 ("netdev-offload-tc: Add conntrack nat support") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto: Add support to watch controller port liveness in fast-failover groupVishal Deep Ajmera2020-03-061-1/+2
| | | | | | | | | | | | | | | | | | | | Currently fast-failover group does not support checking liveness of controller port (OFPP_CONTROLLER). However this feature can be useful for selecting alternate pipeline when controller connection itself is down for e.g. by using local DHCP server to reply for any DHCP request originating from VMs. This patch adds the support for watching controller port liveness in fast- failover group. Controller port is considered live when atleast one of-connection is alive. Example usage: ovs-ofctl add-group br-int 'group_id=1234,type=ff, bucket=watch_port:CONTROLLER,actions:<A>, bucket=watch_port:1,actions:<B> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Remove deprecated ring port type.Ilya Maximets2020-03-061-189/+0
| | | | | | | | | | | | | 'dpdkr' ring ports was deprecated in 2.13 release and was not actually used for a long time. Remove support now. More details in commit b4c5f00c339b ("netdev-dpdk: Deprecate ring ports.") Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Remove deprecated pdump support.Ilya Maximets2020-03-061-12/+0
| | | | | | | | | | | | DPDK pdump was deprecated in 2.13 release and didn't actually work since 2.11. Removing it. More details in commit 4ae8c4617fd3 ("dpdk: Deprecate pdump support.") Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* packets: Fix typo in comment.Ben Pfaff2020-03-051-1/+1
| | | | | | Acked-by: Han Zhou <hzhou@ovn.org> Reported-by: Toms Atteka <tatteka@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: Enable TSO in the TAP device.Flavio Leitner2020-03-021-0/+17
| | | | | | | | | | | Use ioctl TUNSETOFFLOAD if kernel supports to enable TSO offloading in the tap device. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: "Yi Yang (杨�D)-云服务集团" <yangyi01@inspur.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: William Tu <u9012063@gmail.com>
* userspace TSO: SCTP checksum offload optional.Flavio Leitner2020-02-264-3/+28
| | | | | | | | | | | | | Ideally SCTP checksum offload needs be advertised by the NIC when userspace TSO is enabled. However, very few drivers do that and it's not a widely used protocol. So, this patch enables SCTP checksum offload if available, otherwise userspace TSO can still be enabled but SCTP packets will be dropped on NICs without support. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace TSO: Include UDP checksum offload.Flavio Leitner2020-02-264-11/+35
| | | | | | | | | | | Virtio doesn't expose flags to control which protocols checksum offload needs to be enabled or disabled. This patch checks if the NIC supports UDP checksum offload and active it when TSO is enabled. Reported-by: Ilya Maximets <i.maximets@ovn.org> Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dpdk: vhost: disable unsupported offload features.Flavio Leitner2020-02-261-9/+15
| | | | | | | | | Disable ECN and UFO since this is not supported yet. Also, disable all other features when userspace_tso is not enabled. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* docs: Update conntrack established state descriptionYi-Hung Wei2020-02-251-2/+2
| | | | | | | | | | | | Patch a867c010ee91 ("conntrack: Fix conntrack new state") fixes the userspace conntrack behavior. This patch updates the corresponding conntrack state description. Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Reported-by: Roni Bar Yanai <roniba@mellanox.com> Acked-by: Roni Bar Yanai <roniba@mellanox.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* conntrack: Fix TCP conntrack stateYi-Hung Wei2020-02-181-1/+1
| | | | | | | | | | | | | | If a TCP connection is in SYN_SENT state, receiving another SYN packet would just renew the timeout of that conntrack entry rather than create a new one. Thus, tcp_conn_update() should return CT_UPDATE_VALID_NEW. This also fixes regressions of a couple of OVN system tests. Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Reported-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Tested-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: William Tu <u9012063@gmail.com>
* dp-packet: prefetch the next packet when cloning a batch.Flavio Leitner2020-02-101-0/+4
| | | | | | | | | | | | There is a cache miss when accessing mbuf->data_off while cloning a batch and using prefetch improved the throughput by ~2.3%. Before: 13709416.30 pps After: 14031475.80 pps Fixes: d48771848560 ("dp-packet: preserve headroom when cloning a pkt batch") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Don't enable offloading on HW device if not requested.Ilya Maximets2020-02-071-6/+9
| | | | | | | | | | | | | DPDK drivers has different implementations of transmit functions. Enabled offloading may cause driver to choose slower variant significantly affecting performance if userspace TSO wasn't requested. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: David Marchand <david.marchand@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-linux: Prepend the std packet in the TSO packetFlavio Leitner2020-02-064-52/+78
| | | | | | | | | | | | | | Usually TSO packets are close to 50k, 60k bytes long, so to to copy less bytes when receiving a packet from the kernel change the approach. Instead of extending the MTU sized packet received and append with remaining TSO data from the TSO buffer, allocate a TSO packet with enough headroom to prepend the std packet data. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux-private: fix max length to be 16 bitsFlavio Leitner2020-02-061-1/+2
| | | | | | | | | The dp_packet length is limited to 16 bits, so document that and fix the length value accordingly. Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Fix port init when lacking Tx offloads for TSO.David Marchand2020-02-051-1/+1
| | | | | | | | | | | | | | | | | | The check on TSO capability did not ensure ip checksum, tcp checksum and TSO tx offloads were available which resulted in a port init failure (example below with a ena device): *2020-02-04T17:42:52.976Z|00084|dpdk|ERR|Ethdev port_id=0 requested Tx offloads 0x2a doesn't match Tx offloads capabilities 0xe in rte_eth_dev_configure()* Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") Reported-by: Ravi Kerur <rkerur@gmail.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* flow: Fix parsing l3_ofs with partial offloadingEli Britstein2020-01-311-2/+1
| | | | | | | | | | | l3_ofs should be set all Ethernet packets, not just IPv4/IPv6 ones. For example for ARP over VLAN tagged packets, it may cause wrong processing like in changing the VLAN ID action. Fix it. Fixes: aab96ec4d81e ("dpif-netdev: retrieve flow directly from the flow mark") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* conntrack: Fix conntrack new stateYi-Hung Wei2020-01-294-6/+17
| | | | | | | | | | | | | | In connection tracking system, a connection is established if we see packets from both directions. However, in userspace datapath's conntrack, if we send a connection setup packet in one direction twice, it will make the connection to be in established state. This patch fixes the aforementioned issue, and adds a system traffic test for UDP and TCP traffic to avoid regression. Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* dpif: Fix dp_extra_info leak by reworking the allocation scheme.Ilya Maximets2020-01-274-22/+21
| | | | | | | | | | | | | | | | | | | | | | | | | dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't fit the dump filter while executing dpctl/dump-flows and also while executing dpctl/get-flow. This is already a 3rd attempt to fix all the leaks and incorrect usage of this string that definitely indicates poor initial design of the feature. Flow dump/get documentation clearly states that the caller does not own the data provided in dpif_flow. Datapath still owns all the data and promises to not free/modify it until the next quiescent period, however we're requesting the caller to free 'dp_extra_info' and this obviously breaks the rules. This patch fixes the issue by by storing 'dp_extra_info' within 'struct dp_netdev_flow' making datapath to own it. 'dp_netdev_flow' is RCU-protected, so it will be valid until the next quiescent period. Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Tested-by: Emma Finn <emma.finn@intel.com> Acked-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* lib/stream-windows.c: Grant Access Privilege of Named Pipe to CreatorNing Wu2020-01-241-1/+32
| | | | | | | | | | | | | | Current implementation of ovs on windows only allows LocalSystem and Administrators to access the named pipe created with API of ovs. Thus any service that needs to invoke the API to create named pipe has to run as System account to interactive with ovs. It causes the system more vulnerable if one of those services was break into. The patch adds the creator owner account to allowed ACLs. Signed-off-by: Ning Wu <nwu@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Anand Kumar <kumaranand@vmware.com> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
* tc: handle packet mark of zeroJohn Hurley2020-01-221-0/+5
| | | | | | | | | | | | | | | | | | | | Openstack may set an skb mark of 0 in tunnel rules. This is considered to be an unused/unset value. However, it prevents the rule from being offloaded. Check if the key value of the skb mark is 0 when it is in use (mask is set to all ones). If it is then ignore the field and continue with TC offload. Only the exact-match case is covered by this patch as it addresses the Openstack use-case and seems most robust against feature evolution: f.e. in future there may exist hardware offload scenarios where an operation, such as a BPF offload, sets the SKB mark before proceeding tho the in-HW OVS. datapath. Signed-off-by: John Hurley <john.hurley@netronome.com> Co-Authored-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Aaron Conole <aconole@redhat.com>
* dpif: Fix leak and usage of uninitialized dp_extra_info.Ilya Maximets2020-01-205-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'dpif_probe_feature'/'revalidate' doesn't free the 'dp_extra_info' string. Also, all the implementations of dpif_flow_get() should initialize the value to avoid printing/freeing of random memory. 30 bytes in 1 blocks are definitely lost in loss record 323 of 889 at 0x483AD19: realloc (vg_replace_malloc.c:836) by 0xDDAD89: xrealloc (util.c:149) by 0xCE1609: ds_reserve (dynamic-string.c:63) by 0xCE1A90: ds_put_format_valist (dynamic-string.c:161) by 0xCE19B9: ds_put_format (dynamic-string.c:142) by 0xCCCEA9: dp_netdev_flow_to_dpif_flow (dpif-netdev.c:3170) by 0xCCD2DD: dpif_netdev_flow_get (dpif-netdev.c:3278) by 0xCCEA0A: dpif_netdev_operate (dpif-netdev.c:3868) by 0xCDF81B: dpif_operate (dpif.c:1361) by 0xCDEE93: dpif_flow_get (dpif.c:1002) by 0xCDECF9: dpif_probe_feature (dpif.c:962) by 0xC635D2: check_recirc (ofproto-dpif.c:896) by 0xC65C02: check_support (ofproto-dpif.c:1567) by 0xC63274: open_dpif_backer (ofproto-dpif.c:818) by 0xC65E3E: construct (ofproto-dpif.c:1605) by 0xC4D436: ofproto_create (ofproto.c:549) by 0xC3931A: bridge_reconfigure (bridge.c:877) by 0xC3FEAC: bridge_run (bridge.c:3324) by 0xC4551D: main (ovs-vswitchd.c:127) CC: Emma Finn <emma.finn@intel.com> Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev-afxdp: NUMA-aware memory allocation for XSK related memory.Yi-Hung Wei2020-01-181-0/+32
| | | | | | | | | | | | | | | | Currently, the AF_XDP socket (XSK) related memory are allocated by main thread in the main thread's NUMA domain. With the patch that detects netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on the AF_XDP netdev's NUMA domain. If the net device's NUMA domain is different from the main thread's NUMA domain, we will have two cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU). This patch addresses the aforementioned issue by allocating the memory in the net device's NUMA domain. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-linux: Detect numa node id.William Tu2020-01-184-15/+72
| | | | | | | | | | | | The patch detects the numa node id from the name of the netdev, by reading the '/sys/class/net/<devname>/device/numa_node'. If not available, ex: virtual device, or any error happens, return numa id 0. Currently only the afxdp netdev type uses it, other linux netdev types are disabled due to no use case. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add TCP Segmentation Offload supportFlavio Leitner2020-01-1711-124/+1017
| | | | | | | | | | | | | | | | | | | | | | | | | | | Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* vhost: Disable multi-segmented buffersFlavio Leitner2020-01-171-0/+6
| | | | | | | | | | There is no support for multi-segmented buffers, so flag that to vhost library. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dp-packet: preserve headroom when cloning a pkt batchFlavio Leitner2020-01-171-1/+5
| | | | | | | | | | The headroom is useful if the packet needs to insert additional header, so preserve the original headroom when cloning the batch. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Modified ovs-appctl dpctl/dump-flows commandEmma Finn2020-01-173-0/+20
| | | | | | | | | | Modified ovs-appctl dpctl/dump-flows command to output the miniflow bits for a given flow when -m option is passed. $ ovs-appctl dpctl/dump-flows -m Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-offload-dpdk: Support offload of set TCP/UDP ports actions.Eli Britstein2020-01-161-0/+43
| | | | | | Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-dpdk: Support offload of set IPv4 actions.Eli Britstein2020-01-161-0/+41
| | | | | | Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-dpdk: Support offload of set MAC actions.Eli Britstein2020-01-161-0/+99
| | | | | | Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-dpdk: Support offload of drop action.Eli Britstein2020-01-161-0/+4
| | | | | | Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-dpdk: Support offload of output action.Eli Britstein2020-01-161-4/+82
| | | | | | | | | Support offload of output action, also configuring count action for allowing query statistics of HW offloaded flows. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload: Introduce a function to validate same flow api handle.Eli Britstein2020-01-162-0/+13
| | | | | Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dpdk: Getter function for dpdk port id API.Eli Britstein2020-01-162-0/+20
| | | | | | | | | Add a getter function for using the dpdk port id outside the scope of netdev-dpdk.c to be used for HW offload. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Populate dpif class field in offload struct.Eli Britstein2020-01-161-0/+2
| | | | | | | | | Populate dpif class field in offload struct to be used in offloading flow put. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Update offloaded flows statistics.Ophir Munk2020-01-161-13/+68
| | | | | | | | | | | | In case a flow is HW offloaded, packets do not reach the SW, thus not counted for statistics. Use netdev flow get API in order to update the statistics of flows by the HW statistics. Co-authored-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Support dump-flows filters "dpdk" and "partially-offloaded".Eli Britstein2020-01-162-3/+27
| | | | | | | | | | | Flows that are offloaded via DPDK can be partially offloaded (matches only) or fully offloaded (matches and actions). Set partially offloaded display to (offloaded=partial, dp:ovs), and fully offloaded to (offloaded=yes, dp:dpdk). Also support filter types "dpdk" and "partially-offloaded". Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>