summaryrefslogtreecommitdiff
path: root/lib/dpif.h
Commit message (Collapse)AuthorAgeFilesLines
* ofproto-dpif-upcall: Reset ukey's last stats value if the datapath changed.Eelco Chaudron2023-03-031-0/+1
| | | | | | | | | | | | | When the ukey's action set changes, it could cause the flow to use a different datapath, for example, when it moves from tc to kernel. This will cause the the cached previous datapath statistics to be used. This change will reset the cached statistics when a change in datapath is discovered. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Add function to read hardware offload statistics.Gaetan Rivet2022-01-181-0/+9
| | | | | | | | | | | | | | | | | Expose a function to query datapath offload statistics. This function is separate from the current one in netdev-offload as it exposes more detailed statistics from the datapath, instead of only from the netdev-offload provider. Each datapath is meant to use the custom counters as it sees fit for its handling of hardware offloads. Call the new API from dpctl. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: dpif: Allow viewing and configuring dp cache sizes.Eelco Chaudron2021-11-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a general way of viewing/configuring datapath cache sizes. With an implementation for the netlink interface. The ovs-dpctl/ovs-appctl show commands will display the current cache sizes configured: $ ovs-dpctl show system@ovs-system: lookups: hit:25 missed:63 lost:0 flows: 0 masks: hit:282 total:0 hit/pkt:3.20 cache: hit:4 hit-rate:4.54% caches: masks-cache: size:256 port 0: ovs-system (internal) port 1: br-int (internal) port 2: genev_sys_6081 (geneve: packet_type=ptap) port 3: br-ex (internal) port 4: eth2 port 5: sw0p1 (internal) port 6: sw0p3 (internal) A specific cache can be configured as follows: $ ovs-appctl dpctl/cache-set-size DP CACHE SIZE $ ovs-dpctl cache-set-size DP CACHE SIZE For example to disable the cache do: $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0 Setting cache size successful, new size 0. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: dpif: Add kernel datapath cache hit output.Eelco Chaudron2021-11-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | This patch adds cache usage statistics to the output: $ ovs-dpctl show system@ovs-system: lookups: hit:24 missed:71 lost:0 flows: 0 masks: hit:334 total:0 hit/pkt:3.52 cache: hit:4 hit-rate:4.21% port 0: ovs-system (internal) port 1: genev_sys_6081 (geneve: packet_type=ptap) port 2: br-int (internal) port 3: br-ex (internal) port 4: eth2 port 5: sw1p1 (internal) port 6: sw0p4 (internal) Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Introduce per-cpu upcall dispatch.Mark Gray2021-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is correspondingly a large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. Reported-at: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif: Fix use of uninitialized execute hash.Ilya Maximets2021-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'dpif_execute_helper_cb' doesn't initilalize the 'hash' field that may be passed down to datapath and might cause execution of a different set of actions, e.g. on recirculation. Thread 6 handler27: Conditional jump or move depends on uninitialised value(s) at 0x53A2C2: dpif_netlink_encode_execute (dpif-netlink.c:1841) by 0x53A2C2: dpif_netlink_operate__ (dpif-netlink.c:1919) by 0x53A82D: dpif_netlink_operate_chunks (dpif-netlink.c:2238) by 0x53A82D: dpif_netlink_operate (dpif-netlink.c:2297) by 0x48135F: dpif_operate (dpif.c:1366) by 0x481923: dpif_execute.part.24 (dpif.c:1320) by 0x481C46: dpif_execute (dpif.c:1312) by 0x481C46: dpif_execute_helper_cb (dpif.c:1243) by 0x4AE943: odp_execute_actions (odp-execute.c:865) by 0x47F272: dpif_execute_with_help (dpif.c:1296) by 0x4812FF: dpif_operate (dpif.c:1422) by 0x442226: handle_upcalls (ofproto-dpif-upcall.c:1617) by 0x442226: recv_upcalls.isra.36 (ofproto-dpif-upcall.c:855) by 0x442351: udpif_upcall_handler (ofproto-dpif-upcall.c:755) by 0x4FDE2C: ovsthread_wrapper (ovs-thread.c:383) by 0x5E19159: start_thread (in /usr/lib64/libpthread-2.28.so) by 0x69ECF72: clone (in /usr/lib64/libc-2.28.so) Uninitialised value was created by a stack allocation at 0x481966: dpif_execute_helper_cb (dpif.c:1159) Additionally added a missing comment to the 'struct dpif_execute'. Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.") Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Fix issues of the offloaded flows counter.Jianbo Liu2020-12-211-1/+2
| | | | | | | | | | | | | | | | | | The n_offloaded_flows counter is saved in dpif, and this is the first one when ofproto is created. When flow operation is done by ovs-appctl commands, such as, dpctl/add-flow, a new dpif is opened, and the n_offloaded_flows in it can't be used. So, instead of using counter, the number of offloaded flows is queried from each netdev, then sum them up. To achieve this, a new API is added in netdev_flow_api to get how many flows assigned to a netdev. In order to get better performance, this number is calculated directly from tc_to_ufid hmap for netdev-offload-tc, because flow dumping by tc takes much time if there are many flows offloaded. Fixes: af0618470507 ("dpif-netlink: Count the number of offloaded rules") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Count the number of offloaded rulesJianbo Liu2020-12-071-0/+1
| | | | | | | | | Add a counter for the offloaded rules, and display it in the command of "ovs-appctl upcall/show". Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* Eliminate use of term "slave" in bond, LACP, and bundle contexts.Ben Pfaff2020-10-211-1/+1
| | | | | | | | | | | | | The new term is "member". Most of these changes should not change user-visible behavior. One place where they do is in "ovs-ofctl dump-flows", which will now output "members:..." inside "bundle" actions instead of "slaves:...". I don't expect this to cause real problems in most systems. The old syntax is still supported on input for backward compatibility. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
* Eliminate "whitelist" and "blacklist" terms.Ben Pfaff2020-10-161-1/+1
| | | | | | | | There is one remaining use under datapath. That change should happen upstream in Linux first according to our usual policy. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
* userspace: Avoid dp_hash recirculation for balance-tcp bond mode.Vishal Deep Ajmera2020-06-221-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used *only* for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif: Fix dp_extra_info leak by reworking the allocation scheme.Ilya Maximets2020-01-271-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't fit the dump filter while executing dpctl/dump-flows and also while executing dpctl/get-flow. This is already a 3rd attempt to fix all the leaks and incorrect usage of this string that definitely indicates poor initial design of the feature. Flow dump/get documentation clearly states that the caller does not own the data provided in dpif_flow. Datapath still owns all the data and promises to not free/modify it until the next quiescent period, however we're requesting the caller to free 'dp_extra_info' and this obviously breaks the rules. This patch fixes the issue by by storing 'dp_extra_info' within 'struct dp_netdev_flow' making datapath to own it. 'dp_netdev_flow' is RCU-protected, so it will be valid until the next quiescent period. Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Tested-by: Emma Finn <emma.finn@intel.com> Acked-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif: Fix leak and usage of uninitialized dp_extra_info.Ilya Maximets2020-01-201-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'dpif_probe_feature'/'revalidate' doesn't free the 'dp_extra_info' string. Also, all the implementations of dpif_flow_get() should initialize the value to avoid printing/freeing of random memory. 30 bytes in 1 blocks are definitely lost in loss record 323 of 889 at 0x483AD19: realloc (vg_replace_malloc.c:836) by 0xDDAD89: xrealloc (util.c:149) by 0xCE1609: ds_reserve (dynamic-string.c:63) by 0xCE1A90: ds_put_format_valist (dynamic-string.c:161) by 0xCE19B9: ds_put_format (dynamic-string.c:142) by 0xCCCEA9: dp_netdev_flow_to_dpif_flow (dpif-netdev.c:3170) by 0xCCD2DD: dpif_netdev_flow_get (dpif-netdev.c:3278) by 0xCCEA0A: dpif_netdev_operate (dpif-netdev.c:3868) by 0xCDF81B: dpif_operate (dpif.c:1361) by 0xCDEE93: dpif_flow_get (dpif.c:1002) by 0xCDECF9: dpif_probe_feature (dpif.c:962) by 0xC635D2: check_recirc (ofproto-dpif.c:896) by 0xC65C02: check_support (ofproto-dpif.c:1567) by 0xC63274: open_dpif_backer (ofproto-dpif.c:818) by 0xC65E3E: construct (ofproto-dpif.c:1605) by 0xC4D436: ofproto_create (ofproto.c:549) by 0xC3931A: bridge_reconfigure (bridge.c:877) by 0xC3FEAC: bridge_run (bridge.c:3324) by 0xC4551D: main (ovs-vswitchd.c:127) CC: Emma Finn <emma.finn@intel.com> Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* dpif-netdev: Modified ovs-appctl dpctl/dump-flows commandEmma Finn2020-01-171-0/+1
| | | | | | | | | | Modified ovs-appctl dpctl/dump-flows command to output the miniflow bits for a given flow when -m option is passed. $ ovs-appctl dpctl/dump-flows -m Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.Ilya Maximets2020-01-081-3/+1
| | | | | | | | | | | | | | | | | | | | Current implementation of dpif_flow_hash() doesn't depend on datapath interface and only complicates the callers by forcing them to figure out what is their current 'dpif'. If we'll need different hashing for different 'dpif's we'll implement an API for dpif-providers and each dpif implementation will be able to use their local function directly without calling it via dpif API. This change will allow us to not store 'dpif' pointer in the userspace datapath implementation which is broken and will be removed in next commits. This patch moves dpif_flow_hash() to odp-util module and replaces unused odp_flow_key_hash() by it, along with removing of unused 'dpif' argument. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* userspace: Improved packet drop statistics.Anju Thomas2020-01-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently OVS maintains explicit packet drop/error counters only on port level. Packets that are dropped as part of normal OpenFlow processing are counted in flow stats of “drop” flows or as table misses in table stats. These can only be interpreted by controllers that know the semantics of the configured OpenFlow pipeline. Without that knowledge, it is impossible for an OVS user to obtain e.g. the total number of packets dropped due to OpenFlow rules. Furthermore, there are numerous other reasons for which packets can be dropped by OVS slow path that are not related to the OpenFlow pipeline. The generated datapath flow entries include a drop action to avoid further expensive upcalls to the slow path, but subsequent packets dropped by the datapath are not accounted anywhere. Finally, the datapath itself drops packets in certain error situations. Also, these drops are today not accounted for.This makes it difficult for OVS users to monitor packet drop in an OVS instance and to alert a management system in case of a unexpected increase of such drops. Also OVS trouble-shooters face difficulties in analysing packet drops. With this patch we implement following changes to address the issues mentioned above. 1. Identify and account all the silent packet drop scenarios 2. Display these drops in ovs-appctl coverage/show Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Keshav Gupta <keshugupta1@gmail.com> Signed-off-by: Anju Thomas <anju.thomas@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Keshav Gupta <keshugupta1@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif: Add support to set user featuresPaul Blakey2019-12-221-0/+2
| | | | | | | | | | | | This enables user features on the kernel datapath via the DP_CMD_SET command, and also retrieves them to check for actual support and not just an older kernel ignoring the requested features. This will be used in next patch to enable recirc_id sharing with tc. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* Add offload packets statisticszhaozhanxu2019-12-061-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add argument '--offload-stats' for command ovs-appctl bridge/dump-flows to display the offloaded packets statistics. The commands display as below: orignal command: ovs-appctl bridge/dump-flows br0 duration=574s, n_packets=1152, n_bytes=110768, priority=0,actions=NORMAL table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop new command with argument '--offload-stats' Notice: 'n_offload_packets' are a subset of n_packets and 'n_offload_bytes' are a subset of n_bytes. ovs-appctl bridge/dump-flows --offload-stats br0 duration=582s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, n_offload_bytes=107992, priority=0,actions=NORMAL table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x3,actions=drop Signed-off-by: zhaozhanxu <zhaozhanxu@163.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* ofproto-dpif-upcall: Echo HASH attribute back to datapath.Tonghao Zhang2019-11-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel datapath may sent upcall with hash info, ovs-vswitchd should get it from upcall and then send it back. The reason is that: | When using the kernel datapath, the upcall don't | include skb hash info relatived. That will introduce | some problem, because the hash of skb is important | in kernel stack. For example, VXLAN module uses | it to select UDP src port. The tx queue selection | may also use the hash in stack. | | Hash is computed in different ways. Hash is random | for a TCP socket, and hash may be computed in hardware, | or software stack. Recalculation hash is not easy. | | There will be one upcall, without information of skb | hash, to ovs-vswitchd, for the first packet of a TCP | session. The rest packets will be processed in Open vSwitch | modules, hash kept. If this tcp session is forward to | VXLAN module, then the UDP src port of first tcp packet | is different from rest packets. | | TCP packets may come from the host or dockers, to Open vSwitch. | To fix it, we store the hash info to upcall, and restore hash | when packets sent back. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=bd1903b7c4596ba6f7677d0dfefd05ba5876707d Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Always cleanup userspace datapath.Ilya Maximets2019-07-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'netdev' datapath is implemented within ovs-vswitchd process and can not exist without it, so it should be gracefully terminated with a full cleanup of resources upon ovs-vswitchd exit. This change forces dpif cleanup for 'netdev' datapath regardless of passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to not pass this additional option everytime for userspace datapath installations and also allowes to not terminate system datapath in setups where both datapaths runs at the same time. The main part is that dpif_port_del() will lead to netdev_close() and subsequent netdev_class->destroy(dev) which will stop HW NICs and free their resources. For vhost-user interfaces it will invoke vhost driver unregistering with a properly closed vhost-user connection. For upcoming AF_XDP netdev this will allow to gracefully destroy xdp sockets and unload xdp programs from linux interfaces. Another important thing is that port deletion will also trigger flushing of flows offloaded to HW NICs. Exception made for 'internal' ports that could have user ip/route configuration. These ports will not be removed without '--cleanup'. This change fixes OVS disappearing from the DPDK point of view (keeping HW NICs improperly configured, sudden closing of vhost-user connections) and will help with linux devices clearing with upcoming AF_XDP netdev support. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ben Pfaff <blp@ovn.org>
* Remove ESX references.Justin Pettit2019-06-011-3/+3
| | | | | Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* dpif: Restore a few lines with form feed charactersSriharsha Basavapatna2018-10-311-1/+1
| | | | | | | | | | A few lines with form feed characters (ASCII: ^L) were accidentally deleted by a recent commit to support rebalancing of offloaded flows. This patch reverts those lines. Fixes: 57924fc91c ("revalidator: Rebalance offloaded flows") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* revalidator: Rebalance offloaded flows based on the pps rateSriharsha Basavapatna via dev2018-10-191-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Remove support for multiple queues per port.Ben Pfaff2018-09-261-14/+1
| | | | | | | | | | | | Commit 69c51582ff78 ("dpif-netlink: don't allocate per thread netlink sockets") removed dpif-netlink support for multiple queues per port. No remaining dpif provider supports multiple queues per port, so remove infrastructure for the feature. CC: Matteo Croce <mcroce@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* dpctl: Expand the flow dump type filterGavi Teitz2018-09-131-1/+6
| | | | | | | | | | | | | | | | | Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Don't pass in '*meter_id' to meter_set commands.Justin Pettit2018-08-161-1/+1
| | | | | | | | | | | | The original intent of the API appears to be that the underlying DPIF implementaion would choose a local meter id. However, neither of the existing datapath meter implementations (userspace or Linux) implemented that; they expected a valid meter id to be passed in, otherwise they returned an error. This commit follows the existing implementations and makes the API somewhat cleaner. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* Revert "dpctl: Expand the flow dump type filter"Justin Pettit2018-07-251-6/+1
| | | | | | | | | | | | | | | Commit ab15e70eb587 ("dpctl: Expand the flow dump type filter") had a number of issues with style, build breakage, and failing unit tests. The patch is being reverted so that they can addressed. This reverts commit ab15e70eb5878b46f8f84da940ffc915b6d74cad. CC: Gavi Teitz <gavi@mellanox.com> CC: Simon Horman <simon.horman@netronome.com> CC: Roi Dayan <roid@mellanox.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* dpctl: Expand the flow dump type filterGavi Teitz2018-07-251-1/+6
| | | | | | | | | | | | | | | | | Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpctl: Properly reflect a rule's offloaded to HW stateGavi Teitz2018-06-181-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, any rule that is offloaded via a netdev, not necessarily to the HW, would be reported as "offloaded". This patch fixes this misalignment, and introduces the 'dp' state, as follows: rule is in HW via TC offload -> offloaded=yes dp:tc rule is in not HW over TC DP -> offloaded=no dp:tc rule is in not HW over OVS DP -> offloaded=no dp:ovs To achieve this, the flows's 'offloaded' flag was encapsulated in a new attrs struct, which contains the offloaded state of the flow and the DP layer the flow is handled in, and instead of setting the flow's 'offloaded' state based solely on the type of dump it was acquired via, for netdev flows it now sends the new attrs struct to be collected along with the rest of the flow via the netdev, allowing it to be set per flow. For TC offloads, the offloaded state is set based on the 'in_hw' and 'not_in_hw' flags received from the TC as part of the flower. If no such flag was received, due to lack of kernel support, it defaults to true. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> [simon: resolved conflict in lib/dpctl.man] Signed-off-by: Simon Horman <simon.horman@netronome.com>
* Embrace anonymous unions.Ben Pfaff2018-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several OVS structs contain embedded named unions, like this: struct { ... union { ... } u; }; C11 standardized a feature that many compilers already implemented anyway, where an embedded union may be unnamed, like this: struct { ... union { ... }; }; This is more convenient because it allows the programmer to omit "u." in many places. OVS already used this feature in several places. This commit embraces it in several others. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Tested-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
* ofp-util, ofp-parse: Break up into many separate modules.Ben Pfaff2018-02-131-1/+1
| | | | | | | | | | | | ofp-util had been far too large and monolithic for a long time. This commit breaks it up into units that make some logical sense. It also moves the pieces of ofp-parse that were specific to each unit into the relevant unit. Most of this commit is just moving code around. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* dpif: Use spaces instead of tabs.Justin Pettit2017-12-211-3/+3
| | | | | Signed-off-by: Justin Pettit <jpettit@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* netdev, dpif: fix the crash/assert on port deleteAshish Varma2017-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | a crash is seen in "netdev_ports_remove" when an interface is deleted and added back in the system and when the interface is part of a bridge configuration. e.g. steps: create a tap0 interface using "ip tuntap add.." add the tap0 interface to br0 using "ovs-vsctl add-port.." delete the tap0 interface from system using "ip tuntap del.." add the tap0 interface back in system using "ip tuntap add.." (this changes the ifindex of the interface) delete tap0 from br0 using "ovs-vsctl del-port.." In the function "netdev_ports_insert", two hmap entries were created for mapping "portnum -> netdev" and "ifindex -> portnum". When the interface is deleted from the system, the "netdev_ports_remove" function is not getting called and the old ifindex entry is not getting cleaned up from the "ifindex_to_port" hmap. As part of the fix, added function "dpif_port_remove" which will call "netdev_ports_remove" in the path where the interface deletion from the system is detected. Also, in "netdev_ports_remove", added the code where the "ifindex_to_port_data" (ifindex -> portnum map node) is getting freed when the ifindex is not available any more. (as the interface is already deleted.) VMware-BZ: #1975788 Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif: Refactor obj type from void pointer to dpif_classRoi Dayan2017-07-271-2/+0
| | | | | | | | | It's basically what is being passed today and passing a specific type adds a compiler type check. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Refactor flow logging functions to be used by other modulesRoi Dayan2017-06-151-0/+28
| | | | | | | | | To be reused by other modules. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpctl: Indicate if flow is offloaded when dumping flows of all typesPaul Blakey2017-06-151-0/+1
| | | | | | | | | | When verbosity is requested on dump-flows (-m) indicate which flows are offloaded. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpctl: Add an option to dump only certain kinds of flowsPaul Blakey2017-06-151-1/+2
| | | | | | | | | | | | | | | | | Usage: # to dump all datapath flows (default): ovs-dpctl dump-flows # to dump only flows that in kernel datapath: ovs-dpctl dump-flows type=ovs # to dump only flows that are offloaded: ovs-dpctl dump-flows type=offloaded Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Save added ports in a port map for netdev flow api usePaul Blakey2017-06-151-0/+2
| | | | | | | | | | | To use netdev flow offloading api, dpifs needs to iterate over added ports. This addition inserts the added dpif ports in a hash map, The map will also be used to translate dpif ports to netdevs. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Refactor dpif_probe_feature()Andy Zhou2017-03-101-1/+2
| | | | | | | | Allow actions to be part of the probe. No functional changes. Future patch will make use this new API. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* dpif: Meter framework.Jarno Rajahalme2017-03-081-1/+12
| | | | | | | | Add DPIF-level infrastructure for meters. Allow meter_set to modify the meter configuration (e.g. set the burst size if unspecified). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>
* dpif-netdev: Pass Openvswitch other_config smap to dpif.Daniele Di Proietto2017-02-031-1/+1
| | | | | | | | | | | | | | | | | Currently we parse the 'other_config' column in Openvswitch table in bridge.c. We extract the values (just 'pmd-cpu-mask' for now) and we pass them down to the datapath, via different layers. If we want to pass other values to dpif-netdev.c (like we recently discussed) we would have to touch ofproto.c, ofproto-dpif.c and dpif.c. This patch sends the entire other_config column to dpif-netdev, so that dpif-netdev can extract the values it's interested in. No functional change. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* dpctl: Avoid making assumptions on pmd threads.Daniele Di Proietto2017-01-151-3/+9
| | | | | | | | | | | | | | | | | | | Currently dpctl depends on ovs-numa module to delete and create flows on different pmd threads for pmd devices. The next commits will move away the pmd threads state from ovs-numa to dpif-netdev, so the ovs-numa interface will not be supported. Also, the assignment between ports and thread is an implementation detail of dpif-netdev, dpctl shouldn't know anything about it. This commit changes the dpif_flow_put() and dpif_flow_del() calls to iterate over all the pmd threads, if pmd_id is PMD_ID_NULL. A simple test is added. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
* doc: Populate 'topics' sectionStephen Finucane2016-12-121-3/+2
| | | | | | | | | | | There are many docs that don't need to kept at the top level, along with many more hidden in random folders. Move them all. This also allows us to add the '-W' flag to Sphinx, ensuring unindexed docs result in build failures. Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ben Pfaff <blp@ovn.org>
* doc: Convert datapath/README to rSTStephen Finucane2016-11-031-1/+1
| | | | | Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Russell Bryant <russell@ovn.org>
* dpif: Reorder elements in dpif_upcall structure.Bhanuprakash Bodireddy2016-10-171-1/+4
| | | | | | | | | | | | | | | | By reordering the data elements in dpif_upcall structure, pad bytes can be reduced and also a cache line. Also dp_packet should be the first member of the structure because rte_mbuf, the first member of dp_packet should be aligned atleast on a 64-byte boundary. Before: structure size:768, holes:1, sum padbytes:60, cachelines:12 After: structure size:704, holes:1, sum padbytes:4, cachelines:11 Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: New module with some code from netdev-dpdk.Daniele Di Proietto2016-10-121-0/+2
| | | | | | | | | | | | | | | | | | | | There's a lot of code in netdev-dpdk which is not at all related to the netdev interface, mostly the library initialization code. This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'. Also a new module 'dpdk-stub' is introduced to implement some functions when DPDK is not available. This replaces the old 'netdev-nodpdk' module. Some redundant includes are removed or reorganized as a consequence. No functional change. CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>
* bridge: Pass interface's configuration to datapath.Ilya Maximets2016-07-271-0/+1
| | | | | | | | | | | | This commit adds functionality to pass value of 'other_config' column of 'Interface' table to datapath. This may be used to pass not directly connected with netdev options and configure behaviour of the datapath for different ports. For example: pinning of rx queues to polling threads in dpif-netdev. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* ofp-actions: Add truncate action.William Tu2016-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* dpif: Pass flow parameter to dpif_execute().Daniele Di Proietto2016-05-201-0/+1
| | | | | | | | | | | | | | | | All the callers of the function already have a copy of the extracted flow in their stack (or a few frames before). This is useful for different resons: * It forces the callers to also call flow_extract() on the packet, which is necessary to initialize the l2,l3,l4 pointers. * It will be used in the userspace datapath to generate the RSS hash by a following commit * It can be used by the userspace connection tracker to avoid extracting the l3 type again. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* dpif-netdev: Allow different numbers of rx queues for different ports.Ilya Maximets2016-02-041-2/+1
| | | | | | | | | | | | | | | | | | | Currently, all of the PMD netdevs can only have the same number of rx queues, which is specified in other_config:n-dpdk-rxqs. Fix that by introducing of new option for PMD interfaces: 'n_rxq', which specifies the maximum number of rx queues to be created for this interface. Example: ovs-vsctl set Interface dpdk0 options:n_rxq=8 Old 'other_config:n-dpdk-rxqs' deleted. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>