summaryrefslogtreecommitdiff
path: root/ofproto/ofproto-dpif.c
Commit message (Collapse)AuthorAgeFilesLines
* ofproto: Fix re-creation of tunnel backing interfaces on restart.Ilya Maximets2023-02-271-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tunnel OpenFlow ports do not exist in the datapath, instead there is a tunnel backing interface that serves all the tunnels of the same type. For example, if the geneve port 'my_tunnel' is added to OVS, it will create 'geneve_sys_6041' datapath port, if it doesn't already exist, and use this port as a tunnel output. However, while creating/opening a new datapath after re-start, ovs-vswitchd only has a list of names of OpenFlow interfaces. And it thinks that each datapath port, that is not on the list, is a stale port that needs to be removed. This is obviously not correct for tunnel backing interfaces that can serve multiple tunnel ports and do not match OpenFlow port names. This is causing removal and re-creation of all the tunnel backing interfaces in the datapath on OVS restart, causing disruption in existing connections. It's hard to tell by only having a name of the interface if this interface is a tunnel backing interface, or someone just named a normal interface this way. So, instead of trying to determine that, not removing any interfaces at all, while we don't know types of actual ports we need. Assuming that all the ports that are currently not in the list of OF ports are tunnel backing ports. Later, revalidation of tunnel backing ports in type_run() will determine which ports are still needed and which should be removed. It's OK to add even a non-tunnel stale ports into tnl_backers, they will be cleaned up the same way as stale tunnel backers. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-February/052215.html Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* openflow: Add extension to flush CT by generic match.Ales Musil2023-01-161-2/+7
| | | | | | | | | | | Add extension that allows to flush connections from CT by specifying fields that the connections should be matched against. This allows to match only some fields of the connection e.g. source address for orig direction. Reported-at: https://bugzilla.redhat.com/2120546 Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Avoid unneccesary backer revalidation.lic1212022-06-281-2/+5
| | | | | | | | | | | | | If lldp didn't change, we are not supposed to trigger backer revalidation. Without this patch, bridge_reconfigure() always trigger udpif revalidator because of lldp. Signed-off-by: lic121 <lic121@chinatelecom.cn> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Co-authored-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ipfix: Trigger revalidation if ipfix options changes.lic1212022-06-281-4/+3
| | | | | | | | | | | ipfix cfg creation/deleting triggers revalidation. But this does not cover the case where ipfix options changes, which also suppose to trigger revalidation. Fixes: a9f5ee1199e1 ("ofproto-dpif: Trigger revalidation when ipfix config set.") Signed-off-by: lic121 <lic121@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Fix meter use-after-free.Peng He2022-05-301-0/+2
| | | | | | | | | | | | | Add a rcu_barrier before close_dpif_backer to ensure that all meters have been freed before id_pool_destory meter's id-pool. Signed-off-by: Peng He <hepeng.0320@bytedance.com> Tested-by: David Marchand <david.marchand@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Trigger revalidation if ct tp changes.lic1212022-05-251-0/+3
| | | | | | | | | | Once ct_zone timeout policy changes, revalidator is supposed to be triggered. Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables") Signed-off-by: lic121 <lic121@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto/ofproto-dpif: Fix dpif_type for userspace tunnels.Wan Junjie2022-04-041-0/+4
| | | | | | | | | | | When we create two or more tunnels with the same type, only the first tunnel will be added by dpif since they share the same datapath port. Set the dpif_type here will clear the ioctl error logs. Fixes: 4f19a78 ("netdev-vport: Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.") Signed-off-by: Wan Junjie <wanjunjie@bytedance.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* hmap: use short version of safe loops if possible.Adrian Moreno2022-03-301-2/+2
| | | | | | | | | | | | | | | Using SHORT version of the *_SAFE loops makes the code cleaner and less error prone. So, use the SHORT version and remove the extra variable when possible for hmap and all its derived types. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* list: use short version of safe loops if possible.Adrian Moreno2022-03-301-13/+11
| | | | | | | | | | | | | | | Using the SHORT version of the *_SAFE loops makes the code cleaner and less error-prone. So, use the SHORT version and remove the extra variable when possible. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto: Use xlate map for uuid lookups.Gaetan Rivet2022-03-071-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ofproto map 'all_ofproto_dpifs_by_uuid' does not support concurrent accesses. It is however read by upcall handler threads and written by the main thread at the same time. Additionally, handler threads will change the ams_seq while an ofproto is being destroyed, triggering crashes with the following backtrace: (gdb) bt hmap_next (hmap.h:398) seq_wake_waiters (seq.c:326) seq_change_protected (seq.c:134) seq_change (seq.c:144) ofproto_dpif_send_async_msg (ofproto_dpif.c:263) process_upcall (ofproto_dpif_upcall.c:1782) recv_upcalls (ofproto_dpif_upcall.c:1026) udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945) ovsthread_wrapper (ovs_thread.c:734) To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'. Instead, another map already storing ofprotos in xlate can be used. During an ofproto destruction, its reference is removed from the current xlate xcfg. Such change is committed only after all threads have quiesced at least once during xlate_txn_commit(). This wait ensures that the removal is seen by all threads, rendering impossible for a thread to still hold a reference while the destruction proceeds. Furthermore, the xlate maps are copied during updates instead of being written in place. It is thus correct to read xcfg->xbridges while inserting or removing from new_xcfg->xbridges. Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges, it is important to use a high level of entropy. As it used the ofproto pointer hashed, fewer bits were random compared to the uuid key used in 'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key in xbridges as well, improving entropy. Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.") Suggested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Co-authored-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto: Add refcount to ofproto to fix ofproto use-after-free.Peng He2022-03-071-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From hepeng: https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/#2487473 also from guohongzhi <guohongzhi1@huawei.com>: http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongzhi1@huawei.com/ also from a discussion about the mixing use of RCU and refcount in the mail list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet. A summary, as quoted from Ilya: " RCU for ofproto was introduced for one and only one reason - to avoid freeing ofproto while rules are still alive. This was done in commit f416c8d61601 ("ofproto: RCU postpone rule destruction."). The goal was to allow using rules without refcounting them within a single grace period. And that forced us to postpone destruction of the ofproto for a single grace period. Later commit 39c9459355b6 ("Use classifier versioning.") made it possible for rules to be alive for more than one grace period, so the commit made ofproto wait for 2 grace periods by double postponing. As we can see now, that wasn't enough and we have to wait for more than 2 grace periods in certain cases. " In a short, the ofproto should have a longer life time than rule, if the rule lasts for more than 2 grace periods, the ofproto should live longer to ensure rule->ofproto is valid. It's hard to predict how long a ofproto should live, thus we need to use refcount on ofproto to make things easy. The controversial part is that we have already used RCU postpone to delay ofproto destrution, if we have to add refcount, is it simpler to use just refcount without RCU postpone? IMO, I think going back to the pure refcount solution is more complicated than mixing using both. Gaëtan Rive asks some questions on guohongzhi's v2 patch: during ofproto_rule_create, should we use ofproto_ref or ofproto_try_ref? how can we make sure the ofproto is alive? By using RCU, ofproto has three states: state 1: alive, with refcount >= 1 state 2: dying, with refcount == 0, however pointer is valid state 3: died, memory freed, pointer might be dangling. Without using RCU, there is no state 2, thus, we have to be very careful every time we see a ofproto pointer. In contrast, with RCU, we can be sure that it's alive at least in this grace peroid, so we can just check if it is dying by ofproto_try_ref. This shows that by mixing use of RCU and refcount we can save a lot of work worrying if ofproto is dangling. In short, the RCU part makes sure the ofproto is alive when we use it, and the refcount part makes sure it lives longer enough. In this patch, I have merged guohongzhi's patch and mine, and fixes accoring to the previous comments. Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Peng He <hepeng.0320@bytedance.com> Co-authored-by: Hongzhi Guo <guohongzhi1@huawei.com> Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Trigger revalidation when ipfix config set.lic1212022-03-041-0/+6
| | | | | | | | | | | | | | Currently, ipfix conf creation/deletion don't trigger dpif backer revalidation. This is not expected, as we need the revalidation to commit ipfix into xlate. So that xlate can generate ipfix actions. This patch covers only new creation/deletion of ipfix config. Will upload one more patch to cover ipfix option changes. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: lic121 <lic121@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Fix memory leak in dpif/show-dp-features appctl.Ilya Maximets2022-01-171-0/+1
| | | | | | Fixes: a98b700db617 ("ofproto: Add appctl command to show Datapath features") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Encap & Decap actions for MPLS packet type.Martin Varghese2022-01-171-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The encap & decap actions are extended to support MPLS packet type. Encap & decap actions adds and removes MPLS header at start of the packet. The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the ethernet packets must be encapsulated inside MPLS tunnel. In this change the encap & decap actions are extended to support MPLS packet type. The encap & decap adds and removes MPLS header at the start of packet as depicted below. Encapsulation: Actions - encap(mpls),encap(ethernet) Incoming packet -> | ETH | IP | Payload | 1 Actions - encap(mpls) [Datapath action - ADD_MPLS:0x8847] Outgoing packet -> | MPLS | ETH | Payload| 2 Actions - encap(ethernet) [ Datapath action - push_eth ] Outgoing packet -> | ETH | MPLS | ETH | Payload| Decapsulation: Incoming packet -> | ETH | MPLS | ETH | IP | Payload | Actions - decap(),decap(packet_type(ns=0,type=0)) 1 Actions - decap() [Datapath action - pop_eth) Outgoing packet -> | MPLS | ETH | IP | Payload| 2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558] Outgoing packet -> | ETH | IP | Payload| Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Increase dp_hash default max buckets.Mike Pattrick2021-12-031-2/+2
| | | | | | | | | | | | | | | | | | | | Currently when a user creates an openflow group with with multiple buckets without specifying a selection type, the efficient dp_hash is only selected if the user is creating fewer than 64 buckets. But when dp_hash is explicitly selected, up to 256 buckets are supported. While up to 64 buckets seems like a lot, certain OVN/Open Stack workloads could result in the user creating more than 64 buckets. For example, when using OVN to load balance. This patch increases the default maximum from 64 to 256. This change to the default limit doesn't affect how many buckets are actually created, that is specified by the user when the group is created, just how traffic is distributed across buckets. Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: APIs and CLI option to add/delete static fdb entry.Vasu Dasari2021-07-161-4/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 side. For L2 side, there is only fdb show command. This commit gives an option to add/del an fdb entry via ovs-appctl. CLI command looks like: To add: ovs-appctl fdb/add <bridge> <port> <vlan> <Mac> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 To del: ovs-appctl fdb/del <bridge> <vlan> <Mac> ovs-appctl fdb/del br0 0 50:54:00:00:00:05 Added two new APIs to provide convenient interface to add and delete static-macs. bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t in_port, struct eth_addr dl_src, int vlan); bool xlate_delete_static_mac_entry(const struct ofproto_dpif *, struct eth_addr dl_src, int vlan); 1. Static entry should not age. To indicate that entry being programmed is a static entry, 'expires' field in 'struct mac_entry' will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value is made while deleting mac entry as part of regular aging process. 2. Another change to the mac-update logic, when a packet with same dl_src as that of a static-mac entry arrives on any port, the logic will not modify the expires field. 3. While flushing fdb entries, made sure static ones are not evicted. 4. Updated "ovs-appctl fdb/stats-show br0" to display number of static entries in switch Added following tests: ofproto-dpif - static-mac add/del/flush ofproto-dpif - static-mac mac moves Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 Signed-off-by: Vasu Dasari <vdasari@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* conntrack: Document all-zero IP SNAT behavior and add a test case.Eelco Chaudron2021-07-081-1/+21
| | | | | | | | | | | | | | | | | | | | Currently, conntrack in the kernel has an undocumented feature referred to as all-zero IP address SNAT. Basically, when a source port collision is detected during the commit, the source port will be translated to an ephemeral port. If there is no collision, no SNAT is performed. This patchset documents this behavior and adds a self-test to verify it's not changing. In addition, a datapath feature flag is added for the all-zero IP SNAT case. This will help applications on top of OVS, like OVN, to determine this feature can be used. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Fix use of uninitialized attributes of timeout policy.Ilya Maximets2021-05-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | 'cdtp' is allocated on a stack and it has uninitialized 'present' field that specifies which attributes are actually set. This causes use of uninitialized attributes. Conditional jump or move depends on uninitialised value(s) at 0x539651: dpif_netlink_get_nl_tp_udp_attrs (dpif-netlink.c:3206) by 0x539651: dpif_netlink_get_nl_tp_attrs (dpif-netlink.c:3234) by 0x539651: dpif_netlink_ct_set_timeout_policy (dpif-netlink.c:3370) by 0x42B615: ct_add_timeout_policy_to_dpif (ofproto-dpif.c:5421) by 0x42B615: ct_set_zone_timeout_policy (ofproto-dpif.c:5525) by 0x40BD98: ct_zones_reconfigure (bridge.c:756) by 0x40BD98: datapath_reconfigure (bridge.c:804) by 0x40D737: bridge_reconfigure (bridge.c:903) by 0x411720: bridge_run (bridge.c:3331) by 0x40751C: main (ovs-vswitchd.c:127) Clearing the whole structure to avoid this kind of problems. Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
* Eliminate use of term "slave" in bond, LACP, and bundle contexts.Ben Pfaff2020-10-211-23/+24
| | | | | | | | | | | | | The new term is "member". Most of these changes should not change user-visible behavior. One place where they do is in "ovs-ofctl dump-flows", which will now output "members:..." inside "bundle" actions instead of "slaves:...". I don't expect this to cause real problems in most systems. The old syntax is still supported on input for backward compatibility. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
* userspace: Avoid dp_hash recirculation for balance-tcp bond mode.Vishal Deep Ajmera2020-06-221-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used *only* for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add conntrack timeout policy support.William Tu2020-05-011-1/+2
| | | | | | | | | | | Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout policy support") adds conntrack timeout policy for kernel datapath. This patch enables support for the userspace datapath. I tested using the 'make check-system-userspace' which checks the timeout policies for ICMP and UDP cases. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
* ofproto: Add support to watch controller port liveness in fast-failover groupVishal Deep Ajmera2020-03-061-0/+10
| | | | | | | | | | | | | | | | | | | | Currently fast-failover group does not support checking liveness of controller port (OFPP_CONTROLLER). However this feature can be useful for selecting alternate pipeline when controller connection itself is down for e.g. by using local DHCP server to reply for any DHCP request originating from VMs. This patch adds the support for watching controller port liveness in fast- failover group. Controller port is considered live when atleast one of-connection is alive. Example usage: ovs-ofctl add-group br-int 'group_id=1234,type=ff, bucket=watch_port:CONTROLLER,actions:<A>, bucket=watch_port:1,actions:<B> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Only delete tunnel backer ports along with the dpif.Ben Pfaff2020-02-291-2/+4
| | | | | | | | | | | | | | | | | | The admin can choose whether or not to delete flows from datapaths when they stop ovs-vswitchd. The goal of not deleting flows it to allow existing traffic to continue being forwarded until ovs-vswitchd is restarted. Until now, regardless of this choice, ovs-vswitchd has always deleted tunnel ports from the datapath. When flows are not deleted, this nevertheless prevents tunnel traffic from being forwarded. With this patch, ovs-vswitchd no longer deletes tunnel ports in the case where it does not delete flows, allowing tunnel traffic to continue being forwarded. Reported-by: Antonin Bas <abas@vmware.com> Tested-by: Antonin Bas <abas@vmware.com> Tested-by: txfh2007 <txfh2007@aliyun.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif: Fix dp_extra_info leak by reworking the allocation scheme.Ilya Maximets2020-01-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | dpctl module leaks the 'dp_extra_info' in case the dumped flow doesn't fit the dump filter while executing dpctl/dump-flows and also while executing dpctl/get-flow. This is already a 3rd attempt to fix all the leaks and incorrect usage of this string that definitely indicates poor initial design of the feature. Flow dump/get documentation clearly states that the caller does not own the data provided in dpif_flow. Datapath still owns all the data and promises to not free/modify it until the next quiescent period, however we're requesting the caller to free 'dp_extra_info' and this obviously breaks the rules. This patch fixes the issue by by storing 'dp_extra_info' within 'struct dp_netdev_flow' making datapath to own it. 'dp_netdev_flow' is RCU-protected, so it will be valid until the next quiescent period. Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Tested-by: Emma Finn <emma.finn@intel.com> Acked-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-upcall: Get rid of udpif_synchronize().Ben Pfaff2020-01-241-4/+0
| | | | | | | | | | | | | | | | | | | | | RCU provides the semantics we want from udpif_synchronize() and it should be much more lightweight than killing and restarting all the upcall threads. It looks like udpif_synchronize() was written before the OVS tree had RCU support, which is probably why we didn't use it here from the beginning. So we can just change udpif_synchronize() to a single ovsrcu_synchronize() call. However, udpif_synchronize() only has a single caller, which calls ovsrcu_synchronize() anyway just beforehand, via xlate_txn_commit(). So we can get rid of udpif_synchronize() entirely, which this patch does. As a side effect, this eliminates one reason why terminating OVS cleanly clears the datapath flow table. An upcoming patch will eliminate other reasons. Acked-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif: Fix memory leak while dumping dpif flows.Damijan Skvarc2020-01-221-0/+3
| | | | | | | | Leak was detected by running test: "ofproto-dpif - balance-tcp bonding" Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command") Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif: Turn dpif_flow_hash function into generic odp_flow_key_hash.Ilya Maximets2020-01-081-1/+1
| | | | | | | | | | | | | | | | | | | | Current implementation of dpif_flow_hash() doesn't depend on datapath interface and only complicates the callers by forcing them to figure out what is their current 'dpif'. If we'll need different hashing for different 'dpif's we'll implement an API for dpif-providers and each dpif implementation will be able to use their local function directly without calling it via dpif API. This change will allow us to not store 'dpif' pointer in the userspace datapath implementation which is broken and will be removed in next commits. This patch moves dpif_flow_hash() to odp-util module and replaces unused odp_flow_key_hash() by it, along with removing of unused 'dpif' argument. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* userspace: Improved packet drop statistics.Anju Thomas2020-01-071-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently OVS maintains explicit packet drop/error counters only on port level. Packets that are dropped as part of normal OpenFlow processing are counted in flow stats of “drop” flows or as table misses in table stats. These can only be interpreted by controllers that know the semantics of the configured OpenFlow pipeline. Without that knowledge, it is impossible for an OVS user to obtain e.g. the total number of packets dropped due to OpenFlow rules. Furthermore, there are numerous other reasons for which packets can be dropped by OVS slow path that are not related to the OpenFlow pipeline. The generated datapath flow entries include a drop action to avoid further expensive upcalls to the slow path, but subsequent packets dropped by the datapath are not accounted anywhere. Finally, the datapath itself drops packets in certain error situations. Also, these drops are today not accounted for.This makes it difficult for OVS users to monitor packet drop in an OVS instance and to alert a management system in case of a unexpected increase of such drops. Also OVS trouble-shooters face difficulties in analysing packet drops. With this patch we implement following changes to address the issues mentioned above. 1. Identify and account all the silent packet drop scenarios 2. Display these drops in ovs-appctl coverage/show Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Keshav Gupta <keshugupta1@gmail.com> Signed-off-by: Anju Thomas <anju.thomas@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Keshav Gupta <keshugupta1@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Fix using uninitialized execute hash.Ilya Maximets2020-01-071-63/+50
| | | | | | | | | | | | | | Most of callers doesn't initialize dpif_execute.hash leaving random value from the stack. And this random value used later while encoding netlink message and might produce unwanted kernel behavior. Fix that by fully initializing dpif_execute structure. Using designated initializers to avoid such issues in the future. Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* Add offload packets statisticszhaozhanxu2019-12-061-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add argument '--offload-stats' for command ovs-appctl bridge/dump-flows to display the offloaded packets statistics. The commands display as below: orignal command: ovs-appctl bridge/dump-flows br0 duration=574s, n_packets=1152, n_bytes=110768, priority=0,actions=NORMAL table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop new command with argument '--offload-stats' Notice: 'n_offload_packets' are a subset of n_packets and 'n_offload_bytes' are a subset of n_bytes. ovs-appctl bridge/dump-flows --offload-stats br0 duration=582s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, n_offload_bytes=107992, priority=0,actions=NORMAL table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x3,actions=drop Signed-off-by: zhaozhanxu <zhaozhanxu@163.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* ofproto-dpif: Refactor the get capability code.William Tu2019-12-021-14/+4
| | | | | | | | | Make the code simpler by removing the use of xasprintf and free, and use smap_add_format. Cc: Ben Pfaff <blp@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-provider: Move datapath capabilities callback to correct section.Ilya Maximets2019-11-281-1/+1
| | | | | | | | | | | 'get_datapath_cap' callback was mistakenly placed in 'Connection tracking' section of the 'struct dpif_class' while belongs to the 'Datapath information'. CC: William Tu <u9012063@gmail.com> Fixes: 27501802d09f ("ofproto-dpif: Expose datapath capability to ovsdb.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* ofproto: Fix crash on PACKET_OUT due to recursive locking after upcall.Ilya Maximets2019-11-261-10/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handling of OpenFlow PACKET_OUT implies pushing the packet through the datapath and packet processing inside the datapath could trigger an upcall. In case of system datapath, 'dpif_execute()' sends packet to the kernel module and returns. If any, upcall will be triggered inside the kernel and handled by a separate thread in userspace. But in case of userspace datapath full processing of the packet happens inside the 'dpif_execute()' in the same thread that handled PACKET_OUT. This causes an issue if upcall will lead to modification of flow rules. For example, it could happen while processing of 'learn' actions. Since whole handling of PACKET_OUT is protected by 'ofproto_mutex', OVS will assert on attempt to take it recursively while processing 'learn' actions: 0 __GI_raise (sig=sig@entry=6) 1 __GI_abort () 2 ovs_abort_valist () 3 ovs_abort () 4 ovs_mutex_lock_at (where=where@entry=0xad4199 "ofproto/ofproto.c:5391") <Trying to acquire ofproto_mutex again> 5 ofproto_flow_mod_learn () at ofproto/ofproto.c:5391 <Trying to modify flows according to 'learn' action> 6 xlate_learn_action () at ofproto/ofproto-dpif-xlate.c:5378 <'learn' action found> 7 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6893 8 xlate_recursively () at ofproto/ofproto-dpif-xlate.c:4233 9 xlate_table_action () at ofproto/ofproto-dpif-xlate.c:4361 10 in xlate_ofpact_resubmit () at ofproto/ofproto-dpif-xlate.c:4672 11 do_xlate_actions () at ofproto/ofproto-dpif-xlate.c:6773 12 xlate_actions () at ofproto/ofproto-dpif-xlate.c:7570 <Translating actions> 13 upcall_xlate () at ofproto/ofproto-dpif-upcall.c:1197 14 process_upcall () at ofproto/ofproto-dpif-upcall.c:1413 15 upcall_cb () at ofproto/ofproto-dpif-upcall.c:1315 16 dp_netdev_upcall (DPIF_UC_MISS) at lib/dpif-netdev.c:6236 <Flow cache miss. Making upcall> 17 handle_packet_upcall () at lib/dpif-netdev.c:6591 18 fast_path_processing () at lib/dpif-netdev.c:6709 19 dp_netdev_input__ () at lib/dpif-netdev.c:6797 20 dp_netdev_recirculate () at lib/dpif-netdev.c:6842 21 dp_execute_cb () at lib/dpif-netdev.c:7158 22 odp_execute_actions () at lib/odp-execute.c:794 23 dp_netdev_execute_actions () at lib/dpif-netdev.c:7332 24 dpif_netdev_execute () at lib/dpif-netdev.c:3725 25 dpif_netdev_operate () at lib/dpif-netdev.c:3756 <Packet pushed to userspace datapath for processing> 26 dpif_operate () at lib/dpif.c:1367 27 dpif_execute () at lib/dpif.c:1321 28 packet_execute () at ofproto/ofproto-dpif.c:4760 29 ofproto_packet_out_finish () at ofproto/ofproto.c:3594 <Taking ofproto_mutex> 30 handle_packet_out () at ofproto/ofproto.c:3635 31 handle_single_part_openflow (OFPTYPE_PACKET_OUT) at ofproto/ofproto.c:8400 32 handle_openflow () at ofproto/ofproto.c:8587 33 ofconn_run () 34 connmgr_run () 35 ofproto_run () 36 bridge_run__ () 37 bridge_run () 38 main () Fix that by splitting the 'ofproto_packet_out_finish()' in two parts. First one that translates side-effects and requires holding 'ofproto_mutex' and the second that only pushes packet to datapath. The second part moved out of 'ofproto_mutex' because 'ofproto_mutex' is not required and actually should not be taken in order to avoid recursive locking. Reported-by: Anil Kumar Koli <anilkumar.k@altencalsoftlabs.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Expose datapath ND Extensions capability to ovsdbFlavio Leitner2019-11-221-0/+1
| | | | | | | | Document and expose datapath ND Extensions capability to ovsdb. Fixes: d0d571493 ("ofproto-dpif: Allow IPv6 ND Extensions only if supported") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Allow IPv6 ND Extensions only if supportedFlavio Leitner2019-11-211-1/+58
| | | | | | | | | | | | | | The IPv6 ND Extensions is only implemented in userspace datapath, but nothing prevents that to be used with other datapaths. This patch probes the datapath and only allows if the support is available. Fixes: 9b2b84973 ("Support for match & set ICMPv6 reserved and options type fields") Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Expose datapath capability to ovsdb.William Tu2019-11-211-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for fetching the datapath's capabilities from the result of 'check_support()', and write the supported capability to a new database column, called 'capabilities' under Datapath table. To see how it works, run: # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev # ovs-vsctl -- --id=@m create Datapath datapath_version=0 \ 'ct_zones={}' 'capabilities={}' \ -- set Open_vSwitch . datapaths:"netdev"=@m # ovs-vsctl list-dp-cap netdev ufid=true sample_nesting=true clone=true tnl_push_pop=true \ ct_orig_tuple=true ct_eventmask=true ct_state=true \ ct_clear=true max_vlan_headers=1 recirc=true ct_label=true \ max_hash_alg=1 ct_state_nat=true ct_timeout=true \ ct_mark=true ct_orig_tuple6=true check_pkt_len=true \ masked_set_action=true max_mpls_depth=3 trunc=true ct_zone=true Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> --- v5: Add improved documentation from Ben and fix checkpatch error (tab and line 79 char) v4: rebase to master v3: fix 32-bit build, reported by Greg travis: https://travis-ci.org/williamtu/ovs-travis/builds/599276267 v2: rebase to master
* lacp: warn transmit failure of lacp pduGowrishankar Muthukrishnan2019-10-301-1/+5
| | | | | | | | | | | | | | It might be difficult to trace whether LACP PDU tx (as in response) was successful when the pdu was not transmitted by egress slave for various reasons (including resource contention within NIC) and only way to trace its fate is by looking at ofproto->stats.tx_[packets/bytes] and slave port stats. Adding a warning when there is tx failure could help user debug at the root of this problem. Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lldp: Fix for OVS crashes when a LLDP-enabled port is deletedSurya Rudra2019-10-241-5/+5
| | | | | | | | | | | | | | | | | | | Issue: When LLDP is enabled on a port, a structure to hold LLDP related state is created and that structure has a reference to the port. The ofproto monitor thread accesses the LLDP structure to periodically send packets over the associated port. When the port is deleted, the LLDP structure is not cleaned up and it continues to refer to the deleted port. When the monitor thread attempts to access the deleted port OVS crashes. Crash can happen with bridge delete and bond delete also. Fix: Remove all references to the LLDP structure and free it when the port is deleted. Signed-off-by: Surya Rudra <rudrasurya.r@altencalsoftlabs.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Translate timeout policy in ct actionYi-Hung Wei2019-09-261-0/+97
| | | | | | | | | | | | | | | | | | | | | | This patch derives the timeout policy based on ct zone from the internal data structure that we maintain on dpif layer. It also adds a system traffic test to verify the zone-based conntrack timeout feature. The test uses ovs-vsctl commands to configure the customized ICMP and UDP timeout on zone 5 to a shorter period. It then injects ICMP and UDP traffic to conntrack, and checks if the corresponding conntrack entry expires after the predefined timeout. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> ofproto-dpif: Checks if datapath supports OVS_CT_ATTR_TIMEOUT This patch checks whether datapath supports OVS_CT_ATTR_TIMEOUT. With this check, ofproto-dpif-xlate can use this information to decide whether to translate the ct timeout policy. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tablesYi-Hung Wei2019-09-261-0/+293
| | | | | | | | | | | | | | | | | | | | | | | This patch consumes the CT_Zone and CT_Timeout_Policy tables, maintains the zone-based configuration in the vswitchd. Whenever there is a database change, vswitchd will read the datapath, CT_Zone, and CT_Timeout_Policy tables from ovsdb, builds an internal snapshot of the database configuration in bridge.c, and pushes down the change into ofproto and dpif layer. If a new zone-based timeout policy is added, it updates the zone to timeout policy mapping in the per datapath type datapath structure in dpif-backer, and pushes down the timeout policy into the datapath via dpif interface. If a timeout policy is no longer used, for kernel datapath, vswitchd may not be able to remove it from datapath immediately since datapath flows can still reference the to-be-deleted timeout policies. Thus, we keep an timeout policy kill list, that vswitchd will go back to the list periodically and try to kill the unused timeout policies. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* ofproto-dpif: Free leaked 'webster'Yifeng Sun2019-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind reported: 1122: ofproto-dpif - select group with explicit dp_hash selection method ==16884== 64 bytes in 1 blocks are definitely lost in loss record 320 of 346 ==16884== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==16884== by 0x532512: xcalloc (util.c:121) ==16884== by 0x4262B9: group_setup_dp_hash_table (ofproto-dpif.c:4846) ==16884== by 0x4267CB: group_set_selection_method (ofproto-dpif.c:4938) ==16884== by 0x4267CB: group_construct (ofproto-dpif.c:4984) ==16884== by 0x417250: init_group (ofproto.c:7286) ==16884== by 0x41B4FC: add_group_start (ofproto.c:7316) ==16884== by 0x42247A: ofproto_group_mod_start (ofproto.c:7589) ==16884== by 0x4250EC: handle_group_mod (ofproto.c:7744) ==16884== by 0x4250EC: handle_single_part_openflow (ofproto.c:8428) ==16884== by 0x4250EC: handle_openflow (ofproto.c:8606) ==16884== by 0x4579E2: ofconn_run (connmgr.c:1318) ==16884== by 0x4579E2: connmgr_run (connmgr.c:355) ==16884== by 0x41E0F5: ofproto_run (ofproto.c:1845) ==16884== by 0x40BA63: bridge_run__ (bridge.c:2971) ==16884== by 0x410CF3: bridge_run (bridge.c:3029) ==16884== by 0x407614: main (ovs-vswitchd.c:127) This patch fixes it. Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Uninitialize 'xlate_cache' to free resourcesYifeng Sun2019-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind reported: 1210: ofproto-dpif - continuation after clone ==32205== 4,392 (1,440 direct, 2,952 indirect) bytes in 12 blocks are definitely lost in loss record 359 of 362 ==32205== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==32205== by 0x532574: xmalloc (util.c:138) ==32205== by 0x4F98CA: ofpbuf_init (ofpbuf.c:123) ==32205== by 0x42C07B: nxt_resume (ofproto-dpif.c:5110) ==32205== by 0x41796F: handle_nxt_resume (ofproto.c:3677) ==32205== by 0x424583: handle_single_part_openflow (ofproto.c:8473) ==32205== by 0x424583: handle_openflow (ofproto.c:8606) ==32205== by 0x4579E2: ofconn_run (connmgr.c:1318) ==32205== by 0x4579E2: connmgr_run (connmgr.c:355) ==32205== by 0x41E0F5: ofproto_run (ofproto.c:1845) ==32205== by 0x40BA63: bridge_run__ (bridge.c:2971) ==32205== by 0x410CF3: bridge_run (bridge.c:3029) ==32205== by 0x407614: main (ovs-vswitchd.c:127) This is because 'xcache' was not destroyed properly. This patch fixes it. Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Always cleanup userspace datapath.Ilya Maximets2019-07-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'netdev' datapath is implemented within ovs-vswitchd process and can not exist without it, so it should be gracefully terminated with a full cleanup of resources upon ovs-vswitchd exit. This change forces dpif cleanup for 'netdev' datapath regardless of passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to not pass this additional option everytime for userspace datapath installations and also allowes to not terminate system datapath in setups where both datapaths runs at the same time. The main part is that dpif_port_del() will lead to netdev_close() and subsequent netdev_class->destroy(dev) which will stop HW NICs and free their resources. For vhost-user interfaces it will invoke vhost driver unregistering with a properly closed vhost-user connection. For upcoming AF_XDP netdev this will allow to gracefully destroy xdp sockets and unload xdp programs from linux interfaces. Another important thing is that port deletion will also trigger flushing of flows offloaded to HW NICs. Exception made for 'internal' ports that could have user ip/route configuration. These ports will not be removed without '--cleanup'. This change fixes OVS disappearing from the DPDK point of view (keeping HW NICs improperly configured, sudden closing of vhost-user connections) and will help with linux devices clearing with upcoming AF_XDP netdev support. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Fix continuation with patch portYi-Hung Wei2019-06-211-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the ofp_port to odp_port translation issue on patch port with nxt_resume. When OVS resumes processing a packet from nxt_resume, OVS does not translate the ofp in_port to odp in_port correctly if the packet is originally received from a patch port. Currently,OVS sets the odp in_port for this resume pakcet as ODPP_NONE and push the resume packet back to the datapath. Later on, if the packet goes through a recirc, OVS will generate the following message since it can not translate odp in_port (ODPP_NONE) back to ofp in_port during upcall, and push down a datapath rule to drop the packet. ofproto_dpif_upcall(handler16)|INFO|received packet on unassociated datapath port 4294967295 When OVS revalidates the drop datapath flow with ODPP_NONE in_port, we will see the following warning. ofproto_dpif_upcall(revalidator18)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:.... This patch resolves this issue by storing the odp in_port in the continuation messages, and restores the odp in_port before push the packet back to the datapath. VMWare-BZ: 2364696 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lacp: Don't send or receive PDUs when carrier state of slave is downNitin Katiyar2019-06-101-7/+7
| | | | | | | | | | | | | | | | | | | Fortville NICs (or their drivers) can get into an inconsistent state, in which the NIC can actually transmit and receive packets even though they report "PHY down". In such a state, OVS can exchange and process LACP messages and enable a LACP slave. However, further packet exchange over the slave fails because OVS sees that the PHY is down. This commit fixes the problem by making OVS ignore received LACP PDUs and suppress transmitting LACP PDUs when carrier is down. In addition, when a LACP PDU is received with carrier down, this commit triggers rechecking the carrier status (by incrementing the connectivity sequence number) to ensure that it is updated as quickly as possible. Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Change priority tags from boolean to enumEli Britstein2019-05-241-1/+2
| | | | | | | | | | | | | Priority tags is a port configuration to determine how the port treats priority tags, e.g. zero VLAN ID. Change the type from boolean to enum as a pre-step towards introducing additional modes. The new options are "never", equivalent to previously "false", and "if-nonzero", equivalent to previously "true". "true" is still supported for backwards compatibility. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto: Return error codes for rule insertions.Aravind Prasad S2019-04-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Currently, rule_insert() API does not have return value. There are some possible scenarios where rule insertions can fail at run-time even though the static checks during rule_construct() had passed previously. Some possible scenarios for failure of rule insertions: **) Rule insertions can fail dynamically in Hybrid mode (both Openflow and Normal switch functioning coexist) where the CAM space could get suddenly filled up by Normal switch functioning and Openflow gets devoid of available space. **) Some deployments could have separate independent layers for HW rule insertions and application layer to interact with OVS. HW layer could face any dynamic issue during rule handling which application could not have predicted/captured in rule-construction phase. Rule-insert errors for bundles are handled too. Testing: Tested failures of rule insertions and also with bundles. Signed-off-by: Aravind Prasad S <aravind.sridharan at dell.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Add a new OVS action check_pkt_largerNuman Siddique2019-04-221-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new action 'check_pkt_larger' which checks if the packet is larger than the given size and stores the result in the destination register. Usage: check_pkt_larger(len)->REGISTER Eg. match=...,actions=check_pkt_larger(1442)->NXM_NX_REG0[0],next; This patch makes use of the new datapath action - 'check_pkt_len' which was recently added in the commit [1]. At the start of ovs-vswitchd, datapath is probed for this action. If the datapath action is present, then 'check_pkt_larger' makes use of this datapath action. Datapath action 'check_pkt_len' takes these nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER (optional) - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL (optional) - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. Let's say we have these flows added to an OVS bridge br-int table=0, priority=100 in_port=1,ip,actions=check_pkt_larger:100->NXM_NX_REG0[0],resubmit(,1) table=1, priority=200,in_port=1,ip,reg0=0x1/0x1 actions=output:3 table=1, priority=100,in_port=1,ip,actions=output:4 Then the action 'check_pkt_larger' will be translated as - check_pkt_len(size=100,gt(3),le(4)) datapath will check the packet length and if the packet length is greater than 100, it will output to port 3, else it will output to port 4. In case, datapath doesn't support 'check_pkt_len' action, the OVS action 'check_pkt_larger' sets SLOW_ACTION so that datapath flow is not added. This OVS action is intended to be used by OVN to check the packet length and generate an ICMP packet with type 3, code 4 and next hop mtu in the logical router pipeline if the MTU of the physical interface is lesser than the packet length. More information can be found here [2] [1] - https://kernel.googlesource.com/pub/scm/linux/kernel/git/davem/net-next/+/4d5ec89fc8d14dcdab7214a0c13a1c7321dc6ea9 [2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Ben Pfaff <blp@ovn.org> CC: Gregory Rose <gvrose8192@gmail.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge: Propagate patch port pairing errors to db.Ilya Maximets2019-03-261-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Virtual ports like 'patch' ports that almost fully implemented on 'ofproto' layer could have internal to 'ofproto' statuses that could not be retrieved from 'netdev' or other layers. For example, in current implementation there is no way to get the patch port pairing status (i.e. if it has usable peer?). New 'ofproto-provider' API function 'vport_get_status' introduced to cover this gap. It allowes 'bridge' layer to retrive current status of ofproto virtual ports and propagate it to DB. For now we're only interested in pairing errors of 'patch' ports. That are propagated to the 'error' column of the 'Interface' table. Ex.: $ ovs-vsctl show ... Bridge "br1" ... Port "patch1" Interface "patch1" type: patch options: {peer="patch0"} error: "No usable peer 'patch0' exists in 'system' datapath." Bridge "br0" datapath_type: netdev ... Port "patch0" Interface "patch0" type: patch options: {peer="patch1"} error: "No usable peer 'patch1' exists in 'netdev' datapath." Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Improve log messages and error reporting for Netlink parsing.Ben Pfaff2019-02-251-2/+4
| | | | | | | | As a side effect, this also reduces a lot of log messages' severities from ERR to WARN. They just didn't seem like messages that in general reported anything that would prevent functioning. Signed-off-by: Ben Pfaff <blp@ovn.org>