summaryrefslogtreecommitdiff
path: root/datapath
Commit message (Collapse)AuthorAgeFilesLines
* compat: Fix compile warning.Greg Rose2020-11-161-1/+4
| | | | | | | | | | | | | | | | In ../compat/nf_conntrack_reasm.c nf_frags_cache_name is declared if OVS_NF_DEFRAG6_BACKPORT is defined. However, later in the patch it is only used if HAVE_INET_FRAGS_WITH_FRAGS_WORK is defined and HAVE_INET_FRAGS_RND is not defined. This will cause a compile warning about unused variables. Fix it up by using the same defines that enable its use to decide if it should be declared and avoid the compiler warning. Fixes: 4a90b277baca ("compat: Fixup ipv6 fragmentation on 4.9.135+ kernels") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* compat: Fix build issue on RHEL 7.7.Greg Rose2020-11-162-12/+2
| | | | | | | | | | | | | | | | | | RHEL 7.2 introduced a KABI fixup in struct sk_buff for the name change of l4_rxhash to l4_hash. Then patch 9ba57fc7cccc ("datapath: Add hash info to upcall") introduced a compile error by using l4_hash and not fixing up the HAVE_L4_RXHASH configuration flag. Remove all references to HAVE_L4_RXHASH and always use l4_hash to resolve the issue. This will break compilation on RHEL 7.0 and RHEL 7.1 but dropping support for these older kernels shouldn't be a problem. Fixes: 9ba57fc7cccc ("datapath: Add hash info to upcall") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* compat: Remove stale code.Greg Rose2020-11-162-7/+1
| | | | | | | | | | Remove stale and unused code left over after support for kernels older than 3.10 was removed. Fixes: 8063e0958780 ("datapath: Drop support for kernel older than 3.10") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: use hlist_for_each_entry_rcu instead of hlist_for_each_entryTonghao Zhang2020-10-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 64948427a63f49dd0ce403388d232f22cc1971a8 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Thu Mar 26 04:27:24 2020 +0800 net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry The struct sw_flow is protected by RCU, when traversing them, use hlist_for_each_entry_rcu. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Compat fixup - OVS doesn't support lockdep_ovsl_is_held() yet Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: Distribute switch variables for initializationKees Cook2020-10-171-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 16a556eeb7ed2dc3709fe2c5be76accdfa4901ab Author: Kees Cook <keescook@chromium.org> Date: Wed Feb 19 22:23:09 2020 -0800 openvswitch: Distribute switch variables for initialization Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. net/openvswitch/flow_netlink.c: In function ‘validate_set’: net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable] 2711 | const struct ovs_key_ipv4 *ipv4_key; | ^~~~~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld2020-10-172-7/+11
| | | | | | | | | | | | | | | | | | | Upstream commit: commit 2cec4448db38758832c2edad439f99584bb8fa0d Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Mon Jan 13 18:42:29 2020 -0500 net: openvswitch: use skb_list_walk_safe helper for gso segments This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: support asymmetric conntrackaaron conole2020-10-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 author: aaron conole <aconole@redhat.com> date: tue dec 3 16:34:13 2019 -0500 openvswitch: support asymmetric conntrack the openvswitch module shares a common conntrack and nat infrastructure exposed via netfilter. it's possible that a packet needs both snat and dnat manipulation, due to e.g. tuple collision. netfilter can support this because it runs through the nat table twice - once on ingress and again after egress. the openvswitch module doesn't have such capability. like netfilter hook infrastructure, we should run through nat twice to keep the symmetry. fixes: 05752523e565 ("openvswitch: interface with nat.") signed-off-by: aaron conole <aconole@redhat.com> signed-off-by: david s. miller <davem@davemloft.net> Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.") Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: remove another BUG_ON()Paolo Abeni2020-10-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 8a574f86652a4540a2433946ba826ccb87f398cc Author: Paolo Abeni <pabeni@redhat.com> Date: Sun Dec 1 18:41:25 2019 +0100 openvswitch: remove another BUG_ON() If we can't build the flow del notification, we can simply delete the flow, no need to crash the kernel. Still keep a WARN_ON to preserve debuggability. Note: the BUG_ON() predates the Fixes tag, but this change can be applied only after the mentioned commit. v1 -> v2: - do not leak an skb on error Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()Paolo Abeni2020-10-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 8ffeb03fbba3b599690b361467bfd2373e8c450f Author: Paolo Abeni <pabeni@redhat.com> Date: Sun Dec 1 18:41:24 2019 +0100 openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() All the callers of ovs_flow_cmd_build_info() already deal with error return code correctly, so we can handle the error condition in a more gracefull way. Still dump a warning to preserve debuggability. v1 -> v2: - clarify the commit message - clean the skb and report the error (DaveM) Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: fix flow command message sizePaolo Abeni2020-10-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 4e81c0b3fa93d07653e2415fa71656b080a112fd Author: Paolo Abeni <pabeni@redhat.com> Date: Tue Nov 26 12:55:50 2019 +0100 openvswitch: fix flow command message size When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant flow has no UFID, we can exceed the computed size, as ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY attribute. Take the above in account when computing the flow command message size. Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.") Reported-by: Qi Jun Ding <qding@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: don't call pad_packet if not necessaryTonghao Zhang2020-10-171-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 61ca533c0e94104c35fcb7858a23ec9a05d78143 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Thu Nov 14 23:51:08 2019 +0800 net: openvswitch: don't call pad_packet if not necessary The nla_put_u16/nla_put_u32 makes sure that *attrlen is align. The call tree is that: nla_put_u16/nla_put_u32 -> nla_put attrlen = sizeof(u16) or sizeof(u32) -> __nla_put attrlen -> __nla_reserve attrlen -> skb_put(skb, nla_total_size(attrlen)) nla_total_size returns the total length of attribute including padding. Cc: Joe Stringer <joe@ovn.org> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: select vport upcall portid directlyTonghao Zhang2020-10-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 90ce9f23a886bdef7a4b7a9bd52c7a50a6a81635 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Thu Nov 7 00:34:28 2019 +0800 net: openvswitch: select vport upcall portid directly The commit 69c51582ff786 ("dpif-netlink: don't allocate per thread netlink sockets"), in Open vSwitch ovs-vswitchd, has changed the number of allocated sockets to just one per port by moving the socket array from a per handler structure to a per datapath one. In the kernel datapath, a vport will have only one socket in most case, if so select it directly in fast-path. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: simplify the ovs_dp_cmd_newTonghao Zhang2020-10-171-22/+38
| | | | | | | | | | | | | | | | | | | | | Upstream commit: commit eec62eadd1d757b0743ccbde55973814f3ad396e Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Fri Nov 1 22:23:54 2019 +0800 net: openvswitch: simplify the ovs_dp_cmd_new use the specified functions to init resource. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: fix possible memleak on destroy flow-tableTonghao Zhang2020-10-172-88/+106
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Fri Nov 1 22:23:52 2019 +0800 net: openvswitch: fix possible memleak on destroy flow-table When we destroy the flow tables which may contain the flow_mask, so release the flow mask struct. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Added additional compat layer fixup for WRITE_ONCE() Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: add likely in flow_lookupTonghao Zhang2020-10-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 0a3e01371db17d753dd92ec4d0fc6247412d3b01 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Fri Nov 1 22:23:51 2019 +0800 net: openvswitch: add likely in flow_lookup The most case *index < ma->max, and flow-mask is not NULL. We add un/likely for performance. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: simplify the flow_hashTonghao Zhang2020-10-171-5/+2
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 515b65a4b99197ae062a795ab4de919e6d04be04 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Fri Nov 1 22:23:50 2019 +0800 net: openvswitch: simplify the flow_hash Simplify the code and remove the unnecessary BUILD_BUG_ON. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: optimize flow-mask looking upTonghao Zhang2020-10-171-50/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 57f7d7b9164426c496300d254fd5167fbbf205ea Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Fri Nov 1 22:23:49 2019 +0800 net: openvswitch: optimize flow-mask looking up The full looking up on flow table traverses all mask array. If mask-array is too large, the number of invalid flow-mask increase, performance will be drop. One bad case, for example: M means flow-mask is valid and NULL of flow-mask means deleted. +-------------------------------------------+ | M | NULL | ... | NULL | M| +-------------------------------------------+ In that case, without this patch, openvswitch will traverses all mask array, because there will be one flow-mask in the tail. This patch changes the way of flow-mask inserting and deleting, and the mask array will be keep as below: there is not a NULL hole. In the fast path, we can "break" "for" (not "continue") in flow_lookup when we get a NULL flow-mask. "break" v +-------------------------------------------+ | M | M | NULL |... | NULL | NULL| +-------------------------------------------+ This patch don't optimize slow or control path, still using ma->max to traverse. Slow path: * tbl_mask_array_realloc * ovs_flow_tbl_lookup_exact * flow_mask_find Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: don't unlock mutex when changing the user_features failsTonghao Zhang2020-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 4c76bf696a608ea5cc555fe97ec59a9033236604 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Fri Nov 1 22:23:53 2019 +0800 net: openvswitch: don't unlock mutex when changing the user_features fails Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: fix GFP flags in rtnl_net_notifyid()Guillaume Nault2020-10-171-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit d4e4fdf9e4a27c87edb79b1478955075be141f67 Author: Guillaume Nault <gnault@redhat.com> Date: Wed Oct 23 18:39:04 2019 +0200 netns: fix GFP flags in rtnl_net_notifyid() In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances, but there are a few paths calling rtnl_net_notifyid() from atomic context or from RCU critical sections. The later also precludes the use of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new() call is wrong too, as it uses GFP_KERNEL unconditionally. Therefore, we need to pass the GFP flags as parameter and propagate it through function calls until the proper flags can be determined. In most cases, GFP_KERNEL is fine. The exceptions are: * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump() indirectly call rtnl_net_notifyid() from RCU critical section, * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as parameter. Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used by nlmsg_new(). The function is allowed to sleep, so better make the flags consistent with the ones used in the following ovs_vport_cmd_fill_info() call. Found by code inspection. Fixes: 9a9634545c70 ("netns: notify netns id events") Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Backport the datapath.c portion of this fix. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: Set OvS recirc_id from tc chain indexPaul Blakey2020-10-174-5/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 95a7233c452a58a4c2310c456c73997853b2ec46 Author: Paul Blakey <paulb@mellanox.com> Date: Wed Sep 4 16:56:37 2019 +0300 net: openvswitch: Set OvS recirc_id from tc chain index Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Backport the local datapath changes from this patch and add compat layer fixup for the DECLARE_STATIC_KEY_FALSE macro. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: Print error when ovs_execute_actions() failsYifeng Sun2020-10-171-2/+5
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit aa733660dbd8d9192b8c528ae0f4b84f3fef74e4 Author: Yifeng Sun <pkusunyifeng@gmail.com> Date: Sun Aug 4 19:56:11 2019 -0700 openvswitch: Print error when ovs_execute_actions() fails Currently in function ovs_dp_process_packet(), return values of ovs_execute_actions() are silently discarded. This patch prints out an debug message when error happens so as to provide helpful hints for debugging. Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: do not update max_headroom if new headroom is equal to old headroomTaehee Yoo2020-10-171-11/+27
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 6b660c4177aaebdc73df7a3378f0e8b110aa4b51 Author: Taehee Yoo <ap420073@gmail.com> Date: Sat Jul 6 01:08:09 2019 +0900 net: openvswitch: do not update max_headroom if new headroom is equal to old headroom When a vport is deleted, the maximum headroom size would be changed. If the vport which has the largest headroom is deleted, the new max_headroom would be set. But, if the new headroom size is equal to the old headroom size, updating routine is unnecessary. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: drop unneeded likely() call around IS_ERR()Enrico Weigelt2020-10-171-1/+1
| | | | | | | | | | | | | | | | | | | Upstream commit: commit b90f5aa4d6268e81dd1fd51e5ef89d2892bf040d Author: Enrico Weigelt <info@metux.net> Date: Wed Jun 5 23:06:40 2019 +0200 net: openvswitch: drop unneeded likely() call around IS_ERR() IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: return an error instead of doing BUG_ON()Eelco Chaudron2020-10-171-2/+5
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit a734d1f4c2fc962ef4daa179e216df84a8ec5f84 Author: Eelco Chaudron <echaudro@redhat.com> Date: Thu May 2 16:12:38 2019 -0400 net: openvswitch: return an error instead of doing BUG_ON() For all other error cases in queue_userspace_packet() the error is returned, so it makes sense to do the same for these two error cases. Reported-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Eliminate "whitelist" and "blacklist" terms.Ben Pfaff2020-10-163-2/+2
| | | | | | | | There is one remaining use under datapath. That change should happen upstream in Linux first according to our usual policy. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
* datapath: Fix exposing OVS_TUNNEL_KEY_ATTR_GTPU_OPTS to kernel module.Ilya Maximets2020-10-081-0/+3
| | | | | | | | | | | | | | | | Kernel module doesn't know about GTPU and it should return correct out-of-range error in case this tunnel attribute passed there for any reason. Current out-of-tree module will pass the range check and will try to access ovs_tunnel_key_lens[] array by index OVS_TUNNEL_KEY_ATTR_GTPU_OPTS. Even though it might not produce issues in current code, this is not a good thing to do since ovs_tunnel_key_lens[] array is not explicitly initialized for OVS_TUNNEL_KEY_ATTR_GTPU_OPTS and we will likely have misleading error about incorrect attribute length in the end. Fixes: 3c6d05a02e0f ("userspace: Add GTP-U support.") Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: Remove duplicated includesYunjian Wang2020-07-142-2/+0
| | | | | | | | Remove duplicated includes. Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: William Tu <u9012063@gmail.com>
* userspace: Avoid dp_hash recirculation for balance-tcp bond mode.Vishal Deep Ajmera2020-06-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used *only* for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* datapath: Add hash info to upcall.Han Zhou2020-05-283-1/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports below upstream patches, and add __skb_set_hash to compat for older kernels. commit b5ab1f1be6180a2e975eede18731804b5164a05d Author: Jakub Kicinski <kuba@kernel.org> Date: Mon Mar 2 21:05:18 2020 -0800 openvswitch: add missing attribute validation for hash Add missing attribute validation for OVS_PACKET_ATTR_HASH to the netlink policy. Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit bd1903b7c4596ba6f7677d0dfefd05ba5876707d Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Wed Nov 13 23:04:49 2019 +0800 net: openvswitch: add hash info to upcall When using the kernel datapath, the upcall don't include skb hash info relatived. That will introduce some problem, because the hash of skb is important in kernel stack. For example, VXLAN module uses it to select UDP src port. The tx queue selection may also use the hash in stack. Hash is computed in different ways. Hash is random for a TCP socket, and hash may be computed in hardware, or software stack. Recalculation hash is not easy. Hash of TCP socket is computed: tcp_v4_connect -> sk_set_txhash (is random) __tcp_transmit_skb -> skb_set_hash_from_sk There will be one upcall, without information of skb hash, to ovs-vswitchd, for the first packet of a TCP session. The rest packets will be processed in Open vSwitch modules, hash kept. If this tcp session is forward to VXLAN module, then the UDP src port of first tcp packet is different from rest packets. TCP packets may come from the host or dockers, to Open vSwitch. To fix it, we store the hash info to upcall, and restore hash when packets sent back. +---------------+ +-------------------------+ | Docker/VMs | | ovs-vswitchd | +----+----------+ +-+--------------------+--+ | ^ | | | | | | upcall v restore packet hash (not recalculate) | +-+--------------------+--+ | tap netdev | | vxlan module +---------------> +--> Open vSwitch ko +--> or internal type | | +-------------------------+ Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Aliasgar Ginwala <aginwala@ebay.com> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* compat: Backport ipv6_stub changeGreg Rose2020-05-242-2/+27
| | | | | | | | | | | | | | | | | A patch backported to the Linux stable 4.14 tree and present in the latest stable 4.14.181 kernel breaks ipv6_stub usage. The commit is 8ab8786f78c3 ("net ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup"). Create the compat layer define to check for it and fixup usage in vxlan and geneve modules. Passes Travis here: https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/689798733 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* compat: Fix ipv6_dst_lookup build errorYi-Hung Wei2020-04-302-10/+15
| | | | | | | | | | | | | | | | | | | The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree. This causes travis build failure: * https://travis-ci.org/github/openvswitch/ovs/builds/681084038 This patch updates the backport logic to invoke the right function. Related patch in git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* compat: Fix broken partial backport of extack op parameterGreg Rose2020-04-157-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A series of commits added support for the extended ack parameter to the newlink, changelink and validate ops in the rtnl_link_ops structure: a8b8a889e369d ("net: add netlink_ext_ack argument to rtnl_link_ops.validate") 7a3f4a185169b ("net: add netlink_ext_ack argument to rtnl_link_ops.newlink") ad744b223c521 ("net: add netlink_ext_ack argument to rtnl_link_ops.changelink") These commits were all added at the same time and present since the Linux kernel 4.13 release. In our compatiblity layer we have a define HAVE_EXT_ACK_IN_RTNL_LINKOPS that indicates the presence of the extended ack parameter for these three link operations. At least one distro has only backported two of the three patches, for newlink and changelink, while not backporting patch a8b8a889e369d for the validate op. Our compatibility layer code in acinclude.m4 is able to find the presence of the extack within the rtnl_link_ops structure so it defines HAVE_EXT_ACK_IN_RTNL_LINKOPS but since the validate link op does not have the extack parameter the compilation fails on recent kernels for that particular distro. Other kernel distributions based upon this distro will presumably also encounter the compile errors. Introduce a new function in acinclude.m4 that will find function op definitions and then search for the required parameter. Then use this function to define HAVE_RTNLOP_VALIDATE_WITH_EXTACK so that we can detect and enable correct compilation on kernels which have not backported the entire set of patches. This function is generic to any function op - it need not be in a structure. In places where HAVE_EXT_ACK_IN_RTNL_LINKOPS wraps validate functions replace it with the new HAVE_RTNLOP_VALIDATE_WITH_EXTACK define. Passes Travis here: https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/674599698 Passes a kernel check-kmod test on several systems, including sles12 sp4 4.12.14-95.48-default kernel, without any regressions. VMWare-BZ: #2544032 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* userspace: Add GTP-U support.William Tu2020-03-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | GTP, GPRS Tunneling Protocol, is a group of IP-based communications protocols used to carry general packet radio service (GPRS) within GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling (GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used for setting up GTP-U protocol, which is an IP-in-UDP tunneling protocol. Usually GTP is used in connecting between base station for radio, Serving Gateway (S-GW), and PDN Gateway (P-GW). This patch implements GTP-U protocol for userspace datapath, supporting only required header fields and G-PDU message type. See spec in: https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00 Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784 Signed-off-by: Feng Yang <yangfengee04@gmail.com> Co-authored-by: Feng Yang <yangfengee04@gmail.com> Signed-off-by: Yi Yang <yangyi01@inspur.com> Co-authored-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* compat: Fix nf_ip_hook parameters for RHEL 8Greg Rose2020-03-241-1/+1
| | | | | | | | | | A RHEL release version check was only checking for RHEL releases greater than 7.0 so that ended up including a compat fixup that is not needed for 8.0. Fix up the version check. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* datapath: conntrack: mark expected switch fall-throughGustavo A. R. Silva2020-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 279badc2a85be83e0187b8c566e3b476b76a87a2 Author: Gustavo A. R. Silva <garsilva@embeddedor.com> Date: Thu Oct 19 12:55:03 2017 -0500 openvswitch: conntrack: mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I placed a "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Use nla_parse deprecated functionsJohannes Berg2020-03-065-13/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 8cb081746c031fb164089322e2336a0bf5b3070c Author: Johannes Berg <johannes.berg@intel.com> Date: Fri Apr 26 14:07:28 2019 +0200 netlink: make validation more configurable for future strictness We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Backport portions of this commit applicable to openvswitch and added necessary compatibility layer changes to support older kernels. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Kbuild: Add kcompat.h header to front of NOSTDINCGreg Rose2020-03-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since this commit in the Linux upstream kernel: 'commit 9b9a3f20cbe0 ("kbuild: split final module linking out into Makefile.modfinal")' The openvswitch kernel module fails to build against the upstream Linux kernel. The cause of the build failure is that the include of the KBUILD_EXTMOD variable was dropped in Makefile.modfinal when it was split out from Makefile.modpost. Our Kbuild was setting the ccflags-y variable to include our kcompat.h header as the first header file. The Linux kernel maintainer has said that it is incorrect to rely on the ccflags-y variable for the modfinal phase of the build so that is why KBUILD_EXTMOD is not included. We fix this by breaking a different Linux kernel make rule. We add '-include $(builddir)/kcompat.h' to the front of the NOSTDINC variable setting in our Kbuild makefile. As noted already in the comment for the NOSTDINC setting: \# These include directories have to go before -I$(KSRC)/include. \# NOSTDINC_FLAGS just happens to be a variable that goes in the \# right place, even though it's conceptually incorrect. So we continue the misuse of the NOSTDINC variable to fix this issue as well. The assumption of the Linux kernel maintainers is that any local, out-of-tree build include files can be added to the end of the command line. In our case that is wrong of course, but there is nothing we can do about it that I know of other than using some utility like unifdef to strip out offending chunks of our compatibility layer code before invocation of Makefile.modfinal. That is a big change that would take a lot of work to implement. We could ask the Linux kernel maintainers to provide some way for out-of-tree kernel modules to include their own header files first in a proper manner. I consider that to be a very low probability of success but something we could ask about. For now we cheat and take the easy way out. Reported-by: David Ahern <dsahern@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Use sizeof_field macroPankaj Bharadiya2020-03-069-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c593642c8be046915ca3a4a300243a68077cd207 Author: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Date: Mon Dec 9 10:31:43 2019 -0800 treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net Also added a compatibility layer macro for older kernels that still use FIELD_SIZEOF Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Remove flex_array codeGreg Rose2020-03-061-18/+10
| | | | | | | | | Flex array support is removed since kernel 5.1. Convert to use kvmalloc_array instead. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Move genl_ops policy to genl_familyJohannes Berg2020-03-063-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 3b0f31f2b8c9fb348e4530b88f6b64f9621f83d6 Author: Johannes Berg <johannes.berg@intel.com> Date: Thu Mar 21 22:51:02 2019 +0100 genetlink: make policy common to family Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Since commit 3b0f31f2b8c9f ("genetlink: make policy common to family") the policy field of the genl_ops structure has been moved into the genl_family structure. Add necessary compat layer infrastructure to still support older kernels. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Fix up changes to inet frags in 5.1+Greg Rose2020-03-061-0/+14
| | | | | | | | | | | | Since Linux kernel release 5.1 the fragments field of the inet_frag_queue structure is removed and now only the rb_fragments structure with an rb_node pointer is used for both ipv4 and ipv6. In addition, the atomic_sub and atomic_add functions are replaced with their equivalent long counterparts. Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Remove HAVE_BOOL_TYPEGreg Rose2020-01-312-11/+0
| | | | | | | | | | | OVS only supports Linux kernels since 3.10 and all kernels since then have the bool type. This check is unnecessary so remove it. Passes Travis: https://travis-ci.org/gvrose8192/ovs-experimental/builds/644103253 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: Improved packet drop statistics.Anju Thomas2020-01-071-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently OVS maintains explicit packet drop/error counters only on port level. Packets that are dropped as part of normal OpenFlow processing are counted in flow stats of “drop” flows or as table misses in table stats. These can only be interpreted by controllers that know the semantics of the configured OpenFlow pipeline. Without that knowledge, it is impossible for an OVS user to obtain e.g. the total number of packets dropped due to OpenFlow rules. Furthermore, there are numerous other reasons for which packets can be dropped by OVS slow path that are not related to the OpenFlow pipeline. The generated datapath flow entries include a drop action to avoid further expensive upcalls to the slow path, but subsequent packets dropped by the datapath are not accounted anywhere. Finally, the datapath itself drops packets in certain error situations. Also, these drops are today not accounted for.This makes it difficult for OVS users to monitor packet drop in an OVS instance and to alert a management system in case of a unexpected increase of such drops. Also OVS trouble-shooters face difficulties in analysing packet drops. With this patch we implement following changes to address the issues mentioned above. 1. Identify and account all the silent packet drop scenarios 2. Display these drops in ovs-appctl coverage/show Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Keshav Gupta <keshugupta1@gmail.com> Signed-off-by: Anju Thomas <anju.thomas@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Keshav Gupta <keshugupta1@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* compat: Include confirm_neigh parameter if neededGreg Rose2020-01-072-0/+9
| | | | | | | | | | | | | | A change backported to the Linux 4.14.162 LTS kernel requires a boolean parameter. Check for the presence of the parameter and adjust the caller in that case. Passes check-kmod test with no regressions. Passes Travis build here: https://travis-ci.org/gvrose8192/ovs-experimental/builds/633461320 Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Add support to set user featuresPaul Blakey2019-12-221-0/+3
| | | | | | | | | | | | This enables user features on the kernel datapath via the DP_CMD_SET command, and also retrieves them to check for actual support and not just an older kernel ignoring the requested features. This will be used in next patch to enable recirc_id sharing with tc. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: make generic netlink group constGreg Rose2019-12-022-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 48e48a70c08a8a68f8697f8b30cb83775bda8001 Author: stephen hemminger <stephen@networkplumber.org> Date: Wed Jul 16 11:25:52 2014 -0700 openvswitch: make generic netlink group const Generic netlink tables can be const. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> The equivalent tables in meter.c and conntrack.c are constified so it should be safe to do the same for these and will improve security as well. Original patch slightly modified for out of tree module. Passes check-kmod. Passes Travis. https://travis-ci.org/gvrose8192/ovs-experimental/builds/616880002 Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-upcall: Echo HASH attribute back to datapath.Tonghao Zhang2019-11-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel datapath may sent upcall with hash info, ovs-vswitchd should get it from upcall and then send it back. The reason is that: | When using the kernel datapath, the upcall don't | include skb hash info relatived. That will introduce | some problem, because the hash of skb is important | in kernel stack. For example, VXLAN module uses | it to select UDP src port. The tx queue selection | may also use the hash in stack. | | Hash is computed in different ways. Hash is random | for a TCP socket, and hash may be computed in hardware, | or software stack. Recalculation hash is not easy. | | There will be one upcall, without information of skb | hash, to ovs-vswitchd, for the first packet of a TCP | session. The rest packets will be processed in Open vSwitch | modules, hash kept. If this tcp session is forward to | VXLAN module, then the UDP src port of first tcp packet | is different from rest packets. | | TCP packets may come from the host or dockers, to Open vSwitch. | To fix it, we store the hash info to upcall, and restore hash | when packets sent back. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=bd1903b7c4596ba6f7677d0dfefd05ba5876707d Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Datapath: Change in openvswitch kernel module to support MPLS label depth of ↵Martin Varghese2019-11-224-32/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | 3 in ingress direction. Upstream commit: commit fbdcdd78da7c95f1b970d371e1b23cbd3aa990f3 Author: Martin Varghese <martin.varghese@nokia.com> Date: Mon Nov 4 07:27:44 2019 +0530 Change in Openvswitch to support MPLS label depth of 3 in ingress direction The openvswitch was supporting a MPLS label depth of 1 in the ingress direction though the userspace OVS supports a max depth of 3 labels.This change enables openvswitch module to support a max depth of 3 labels in the ingress. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Add missing inline keywordGreg Rose2019-11-201-1/+1
| | | | | | | | | | The missing inline keyword before the definition of the rpl_nf_ct_tmpl_free() function causes spurious warnings about the function not being used on some older kernels. Add the keyword to suppress the warning. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ip_gre: Remove even more unused codeGreg Rose2019-11-051-102/+0
| | | | | | | | | | | | | | There is a confusing mix of ipgre and gretap functions with some needed for gretap still having ipgre_ prefixes. This time though I think I got the rest of the unused ipgre code. Passes Travis here and this time I made sure the patch passing Travis is the same one I'm mailing. https://travis-ci.org/gvrose8192/ovs-experimental/builds/607296133 Fixes: d5822f428814 ("gre: Remove dead ipgre code") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>