summaryrefslogtreecommitdiff
path: root/datapath
Commit message (Collapse)AuthorAgeFilesLines
* datapath: Add support for kernel 4.16.x & 4.17.xYifeng Sun2018-08-243-46/+9
| | | | | | | | | | | | | | Add support for kernel version up to 4.17.x. On Travis, build passed for all kernel versions. And no new test fails are introduced by this patch. Cleaned up file datapath/linux/compat/include/net/ip6_fib.h which has no effect to kernel module but brings complexity to porting. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: conntrack: Support conntrack zone limitYi-Hung Wei2018-08-175-5/+573
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 11efd5cb04a184eea4f57b68ea63dddd463158d1 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Thu May 24 17:56:43 2018 -0700 openvswitch: Support conntrack zone limit Currently, nf_conntrack_max is used to limit the maximum number of conntrack entries in the conntrack table for every network namespace. For the VMs and containers that reside in the same namespace, they share the same conntrack table, and the total # of conntrack entries for all the VMs and containers are limited by nf_conntrack_max. In this case, if one of the VM/container abuses the usage the conntrack entries, it blocks the others from committing valid conntrack entries into the conntrack table. Even if we can possibly put the VM in different network namespace, the current nf_conntrack_max configuration is kind of rigid that we cannot limit different VM/container to have different # conntrack entries. To address the aforementioned issue, this patch proposes to have a fine-grained mechanism that could further limit the # of conntrack entries per-zone. For example, we can designate different zone to different VM, and set conntrack limit to each zone. By providing this isolation, a mis-behaved VM only consumes the conntrack entries in its own zone, and it will not influence other well-behaved VMs. Moreover, the users can set various conntrack limit to different zone based on their preference. The proposed implementation utilizes Netfilter's nf_conncount backend to count the number of connections in a particular zone. If the number of connection is above a configured limitation, ovs will return ENOMEM to the userspace. If userspace does not configure the zone limit, the limit defaults to zero that is no limitation, which is backward compatible to the behavior without this patch. The following high leve APIs are provided to the userspace: - OVS_CT_LIMIT_CMD_SET: * set default connection limit for all zones * set the connection limit for a particular zone - OVS_CT_LIMIT_CMD_DEL: * remove the connection limit for a particular zone - OVS_CT_LIMIT_CMD_GET: * get the default connection limit for all zones * get the connection limit for a particular zone Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: Add conntrack limit netlink definitionYi-Hung Wei2018-08-171-0/+28
| | | | | | | | | | | | | | | | | | | Upstream commit: commit 5972be6b2495c6bffbf444497517fd1c070eef78 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Thu May 24 17:56:42 2018 -0700 openvswitch: Add conntrack limit netlink definition Define netlink messages and attributes to support user kernel communication that uses the conntrack limit feature. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: compat: Introduce static key supportYi-Hung Wei2018-08-172-0/+71
| | | | | | | | | | | | | | | | | Static keys allow the inclusion of seldom used features in performance-sensitive fast-path kernel code, via a GCC feature and a code patching technique. For more information: * https://www.kernel.org/doc/Documentation/static-keys.txt Since upstream ovs kernel module now uses some static key API that was introduced in v4.3 kernel, we shall backport them to the compat module for older kernel supprots. This backport is based on upstream net-next commit 11276d5306b8 ("locking/static_keys: Add a new static_key interface"). Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: compat: Backports nf_conncountYi-Hung Wei2018-08-173-0/+700
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports the nf_conncount backend that counts the number of connections matching an arbitrary key. The following patch will use the feature to support connection tracking zone limit in ovs kernel datapath. This backport is based on an upstream net-next upstream commits. 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") 34848d5c896e ("netfilter: nf_conncount: Split insert and traversal") 2ba39118c10a ("netfilter: nf_conncount: Move locking into count_tree()") 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup") cb2b36f5a97d ("netfilter: nf_conncount: Switch to plain list") 2a406e8ac7c3 ("netfilter: nf_conncount: Early exit for garbage collection") b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm race") 21ba8847f857 ("netfilter: nf_conncount: Fix garbage collection with zones") 5e5cbc7b23ea ("netfilter: nf_conncount: expose connection list interface") 35d8deb80c30 ("netfilter: conncount: Support count only use case") 6aec208786c2 ("netfilter: Refactor nf_conncount") d384e65f1e75 ("netfilter: return booleans instead of integers") 625c556118f3 ("netfilter: connlimit: split xt_connlimit into front and backend") The upstream nf_conncount has a couple of export functions while this patch only export the ones that ovs kernel module needs. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* compat: Backport nf_ct_netns_{get, put}()Yi-Hung Wei2018-08-174-1/+137
| | | | | | | | | | | | | | | | | | | | | | | | | This patch backports nf_ct_netns_get/put() in order to support a feature in the follow up patch. nf_ct_netns_{get,put} were first introduced in upstream net-next commit ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put") in kernel v4.10, and then updated in commmit 7e35ec0e8044 ("netfilter: conntrack: move nf_ct_netns_{get,put}() to core") in kernel v4.15. We need to invoke nf_ct_netns_get/put() when the underlying nf_conntrack_l3proto supports net_ns_{get,put}(). Therefore, there are 3 cases that we need to consider. 1) Before nf_ct_{get,put}() is introduced. We just mock nf_ct_nets_{get,put}() and do nothing. 2) After 1) and before v4.15 Backports based on commit 7e35ec0e8044 . 3) Staring from v4.15 Use the upstream version. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* porting: Add fixes to support kernel 4.15.xYifeng Sun2018-08-161-0/+10
| | | | | | | | | | | | | | | | This patch enables OVS kernel module to run on kernel 4.15.x. Two conntrack-related tests failed: - conntrack - multiple zones, local - conntrack - multi-stage pipeline, local This might be due to conntrack policy changes for packets coming from local ports on kernel 4.15. More survey will be done later. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Co-authored-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Gregory Rose <gvrose8192@gmail.com> Reviewed-by: Gregory Rose <gvrose8192@gmail.com>
* ip6_gre: Fix a bug that clears address bitsYifeng Sun2018-08-161-4/+0
| | | | | | | | | | | | In compatible gre module, skb->cb is solely used as ovs_gso_cb. However, IPCB(skb) also points to skb->cb. IPCB(skb)->flags overlaps with ovs_gso_cb.tun_dst. As a result, this bug clears the 16-23 bit in the address of ovs_gso_cb.tun_dst and causes kernel to crash. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* ip_tunnel: Fix bugs that could crash kernelYifeng Sun2018-08-151-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this patch, OVS kernel module can delete itn->fb_tunnel_dev one more time than necessary, which causes kernel crash. On kernel 4.4.0-116-generic, the crash can be reproduced by running the simple test provided below through check-kernel. make & make modules_install rmmod ip_gre gre ip_tunnel modprobe openvswitch make check-kernel TESTSUITEFLAGS=x dmesg Simple test: AT_SETUP([datapath - crash test]) OVS_CHECK_GRE() ip link del gre0 OVS_TRAFFIC_VSWITCHD_START() AT_CHECK([ovs-vsctl -- set bridge br0]) ADD_BR([br-underlay], [set bridge br-underlay]) AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) ADD_NAMESPACES(at_ns0) ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24") AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) AT_CHECK([ip link set dev br-underlay up]) ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24]) tcpdump -U -i br-underlay -w underlay.pcap & sleep 1 OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* compat: Substitute more dependable defineGreg Rose2018-08-132-3/+3
| | | | | | | | | | | The compat layer ip_tunnel_get_stats64 function was checking for the Linux kernel version to determine if the return was void or a pointer. This is not very reliable and caused compile warnings on SLES 12 SP3. In acinclude.m4 create a more reliable method of determining when to use a void return vs. a pointer return. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: meter: Fix setting meter id for new entriesJustin Pettit2018-08-071-5/+5
| | | | | | | | | | | | | | | | | | | | | Upstream commit: From: Justin Pettit <jpettit@ovn.org> Date: Sat, 28 Jul 2018 15:26:01 -0700 Subject: [PATCH] openvswitch: meter: Fix setting meter id for new entries The meter code would create an entry for each new meter. However, it would not set the meter id in the new entry, so every meter would appear to have a meter id of zero. This commit properly sets the meter id when adding the entry. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Signed-off-by: Justin Pettit <jpettit@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Justin Pettit <jpettit@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: support upstream ndo_udp_tunnel_add in net_device_opswenxu2018-08-073-6/+101
| | | | | | | | | | | It makes datapath can support both ndo_add_udp_tunnel_port and ndo_add_vxlan/geneve_port. The newer kernels don't support vxlan/geneve specific NDO's anymore Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* ip_gre: remove redundant variables t_hlenYueHaibing2018-08-071-5/+0
| | | | | | | | | | | | | | | | | | Upstream commit: From: YueHaibing <yuehaibing@huawei.com> Date: Wed, 1 Aug 2018 10:04:02 +0800 Subject: [PATCH] ip_gre: remove redundant variables t_hlen After commit ffc2b6ee4174 ("ip_gre: fix IFLA_MTU ignored on NEWLINK") variable t_hlen is assigned values that are never read, hence they are redundant and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ip_gre: fix IFLA_MTU ignored on NEWLINKXin Long2018-08-071-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: From: Xin Long <lucien.xin@gmail.com> Date: Tue, 27 Feb 2018 19:19:39 +0800 Subject: [PATCH] ip_gre: fix IFLA_MTU ignored on NEWLINK It's safe to remove the setting of dev's needed_headroom and mtu in __gre_tunnel_init, as discussed in [1], ip_tunnel_newlink can do it properly. Now Eric noticed that it could cover the mtu value set in do_setlink when creating a ip_gre dev. It makes IFLA_MTU param not take effect. So this patch is to remove them to make IFLA_MTU work, as in other ipv4 tunnels. [1]: https://patchwork.ozlabs.org/patch/823504/ Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: Eric Garver <e@erig.me> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Part of this commit already made it into __gre_tunnel_init but the piece for erspan_tunnel_init did not make it in so fix that now. Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: add transport ports in route lookup for sttQiuyu Xiao2018-07-311-5/+10
| | | | | | | | | | This patch adds transport ports information for route lookup so that IPsec can select stt tunnel traffic to do encryption. Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: add transport ports in route lookup for vxlanQiuyu Xiao2018-07-311-2/+12
| | | | | | | | | | This patch adds transport ports information for route lookup so that IPsec can select vxlan tunnel traffic to do encryption. Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* erspan: set bso bit based on mirrored packet's lenGreg Rose2018-07-311-0/+28
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: Before the patch, the erspan BSO bit (Bad/Short/Oversized) is not handled. BSO has 4 possible values: 00 --> Good frame with no error, or unknown integrity 11 --> Payload is a Bad Frame with CRC or Alignment Error 01 --> Payload is a Short Frame 10 --> Payload is an Oversized Frame Based the short/oversized definitions in RFC1757, the patch sets the bso bit based on the mirrored packet's size. Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* compat: ip6_tunnel: improve error message.William Tu2018-07-311-3/+9
| | | | | | | | | When loading compact ip6 tunnel, if the system already loads upstream kernel's ip6 tunnel, print error message before return. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is foundStefano Brivio2018-07-301-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 72f17baf2352ded6a1d3f4bb2d15da8c678cd2cb Author: Stefano Brivio <sbrivio@redhat.com> Date: Thu May 3 18:13:25 2018 +0200 openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found If an OVS_ATTR_NESTED attribute type is found while walking through netlink attributes, we call nlattr_set() recursively passing the length table for the following nested attributes, if different from the current one. However, once we're done with those sub-nested attributes, we should continue walking through attributes using the current table, instead of using the one related to the sub-nested attributes. For example, given this sequence: 1 OVS_KEY_ATTR_PRIORITY 2 OVS_KEY_ATTR_TUNNEL 3 OVS_TUNNEL_KEY_ATTR_ID 4 OVS_TUNNEL_KEY_ATTR_IPV4_SRC 5 OVS_TUNNEL_KEY_ATTR_IPV4_DST 6 OVS_TUNNEL_KEY_ATTR_TTL 7 OVS_TUNNEL_KEY_ATTR_TP_SRC 8 OVS_TUNNEL_KEY_ATTR_TP_DST 9 OVS_KEY_ATTR_IN_PORT 10 OVS_KEY_ATTR_SKB_MARK 11 OVS_KEY_ATTR_MPLS we switch to the 'ovs_tunnel_key_lens' table on attribute #3, and we don't switch back to 'ovs_key_lens' while setting attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is 15, we also get this kind of KASan splat while accessing the wrong table: [ 7654.586496] ================================================================== [ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch] [ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430 [ 7654.610983] [ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1 [ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 [ 7654.631379] Call Trace: [ 7654.634108] [<ffffffffb65a7c50>] dump_stack+0x19/0x1b [ 7654.639843] [<ffffffffb53ff373>] print_address_description+0x33/0x290 [ 7654.647129] [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch] [ 7654.654607] [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330 [ 7654.661406] [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40 [ 7654.668789] [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch] [ 7654.676076] [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch] [ 7654.684234] [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40 [ 7654.689968] [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590 [ 7654.696574] [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch] [ 7654.705122] [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0 [ 7654.712503] [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21 [ 7654.719401] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.726298] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.733195] [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50 [ 7654.740187] [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0 [ 7654.746406] [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20 [ 7654.752914] [<ffffffffb53fe711>] ? memset+0x31/0x40 [ 7654.758456] [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch] [snip] [ 7655.132484] The buggy address belongs to the variable: [ 7655.138226] ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch] [ 7655.145507] [ 7655.147166] Memory state around the buggy address: [ 7655.152514] ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa [ 7655.160585] ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa [ 7655.176701] ^ [ 7655.184372] ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05 [ 7655.192431] ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.200490] ================================================================== Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 982b52700482 ("openvswitch: Fix mask generation for nested attributes.") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: NAT support for shifted portmap rangesYi-Hung Wei2018-07-301-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports the following upstream commit from net-next, and defines HAVE_NF_NAT_RANGE2 to determine whether to use 'struct nf_nat_range2'. Upstream commit: commit 2eb0f624b709e78ec8e2f4c3412947703db99301 Author: Thierry Du Tre <thierry@dtsystems.be> Date: Wed Apr 4 15:38:22 2018 +0200 netfilter: add NAT support for shifted portmap ranges This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100) Currently DNAT only works for single port or identical port ranges. (i.e. ports 5000-5100 on WAN interface redirected to a LAN host while original destination port is not altered) When different port ranges are configured, either 'random' mode should be used, or else all incoming connections are mapped onto the first port in the redirect range. (in described example WAN:5000-5100 will all be mapped to 192.168.1.5:2000) This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET which uses a base port value to calculate an offset with the destination port present in the incoming stream. That offset is then applied as index within the redirect port range (index modulo rangewidth to handle range overflow). In described example the base port would be 5000. An incoming stream with destination port 5004 would result in an offset value 4 which means that the NAT'ed stream will be using destination port 2004. Other possibilities include deterministic mapping of larger or multiple ranges to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port 51xx) This patch does not change any current behavior. It just adds new NAT proto range functionality which must be selected via the specific flag when intended to use. A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed which makes this functionality immediately available. Signed-off-by: Thierry Du Tre <thierry@dtsystems.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: Introduce net_rwsem and remove rtnl_lock()Yi-Hung Wei2018-07-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports the following two upstream commits and add a new symbol HAVE_NET_RWSEM in acinclude.m4 to determine whether to use new introduced rw_semaphore, net_rwsem. Upstream commit: commit f0b07bb151b098d291fd1fd71ef7a2df56fb124a Author: Kirill Tkhai <ktkhai@virtuozzo.com> Date: Thu Mar 29 19:20:32 2018 +0300 net: Introduce net_rwsem to protect net_namespace_list rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit ec9c780925c57588637e1dbd8650d294107311c0 Author: Kirill Tkhai <ktkhai@virtuozzo.com> Date: Thu Mar 29 19:21:09 2018 +0300 ovs: Remove rtnl_lock() from ovs_exit_net() Here we iterate for_each_net() and removes vport from alive net to the exiting net. ovs_net::dps are protected by ovs_mutex(), and the others, who change it (ovs_dp_cmd_new(), __dp_destroy()) also take it. The same with datapath::ports list. So, we remove rtnl_lock() here. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: meter: fix the incorrect calculation of max delta_tzhangliping2018-07-301-3/+9
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit ddc502dfed600bff0b61d899f70d95b76223fdfc Author: zhangliping <zhangliping02@baidu.com> Date: Fri Mar 9 10:08:50 2018 +0800 openvswitch: meter: fix the incorrect calculation of max delta_t Max delat_t should be the full_bucket/rate instead of the full_bucket. Also report EINVAL if the rate is zero. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* compat: Allow IPv6 GRE/ERSPAN Tx when ip6_gre is loadedGreg Rose2018-07-272-12/+46
| | | | | | | | | | | | | When for some reason the built-in kernel ip6_gre module is loaded that would prevent the openvswitch kernel driver from loading. Even when the built-in kernel ip6_gre module is loaded we can still perform port mirroring via Tx. Adjust the error handling and detect when the ip6_gre kernel module is loaded and in that case still enable IPv6 GRE/ERSPAN Tx. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* compat: Initialize IPv4 reassembly secret timerGreg Rose2018-07-273-10/+6
| | | | | | | | | | | | | | | | | | The RHEL 7 kernels expect the secret timer interval to be initialized before calling the inet_frags_init() function. By not initializing it the inet_frags_secret_rebuild() function was running on every tick rather than on the expected interval. This caused occasional panics from page faults when inet_frags_secret_rebuild() would try to rearm a timer from the openvswitch kernel module which had just been removed. Also remove the prior, and now unnecessary, work around. VMware BZ 2094203 Fixes: 595e069a ("compat: Backport IPv4 reassembly.") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
* datapath: work around the single GRE receive limitation.William Tu2018-07-112-16/+56
| | | | | | | | | | | | | | | | | | Commit 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook") allows only single GRE packet receiver. When upstream kernel's gre module is loaded, the gre.ko exclusively becomes the only gre packet receiver, preventing OVS kernel module from registering another gre receiver. We can either try to unload the gre.ko by removing its dependencies, or, in this patch, we try to register OVS as only the GRE transmit portion when detecting there already exists another GRE receiver. Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Cc: Greg Rose <gvrose8192@gmail.com> Cc: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Add missing code in ip_tunnel_lookup()Greg Rose2018-06-292-2/+5
| | | | | | | | | | | | | | The compat rpl_ip_tunnel_lookup() function was missing some code added in Linux kernel release 4.3 but not backported in the initial commit. This also allows us to remove an old hack in erspan_rcv() that was zeroing out the key parameter so that the tunnel lookups wouldn't fail. Fixes: 8e53509c ("gre: introduce native tunnel support for ERSPAN") Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* compat: Fix gre header bugGreg Rose2018-06-291-4/+6
| | | | | | | | | | | | Commit 436d36db introduced a bug into the gre header build for gre and ip gre type tunnels. __vlan_hwaccel_push_inside does not check whether the vlan tag is even present. So check first and avoid padding space for a vlan tag that isn't present. Fixes: 436d36db ("compat: Fixups for newer kernels") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* datapath: stt: linearize in SKIP_ZERO_COPY caseNeal Shrader via dev2018-06-251-7/+0
| | | | | | | | | | | | | | | | | | | | During the investigation of a kernel panic, we encountered a condition that triggered a kernel panic due to a large skb with an unusual geometry. Inside of the STT codepath, an effort is made to linearize such packets to avoid trouble during both fragment reassembly and segmentation in the linux networking core. As currently implemented, kernels with CONFIG_SLUB defined will skip this process because it does not expect an skb with a frag_list to be present. This patch removes the assumption, and allows these skb to be linearized as intended. We confirmed this corrects the panic we encountered. Reported-by: Johannes Erdfelt <johannes@erdfelt.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-May/046800.html Requested-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: Neal Shrader <neal@digitalocean.com> Signed-off-by: Pravin Shelar <pshelar@ovn.org>
* datapath: Add meter action support.Andy Zhou2018-06-204-1/+14
| | | | | | | | | | | | | | | | | | Upstream commit: commit cd8a6c33693c1b89d2737ffdbf9611564e9ac907 Author: Andy Zhou <azhou@ovn.org> Date: Fri Nov 10 12:09:43 2017 -0800 openvswitch: Add meter action support Implements OVS kernel meter action support. Signed-off-by: Andy Zhou <azhou@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Justin Pettit <jpettit@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Fix compiler warning for HAVE_RHEL7_MAX_MTU.Justin Pettit2018-06-201-1/+1
| | | | | | Fixes: 1e40b541bc ("datapath: Fix max MTU size on RHEL 7.5 kernel") Signed-off-by: Justin Pettit <jpettit@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* datapath: compat: Fix RHEL 7.5 build warning from ip_tunnel_get_stats64()Yi-Hung Wei2018-06-142-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes warning as the following in RHEL 7.5 kernel. CC [M] /root/git/ovs/datapath/linux/geneve.o /root/git/ovs/datapath/linux/geneve.c:1273:2: warning: initialization from incompatible pointer type [enabled by default] .ndo_get_stats64 = ip_tunnel_get_stats64, ^ /root/git/ovs/datapath/linux/geneve.c:1273:2: warning: (near initialization for ‘geneve_netdev_ops.<anonymous>.ndo_get_stats64’) [enabled by default] /root/git/ovs/datapath/linux/ip_gre.c:1162:2: warning: initialization from incompatible pointer type [enabled by default] .ndo_get_stats64 = ip_tunnel_get_stats64, ^ /root/git/ovs/datapath/linux/ip_gre.c:1162:2: warning: (near initialization for ‘ipgre_netdev_ops.<anonymous>.ndo_get_stats64’) [enabled by default] /root/git/ovs/datapath/linux/ip_gre.c:1180:2: warning: initialization from incompatible pointer type [enabled by default] .ndo_get_stats64 = ip_tunnel_get_stats64, ^ Fixes: 436d36db ("compat: Fixups for newer kernels") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Fix ip6_gre, ip6_tunnel, and ip_gre backportYi-Hung Wei2018-06-143-0/+30
| | | | | | | | | | | | | Recently added ERSAPN feature introduced changes in ip6_gre, ip6_tunnel, and ip_gre which will break build on RHEL 7.5 kernel because of ndo_change_mtu(). This patch fixes the issue in RHEL 7.5 kernel. Fixes: 8e53509c ("gre: introduce native tunnel support for ERSPAN") Fixes: c387d817 ("compat: Add ipv6 GRE and IPV6 Tunneling") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Fix max MTU size on RHEL 7.5 kernelYi-Hung Wei2018-06-141-0/+2
| | | | | | | | | | | | | Without the patch, in RHEL 7.5, the maximum configurable MTU of vport internal device is 1500, which shall be 65535. This patch fixes this issue. Fixes: 39ca338374ab ("datapath: compat: Fix build on RHEL 7.5") Reported-by: Lucas Alvares Gomes <lucasagomes@gmail.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Check if gre kernel module is loadedGreg Rose2018-06-071-0/+4
| | | | | | | | | | Before attempting to add a gre tunnel to OVS via the vport gre kernel interface make sure that the openvswitch kernel module has been able to grab the gre protocol entry point. If OVS does not own the gre protocol then report address family not supported. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Do not fail to load on gre protocol conflictGreg Rose2018-06-052-10/+19
| | | | | | | | | | | | | | | | | | | | The ERSPAN feature depends on the gre kernel module so on systems where the ERSPAN feature isn't supported the openvswitch kernel module would attempt to grab the ipv4 GRE protocol entry point and would fail to load if it could not. This patch modifies openvswitch to not fail to load when the gre kernel module is loaded and instead it will print a warning message to the kernel system log indicating that the ERSPAN feature may not be available. We need this patch because users are experiencing failures due to the conflicts and high priority bugs are resulting. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Fix compile warningGreg Rose2018-06-041-1/+2
| | | | | | | | | Fix compile warning about redefined symbol Fixes: 10f242363d ("compat: Add skb_checksum_simple_complete()") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Add skb_checksum_simple_complete()Greg Rose2018-06-041-0/+19
| | | | | | | | | | A recent patch to gre.c added a call to skb_checksum_simple_complete() which is not present in kernels before 3.16. Fix up the compatability layer to allow compile on older kernels that do not have it. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat: Fixups for newer kernelsGreg Rose2018-05-3113-286/+161
| | | | | | | | | | | | | | | | | | | | | | A recent patch series added support for ERSPAN but left some problems remaining for kernel releases from 4.10 to 4.14. This patch addresses those problems. Of note is that the old cisco gre compat layer code is gone for good. Also, several compat defines in acinclude.m4 were looking for keys in .c source files - this does not work on distros without source code. A more reliable key was already defined so we use that instead. We have pared support for the Linux kernel releases in .travis.yml to reflect that 4.15 is no longer in the LTS list. With this patch the Out of Tree OVS datapath kernel modules can build on kernels up to 4.14.47. Support for kernels up to 4.16.x will be added later. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: ip6_gre: fix tunnel metadata device sharing.William Tu2018-05-291-31/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b80d0b93b991e551a32157e0d9d38fc5bc9348a7 Author: William Tu <u9012063@gmail.com> Date: Fri May 18 19:22:28 2018 -0700 net: ip6_gre: fix tunnel metadata device sharing. Currently ip6gre and ip6erspan share single metadata mode device, using 'collect_md_tun'. Thus, when doing: ip link add dev ip6gre11 type ip6gretap external ip link add dev ip6erspan12 type ip6erspan external RTNETLINK answers: File exists simply fails due to the 2nd tries to create the same collect_md_tun. The patch fixes it by adding a separate collect md tunnel device for the ip6erspan, 'collect_md_tun_erspan'. As a result, a couple of places need to refactor/split up in order to distinguish ip6gre and ip6erspan. First, move the collect_md check at ip6gre_tunnel_{unlink,link} and create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}. Then before link/unlink, make sure the link_md/unlink_md is called. Finally, a separate ndo_uninit is created for ip6erspan. Tested it using the samples/bpf/test_tunnel_bpf.sh. Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Fix ip6erspan hlen calculationWilliam Tu2018-05-291-9/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2d665034f239412927b1e71329f20f001c92da09 Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:51 2018 +0200 net: ip6_gre: Fix ip6erspan hlen calculation Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which overwrites these settings with GRE-specific ones. Similarly for changelink callbacks, which are handled by ip6gre_changelink() calls ip6gre_tnl_change() calls ip6gre_tnl_link_config() as well. The difference ends up being 12 vs. 20 bytes, and this is generally not a problem, because a 12-byte request likely ends up allocating more and the extra 8 bytes are thus available. However correct it is not. So replace the newlink and changelink callbacks with an ERSPAN-specific ones, reusing the newly-introduced _common() functions. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Split up ip6gre_changelink()William Tu2018-05-291-11/+37
| | | | | | | | | | | | | | | | | | | | | | | | commit c8632fc30bb03aa0c3bd7bcce85355a10feb8149 Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:45 2018 +0200 net: ip6_gre: Split up ip6gre_changelink() Extract from ip6gre_changelink() a reusable function ip6gre_changelink_common(). This will allow introduction of ERSPAN-specific _changelink() function with not a lot of code duplication. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Split up ip6gre_newlink()William Tu2018-05-291-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 7fa38a7c852ec99e3a7fc375eb2c21c50c2e46b8 Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:39 2018 +0200 net: ip6_gre: Split up ip6gre_newlink() Extract from ip6gre_newlink() a reusable function ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be made customizable for ERSPAN, thus reorder it with calls to ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink() function without a lot of duplicity. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Split up ip6gre_tnl_change()William Tu2018-05-291-2/+8
| | | | | | | | | | | | | | | | | | | | | | | commit a6465350ef495f5cbd76a3e505d25a01d648477e Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:33 2018 +0200 net: ip6_gre: Split up ip6gre_tnl_change() Split a reusable function ip6gre_tnl_copy_tnl_parm() from ip6gre_tnl_change(). This will allow ERSPAN-specific code to reuse the common parts while customizing the behavior for ERSPAN. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Split up ip6gre_tnl_link_config()William Tu2018-05-291-12/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | commit a483373ead61e6079bc8ebe27e2dfdb2e3c1559f Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:27 2018 +0200 net: ip6_gre: Split up ip6gre_tnl_link_config() The function ip6gre_tnl_link_config() is used for setting up configuration of both ip6gretap and ip6erspan tunnels. Split the function into the common part and the route-lookup part. The latter then takes the calculated header length as an argument. This split will allow the patches down the line to sneak in a custom header length computation for the ERSPAN tunnel. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()William Tu2018-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5691484df961aff897d824bcc26cd1a2aa036b5b Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:15 2018 +0200 net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit() dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts out zero. Thus the call to skb_cow_head() fails to actually make sure there's enough headroom to push the ERSPAN headers to. That can lead to the panic cited below. (Reproducer below that). Fix by requesting either needed_headroom if already primed, or just the bare minimum needed for the header otherwise. [ 190.703567] kernel BUG at net/core/skbuff.c:104! [ 190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_t emp_thermal mlx_platform nfsd e1000e leds_mlxcpld [ 190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10 [ 190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 190.747006] Workqueue: ipv6_addrconf addrconf_dad_work [ 190.752222] RIP: 0010:skb_panic+0xc3/0x100 [ 190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282 [ 190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000 [ 190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54 [ 190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19 [ 190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe [ 190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8 [ 190.797621] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000 [ 190.805786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0 [ 190.818790] Call Trace: [ 190.821264] <IRQ> [ 190.823314] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.828940] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.834562] skb_push+0x78/0x90 [ 190.837749] ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.843219] ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre] [ 190.848577] ? debug_check_no_locks_freed+0x210/0x210 [ 190.853679] ? debug_check_no_locks_freed+0x210/0x210 [ 190.858783] ? print_irqtrace_events+0x120/0x120 [ 190.863451] ? sched_clock_cpu+0x18/0x210 [ 190.867496] ? cyc2ns_read_end+0x10/0x10 [ 190.871474] ? skb_network_protocol+0x76/0x200 [ 190.875977] dev_hard_start_xmit+0x137/0x770 [ 190.880317] ? do_raw_spin_trylock+0x6d/0xa0 [ 190.884624] sch_direct_xmit+0x2ef/0x5d0 [ 190.888589] ? pfifo_fast_dequeue+0x3fa/0x670 [ 190.892994] ? pfifo_fast_change_tx_queue_len+0x810/0x810 [ 190.898455] ? __lock_is_held+0xa0/0x160 [ 190.902422] __qdisc_run+0x39e/0xfc0 [ 190.906041] ? _raw_spin_unlock+0x29/0x40 [ 190.910090] ? pfifo_fast_enqueue+0x24b/0x3e0 [ 190.914501] ? sch_direct_xmit+0x5d0/0x5d0 [ 190.918658] ? pfifo_fast_dequeue+0x670/0x670 [ 190.923047] ? __dev_queue_xmit+0x172/0x1770 [ 190.927365] ? preempt_count_sub+0xf/0xd0 [ 190.931421] __dev_queue_xmit+0x410/0x1770 [ 190.935553] ? ___slab_alloc+0x605/0x930 [ 190.939524] ? print_irqtrace_events+0x120/0x120 [ 190.944186] ? memcpy+0x34/0x50 [ 190.947364] ? netdev_pick_tx+0x1c0/0x1c0 [ 190.951428] ? __skb_clone+0x2fd/0x3d0 [ 190.955218] ? __copy_skb_header+0x270/0x270 [ 190.959537] ? rcu_read_lock_sched_held+0x93/0xa0 [ 190.964282] ? kmem_cache_alloc+0x344/0x4d0 [ 190.968520] ? cyc2ns_read_end+0x10/0x10 [ 190.972495] ? skb_clone+0x123/0x230 [ 190.976112] ? skb_split+0x820/0x820 [ 190.979747] ? tcf_mirred+0x554/0x930 [act_mirred] [ 190.984582] tcf_mirred+0x554/0x930 [act_mirred] [ 190.989252] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred] [ 190.996109] ? __lock_acquire+0x706/0x26e0 [ 191.000239] ? sched_clock_cpu+0x18/0x210 [ 191.004294] tcf_action_exec+0xcf/0x2a0 [ 191.008179] tcf_classify+0xfa/0x340 [ 191.011794] __netif_receive_skb_core+0x8e1/0x1c60 [ 191.016630] ? debug_check_no_locks_freed+0x210/0x210 [ 191.021732] ? nf_ingress+0x500/0x500 [ 191.025458] ? process_backlog+0x347/0x4b0 [ 191.029619] ? print_irqtrace_events+0x120/0x120 [ 191.034302] ? lock_acquire+0xd8/0x320 [ 191.038089] ? process_backlog+0x1b6/0x4b0 [ 191.042246] ? process_backlog+0xc2/0x4b0 [ 191.046303] process_backlog+0xc2/0x4b0 [ 191.050189] net_rx_action+0x5cc/0x980 [ 191.053991] ? napi_complete_done+0x2c0/0x2c0 [ 191.058386] ? mark_lock+0x13d/0xb40 [ 191.062001] ? clockevents_program_event+0x6b/0x1d0 [ 191.066922] ? print_irqtrace_events+0x120/0x120 [ 191.071593] ? __lock_is_held+0xa0/0x160 [ 191.075566] __do_softirq+0x1d4/0x9d2 [ 191.079282] ? ip6_finish_output2+0x524/0x1460 [ 191.083771] do_softirq_own_stack+0x2a/0x40 [ 191.087994] </IRQ> [ 191.090130] do_softirq.part.13+0x38/0x40 [ 191.094178] __local_bh_enable_ip+0x135/0x190 [ 191.098591] ip6_finish_output2+0x54d/0x1460 [ 191.102916] ? ip6_forward_finish+0x2f0/0x2f0 [ 191.107314] ? ip6_mtu+0x3c/0x2c0 [ 191.110674] ? ip6_finish_output+0x2f8/0x650 [ 191.114992] ? ip6_output+0x12a/0x500 [ 191.118696] ip6_output+0x12a/0x500 [ 191.122223] ? ip6_route_dev_notify+0x5b0/0x5b0 [ 191.126807] ? ip6_finish_output+0x650/0x650 [ 191.131120] ? ip6_fragment+0x1a60/0x1a60 [ 191.135182] ? icmp6_dst_alloc+0x26e/0x470 [ 191.139317] mld_sendpack+0x672/0x830 [ 191.143021] ? igmp6_mcf_seq_next+0x2f0/0x2f0 [ 191.147429] ? __local_bh_enable_ip+0x77/0x190 [ 191.151913] ipv6_mc_dad_complete+0x47/0x90 [ 191.156144] addrconf_dad_completed+0x561/0x720 [ 191.160731] ? addrconf_rs_timer+0x3a0/0x3a0 [ 191.165036] ? mark_held_locks+0xc9/0x140 [ 191.169095] ? __local_bh_enable_ip+0x77/0x190 [ 191.173570] ? addrconf_dad_work+0x50d/0xa20 [ 191.177886] ? addrconf_dad_work+0x529/0xa20 [ 191.182194] addrconf_dad_work+0x529/0xa20 [ 191.186342] ? addrconf_dad_completed+0x720/0x720 [ 191.191088] ? __lock_is_held+0xa0/0x160 [ 191.195059] ? process_one_work+0x45d/0xe20 [ 191.199302] ? process_one_work+0x51e/0xe20 [ 191.203531] ? rcu_read_lock_sched_held+0x93/0xa0 [ 191.208279] process_one_work+0x51e/0xe20 [ 191.212340] ? pwq_dec_nr_in_flight+0x200/0x200 [ 191.216912] ? get_lock_stats+0x4b/0xf0 [ 191.220788] ? preempt_count_sub+0xf/0xd0 [ 191.224844] ? worker_thread+0x219/0x860 [ 191.228823] ? do_raw_spin_trylock+0x6d/0xa0 [ 191.233142] worker_thread+0xeb/0x860 [ 191.236848] ? process_one_work+0xe20/0xe20 [ 191.241095] kthread+0x206/0x300 [ 191.244352] ? process_one_work+0xe20/0xe20 [ 191.248587] ? kthread_stop+0x570/0x570 [ 191.252459] ret_from_fork+0x3a/0x50 [ 191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24 [ 191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0 [ 191.281024] ---[ end trace 7ea51094e099e006 ]--- [ 191.285724] Kernel panic - not syncing: Fatal exception in interrupt [ 191.292168] Kernel Offset: disabled [ 191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Reproducer: ip link add h1 type veth peer name swp1 ip link add h3 type veth peer name swp3 ip link set dev h1 up ip address add 192.0.2.1/28 dev h1 ip link add dev vh3 type vrf table 20 ip link set dev h3 master vh3 ip link set dev vh3 up ip link set dev h3 up ip link set dev swp3 up ip address add dev swp3 2001:db8:2::1/64 ip link set dev swp1 up tc qdisc add dev swp1 clsact ip link add name gt6 type ip6erspan \ local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123 ip link set dev gt6 up sleep 1 tc filter add dev swp1 ingress pref 1000 matchall skip_hw \ action mirred egress mirror dev gt6 ping -I h1 192.0.2.2 Fixes: e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit.") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ip6_gre: Request headroom in __gre6_xmit()William Tu2018-05-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 01b8d064d58b4c1f0eff47f8fe8a8508cb3b3840 Author: Petr Machata <petrm@mellanox.com> Date: Thu May 17 16:36:10 2018 +0200 net: ip6_gre: Request headroom in __gre6_xmit() __gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit() for generic IP-in-IP processing. However it doesn't make sure that there is enough headroom to push the header to. That can lead to the panic cited below. (Reproducer below that). Fix by requesting either needed_headroom if already primed, or just the bare minimum needed for the header otherwise. [ 158.576725] kernel BUG at net/core/skbuff.c:104! [ 158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_t emp_thermal mlx_platform nfsd e1000e leds_mlxcpld [ 158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10 [ 158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 158.620426] RIP: 0010:skb_panic+0xc3/0x100 [ 158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286 [ 158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000 [ 158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18 [ 158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19 [ 158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66 [ 158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68 [ 158.666007] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000 [ 158.674212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0 [ 158.687228] Call Trace: [ 158.689752] ? __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.694475] ? __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.699141] skb_push+0x78/0x90 [ 158.702344] __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.706872] ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre] [ 158.711992] ? __gre6_xmit+0xd80/0xd80 [ip6_gre] [ 158.716668] ? debug_check_no_locks_freed+0x210/0x210 [ 158.721761] ? print_irqtrace_events+0x120/0x120 [ 158.726461] ? sched_clock_cpu+0x18/0x210 [ 158.730572] ? sched_clock_cpu+0x18/0x210 [ 158.734692] ? cyc2ns_read_end+0x10/0x10 [ 158.738705] ? skb_network_protocol+0x76/0x200 [ 158.743216] ? netif_skb_features+0x1b2/0x550 [ 158.747648] dev_hard_start_xmit+0x137/0x770 [ 158.752010] sch_direct_xmit+0x2ef/0x5d0 [ 158.755992] ? pfifo_fast_dequeue+0x3fa/0x670 [ 158.760460] ? pfifo_fast_change_tx_queue_len+0x810/0x810 [ 158.765975] ? __lock_is_held+0xa0/0x160 [ 158.770002] __qdisc_run+0x39e/0xfc0 [ 158.773673] ? _raw_spin_unlock+0x29/0x40 [ 158.777781] ? pfifo_fast_enqueue+0x24b/0x3e0 [ 158.782191] ? sch_direct_xmit+0x5d0/0x5d0 [ 158.786372] ? pfifo_fast_dequeue+0x670/0x670 [ 158.790818] ? __dev_queue_xmit+0x172/0x1770 [ 158.795195] ? preempt_count_sub+0xf/0xd0 [ 158.799313] __dev_queue_xmit+0x410/0x1770 [ 158.803512] ? ___slab_alloc+0x605/0x930 [ 158.807525] ? ___slab_alloc+0x605/0x930 [ 158.811540] ? memcpy+0x34/0x50 [ 158.814768] ? netdev_pick_tx+0x1c0/0x1c0 [ 158.818895] ? __skb_clone+0x2fd/0x3d0 [ 158.822712] ? __copy_skb_header+0x270/0x270 [ 158.827079] ? rcu_read_lock_sched_held+0x93/0xa0 [ 158.831903] ? kmem_cache_alloc+0x344/0x4d0 [ 158.836199] ? skb_clone+0x123/0x230 [ 158.839869] ? skb_split+0x820/0x820 [ 158.843521] ? tcf_mirred+0x554/0x930 [act_mirred] [ 158.848407] tcf_mirred+0x554/0x930 [act_mirred] [ 158.853104] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred] [ 158.860005] ? __lock_acquire+0x706/0x26e0 [ 158.864162] ? mark_lock+0x13d/0xb40 [ 158.867832] tcf_action_exec+0xcf/0x2a0 [ 158.871736] tcf_classify+0xfa/0x340 [ 158.875402] __netif_receive_skb_core+0x8e1/0x1c60 [ 158.880334] ? nf_ingress+0x500/0x500 [ 158.884059] ? process_backlog+0x347/0x4b0 [ 158.888241] ? lock_acquire+0xd8/0x320 [ 158.892050] ? process_backlog+0x1b6/0x4b0 [ 158.896228] ? process_backlog+0xc2/0x4b0 [ 158.900291] process_backlog+0xc2/0x4b0 [ 158.904210] net_rx_action+0x5cc/0x980 [ 158.908047] ? napi_complete_done+0x2c0/0x2c0 [ 158.912525] ? rcu_read_unlock+0x80/0x80 [ 158.916534] ? __lock_is_held+0x34/0x160 [ 158.920541] __do_softirq+0x1d4/0x9d2 [ 158.924308] ? trace_event_raw_event_irq_handler_exit+0x140/0x140 [ 158.930515] run_ksoftirqd+0x1d/0x40 [ 158.934152] smpboot_thread_fn+0x32b/0x690 [ 158.938299] ? sort_range+0x20/0x20 [ 158.941842] ? preempt_count_sub+0xf/0xd0 [ 158.945940] ? schedule+0x5b/0x140 [ 158.949412] kthread+0x206/0x300 [ 158.952689] ? sort_range+0x20/0x20 [ 158.956249] ? kthread_stop+0x570/0x570 [ 158.960164] ret_from_fork+0x3a/0x50 [ 158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24 [ 158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110 [ 158.988935] ---[ end trace 5af56ee845aa6cc8 ]--- [ 158.993641] Kernel panic - not syncing: Fatal exception in interrupt [ 159.000176] Kernel Offset: disabled [ 159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Reproducer: ip link add h1 type veth peer name swp1 ip link add h3 type veth peer name swp3 ip link set dev h1 up ip address add 192.0.2.1/28 dev h1 ip link add dev vh3 type vrf table 20 ip link set dev h3 master vh3 ip link set dev vh3 up ip link set dev h3 up ip link set dev swp3 up ip address add dev swp3 2001:db8:2::1/64 ip link set dev swp1 up tc qdisc add dev swp1 clsact ip link add name gt6 type ip6gretap \ local 2001:db8:2::1 remote 2001:db8:2::2 ip link set dev gt6 up sleep 1 tc filter add dev swp1 ingress pref 1000 matchall skip_hw \ action mirred egress mirror dev gt6 ping -I h1 192.0.2.2 Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com>
* userspace datapath: Add OVS_HASH_L4_SYMMETRIC dp_hash algorithmJan Scheurich2018-05-251-0/+4
| | | | | | | | | | | | | | | | This commit implements a new dp_hash algorithm OVS_HASH_L4_SYMMETRIC in the netdev datapath. It will be used as default hash algorithm for the dp_hash-based select groups in a subsequent commit to maintain compatibility with the symmetry property of the current default hash selection method. A new dpif_backer_support field 'max_hash_alg' is introduced to reflect the highest hash algorithm a datapath supports in the dp_hash action. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com> Co-authored-by: Nitin Katiyar <nitin.katiyar@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compati/ip_gre: remove duplicate vport definition.William Tu2018-05-251-4/+0
| | | | | | | | | Clean up the duplicate definition of OVS_VPORT_TYPE_ERSPAN since it is defined in openvswitch.h. Cc: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: compat: Fix ndo_size in RHEL 7.5 backportYi-Hung Wei2018-05-235-0/+6
| | | | | | | | | | | | If 'ndo_size' is not set in 'struct net_device_ops', RHEL kernel will not make use of functions in 'struct net_device_ops_extended'. Fixes: 39ca338374ab ("datapath: compat: Fix build on RHEL 7.5") Reported-by: Jiri Benc <jbenc@redhat.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347070.html Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Jiri Benc <jbenc@redhat.com>
* erspan: fix invalid erspan version.William Tu2018-05-211-1/+3
| | | | | | | | | | | | | | | | ERSPAN only support version 1 and 2. When packets send to an erspan device which does not have proper version number set, drop the packet. In real case, we observe multicast packets sent to the erspan pernet device, erspan0, which does not have erspan version configured. Without this patch, we observe warning message from ovs-vswitchd as below, due to receive an malformed erspan packet: odp_util|WARN|odp_tun_key_from_attr__ invalid erspan version Reported-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>