summaryrefslogtreecommitdiff
path: root/datapath
Commit message (Collapse)AuthorAgeFilesLines
* datapath: Ensure correct L4 checksum with NAT helpers.John Hurley2017-01-061-30/+17
| | | | | | | | | | | | | | | | Setting the CHECKSUM_PARTIAL flag before sending to helper mods was missing the checksum update call ('csum_*_magic()'), which caused checksum failures with kernels <4.6. This can mean that the L4 checksum is incorrect when the packet egresses the system. Rather than adding the missing (IP version dependent) calls, give the packet a temp skb_dst with RTCF_LOCAL flag not set, which ensures the skb is properly changed to CHECKSUM_PARTIAL if required and the modified packet will get the correct checksum when fully processed. This has tested with FTP NAT helpers on kernel version 3.13. Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: compat: Fix build on RHEL 7.3Yi-Hung Wei2016-12-146-4/+43
| | | | | | | | | | | | RHEL 7.3 provides upstream tunnel but it does not support name_assign_type attribute in net-device. This patch fixes the build problem by backporting functions with name_assign_type, and using proper flags in acinclude.m4 to invoke backport functions. Tested on RHEL 7.3 with kernel 3.10.0-514.el7.x86_64 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* doc: Populate 'topics' sectionStephen Finucane2016-12-122-268/+0
| | | | | | | | | | | There are many docs that don't need to kept at the top level, along with many more hidden in random folders. Move them all. This also allows us to add the '-W' flag to Sphinx, ensuring unindexed docs result in build failures. Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Fix compile time assertion.Jarno Rajahalme2016-12-091-1/+3
| | | | | | | | compiletime_assert() cannot be used in file scope, so use preprocessor directives instead. Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Allow compile against current net-next.Jarno Rajahalme2016-12-094-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows openvswitch kernel module in the OVS tree to be compiled against the current net-next Linux kernel. The changes are due to these upstream commits: 56989f6d856 ("genetlink: mark families as __ro_after_init") 489111e5c25 ("genetlink: statically initialize families") a07ea4d9941 ("genetlink: no longer support using static family IDs") struct genl_family initialization is changed be completely static and to include the new (in Linux 4.6) __ro_after_init attribute. Compat code defines it as an empty macro if not defined already. GENL_ID_GENERATE is no longer defined, but since it was defined as 0, it is safe to drop it from all initializers also on older Linux versions. A compiletime_assert is added to make sure this is true whenever GENL_ID_GENERATE is defined. Tested with current Linux net-next (4.9) and 3.16. It should be noted that there are still a number of fixes and new features in upstream net-next that are yet to be backported. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* datapath: backport: openvswitch: Fix skb leak in IPv6 reassembly.Daniele Di Proietto2016-11-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | commit f92a80a9972175a6a1d36c6c44be47fb0efd020d Author: Daniele Di Proietto <diproiettod@ovn.org> Date: Mon Nov 28 15:43:53 2016 -0800 openvswitch: Fix skb leak in IPv6 reassembly. If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it means that we still have a reference to the skb. We should free it before returning from handle_fragments, as stated in the comment above. Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion") CC: Florian Westphal <fw@strlen.de> CC: Pravin B Shelar <pshelar@ovn.org> CC: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> VMware-BZ: #1728498 Fixes: 2e602ea3dafa("compat: nf_defrag_ipv6: avoid nf_iterate recursion.") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: compat: vxlan: Avoid possible NULL dereference in vxlan_gro_receive.Zhang Dongya2016-11-131-1/+1
| | | | | | | | | | | With Linux kernel that does not have HAVE_UDP_OFFLOAD_ARG_UOFF macro detected, struct vxlan_sock *vs will be NULL, which will make kernel crash when receiving VXLAN packet that have RCO flag turn on or even invalid packet that is destined to VXLAN port which have the bit on in the RCO flag position. Signed-off-by: Zhang Dongya <fortitude.zhang@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* doc: Convert datapath/README to rSTStephen Finucane2016-11-033-266/+266
| | | | | Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Russell Bryant <russell@ovn.org>
* datapath: geneve: Handle vlan tagPravin B Shelar2016-11-011-2/+31
| | | | | | | | | The compat vlan code ignores vlan tag for inner packet on egress path. Following patch fixes this by inserting the tag for inner packet before tunnel encapsulation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: backport: vxlan: avoid using stale vxlan socket.Pravin B Shelar2016-10-312-38/+42
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit c6fcc4fc5f8b592600c7409e769ab68da0fb1eca Author: pravin shelar <pshelar@ovn.org> Date: Fri Oct 28 09:59:15 2016 -0700 vxlan: avoid using stale vxlan socket. When vxlan device is closed vxlan socket is freed. This operation can race with vxlan-xmit function which dereferences vxlan socket. Following patch uses RCU mechanism to avoid this situation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* lisp: avoid using stale lisp socket.Pravin B Shelar2016-10-311-9/+25
| | | | | | | | | | | This patch is similar to earlier vxlan patch. Lisp device close operation frees lisp socket. This operation can race with lisp-xmit function which dereferences lisp socket. Following patch uses RCU mechanism to avoid this situation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: backport: geneve: avoid using stale geneve socket.Pravin B Shelar2016-10-311-11/+34
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit fceb9c3e38252992bbf1a3028cc2f7b871211533 Author: pravin shelar <pshelar@ovn.org> Date: Fri Oct 28 09:59:16 2016 -0700 geneve: avoid using stale geneve socket. This patch is similar to earlier vxlan patch. Geneve device close operation frees geneve socket. This operation can race with geneve-xmit function which dereferences geneve socket. Following patch uses RCU mechanism to avoid this situation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Support a fixed size of 128 distinct labels.Jarno Rajahalme2016-10-201-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port upstream change in conntrack labels extension. Add a new configure macro HAVE_NF_CONN_LABELS_WITH_WORDS to detect the old definition. Unfortunately there is no conntrack API to hide the difference, so the this makes conntrack.c deviate from upstream source a bit. Upstream commit: commit 23014011ba4209a086931ff402eac1c41abbe456 Author: Florian Westphal <fw@strlen.de> Date: Thu Jul 21 12:51:16 2016 +0200 netfilter: conntrack: support a fixed size of 128 distinct labels The conntrack label extension is currently variable-sized, e.g. if only 2 labels are used by iptables rules then the labels->bits[] array will only contain one element. We track size of each label storage area in the 'words' member. But in nftables and openvswitch we always have to ask for worst-case since we don't know what bit will be used at configuration time. As most arches are 64bit we need to allocate 24 bytes in this case: struct nf_conn_labels { u8 words; /* 0 1 */ /* XXX 7 bytes hole, try to pack */ long unsigned bits[2]; /* 8 24 */ Make bits a fixed size and drop the words member, it simplifies the code and only increases memory requirements on x86 when less than 64bit labels are required. We still only allocate the extension if its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: avoid deferred execution of recirc actionsLance Richardson2016-09-201-3/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port upstream fix to datapath module. The only notable difference between this patch and the upstream version is that the value of ovs_recursion_limit (5 for upstream kernel, 4 for out-of-tree module) is maintained in this patch. Upstream commit: commit f43e6dfb056b58628e43179d8f6b59eae417754d Author: Lance Richardson <lrichard@redhat.com> Date: Mon Sep 12 17:07:23 2016 -0400 openvswitch: avoid deferred execution of recirc actions The ovs kernel data path currently defers the execution of all recirc actions until stack utilization is at a minimum. This is too limiting for some packet forwarding scenarios due to the small size of the deferred action FIFO (10 entries). For example, broadcast traffic sent out more than 10 ports with recirculation results in packet drops when the deferred action FIFO becomes full, as reported here: http://openvswitch.org/pipermail/dev/2016-March/067672.html Since the current recursion depth is available (it is already tracked by the exec_actions_level pcpu variable), we can use it to determine whether to execute recirculation actions immediately (safe when recursion depth is low) or defer execution until more stack space is available. With this change, the deferred action fifo size becomes a non-issue for currently failing scenarios because it is no longer used when there are three or fewer recursions through ovs_execute_actions(). Suggested-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: backport: openvswitch: use alias for genetlink family namesThadeu Lima de Souza Cascardo2016-09-161-0/+4
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit ed227099dac95128e2aecd62af51bb9d922e5977 Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Date: Fri Sep 9 17:42:30 2016 -0300 openvswitch: use alias for genetlink family names When userspace tries to create datapaths and the module is not loaded, it will simply fail. With this patch, the module will be automatically loaded. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: tunnels: Log error during initialization.Pravin B Shelar2016-09-155-0/+5
| | | | | | | | | | At present OVS compat tunneling can fail due to conflict with already loaded tunneling kernel module. In this case openvswitch kernel module loading fails silently. Following patch give more clues about what went wrong. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: geneve: use ovs specific device type for compat geneve module.Pravin B Shelar2016-09-121-1/+1
| | | | | | | | | This allows openvswitch and geneve module co-exist kernel on newer kernels. Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: Use pre-routing hook for conntrack.Joe Stringer2016-09-091-1/+1
| | | | | | | | | | | | | | | | The upstream code uses NF_INET_PRE_ROUTING hook for the nf_conntrack_in() call, which does deeper (eg l4proto) validation. It was previously thought that using the NF_INET_ROUTING hook for this function on older kernels would trigger kernel panics due to a dependency on the unpopulated skb->dev, however during recent testing on a variety of platforms (Centos7.[12], Ubuntu 1[46].04, Fedora23) using the latest distribution kernels and the OVS kernel module testsuite, no such kernel panics were observed. Therefore it appears to be safe to bring this in line with upstream without any other workarounds. Reported-by: Jesse Gross <jesse@kernel.org> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: handle_offloads: remove csum_help param.Pravin B Shelar2016-08-185-13/+5
| | | | | | | | | | | | | | | | | | Related to following upstream commit: commit 6fa79666e24d32be1b709f5269af41ed9e829e7e Author: Edward Cree <ecree@solarflare.com> Date: Thu Feb 11 21:02:31 2016 +0000 net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: backport LCO optimization.Pravin B Shelar2016-08-185-19/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | This basically backport commit: commit 179bc67f69b6cb53ad68cfdec5a917c2a2248355 Author: Edward Cree <ecree@solarflare.com> Date: Thu Feb 11 20:48:04 2016 +0000 net: local checksum offload for encapsulation The arithmetic properties of the ones-complement checksum mean that a correctly checksummed inner packet, including its checksum, has a ones complement sum depending only on whatever value was used to initialise the checksum field before checksumming (in the case of TCP and UDP, this is the ones complement sum of the pseudo header, complemented). Consequently, if we are going to offload the inner checksum with CHECKSUM_PARTIAL, we can compute the outer checksum based only on the packed data not covered by the inner checksum, and the initial value of the inner checksum field. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: openvswitch: do not ignore netdev errors when creating ↵Pravin B Shelar2016-08-156-6/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | tunnel vports Upstream commit: commit 4b5b9ba553f9aa5f484ab972fc9b58061885ceca Author: Martynas Pumputis <martynas@weave.works> Date: Tue Aug 9 16:24:50 2016 +0100 openvswitch: do not ignore netdev errors when creating tunnel vports The creation of a tunnel vport (geneve, gre, vxlan) brings up a corresponding netdev, a multi-step operation which can fail. For example, changing a vxlan vport's netdev state to 'up' binds the vport's socket to a UDP port - if the binding fails (e.g. due to the port being in use), the error is currently ignored giving the appearance that the tunnel vport creation completed successfully. Signed-off-by: Martynas Pumputis <martynas@weave.works> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: OVS: Ignore negative headroom valuePravin B Shelar2016-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5ef9f289c4e698054e5687edb54f0da3cdc9173a Author: Ian Wienand <iwienand@redhat.com> Date: Wed Aug 3 15:44:57 2016 +1000 OVS: Ignore negative headroom value net_device->ndo_set_rx_headroom (introduced in 871b642adebe300be2e50aa5f65a418510f636ec) says "Setting a negtaive value reset the rx headroom to the default value". It seems that the OVS implementation in 3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets dev->needed_headroom unconditionally. This doesn't have an immediate effect, but can mess up later LL_RESERVED_SPACE calculations, such as done in net/ipv6/mcast.c:mld_newpack. For reference, this issue was found from a skb_panic raised there after the length calculations had given the wrong result. Note the other current users of this interface (drivers/net/tun.c:tun_set_headroom and drivers/net/veth.c:veth_set_rx_headroom) are both checking this correctly thus need no modification. Thanks to Ben for some pointers from the crash dumps! Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: keep skb mark across tunnel devices.Pravin B Shelar2016-08-122-2/+2
| | | | | | | | | | | | | Older kernel skb_scrub_packet() has bug which resets skb mark for all packet. It is fixed during 3.18 release where it is reset only for packets crossing namespace. So OVS is forced to use compat skb_scrub_packet() on older kernel. This is related to upstream bug fix commit ca7c7b9059e3 ("skbuff: Do not scrub skb mark within the same name space"). VMware-BZ: #1710701 Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: compat: keep skb encapsulation zero on older kernel.Pravin B Shelar2016-08-111-2/+5
| | | | | | | | | When using compat GSO there is no need to turn on skb encapsulation bit since OVS does not use any tunnel GSO functionality from the networking stack. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: vxlan: fix vxlan_notify_add_rx_port().Pravin B Shelar2016-08-111-26/+17
| | | | | | | Same as earlier patch this fixes vxlan recieve offload implementation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: geneve: fix geneve_notify_add_rx_port()Pravin B Shelar2016-08-113-27/+37
| | | | | | | | Remove mutual exclusion between udp-gro registration and geneve receive port registration. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: net: vxlan: lwt: Fix vxlan local traffic.Pravin B Shelar2016-08-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit bbec7802c6948c8626b71a4fe31283cb4691c358 Author: pravin shelar <pshelar@ovn.org> Date: Fri Aug 5 17:45:37 2016 -0700 net: vxlan: lwt: Fix vxlan local traffic. vxlan driver has bypass for local vxlan traffic, but that depends on information about all VNIs on local system in vxlan driver. This is not available in case of LWT. Therefore following patch disable encap bypass for LWT vxlan traffic. Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Reported-by: Jakub Libosvar <jlibosva@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: net: vxlan: lwt: Use source ip address during route lookup.Pravin B Shelar2016-08-091-12/+18
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 272d96a5ab10662691b4ec90c4a66fdbf30ea7ba Author: pravin shelar <pshelar@ovn.org> Date: Fri Aug 5 17:45:36 2016 -0700 net: vxlan: lwt: Use source ip address during route lookup. LWT user can specify destination as well as source ip address for given tunnel endpoint. But vxlan is ignoring given source ip address. Following patch uses both ip address to route the tunnel packet. This consistent with other LWT implementations, like GENEVE and GRE. Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: Remove incorrect WARN_ONCE().Jarno Rajahalme2016-08-041-7/+1
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c6b2aafffc6934be72d96855c9a1d88970597fbc Author: Jarno Rajahalme <jarno@ovn.org> Date: Mon Aug 1 19:08:29 2016 -0700 openvswitch: Remove incorrect WARN_ONCE(). ovs_ct_find_existing() issues a warning if an existing conntrack entry classified as IP_CT_NEW is found, with the premise that this should not happen. However, a newly confirmed, non-expected conntrack entry remains IP_CT_NEW as long as no reply direction traffic is seen. This has resulted into somewhat confusing kernel log messages. This patch removes this check and warning. Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: compat: Use checksum offload for outer header.Pravin B Shelar2016-08-032-32/+2
| | | | | | | | | Following patch simplifies UDP-checksum routine by unconditionally using checksum offload for non GSO packets. We might get some performance improvement due to code simplification. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: gso: tighen checks for compat GSO code.Pravin B Shelar2016-08-034-4/+11
| | | | | | | | Few function can be compiled out for non GSO case. This patch make it bit cleaner to understand GSO compat code. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: geneve: fix max_mtu settingPravin B Shelar2016-08-031-2/+7
| | | | | | | | | | | | | | | | | | Upstream commit: commit d5d5e8d55732c7c35c354e45e3b0af2795978a57 Author: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Date: Sat Jul 2 15:02:48 2016 +0800 geneve: fix max_mtu setting For ipv6+udp+geneve encapsulation data, the max_mtu should subtract sizeof(ipv6hdr), instead of sizeof(iphdr). Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: openvswitch: fix conntrack netlink event deliveryPravin B Shelar2016-08-031-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit d913d3a763a6f66a862a6eafcf6da89a7905832a Author: Samuel Gauthier <samuel.gauthier@6wind.com> Date: Tue Jun 28 17:22:26 2016 +0200 openvswitch: fix conntrack netlink event delivery Only the first and last netlink message for a particular conntrack are actually sent. The first message is sent through nf_conntrack_confirm when the conntrack is committed. The last one is sent when the conntrack is destroyed on timeout. The other conntrack state change messages are not advertised. When the conntrack subsystem is used from netfilter, nf_conntrack_confirm is called for each packet, from the postrouting hook, which in turn calls nf_ct_deliver_cached_events to send the state change netlink messages. This commit fixes the problem by calling nf_ct_deliver_cached_events in the non-commit case as well. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") CC: Joe Stringer <joestringer@nicira.com> CC: Justin Pettit <jpettit@nicira.com> CC: Andy Zhou <azhou@nicira.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: vxlan: fix udp-csum typoPravin B Shelar2016-08-031-1/+1
| | | | | Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: fix size of struct ovs_gso_cbPravin B Shelar2016-08-032-1/+2
| | | | | | | | struct ovs_gso_cb is stored in skb->cd. avoid going beyond size of skb->cb. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: Use udp-checksum function for compat case.Pravin B Shelar2016-08-032-2/+2
| | | | | | | | | | | | | udp_set_csum() has bug fix that is not relevant for upstream (commit c77d947191b0). So OVS need to use compat function. This function is also used from UDP xmit path so we have to check USE_UPSTREAM_TUNNEL. Following patch couple this function to USE_UPSTREAM_TUNNEL symbol rather than kernel version. This is not bug, This patch help in code readability. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: remove duplicate check.Pravin B Shelar2016-08-031-4/+0
| | | | | | | | The check for tunnel GSO packet is done at ip-handle-offloads. Remove same check from udp-handle-offloads. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: fix SKB_INIT_FILL_METADATA_DST definitionPravin B Shelar2016-08-031-2/+2
| | | | | | | | In case of OVS using compat fill metadata dst implementation we need to setup temperory dst. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: cleanup ip-tunnelsPravin B Shelar2016-08-031-8/+0
| | | | | | | Remove kernel version check related to unsupported kernel. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: Detect GSO support at ovs configurePravin B Shelar2016-08-034-8/+9
| | | | | | | | | | OVS turns on tunnel GSO for statically for kernel older than 3.18. Some distributions kernel could backport tunnel GSO. To make use of device offload on such kernel detect the support at configure stage. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* compat: Properly handle fragment lru.Joe Stringer2016-08-013-0/+8
| | | | | | | | | | | In kernels <=3.16 there is an LRU for managing fragment queues for IPv4 and IPv6. Because the backport code comes from more recent upstream versions of Linux, this LRU management was missing from ip_frag_queue() and nf_ct_frag6_queue(). Fixes: 595e069a0634 ("compat: Backport IPv4 reassembly.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Only call nf_defrag_ipv[46]_enable() once.Joe Stringer2016-08-012-16/+2
| | | | | | | | | | This function is just a dummy to ensure that the corresponding netfilter fragment module is loaded, to initialize the shared structures. But it doesn't need to be invoked once per namespace; one call per protocol should do the trick. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Remove inet_frag_evictor backport.Joe Stringer2016-08-011-9/+0
| | | | | | | Kernel 3.7 and lower are now unsupported, remove this fragment. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: IPv6 fragmentation backport cleanups.Joe Stringer2016-08-011-36/+3
| | | | | | | | Remove a couple of functions that are available on all supported kernel versions. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Remove ip6_expire_frag_queue().Joe Stringer2016-08-013-127/+0
| | | | | | | | This was previously backported to fix issues with our inet_fragment backport; with that largely gone, we can get rid of this too. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Simplify inet_fragment backports.Joe Stringer2016-08-014-507/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The core fragmentation handling logic is exported on all supported kernels, so it's not necessary to backport the latest version of this. This greatly simplifies the code due to inconsistencies between the old per-lookup garbage collection and the newer workqueue based garbage collection. As a result of simplifying and removing unnecessary backport code, a few bugs are fixed for corner cases such as when some fragments remain in the fragment cache when openvswitch is unloaded. Some backported ip functions need a little extra logic than what is seen on the latest code due to this, for instance on kernels <3.17: * Call inet_frag_evictor() before defrag * Limit hashsize in ip{,6}_fragment logic The pernet init/exit logic also differs a little from upstream. Upstream ipv[46]_defrag logic initializes the various pernet fragment parameters and its own global fragments cache. In the OVS backport, the pernet parameters are shared while the fragments cache is separate. The backport relies upon upstream pernet initialization to perform the shared setup, and performs no pernet initialization of its own. When it comes to pernet exit however, the backport must ensure that all OVS-specific fragment state is cleared, while the shared state remains untouched so that the regular ipv[46] logic may do its own cleanup. In practice this means that OVS must have its own divergent implementation of inet_frags_exit_net(). Fixes the following crash: Call Trace: <IRQ> [<ffffffff810744f6>] ? call_timer_fn+0x36/0x100 [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0 [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0 [<ffffffff8106d215>] irq_exit+0x105/0x110 [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 [<ffffffff8101cb2f>] default_idle+0x1f/0xc0 [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30 [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290 [<ffffffff810415ed>] start_secondary+0x21d/0x2d0 Code: Bad RIP value. RIP [<ffffffffa0177480>] 0xffffffffa0177480 RSP <ffff88003f703e78> CR2: ffffffffa0177480 ---[ end trace eb98ca80ba07bd9c ]--- Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Fix IPv6 frag expiry crash.Joe Stringer2016-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a user sends some fragments of an IPv6 message through OVS, but OVS fails to assemble the IPv6 message and the OVS module is then unloaded before the fragments expire, it could lead to a kernel panic like the following: Call Trace: <IRQ> [<ffffffff810e1919>] ? call_timer_fn+0x39/0x130 [<ffffffff810e31fe>] run_timer_softirq+0x20e/0x2c0 [<ffffffff8107dd0d>] __do_softirq+0xdd/0x290 [<ffffffff817c5bdc>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8107df5f>] do_softirq+0x4f/0x60 [<ffffffff8107dff5>] __local_bh_enable_ip+0x85/0x90 [<ffffffff8173994f>] inet_frags_exit_net+0x6f/0xc0 [<ffffffffc00c02a3>] nf_ct_net_exit+0x43/0x50 [nf_defrag_ipv6] [<ffffffff816ae528>] ops_exit_list.isra.4+0x38/0x60 [<ffffffff816ae656>] unregister_pernet_operations+0x96/0xe0 [<ffffffff816ae6c5>] unregister_pernet_subsys+0x25/0x40 [<ffffffffc00c1315>] nf_ct_frag6_cleanup+0x15/0x23 [nf_defrag_ipv6] [<ffffffffc00c133d>] nf_defrag_fini+0x1a/0xcdd [nf_defrag_ipv6] [<ffffffff810fbedd>] SyS_delete_module+0x18d/0x220 [<ffffffff817c40b2>] entry_SYSCALL_64_fastpath+0x16/0x75 Code: Bad RIP value. RIP [<ffffffffc030f990>] 0xffffffffc030f990 RSP <ffff88007a043e90> CR2: ffffffffc030f990 ---[ end trace 3bd8c1bbc4478fe2 ]--- Kernel panic - not syncing: Fatal exception in interrupt Fixes: 73b09aff14c7 ("compat: Backport IPv6 reassembly.") Reported-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: Add support for kernel 4.6Pravin B Shelar2016-07-267-30/+45
| | | | | | | | | Most of patch iron out USE_UPSTREAM_TUNNEL case where datapath directly use upstream tunneling modules. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org> Acked-by: Amitabha Biswas <abiswas@us.ibm.com>
* datapath: compat: simplify ip_local_out().Pravin B Shelar2016-07-261-49/+33
| | | | | Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: compat: unset skb encapsulation bitPravin B Shelar2016-07-261-0/+2
| | | | | | | | | | | | | | OVS compat layer can handle tunnel GSO packets. but it does keep skb encapsulation on for packet handled in GSO. This can confuse some NIC drivers. I have seen this issue on intel devices: >>> i40e 0000:42:00.0: TX driver issue detected, PF reset issued Following patch resets this bit in case compat layer handles the packet. VMware-BZ: 1698877 Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>