summaryrefslogtreecommitdiff
path: root/datapath/flow.c
Commit message (Collapse)AuthorAgeFilesLines
* Revert "datapath: Derive IP protocol number for IPv6 later frags"Greg Rose2018-12-181-9/+13
| | | | | | | | | | | | | This reverts commit 2f748bf8016c ("datapath: Derive IP protocol...") This commit is causing some ipv6 fragmentation errors in some older kernels. Revert for now and then we can determine how to implement this patch with appropriate compatability layer changes to prevent errors on older kernels. CC: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Derive IP protocol number for IPv6 later fragsYi-Hung Wei2018-12-151-13/+9
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit fa642f08839bf2ff35b2f6c6a6c062aee8121ba8 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Tue Sep 4 15:33:41 2018 -0700 openvswitch: Derive IP protocol number for IPv6 later frags Currently, OVS only parses the IP protocol number for the first IPv6 fragment, but sets the IP protocol number for the later fragments to be NEXTHDF_FRAGMENT. This patch tries to derive the IP protocol number for the IPV6 later frags so that we can match that. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> CC: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Fix pop_vlan action for double tagged framesEric Garver2018-02-161-3/+12
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c48e74736fccf25fb32bb015426359e1c2016e3b Author: Eric Garver <e@erig.me> Date: Wed Dec 20 15:09:22 2017 -0500 openvswitch: Fix pop_vlan action for double tagged frames skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Eric Garver <e@erig.me> Fixes: a27c454ee0 ("datapath: add processing of L3 packets") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: use ktime_get_ts64() instead of ktime_get_ts()Arnd Bergmann2018-02-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 311af51dcb5629f04976a8e451673f77e3301041 Author: Arnd Bergmann <arnd@arndb.de> Date: Mon Nov 27 12:41:38 2017 +0100 openvswitch: use ktime_get_ts64() instead of ktime_get_ts() timespec is deprecated because of the y2038 overflow, so let's convert this one to ktime_get_ts64(). The code is already safe even on 32-bit architectures, since it uses monotonic times. On 64-bit architectures, nothing changes, while on 32-bit architectures this avoids one type conversion. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Additional compatability check for ktime_get_ts64() exists or not. If not, then just continue using ktime_get_ts(). I added a new compatability header file "timekeeping.h". Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: enable NSH supportYi Yang2018-02-071-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit b2d0f5d5dc53532e6f07bc546a476a55ebdfe0f3 Author: Yi Yang <yi.y.yang@intel.com> Date: Tue Nov 7 21:07:02 2017 +0800 openvswitch: enable NSH support OVS master and 2.8 branch has merged NSH userspace patch series, this patch is to enable NSH support in kernel data path in order that OVS can support NSH in compat mode by porting this. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Garver <e@erig.me> Acked-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Optimize operations for OvS flow_stats.Tonghao Zhang2017-09-221-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c4b2bf6b4a35348fe6d1eb06928eb68d7b9d99a9 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Mon Jul 17 23:28:06 2017 -0700 openvswitch: Optimize operations for OvS flow_stats. When calling the flow_free() to free the flow, we call many times (cpu_possible_mask, eg. 128 as default) cpumask_next(). That will take up our CPU usage if we call the flow_free() frequently. When we put all packets to userspace via upcall, and OvS will send them back via netlink to ovs_packet_cmd_execute(will call flow_free). The test topo is shown as below. VM01 sends TCP packets to VM02, and OvS forward packtets. When testing, we use perf to report the system performance. VM01 --- OvS-VM --- VM02 Without this patch, perf-top show as below: The flow_free() is 3.02% CPU usage. 4.23% [kernel] [k] _raw_spin_unlock_irqrestore 3.62% [kernel] [k] __do_softirq 3.16% [kernel] [k] __memcpy 3.02% [kernel] [k] flow_free 2.42% libc-2.17.so [.] __memcpy_ssse3_back 2.18% [kernel] [k] copy_user_generic_unrolled 2.17% [kernel] [k] find_next_bit When applied this patch, perf-top show as below: Not shown on the list anymore. 4.11% [kernel] [k] _raw_spin_unlock_irqrestore 3.79% [kernel] [k] __do_softirq 3.46% [kernel] [k] __memcpy 2.73% libc-2.17.so [.] __memcpy_ssse3_back 2.25% [kernel] [k] copy_user_generic_unrolled 1.89% libc-2.17.so [.] _int_malloc 1.53% ovs-vswitchd [.] xlate_actions With this patch, the TCP throughput(we dont use Megaflow Cache + Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec (maybe ~10% performance improve). This patch adds cpumask struct, the cpu_used_mask stores the cpu_id that the flow used. And we only check the flow_stats on the cpu we used, and it is unncessary to check all possible cpu when getting, cleaning, and updating the flow_stats. Adding the cpu_used_mask to sw_flow struct does’t increase the cacheline number. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Andy Zhou <azhou@ovn.org>
* datapath: Optimize updating for OvS flow_stats.Tonghao Zhang2017-09-221-2/+1
| | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c57c054eb5b1ccf230c49f736f7a018fcbc3e952 Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Mon Jul 17 23:28:05 2017 -0700 openvswitch: Optimize updating for OvS flow_stats. In the ovs_flow_stats_update(), we only use the node var to alloc flow_stats struct. But this is not a common case, it is unnecessary to call the numa_node_id() everytime. This patch is not a bugfix, but there maybe a small increase. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Andy Zhou <azhou@ovn.org>
* datapath: Remove all references to SKB_GSO_UDP.Greg Rose2017-09-221-0/+6
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit 880388aa3c07fdea4f9b85e35641753017b1852f Author: David S. Miller <davem@davemloft.net> Date: Mon Jul 3 07:29:12 2017 -0700 net: Remove all references to SKB_GSO_UDP. Such packets are no longer possible. Signed-off-by: David S. Miller <davem@davemloft.net> SKB_GSO_UDP is removed in the upstream kernel. Use HAVE_SKB_GSO_UDP define from acinclude to detect if SKB_GSO_UDP exists and if so apply openvswitch section of this upstream patch. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Andy Zhou <azhou@ovn.org>
* datapath: Fix ovs_flow_key_update()Yi-Hung Wei2017-05-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 6f56f6186c18e3fd54122b73da68e870687b8c59 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Thu Mar 30 12:36:03 2017 -0700 ovs_flow_key_update() is called when the flow key is invalid, and it is used to update and revalidate the flow key. Commit 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") introduces mac_proto field to flow key and use it to determine whether the flow key is valid. However, the commit does not update the code path in ovs_flow_key_update() to revalidate the flow key which may cause BUG_ON() on execute_recirc(). This patch addresses the aforementioned issue. Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: set network header correctly on key extractYi-Hung Wei2017-05-031-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit f7d49bce8e741e1e6aa14ce4db1b6cea7e4be4e8 Author: Jiri Benc <jbenc@redhat.com> Date: Fri Sep 30 19:08:05 2016 +0200 openvswitch: mpls: set network header correctly on key extract After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in openvswitch was changed to have network header pointing to the start of the MPLS headers and inner_network_header pointing after the MPLS headers. However, key_extract was missed by the mentioned commit, causing incorrect headers to be set when a MPLS packet just enters the bridge or after it is recirculated. Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: Add original direction conntrack tuple to sw_flow_key.Jarno Rajahalme2017-03-081-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:59 2017 -0800 openvswitch: Add original direction conntrack tuple to sw_flow_key. Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP nor ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This patch squashes in minimal amount of OVS userspace code to not break the build. Later patches contain the full userspace support. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: netlink: support L3 packetsYang, Yi Y2017-03-021-27/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 0a6410fbde597ebcf82dda4a0b0e889e82242678 Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:22 2016 +0100 openvswitch: netlink: support L3 packets Extend the ovs flow netlink protocol to support L3 packets. Packets without OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the OVS_KEY_ATTR_ETHERTYPE attribute is mandatory. Push/pop vlan actions are only supported for Ethernet packets. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 87e159c59d9f325d571689d4027115617adb32e6 Author: Jarno Rajahalme <jarno@ovn.org> Date: Mon Dec 19 17:06:33 2016 -0800 openvswitch: Add a missing break statement. Add a break statement to prevent fall-through from OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break actions setting ethernet addresses fail to validate with log messages complaining about invalid tunnel attributes. Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit df30f7408b187929dbde72661c7f7c615268f1d0 Author: pravin shelar <pshelar@ovn.org> Date: Mon Dec 26 08:31:27 2016 -0800 openvswitch: upcall: Fix vlan handling. Networking stack accelerate vlan tag handling by keeping topmost vlan header in skb. This works as long as packet remains in OVS datapath. But during OVS upcall vlan header is pushed on to the packet. When such packet is sent back to OVS datapath, core networking stack might not handle it correctly. Following patch avoids this issue by accelerating the vlan tag during flow key extract. This simplifies datapath by bringing uniform packet processing for packets from all code paths. Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <jarno@ovn.org> CC: Jiri Benc <jbenc@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer Notes] Squashed in the following upstream commits to retain bisectability: 87e159c59d9f ("openvswitch: Add a missing break statement.") df30f7408b18 ("openvswitch: upcall: Fix vlan handling.") Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: add processing of L3 packetsYang, Yi Y2017-03-021-22/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5108bbaddc37c1c8583f0cf2562d7d3463cd12cb Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:21 2016 +0100 openvswitch: add processing of L3 packets Support receiving, extracting flow key and sending of L3 packets (packets without an Ethernet header). Note that even after this patch, non-Ethernet interfaces are still not allowed to be added to bridges. Similarly, netlink interface for sending and receiving L3 packets to/from user space is not in place yet. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: add mac_proto field to the flow keyYang, Yi Y2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 329f45bc4f191c663dc156c510816411a4310578 Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:18 2016 +0100 openvswitch: add mac_proto field to the flow key Use a hole in the structure. We support only Ethernet so far and will add a support for L2-less packets shortly. We could use a bool to indicate whether the Ethernet header is present or not but the approach with the mac_proto field is more generic and occupies the same number of bytes in the struct, while allowing later extensibility. It also makes the code in the next patches more self explaining. It would be nice to use ARPHRD_ constants but those are u16 which would be waste. Thus define our own constants. Another upside of this is that we can overload this new field to also denote whether the flow key is valid. This has the advantage that on refragmentation, we don't have to reparse the packet but can rely on the stored eth.type. This is especially important for the next patches in this series - instead of adding another branch for L2-less packets before calling ovs_fragment, we can just remove all those branches completely. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: avoid resetting flow key while installing new flow.pravin shelar2017-03-021-2/+0
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit 2279994d07ab67ff7a1d09bfbd65588332dfb6d8 Author: pravin shelar <pshelar@ovn.org> Date: Mon Sep 19 13:51:00 2016 -0700 openvswitch: avoid resetting flow key while installing new flow. since commit commit db74a3335e0f6 ("openvswitch: use percpu flow stats") flow alloc resets flow-key. So there is no need to reset the flow-key again if OVS is using newly allocated flow-key. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: use percpu flow statsThadeu Lima de Souza Cascardo2017-03-021-20/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit db74a3335e0f645e3139c80bcfc90feb01d8e304 Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Date: Thu Sep 15 19:11:53 2016 -0300 openvswitch: use percpu flow stats Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: fix flow stats accounting when node 0 is not possibleThadeu Lima de Souza Cascardo2017-03-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 40773966ccf1985a1b2bb570a03cbeaf1cbd4e00 Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Date: Thu Sep 15 19:11:52 2016 -0300 openvswitch: fix flow stats accounting when node 0 is not possible On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This patch contained a memory leak that is fixed in this backport. The next patch silently fixed that in upstream, too. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: 802.1AD Flow handling, actions, vlan parsing, netlink attributesYang, Yi Y2017-03-021-18/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 018c1dda5ff1e7bd1fe2d9fd1d0f5b82dc6fc0cd Author: Eric Garver <e@erig.me> Date: Wed Sep 7 12:56:59 2016 -0400 openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes Add support for 802.1ad including the ability to push and pop double tagged vlans. Add support for 802.1ad to netlink parsing and flow conversion. Uses double nested encap attributes to represent double tagged vlan. Inner TPID encoded along with ctci in nested attributes. This is based on Thomas F Herbert's original v20 patch. I made some small clean ups and bug fixes. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 20ecf1e4e30005ad50f561a92c888b6477f99341 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Oct 10 17:02:42 2016 +0200 openvswitch: vlan: remove wrong likely statement This code is called whenever flow key is being extracted from the packet. The packet may be as likely vlan tagged as not. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 72ec108d701506fa6cd2f66ec5b15ea71df3c464 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Oct 10 17:02:43 2016 +0200 openvswitch: fix vlan subtraction from packet length When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN header is not counted in skb->len. It doesn't make sense to subtract it. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] The following commits upstream fix bugs in this patch, so to retain bisectability of the OVS tree they were rolled into this commit: 20ecf1e4e300 openvswitch: vlan: remove wrong likely statement 72ec108d7015 openvswitch: fix vlan subtraction from packet length Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: Move key memset to ovs_flow_key_extract_userspace()Pravin B Shelar2016-07-171-0/+2
| | | | | | | Synchronize code with upstream ovs_nla_get_flow_metadata(). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: backport: retain parsed IPv6 header fields in flow on error ↵Pravin B Shelar2016-07-171-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | skipping extension headers Upstream commit: commit c30da497893718abc6cec4f1d34d35875200edee Author: Simon Horman <simon.horman@netronome.com> openvswitch: retain parsed IPv6 header fields in flow on error skipping extension headers When an error occurs skipping IPv6 extension headers retain the already parsed IP protocol and IPv6 addresses in the flow. Also assume that the packet is not a fragment in the absence of information to the contrary; that is always use the frag_off value set by ipv6_skip_exthdr(). This allows matching on the IP protocol and IPv6 addresses of packets with malformed extension headers. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: Add support for IPv6 tunnels.Pravin B Shelar2016-07-081-4/+3
| | | | | | | | | | | | | | | | | | | | | Mostly backports upstream commit along with other pieces to make IPv6 tunneling work. commit 6b26ba3a7d952e611dcde1f3f77ce63bcc70540a Author: Jiri Benc <jbenc@redhat.com> openvswitch: netlink attributes for IPv6 tunneling Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support for tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: Drop support for kernel older than 3.10Pravin B Shelar2016-03-141-2/+1
| | | | | | | | | | | | | | | | Currently OVS out of tree datapath supports a large number of kernel versions. From 2.6.32 to 4.3 and various distribution-specific kernels. But at this point major features are only available on more recent kernels. For example, stateful services are only available starting in kernel 3.10 and STT is available on starting with 3.5. Since these features are becoming essential to many OVS deployments, and the effort of maintaining the backports is high. We have decided to drop support for older kernel. Following patch drops supports for kernel older than 3.10. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: Allow matching on conntrack labelJoe Stringer2015-12-031-2/+2
| | | | | | | | | | | | | | Allow matching and setting the ct_label field. As with ct_mark, this is populated by executing the CT action. The label field may be modified by specifying a label and mask nested under the CT action. It is stored as metadata attached to the connection. Label modification occurs after lookup, and will only persist when the conntrack entry is committed by providing the COMMIT flag to the CT action. Labels are currently fixed to 128 bits in size. Upstream: c2ac667 "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Add conntrack actionJoe Stringer2015-12-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose the kernel connection tracker via OVS. Userspace components can make use of the CT action to populate the connection state (ct_state) field for a flow. This state can be subsequently matched. Exposed connection states are OVS_CS_F_*: - NEW (0x01) - Beginning of a new connection. - ESTABLISHED (0x02) - Part of an existing connection. - RELATED (0x04) - Related to an established connection. - INVALID (0x20) - Could not track the connection for this packet. - REPLY_DIR (0x40) - This packet is in the reply direction for the flow. - TRACKED (0x80) - This packet has been sent through conntrack. When the CT action is executed by itself, it will send the packet through the connection tracker and populate the ct_state field with one or more of the connection state flags above. The CT action will always set the TRACKED bit. When the COMMIT flag is passed to the conntrack action, this specifies that information about the connection should be stored. This allows subsequent packets for the same (or related) connections to be correlated with this connection. Sending subsequent packets for the connection through conntrack allows the connection tracker to consider the packets as ESTABLISHED, RELATED, and/or REPLY_DIR. The CT action may optionally take a zone to track the flow within. This allows connections with the same 5-tuple to be kept logically separate from connections in other zones. If the zone is specified, then the "ct_zone" match field will be subsequently populated with the zone id. IP fragments are handled by transparently assembling them as part of the CT action. The maximum received unit (MRU) size is tracked so that refragmentation can occur during output. IP frag handling contributed by Andy Zhou. Based on original design by Justin Pettit. Upstream: 7f8a436 "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Add support for lwtunnelPravin B Shelar2015-12-031-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Following patch adds support for lwtunnel to OVS datapath. With this change OVS datapath detect lwtunnel support and make use of new APIs if available. On older kernel where the support is not there the backported tunnel modules are used. These backported tunnel devices acts as lwtunnel devices. I tried to keep backported module same as upstream for easier bug-fix backport. Since STT and LISP are not upstream OVS always needs to use respective modules from tunnel compat layer. To make it work on kernel 4.3 I have converted STT and LISP modules to lwtunnel API model. lwtunnel make use of skb-dst to pass tunnel information to the tunnel module. On older kernel this is not possible. So the in case of old kernel metadata ref is stored in OVS_CB and direct call to tunnel transmit function is made by respective tunnel vport modules. Similarly on receive side tunnel recv directly call netdev-vport-receive to pass the skb to OVS. Major backported components include: Geneve, GRE, VXLAN, ip_tunnel, udp-tunnels GRO. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* datapath: Add support for 4.1 kernel.Joe Stringer2015-09-181-1/+3
| | | | | | Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
* datapath: Use eth_proto_is_802_3.Alexander Duyck2015-07-301-2/+2
| | | | | | | | | | | | Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto). Backport of upstream commit 6713fc9b8fa33444aa000f0f31076f6a859ccb34: "openvswitch: Use eth_proto_is_802_3" Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
* datapath: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()Thomas Graf2015-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | Backport of upstream commit: openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS() Also factors out Geneve validation code into a new separate function validate_and_copy_geneve_opts(). A subsequent patch will introduce VXLAN options. Rename the existing GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic tunnel metadata options. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: d91641d ("openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()") Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Account for "rename vlan_tx_* helpers since "tx" is misleading there"Thomas Graf2015-02-031-2/+2
| | | | | | | | | | | | | | Upstream commit: net: rename vlan_tx_* helpers since "tx" is misleading there The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: df8a39d ("net: rename vlan_tx_* helpers since "tx" is misleading there") Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Consistently include VLAN header in flow and port stats.Ben Pfaff2015-01-061-2/+3
| | | | | | | | | | | | | | | | | | Until now, when VLAN acceleration was in use, the bytes of the VLAN header were not included in port or flow byte counters. They were however included when VLAN acceleration was not used. This commit corrects the inconsistency, by always including the VLAN header in byte counters. Previous discussion at http://openvswitch.org/pipermail/dev/2014-December/049521.html Already committed to upstream Linux netdev tree as 24cc59d1ebaac54d933dc0b30abcd8bd86193eef. Reported-by: Motonori Shindo <mshindo@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Reviewed-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: fix coding style.Pravin B Shelar2014-11-091-8/+7
| | | | | | | | | Kernel datapath code has diverged from upstream code. This makes porting patches between these two code bases harder than it needs to be. Following patch fixes this by fixing coding style issues on this branch. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Fix few mpls issues.Pravin B Shelar2014-11-091-2/+3
| | | | | | | Found during MPLS upstreaming. Also sync-up MPLS header files with upstream code. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Fix comment style.Pravin B Shelar2014-10-231-2/+4
| | | | | | | Use netdev comment style. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* datapath: fix a use after freeLi RongQing2014-10-171-5/+6
| | | | | | | | | | | | | pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp setting after arphdr_ok to avoid the use the freed memory Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.") Cc: Jesse Gross <jesse@nicira.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Add support for OVS_FLOW_ATTR_PROBE.Jarno Rajahalme2014-10-031-2/+2
| | | | | | | | | This new flag is useful for suppressing error logging while probing for datapath features using flow commands. For backwards compatibility reasons the commands are executed normally, but error logging is suppressed. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Constify various function argumentsThomas Graf2014-09-231-1/+1
| | | | | | | Help produce better optimized code. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Remove pkt_key from OVS_CB.Pravin B Shelar2014-09-201-1/+0
| | | | | | | | | OVS keeps pointer to packet key in skb->cb, but the packet key is store on stack. This could make code bit tricky. So it is better to get rid of the pointer. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* datapath: Replace rcu_dereference() with rcu_access_pointer()Andreea-Cristina Bernat2014-09-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement. The following Coccinelle semantic patch was used: @@ @@ ( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} ) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Update flow key before recircAndy Zhou2014-08-121-10/+5
| | | | | | | | | | | | | | | | | When flow key becomes invalid due to push or pop actions, current implementation leaves it as invalid, only rebuild the flow key used for recirculation. This works, but is less efficient in case of multiple recirc actions. Each recirc action will have to re-extract its own flow keys. This patch update the original flow key as soon as the first recirc action is encountered, avoiding expensive flow extract call for any future recirc actions as long as the flow key remains valid. Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Use tun_info only for egress tunnel path.Pravin B Shelar2014-08-061-5/+5
| | | | | | | | | Currently tun_info is used for passing tunnel information on ingress and egress path, this cause confusion. Following patch removes its use on ingress path make it egress only parameter. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* datapath: Avoid using wrong metadata for recic action.Pravin B Shelar2014-08-061-0/+10
| | | | | | | | | | | | Recirc action needs to extract flow key from packet, it uses tun_info from OVS_CB for setting tunnel meta data in flow key. But tun_info can be overwritten by tunnel send action. This would result in wrong flow key for the recirculation. Following patch copies flow-key meta data from OVS_CB packet key itself thus avoids this bug. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* datapath: refactor ovs flow extract API.Pravin B Shelar2014-08-061-29/+53
| | | | | | | | | OVS flow extract is called on packet receive or packet execute code path. Following patch defines separate API for extracting flow-key in packet execute code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
* datapath: Add basic MPLS support to kernelSimon Horman2014-06-241-0/+29
| | | | | | | | | | | | | | Allow datapath to recognize and extract MPLS labels into flow keys and execute actions which push, pop, and set labels on packets. Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer. Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
* datapath: Add support for Geneve tunneling.Jesse Gross2014-06-201-0/+10
| | | | | | | | | | | | | | | | | | | | | | This adds support for Geneve - Generic Network Virtualization Encapsulation. The protocol is documented at http://tools.ietf.org/html/draft-gross-geneve-00 The kernel implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array. Userspace currently implements only support for basic version of Geneve. It can work with the base header (including the VNI) and is capable of parsing options but does not currently support any particular option definitions. Over time, the intention is to allow options to be matched through OpenFlow without requiring explicit support in OVS userspace. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Wrap struct ovs_key_ipv4_tunnel in a new structure.Jesse Gross2014-06-191-4/+7
| | | | | | | | | | | | | | | | Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers. This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Eliminate memset() from flow_extract.Jesse Gross2014-06-191-5/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As new protocols are added, the size of the flow key tends to increase although few protocols care about all of the fields. In order to optimize this for hashing and matching, OVS uses a variable length portion of the key. However, when fields are extracted from the packet we must still zero out the entire key. This is no longer necessary now that OVS implements masking. Any fields (or holes in the structure) which are not part of a given protocol will be by definition not part of the mask and zeroed out during lookup. Furthermore, since masking already uses variable length keys this zeroing operation automatically benefits as well. In principle, the only thing that needs to be done at this point is remove the memset() at the beginning of flow. However, some fields assume that they are initialized to zero, which now must be done explicitly. In addition, in the event of an error we must also zero out corresponding fields to signal that there is no valid data present. These increase the total amount of code but very little of it is executed in non-error situations. Removing the memset() reduces the profile of ovs_flow_extract() from 0.64% to 0.56% when tested with large packets on a 10G link. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Fix tracking of flags seen in TCP flows.Ben Pfaff2014-04-081-2/+2
| | | | | | | | | | | | | | | Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the flow found in the kernel flow table (in 'flow'). This bug made the Open vSwitch userspace fin_timeout action have no effect in many cases. Bug #1219516. Reported-by: Len Gao <leng@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
* datapath/flow: Fix ovs_flow_stats_get/clear RCU dereference.Jarno Rajahalme2014-04-021-4/+6
| | | | | | | | | | | | | | For ovs_flow_stats_get() using ovsl_dereference() was wrong, since flow dumps call this with RCU read lock. ovs_flow_stats_clear() is always called with ovs_mutex, so can use ovsl_dereference(). Also, make the ovs_flow_stats_get() 'flow' argument const to make later patches cleaner. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
* datapath: Clarify locking.Jarno Rajahalme2014-03-251-1/+2
| | | | | | | Remove unnecessary locking from functions that are always called with appropriate locking. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Thomas Graf <tgraf@redhat.com>
* datapath: Compact sw_flow_key.Jarno Rajahalme2014-03-241-25/+19
| | | | | | | | | | | | | | | | | | | | | | | | Minimize padding in sw_flow_key and move 'tp' top the main struct. These changes simplify code when accessing the transport port numbers and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit systems (128->120 bytes). These changes also make the keys for IPv4 packets to fit in one cache line. There is a valid concern for safety of packing the struct ovs_key_ipv4_tunnel, as it would be possible to take the address of the tun_id member as a __be64 * which could result in unaligned access in some systems. However: - sw_flow_key itself is 64-bit aligned, so the tun_id within is always 64-bit aligned. - We never make arrays of ovs_key_ipv4_tunnel (which would force every second tun_key to be misaligned). - We never take the address of the tun_id in to a __be64 *. - Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key, it is in stack (on tunnel input functions), where compiler has full control of the alignment. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>