summaryrefslogtreecommitdiff
path: root/datapath
Commit message (Collapse)AuthorAgeFilesLines
* datapath: Add a missing comment.Jarno Rajahalme2017-03-081-0/+2
| | | | | | Make openvswitch.h better match upstream by adding a missing comment. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Add force commit.Jarno Rajahalme2017-03-082-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream patch: commit dd41d33f0b033885211a5d6f3ee19e73238aa9ee Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:22:00 2017 -0800 openvswitch: Add force commit. Stateful network admission policy may allow connections to one direction and reject connections initiated in the other direction. After policy change it is possible that for a new connection an overlapping conntrack entry already exists, where the original direction of the existing connection is opposed to the new connection's initial packet. Most importantly, conntrack state relating to the current packet gets the "reply" designation based on whether the original direction tuple or the reply direction tuple matched. If this "directionality" is wrong w.r.t. to the stateful network admission policy it may happen that packets in neither direction are correctly admitted. This patch adds a new "force commit" option to the OVS conntrack action that checks the original direction of an existing conntrack entry. If that direction is opposed to the current packet, the existing conntrack entry is deleted and a new one is subsequently created in the correct direction. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* compat: nf_ct_delete compat.Jarno Rajahalme2017-03-081-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit f330a7fdbe1611104622faff7e614a246a7d20f0 Author: Florian Westphal <fw@strlen.de> Date: Thu Aug 25 15:33:31 2016 +0200 netfilter: conntrack: get rid of conntrack timer With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as Eric Dumazet pointed out during netfilter workshop 2016. Eric also says: "Another reason was the fact that Thomas was about to change max timer range [..]" (500462a9de657f8, 'timers: Switch to a non-cascading wheel'). Remove the timer and use a 32bit jiffies value containing timestamp until entry is valid. During conntrack lookup, even before doing tuple comparision, check the timeout value and evict the entry in case it is too old. The dying bit is used as a synchronization point to avoid races where multiple cpus try to evict the same entry. Because lookup is always lockless, we need to bump the refcnt once when we evict, else we could try to evict already-dead entry that is being recycled. This is the standard/expected way when conntrack entries are destroyed. Followup patches will introduce garbage colliction via work queue and further places where we can reap obsoleted entries (e.g. during netlink dumps), this is needed to avoid expired conntracks from hanging around for too long when lookup rate is low after a busy period. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Upstream commit f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer") changes the way nf_ct_delete() is called. Prior to commit the call pattern was like this: if (del_timer(&ct->timeout)) nf_ct_delete(ct, ...); After this change nf_ct_delete() is called directly: nf_ct_delete(ct, ...); This patch provides a replacement implementation for nf_ct_delete() that first calls the del_timer(). This replacement is only used if the struct nf_conn has member 'timeout' of type 'struct timer_list'. The following patch introduces the first caller to nf_ct_delete() in the OVS kernel module. Linux <3.12 does not have nf_ct_delete() at all, so we inline it if it does not exist. The inlined code is from 3.11 death_by_timeout(), which in later versions simply calls nf_ct_delete(). Upstream commit 02982c27ba1e1bd9f9d4747214e19ca83aa88d0e introduced nf_ct_delete() in Linux 3.12. This commit has the original code that is being inlined here. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Add original direction conntrack tuple to sw_flow_key.Jarno Rajahalme2017-03-088-46/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 9dd7f8907c3705dc7a7a375d1c6e30b06e6daffc Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:59 2017 -0800 openvswitch: Add original direction conntrack tuple to sw_flow_key. Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP nor ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This patch squashes in minimal amount of OVS userspace code to not break the build. Later patches contain the full userspace support. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Inherit master's labels.Jarno Rajahalme2017-03-081-14/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 09aa98ad496d6b11a698b258bc64d7f64c55d682 Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:58 2017 -0800 openvswitch: Inherit master's labels. We avoid calling into nf_conntrack_in() for expected connections, as that would remove the expectation that we want to stick around until we are ready to commit the connection. Instead, we do a lookup in the expectation table directly. However, after a successful expectation lookup we have set the flow key label field from the master connection, whereas nf_conntrack_in() does not do this. This leads to master's labels being inherited after an expectation lookup, but those labels not being inherited after the corresponding conntrack action with a commit flag. This patch resolves the problem by changing the commit code path to also inherit the master's labels to the expected connection. Resolving this conflict in favor of inheriting the labels allows more information be passed from the master connection to related connections, which would otherwise be much harder if the 32 bits in the connmark are not enough. Labels can still be set explicitly, so this change only affects the default values of the labels in presense of a master connection. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: a94ebc39996b ("datapath: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Refactor labels initialization.Jarno Rajahalme2017-03-081-56/+64
| | | | | | | | | | | | | | Upstream commit: Refactoring conntrack labels initialization makes changes in later patches easier to review. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Simplify labels length logic.Jarno Rajahalme2017-03-081-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit b87cec3814ccc7f6afb0a1378ee7e5110d07cdd3 Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:56 2017 -0800 openvswitch: Simplify labels length logic. Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128 distinct labels"), the size of conntrack labels extension has fixed to 128 bits, so we do not need to check for labels sizes shorter than 128 at run-time. This patch simplifies labels length logic accordingly, but allows the conntrack labels size to be increased in the future without breaking the build. In the event of conntrack labels increasing in size OVS would still be able to deal with the 128 first label bits. Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Unionize ovs_key_ct_label with a u32 array.Jarno Rajahalme2017-03-082-9/+14
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit cb80d58fae76d8ea93555149b2b16e19b89a1f4f Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:55 2017 -0800 openvswitch: Unionize ovs_key_ct_label with a u32 array. Make the array of labels in struct ovs_key_ct_label an union, adding a u32 array of the same byte size as the existing u8 array. It is faster to loop through the labels 32 bits at the time, which is also the alignment of netlink attributes. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Do not trigger events for unconfirmed connections.Jarno Rajahalme2017-03-081-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 193e30967897f3a8b6f9f137ac30571d832c2c5c Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:54 2017 -0800 openvswitch: Do not trigger events for unconfirmed connections. Receiving change events before the 'new' event for the connection has been received can be confusing. Avoid triggering change events for setting conntrack mark or labels before the conntrack entry has been confirmed. Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark") Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 2317c6b51e4249dbfa093e1b88cab0a9f0564b7f Author: Jarno Rajahalme <jarno@ovn.org> Date: Fri Feb 17 18:11:58 2017 -0800 openvswitch: Set event bit after initializing labels. Connlabels are included in conntrack netlink event messages only if the IPCT_LABEL bit is set in the event cache (see ctnetlink_conntrack_event()). Set it after initializing labels for a new connection. Found upon further system testing, where it was noticed that labels were missing from the conntrack events. Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed con nections.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Fixes: 372ce9737d2b ("datapath: Allow matching on conntrack mark") Fixes: 038e34abaa31 ("datapath: Allow matching on conntrack label") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.Jarno Rajahalme2017-03-081-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 9ff464db50e437eef131f719cc2e9902eea9c607 Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:53 2017 -0800 openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted. The conntrack lookup for existing connections fails to invert the packet 5-tuple for NATted packets, and therefore fails to find the existing conntrack entry. Conntrack only stores 5-tuples for incoming packets, and there are various situations where a lookup on a packet that has already been transformed by NAT needs to be made. Looking up an existing conntrack entry upon executing packet received from the userspace is one of them. This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple for the conntrack lookup whenever the packet has already been transformed by conntrack from its input form as evidenced by one of the NAT flags being set in the conntrack state metadata. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This patch also adds a test case to OVS system tests to verify the behavior. The following is a more thorough explanation of what is going on: When we have evidence that an existing conntrack entry could exist, we must invert the tuple if NAT has already been applied, as the current packet headers do not match any tuple stored in conntrack. For example, if a packet from private address X to a public address B is source-NATted to A, the conntrack entry will have the following tuples (ignoring the protocol and port numbers) after the conntrack entry is committed: Original direction tuple: (X,B) Reply direction tuple: (B,A) Now, if a reply packet is already transformed back to the private address space (e.g., with a CT(nat) action), the tuple corresponding to the current packet headers is: Current packet tuple: (B,X) This does not match either of the conntrack tuples above. Normally this does not matter, as the conntrack lookup was already done using the tuple (B,A), but if the current packet does not match any flow in the OVS datapath, the packet is sent to userspace via an upcall, during which the packet's skb is freed, and the conntrack entry pointer in the skb is lost. When the packet is reintroduced to the datapath, any further conntrack action will need to perform a new conntrack lookup to find the entry again. Prior to this patch this second lookup failed. The datapath flow setup corresponding to the upcall can succeed, however, allowing all further packets in the reply direction to re-use the conntrack entry pointer in the skb, so typically the lookup failure only causes a packet drop. The solution is to invert the tuple derived from the current packet headers in case the conntrack state stored in the packet metadata indicates that the packet has been transformed by NAT: Inverted tuple: (X,B) With this the conntrack entry can be found, matching the original direction tuple. This same logic also works for the original direction packets: Current packet tuple (after reverse NAT): (A,B) Inverted tuple: (B,A) While the current packet tuple (A,B) does not match either of the conntrack tuples, the inverted one (B,A) does match the reply direction tuple. Since the inverted tuple matches the reverse direction tuple the direction of the packet must be reversed as well. Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Fix comments for skb->_nfctJarno Rajahalme2017-03-081-7/+7
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5e17da634a21b1200853fe82ba67d6571f2beabe Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:21:52 2017 -0800 openvswitch: Fix comments for skb->_nfct Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that they are combined into '_nfct'. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: add and use nf_ct_set helperFlorian Westphal2017-03-082-4/+10
| | | | | | | | | | | | | | | | | | | Upstream commit: commit c74454fadd5ea6fc866ffe2c417a0dba56b2bf1c Author: Florian Westphal <fw@strlen.de> Date: Mon Jan 23 18:21:57 2017 +0100 netfilter: add and use nf_ct_set helper Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff. This avoids changing code in followup patch that merges skb->nfct and skb->nfctinfo into skb->_nfct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: add and use skb_nfct helperFlorian Westphal2017-03-082-3/+14
| | | | | | | | | | | | | | | | | | Upstream commit: commit cb9c68363efb6d1f950ec55fb06e031ee70db5fc Author: Florian Westphal <fw@strlen.de> Date: Mon Jan 23 18:21:56 2017 +0100 skbuff: add and use skb_nfct helper Followup patch renames skb->nfct and changes its type so add a helper to avoid intrusive rename change later. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* dpif: Meter framework.Jarno Rajahalme2017-03-081-1/+3
| | | | | | | | Add DPIF-level infrastructure for meters. Allow meter_set to modify the meter configuration (e.g. set the burst size if unspecified). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>
* datapath: Simplify do_execute_actions().andy zhou2017-03-031-22/+20
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5b8784aaf29be20ba8d363e1124d7436d42ef9bf Author: Andy Zhou <azhou@ovn.org> Date: Fri Jan 27 13:45:28 2017 -0800 openvswitch: Simplify do_execute_actions(). do_execute_actions() implements a worthwhile optimization: in case an output action is the last action in an action list, skb_clone() can be avoided by outputing the current skb. However, the implementation is more complicated than necessary. This patch simplify this logic. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 5b8784aaf29b ("openvswitch: Simplify do_execute_actions().") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: maintain correct checksum state in conntrack actions.Lance Richardson2017-03-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 75f01a4c9cc291ff5cb28ca1216adb163b7a20ee Author: Lance Richardson <lrichard@redhat.com> Date: Thu Jan 12 19:33:18 2017 -0500 openvswitch: maintain correct checksum state in conntrack actions When executing conntrack actions on skbuffs with checksum mode CHECKSUM_COMPLETE, the checksum must be updated to account for header pushes and pulls. Otherwise we get "hw csum failure" logs similar to this (ICMP packet received on geneve tunnel via ixgbe NIC): [ 405.740065] genev_sys_6081: hw csum failure [ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1 [ 405.740108] Call Trace: [ 405.740110] <IRQ> [ 405.740113] dump_stack+0x63/0x87 [ 405.740116] netdev_rx_csum_fault+0x3a/0x40 [ 405.740118] __skb_checksum_complete+0xcf/0xe0 [ 405.740120] nf_ip_checksum+0xc8/0xf0 [ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4] [ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack] [ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch] [ 405.740145] ? netif_rx_internal+0x44/0x110 [ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch] [ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch] [ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch] [ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch] [ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch] [ 405.740168] ? udp_rcv+0x1a/0x20 [ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0 [ 405.740172] ? ip_local_deliver+0x6f/0xe0 [ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0 [ 405.740176] ? ip_rcv_finish+0xdb/0x3a0 [ 405.740177] ? ip_rcv+0x2a7/0x400 [ 405.740180] ? __netif_receive_skb_core+0x970/0xa00 [ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch] [ 405.740187] __netif_receive_skb_core+0x1dc/0xa00 [ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe] [ 405.740197] __netif_receive_skb+0x18/0x60 [ 405.740199] netif_receive_skb_internal+0x40/0xb0 [ 405.740201] napi_gro_receive+0xcd/0x120 [ 405.740204] gro_cell_poll+0x57/0x80 [geneve] [ 405.740206] net_rx_action+0x260/0x3c0 [ 405.740209] __do_softirq+0xc9/0x28c [ 405.740211] irq_exit+0xd9/0xf0 [ 405.740213] do_IRQ+0x51/0xd0 [ 405.740215] common_interrupt+0x93/0x93 Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 75f01a4c9cc2 ("openvswitch: maintain correct checksum state in conntrack actions") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: make ndo_get_stats64 a void functionstephen hemminger2017-03-031-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit bc1f44709cf27fb2a5766cadafe7e2ad5e9cb221 Author: stephen hemminger <stephen@networkplumber.org> Date: Fri Jan 6 19:12:52 2017 -0800 net: make ndo_get_stats64 a void function The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> This seems to be fine for all prior Linux versions as well. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: netns: make struct pernet_operations::id unsigned int.Alexey Dobriyan2017-03-022-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c7d03a00b56fc23c3a01a8353789ad257363e281 Author: Alexey Dobriyan <adobriyan@gmail.com> Date: Thu Nov 17 04:58:21 2016 +0300 netns: make struct pernet_operations::id unsigned int Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] It looks like changing the type of this doesn't affect the build on older kernels, so we can just make the change. I didn't go through all of the compat code to update the net_id variables there as none of that code should be enabled on kernels with this patch. Upstream: c7d03a00b56f ("netns: make struct pernet_operations::id unsigned int") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: allow L3 netdev portsYang, Yi Y2017-03-021-3/+6
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 217ac77a3c2524d999730b2a80b61fcc2d0f734a Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:24 2016 +0100 openvswitch: allow L3 netdev ports Allow ARPHRD_NONE interfaces to be added to ovs bridge. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: add Ethernet push and pop actionsYang, Yi Y2017-03-023-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:23 2016 +0100 openvswitch: add Ethernet push and pop actions It's not allowed to push Ethernet header in front of another Ethernet header. It's not allowed to pop Ethernet header if there's a vlan tag. This preserves the invariant that L3 packet never has a vlan tag. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] Fix build with the upstream commit by folding in the required switch case enum handlers. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: netlink: support L3 packetsYang, Yi Y2017-03-023-88/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 0a6410fbde597ebcf82dda4a0b0e889e82242678 Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:22 2016 +0100 openvswitch: netlink: support L3 packets Extend the ovs flow netlink protocol to support L3 packets. Packets without OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the OVS_KEY_ATTR_ETHERTYPE attribute is mandatory. Push/pop vlan actions are only supported for Ethernet packets. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 87e159c59d9f325d571689d4027115617adb32e6 Author: Jarno Rajahalme <jarno@ovn.org> Date: Mon Dec 19 17:06:33 2016 -0800 openvswitch: Add a missing break statement. Add a break statement to prevent fall-through from OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break actions setting ethernet addresses fail to validate with log messages complaining about invalid tunnel attributes. Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit df30f7408b187929dbde72661c7f7c615268f1d0 Author: pravin shelar <pshelar@ovn.org> Date: Mon Dec 26 08:31:27 2016 -0800 openvswitch: upcall: Fix vlan handling. Networking stack accelerate vlan tag handling by keeping topmost vlan header in skb. This works as long as packet remains in OVS datapath. But during OVS upcall vlan header is pushed on to the packet. When such packet is sent back to OVS datapath, core networking stack might not handle it correctly. Following patch avoids this issue by accelerating the vlan tag during flow key extract. This simplifies datapath by bringing uniform packet processing for packets from all code paths. Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <jarno@ovn.org> CC: Jiri Benc <jbenc@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer Notes] Squashed in the following upstream commits to retain bisectability: 87e159c59d9f ("openvswitch: Add a missing break statement.") df30f7408b18 ("openvswitch: upcall: Fix vlan handling.") Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: add processing of L3 packetsYang, Yi Y2017-03-025-38/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5108bbaddc37c1c8583f0cf2562d7d3463cd12cb Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:21 2016 +0100 openvswitch: add processing of L3 packets Support receiving, extracting flow key and sending of L3 packets (packets without an Ethernet header). Note that even after this patch, non-Ethernet interfaces are still not allowed to be added to bridges. Similarly, netlink interface for sending and receiving L3 packets to/from user space is not in place yet. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: support MPLS push and pop for L3 packetsYang, Yi Y2017-03-021-7/+11
| | | | | | | | | | | | | | | | | | Upstream commit: commit 1560a074df6297e76278e459ca3eb9ff83a6f878 Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:20 2016 +0100 openvswitch: support MPLS push and pop for L3 packets Update Ethernet header only if there is one. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: pass mac_proto to ovs_vport_sendYang, Yi Y2017-03-023-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit e2d9d8358cb961340ef88620b6a25ba4557033d5 Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:19 2016 +0100 openvswitch: pass mac_proto to ovs_vport_send We'll need it to alter packets sent to ARPHRD_NONE interfaces. Change do_output() to use the actual L2 header size of the packet when deciding on the minimum cutlen. The assumption here is that what matters is not the output interface hard_header_len but rather the L2 header of the particular packet. For example, ARPHRD_NONE tunnels that encapsulate Ethernet should get at least the Ethernet header. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] This is not identical to upstream, because the OVS tree is missing upstream commit c66549ffd666 ("openvswitch: correctly fragment packet with mpls headers") Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: add mac_proto field to the flow keyYang, Yi Y2017-03-024-11/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 329f45bc4f191c663dc156c510816411a4310578 Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:18 2016 +0100 openvswitch: add mac_proto field to the flow key Use a hole in the structure. We support only Ethernet so far and will add a support for L2-less packets shortly. We could use a bool to indicate whether the Ethernet header is present or not but the approach with the mac_proto field is more generic and occupies the same number of bytes in the struct, while allowing later extensibility. It also makes the code in the next patches more self explaining. It would be nice to use ARPHRD_ constants but those are u16 which would be waste. Thus define our own constants. Another upside of this is that we can overload this new field to also denote whether the flow key is valid. This has the advantage that on refragmentation, we don't have to reparse the packet but can rely on the stored eth.type. This is especially important for the next patches in this series - instead of adding another branch for L2-less packets before calling ovs_fragment, we can just remove all those branches completely. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: use hard_header_len instead of hardcoded ETH_HLENYang, Yi Y2017-03-022-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 738314a084aae5f76ff760279034b39d52c42e8b Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:17 2016 +0100 openvswitch: use hard_header_len instead of hardcoded ETH_HLEN On tx, use hard_header_len while deciding whether to refragment or drop the packet. That way, all combinations are calculated correctly: * L2 packet going to L2 interface (the L2 header len is subtracted), * L2 packet going to L3 interface (the L2 header is included in the packet lenght), * L3 packet going to L3 interface. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: handle NF_REPEAT from nf_conntrack_in()Pablo Neira Ayuso2017-03-022-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 08733a0cb7decce40bbbd0331a0449465f13c444 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Thu Nov 3 10:56:43 2016 +0100 netfilter: handle NF_REPEAT from nf_conntrack_in() NF_REPEAT is only needed from nf_conntrack_in() under a very specific case required by the TCP protocol tracker, we can handle this case without returning to the core hook path. Handling of NF_REPEAT from the nf_reinject() is left untouched. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [Committer notes] Shift the functionality into the compat code, protected by v4.10 version check. This allows the datapath/conntrack.c to match upstream. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: use core MTU range checking in core net infraJarod Wilson2017-03-022-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 61e84623ace35ce48975e8f90bbbac7557c43d61 Author: Jarod Wilson <jarod@redhat.com> Date: Fri Oct 7 22:04:33 2016 -0400 net: centralize net_device min/max MTU checking While looking into an MTU issue with sfc, I started noticing that almost every NIC driver with an ndo_change_mtu function implemented almost exactly the same range checks, and in many cases, that was the only practical thing their ndo_change_mtu function was doing. Quite a few drivers have either 68, 64, 60 or 46 as their minimum MTU value checked, and then various sizes from 1500 to 65535 for their maximum MTU value. We can remove a whole lot of redundant code here if we simple store min_mtu and max_mtu in net_device, and check against those in net/core/dev.c's dev_set_mtu(). In theory, there should be zero functional change with this patch, it just puts the infrastructure in place. Subsequent patches will attempt to start using said infrastructure, with theoretically zero change in functionality. CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 91572088e3fdbf4fe31cf397926d8b890fdb3237 Author: Jarod Wilson <jarod@redhat.com> Date: Thu Oct 20 13:55:20 2016 -0400 net: use core MTU range checking in core net infra ... openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported ... Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Upstream commit: commit 425df17ce3a26d98f76e2b6b0af2acf4aeb0b026 Author: Jarno Rajahalme <jarno@ovn.org> Date: Tue Feb 14 21:16:28 2017 -0800 openvswitch: Set internal device max mtu to ETH_MAX_MTU. Commit 91572088e3fd ("net: use core MTU range checking in core net infra") changed the openvswitch internal device to use the core net infra for controlling the MTU range, but failed to actually set the max_mtu as described in the commit message, which now defaults to ETH_DATA_LEN. This patch fixes this by setting max_mtu to ETH_MAX_MTU after ether_setup() call. Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This backport detects the new max_mtu field in the struct netdevice and uses the upstream code if it exists, and local backport code if not. The latter case is amended with bounds checks with new upstream macros ETH_MIN_MTU and ETH_MAX_MTU and the corresponding error messages from the upstream commit. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: remove unnecessary EXPORT_SYMBOLsJiri Benc2017-03-023-4/+0
| | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 76e4cc7731a1e0c07e202999b9834f9d9be66de4 Author: Jiri Benc <jbenc@redhat.com> Date: Wed Oct 19 11:26:37 2016 +0200 openvswitch: remove unnecessary EXPORT_SYMBOLs Some symbols exported to other modules are really used only by openvswitch.ko. Remove the exports. Tested by loading all 4 openvswitch modules, nothing breaks. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: remove unused functionsJiri Benc2017-03-022-17/+0
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit f33eb0cf9984f79e8643eaac888e4b6a06a8e221 Author: Jiri Benc <jbenc@redhat.com> Date: Wed Oct 19 11:26:36 2016 +0200 openvswitch: remove unused functions ovs_vport_deferred_free is not used anywhere. It's the only caller of free_vport_rcu thus this one can be removed, too. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: add NETIF_F_HW_VLAN_STAG_TX to internal dev.Jiri Benc2017-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 3145c037e74926dea9241a3f68ada6f294b0119a Author: Jiri Benc <jbenc@redhat.com> Date: Mon Oct 10 17:02:44 2016 +0200 openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev The internal device does support 802.1AD offloading since 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 3145c037e749 ("openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: avoid resetting flow key while installing new flow.pravin shelar2017-03-024-9/+10
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit 2279994d07ab67ff7a1d09bfbd65588332dfb6d8 Author: pravin shelar <pshelar@ovn.org> Date: Mon Sep 19 13:51:00 2016 -0700 openvswitch: avoid resetting flow key while installing new flow. since commit commit db74a3335e0f6 ("openvswitch: use percpu flow stats") flow alloc resets flow-key. So there is no need to reset the flow-key again if OVS is using newly allocated flow-key. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: Fix Frame-size larger than 1024 bytes warning.pravin shelar2017-03-021-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 190aa3e77880a05332ea1ccb382a51285d57adb5 Author: pravin shelar <pshelar@ovn.org> Date: Mon Sep 19 13:50:59 2016 -0700 openvswitch: Fix Frame-size larger than 1024 bytes warning. There is no need to declare separate key on stack, we can just use sw_flow->key to store the key directly. This commit fixes following warning: net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’: net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: use percpu flow statsThadeu Lima de Souza Cascardo2017-03-023-39/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit db74a3335e0f645e3139c80bcfc90feb01d8e304 Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Date: Thu Sep 15 19:11:53 2016 -0300 openvswitch: use percpu flow stats Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: fix flow stats accounting when node 0 is not possibleThadeu Lima de Souza Cascardo2017-03-022-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 40773966ccf1985a1b2bb570a03cbeaf1cbd4e00 Author: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Date: Thu Sep 15 19:11:52 2016 -0300 openvswitch: fix flow stats accounting when node 0 is not possible On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> This patch contained a memory leak that is fixed in this backport. The next patch silently fixed that in upstream, too. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: 802.1AD Flow handling, actions, vlan parsing, netlink attributesYang, Yi Y2017-03-025-125/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 018c1dda5ff1e7bd1fe2d9fd1d0f5b82dc6fc0cd Author: Eric Garver <e@erig.me> Date: Wed Sep 7 12:56:59 2016 -0400 openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes Add support for 802.1ad including the ability to push and pop double tagged vlans. Add support for 802.1ad to netlink parsing and flow conversion. Uses double nested encap attributes to represent double tagged vlan. Inner TPID encoded along with ctci in nested attributes. This is based on Thomas F Herbert's original v20 patch. I made some small clean ups and bug fixes. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 20ecf1e4e30005ad50f561a92c888b6477f99341 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Oct 10 17:02:42 2016 +0200 openvswitch: vlan: remove wrong likely statement This code is called whenever flow key is being extracted from the packet. The packet may be as likely vlan tagged as not. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 72ec108d701506fa6cd2f66ec5b15ea71df3c464 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Oct 10 17:02:43 2016 +0200 openvswitch: fix vlan subtraction from packet length When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN header is not counted in skb->len. It doesn't make sense to subtract it. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] The following commits upstream fix bugs in this patch, so to retain bisectability of the OVS tree they were rolled into this commit: 20ecf1e4e300 openvswitch: vlan: remove wrong likely statement 72ec108d7015 openvswitch: fix vlan subtraction from packet length Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: backport: vlan: Check for vlan ethernet types for 8021.q or 802.1adYang, Yi Y2017-03-011-8/+24
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit fe19c4f971a55cea3be442d8032a5f6021702791 Author: Eric Garver <e@erig.me> Date: Wed Sep 7 12:56:58 2016 -0400 This is to simplify using double tagged vlans. This function allows all valid vlan ethertypes to be checked in a single function call. Also replace some instances that check for both ETH_P_8021Q and ETH_P_8021AD. Patch based on one originally by Thomas F Herbert. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: backport: openvswitch: 802.1ad uapi changes.Yang, Yi Y2017-03-011-8/+9
| | | | | | | | | | | | | | | | | | | Upstream commit: commit 8c146bb9d59aa2ac45222171916ece186c4b3943 Author: Thomas F Herbert <thomasfherbert@gmail.com> Date: Wed Sep 7 12:56:57 2016 -0400 openvswitch: Add support for 8021.AD Change the description of the VLAN tpid field. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: backport: vlan: Introduce helper functions to check if skb is taggedYang, Yi Y2017-03-011-0/+49
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit f5a7fb88e1f82542ca14ba93a1d4fa35471c60ca Author: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Date: Fri Mar 27 14:31:11 2015 +0900 vlan: Introduce helper functions to check if skb is tagged Separate the two checks for single vlan and multiple vlans in netif_skb_features(). This allows us to move the check for multiple vlans to another function later. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: backport: Fix vlan_insert_tag_set_proto().Yang, Yi Y2017-03-011-3/+4
| | | | | | | | | | | | | | | Fix cvlan test failure on old kernel versions with 802.1ad. The root cause is the upcall re-inserts the VLAN back into the raw packet data, but the TPID is hard coded to 0x8100. This affects kernels for which HAVE_VLAN_INSERT_TAG_SET_PROTO is not set. The below patch allows the cvlan and 802.ad tests to pass on debian with 3.16 kernel. Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Eric Garver <e@erig.me> Signed-off-by: Joe Stringer <joe@ovn.org>
* dpif-netdev: Add clone actionAndy Zhou2017-01-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for userspace datapath clone action. The clone action provides an action envelope to enclose an action list. For example, with actions A, B, C and D, and an action list: A, clone(B, C), D The clone action will ensure that: - D will see the same packet, and any meta states, such as flow, as action B. - D will be executed regardless whether B, or C drops a packet. They can only drop a clone. - When B drops a packet, clone will skip all remaining actions within the clone envelope. This feature is useful when we add meter action later: The meter action can be implemented as a simple action without its own envolop (unlike the sample action). When necessary, the flow translation layer can enclose a meter action in clone. The clone action is very similar with the OpenFlow clone action. This is by design to simplify vswitchd flow translation logic. Without datapath clone, vswitchd simulate the effect by inserting datapath actions to "undo" clone actions. The above flow will be translated into A, B, C, -C, -B, D. However, there are two issues: - The resulting datapath action list may be longer without using clone. - Some actions, such as NAT may not be possible to reverse. This patch implements clone() simply with packet copy. The performance can be improved with later patches, for example, to delay or avoid packet copy if possible. It seems datapath should have enough context to carry out such optimization without the userspace context. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: Ensure correct L4 checksum with NAT helpers.John Hurley2017-01-061-30/+17
| | | | | | | | | | | | | | | | Setting the CHECKSUM_PARTIAL flag before sending to helper mods was missing the checksum update call ('csum_*_magic()'), which caused checksum failures with kernels <4.6. This can mean that the L4 checksum is incorrect when the packet egresses the system. Rather than adding the missing (IP version dependent) calls, give the packet a temp skb_dst with RTCF_LOCAL flag not set, which ensures the skb is properly changed to CHECKSUM_PARTIAL if required and the modified packet will get the correct checksum when fully processed. This has tested with FTP NAT helpers on kernel version 3.13. Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* datapath: compat: Fix build on RHEL 7.3Yi-Hung Wei2016-12-146-4/+43
| | | | | | | | | | | | RHEL 7.3 provides upstream tunnel but it does not support name_assign_type attribute in net-device. This patch fixes the build problem by backporting functions with name_assign_type, and using proper flags in acinclude.m4 to invoke backport functions. Tested on RHEL 7.3 with kernel 3.10.0-514.el7.x86_64 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* doc: Populate 'topics' sectionStephen Finucane2016-12-122-268/+0
| | | | | | | | | | | There are many docs that don't need to kept at the top level, along with many more hidden in random folders. Move them all. This also allows us to add the '-W' flag to Sphinx, ensuring unindexed docs result in build failures. Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Fix compile time assertion.Jarno Rajahalme2016-12-091-1/+3
| | | | | | | | compiletime_assert() cannot be used in file scope, so use preprocessor directives instead. Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Allow compile against current net-next.Jarno Rajahalme2016-12-094-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows openvswitch kernel module in the OVS tree to be compiled against the current net-next Linux kernel. The changes are due to these upstream commits: 56989f6d856 ("genetlink: mark families as __ro_after_init") 489111e5c25 ("genetlink: statically initialize families") a07ea4d9941 ("genetlink: no longer support using static family IDs") struct genl_family initialization is changed be completely static and to include the new (in Linux 4.6) __ro_after_init attribute. Compat code defines it as an empty macro if not defined already. GENL_ID_GENERATE is no longer defined, but since it was defined as 0, it is safe to drop it from all initializers also on older Linux versions. A compiletime_assert is added to make sure this is true whenever GENL_ID_GENERATE is defined. Tested with current Linux net-next (4.9) and 3.16. It should be noted that there are still a number of fixes and new features in upstream net-next that are yet to be backported. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* datapath: backport: openvswitch: Fix skb leak in IPv6 reassembly.Daniele Di Proietto2016-11-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | commit f92a80a9972175a6a1d36c6c44be47fb0efd020d Author: Daniele Di Proietto <diproiettod@ovn.org> Date: Mon Nov 28 15:43:53 2016 -0800 openvswitch: Fix skb leak in IPv6 reassembly. If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it means that we still have a reference to the skb. We should free it before returning from handle_fragments, as stated in the comment above. Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion") CC: Florian Westphal <fw@strlen.de> CC: Pravin B Shelar <pshelar@ovn.org> CC: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> VMware-BZ: #1728498 Fixes: 2e602ea3dafa("compat: nf_defrag_ipv6: avoid nf_iterate recursion.") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: compat: vxlan: Avoid possible NULL dereference in vxlan_gro_receive.Zhang Dongya2016-11-131-1/+1
| | | | | | | | | | | With Linux kernel that does not have HAVE_UDP_OFFLOAD_ARG_UOFF macro detected, struct vxlan_sock *vs will be NULL, which will make kernel crash when receiving VXLAN packet that have RCO flag turn on or even invalid packet that is destined to VXLAN port which have the bit on in the RCO flag position. Signed-off-by: Zhang Dongya <fortitude.zhang@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* doc: Convert datapath/README to rSTStephen Finucane2016-11-033-266/+266
| | | | | Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Russell Bryant <russell@ovn.org>
* datapath: geneve: Handle vlan tagPravin B Shelar2016-11-011-2/+31
| | | | | | | | | The compat vlan code ignores vlan tag for inner packet on egress path. Following patch fixes this by inserting the tag for inner packet before tunnel encapsulation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>