summaryrefslogtreecommitdiff
path: root/datapath
Commit message (Collapse)AuthorAgeFilesLines
* datapath: compat: Fix build on RHEL 7.4Yi-Hung Wei2017-08-231-0/+2
| | | | | | | | RHEL 7.4 introduces netdev_master_upper_dev_link_rh() that breaks the backport of OVS kernel module on RHEL 7.4. This patch fixes that issue. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* Generic encap and decap support for NSHJan Scheurich2017-08-071-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | This commit adds translation and netdev datapath support for generic encap and decap actions for the NSH MD1 header. The generic encap and decap actions are mapped to specific encap_nsh and decap_nsh actions in the datapath. The translation follows that general scheme that decap() of an NSH packet triggers recirculation after decapsulation, while encap(nsh) just modifies struct flow and sets the ctx->pending_encap flag to generate the encap_nsh action at the next commit to be able to include subsequent set_field actions for NSH headers. Support for the flexible MD2 format using TLV properties is foreseen in encap(nsh), but not yet fully implemented. The CLI syntax for encap of NSH is encap(nsh(md_type=1)) encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...])) Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: Add support for NSH MD1 match fieldsJan Scheurich2017-08-071-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for NSH packet header fields to the OVS control plane and the userspace datapath. Initially we support the fields of the NSH base header as defined in https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt and the fixed context headers specified for metadata format MD1. The variable length MD2 format is parsed but the TLV context headers are not yet available for matching. The NSH fields are modelled as experimenter fields with the dedicated experimenter class 0x005ad650 proposed for NSH in ONF. The following fields are defined: NXOXM code ofctl name Size Comment ===================================================================== NXOXM_NSH_FLAGS nsh_flags 8 Bits 2-9 of 1st NSH word (0x005ad650,1) NXOXM_NSH_MDTYPE nsh_mdtype 8 Bits 16-23 (0x005ad650,2) NXOXM_NSH_NEXTPROTO nsh_np 8 Bits 24-31 (0x005ad650,3) NXOXM_NSH_SPI nsh_spi 24 Bits 0-23 of 2nd NSH word (0x005ad650,4) NXOXM_NSH_SI nsh_si 8 Bits 24-31 (0x005ad650,5) NXOXM_NSH_C1 nsh_c1 32 Maskable, nsh_mdtype==1 (0x005ad650,6) NXOXM_NSH_C2 nsh_c2 32 Maskable, nsh_mdtype==1 (0x005ad650,7) NXOXM_NSH_C3 nsh_c3 32 Maskable, nsh_mdtype==1 (0x005ad650,8) NXOXM_NSH_C4 nsh_c4 32 Maskable, nsh_mdtype==1 (0x005ad650,9) Co-authored-by: Johnson Li <johnson.li@intel.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: fix potential out of bound access in parse_ctGreg Rose2017-07-261-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c Author: Liping Zhang <zlpnobody@gmail.com> Date: Sun Jul 23 17:52:23 2017 +0800 openvswitch: fix potential out of bound access in parse_ct Before the 'type' is validated, we shouldn't use it to fetch the ovs_ct_attr_lens's minlen and maxlen, else, out of bound access may happen. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Pick up an upstream bug fix. Fixes: a94ebc39996b ("datapath: Add conntrack action") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: Fix for force/commit action failuresGreg Rose2017-07-241-15/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 8b97ac5bda17cfaa257bcab6180af0f43a2e87e0 Author: Greg Rose <gvrose8192@gmail.com> Date: Fri Jul 14 12:42:49 2017 -0700 openvswitch: Fix for force/commit action failures When there is an established connection in direction A->B, it is possible to receive a packet on port B which then executes ct(commit,force) without first performing ct() - ie, a lookup. In this case, we would expect that this packet can delete the existing entry so that we can commit a connection with direction B->A. However, currently we only perform a check in skb_nfct_cached() for whether OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a lookup previously occurred. In the above scenario, a lookup has not occurred but we should still be able to statelessly look up the existing entry and potentially delete the entry if it is in the opposite direction. This patch extends the check to also hint that if the action has the force flag set, then we will lookup the existing entry so that the force check at the end of skb_nfct_cached has the ability to delete the connection. Fixes: dd41d330b03 ("openvswitch: Add force commit.") CC: Pravin Shelar <pshelar@nicira.com> CC: dev@openvswitch.org Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Co-authored-by: Joe Stringer <joe@ovn.org> Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com>
* datapath: fix mis-ordered comment lines for ovs_skb_cbGreg Rose2017-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 52427fa0631269c62885dc48e0c32e2ad6e17f8c Author: Daniel Axtens <dja@axtens.net> Date: Mon Jul 3 21:46:43 2017 +1000 openvswitch: fix mis-ordered comment lines for ovs_skb_cb I was trying to wrap my head around meaning of mru, and realised that the second line of the comment defining it had somehow ended up after the line defining cutlen, leading to much confusion. Reorder the lines to make sense. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: Avoid using stack larger than 1024.Tonghao Zhang2017-07-241-23/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f Author: Tonghao Zhang <xiangxia.m.yue@gmail.com> Date: Thu Jun 29 17:27:44 2017 -0700 datapath: Avoid using stack larger than 1024. When compiling OvS-master on 4.4.0-81 kernel, there is a warning: CC [M] /root/ovs/datapath/linux/datapath.o /root/ovs/datapath/linux/datapath.c: In function 'ovs_flow_cmd_set': /root/ovs/datapath/linux/datapath.c:1221:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] This patch factors out match-init and action-copy to avoid "Wframe-larger-than=1024" warning. Because mask is only used to get actions, we new a function to save some stack space. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* compat: net: store port/representator id in metadata_dst.Joe Stringer2017-07-242-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 3fcece12bc1b6dcdf0986f2cd9e8f63b1f9b6aa0 Author: Jakub Kicinski <jakub.kicinski@netronome.com> Date: Fri Jun 23 22:11:58 2017 +0200 net: store port/representator id in metadata_dst Switches and modern SR-IOV enabled NICs may multiplex traffic from Port representators and control messages over single set of hardware queues. Control messages and muxed traffic may need ordered delivery. Those requirements make it hard to comfortably use TC infrastructure today unless we have a way of attaching metadata to skbs at the upper device. Because single set of queues is used for many netdevs stopping TC/sched queues of all of them reliably is impossible and lower device has to retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on the fastpath. This patch attempts to enable port/representative devs to attach metadata to skbs which carry port id. This way representatives can be queueless and all queuing can be performed at the lower netdev in the usual way. Traffic arriving on the port/representative interfaces will be have metadata attached and will subsequently be queued to the lower device for transmission. The lower device should recognize the metadata and translate it to HW specific format which is most likely either a special header inserted before the network headers or descriptor/metadata fields. Metadata is associated with the lower device by storing the netdev pointer along with port id so that if TC decides to redirect or mirror the new netdev will not try to interpret it. This is mostly for SR-IOV devices since switches don't have lower netdevs today. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 3fcece12bc1b ("net: store port/representator id in metadata_dst") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>
* datapath: get rid of redundant vxlan_dev.flagsGreg Rose2017-07-241-0/+8
| | | | | | | | | | | | | | | | | | | | | Upstream commit: commit dc5321d79697db1b610c25fa4fad1aec7533ea3e Author: Matthias Schiffer <mschiffer@universe-factory.net> Date: Mon Jun 19 10:03:56 2017 +0200 vxlan: get rid of redundant vxlan_dev.flags There is no good reason to keep the flags twice in vxlan_dev and vxlan_config. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net> Applied using HAVE_VXLAN_DEV_CFG compatibility flag defined in acinclude.m4. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* compat: Implement upstream net device free change.Greg Rose2017-07-242-0/+8
| | | | | | | | | | | | Upstream commit cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") removed the destructor member of the net_device structure and replaced it with a boolean flag indicating that the net device resource needs freeing. Use compat flag HAVE_NEEDS_FREE_NETDEV to indicate whether the new flag should be used. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* compat: convert many more places to skb_put_zero().Joe Stringer2017-07-242-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit de77b966ce8adcb4c58d50e2f087320d5479812a Author: Johannes Berg <johannes.berg@intel.com> Date: Fri Jun 16 14:29:19 2017 +0200 networking: convert many more places to skb_put_zero() There were many places that my previous spatch didn't find, as pointed out by yuan linyu in various patches. The following spatch found many more and also removes the now unnecessary casts: @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_zero(skb, len); | -p = (t)skb_put(skb, len); +p = skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); | -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(*p)); | -memset(p, 0, sizeof(*p)); ) @@ expression skb, len; @@ -memset(skb_put(skb, len), 0, len); +skb_put_zero(skb, len); Apply it to the tree (with one manual fixup to keep the comment in vxlan.c, which spatch removed.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Use e45a79da863c ("skbuff/mac80211: introduce and use skb_put_zero()") as the basis for the backported function. Upstream: de77b966ce8a ("networking: convert many more places to skb_put_zero()") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Fix inconsistent teardown and release of private netdev state.Greg Rose2017-07-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit cf124db566e6b036b8bcbe8decbed740bdfac8c6 Author: David S. Miller <davem@davemloft.net> Date: Mon May 8 12:52:56 2017 -0400 net: Fix inconsistent teardown and release of private netdev state. Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net> Applied the portion of the commit applicable to openvswitch. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: more accurate checksumming in queue_userspace_packet()Joe Stringer2017-07-242-1/+16
| | | | | | | | | | | | | | | | | | Upstream commit: commit 7529390d08f07fbf9b0174c5a87600b5caa1a8e8 Author: Davide Caratti <dcaratti@redhat.com> Date: Thu May 18 15:44:42 2017 +0200 openvswitch: more accurate checksumming in queue_userspace_packet() if skb carries an SCTP packet and ip_summed is CHECKSUM_PARTIAL, it needs CRC32c in place of Internet Checksum: use skb_csum_hwoffload_help to avoid corrupting such packets while queueing them towards userspace. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: introduce nf_conntrack_helper_put functionGreg Rose2017-07-243-2/+15
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit d91fc59cd77c719f33eda65c194ad8f95a055190 Author: Liping Zhang <zlpnobody@gmail.com> Date: Sun May 7 22:01:55 2017 +0800 netfilter: introduce nf_conntrack_helper_put helper function And convert module_put invocation to nf_conntrack_helper_put, this is prepared for the followup patch, which will add a refcnt for cthelper, so we can reject the deleting request when cthelper is in use. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Applied with additional use of HAVE_NF_CONNTRACK_HELPER_PUT compatibility flag defined in acinclude.m4. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: Fix kernel panic for ovs reassemble.wangzhike2017-07-218-123/+138
| | | | | | | | | | | | Ovs and kernel stack would add frag_queue to same netns_frags list. As result, ovs and kernel may access the fraq_queue without correct lock. Also the struct ipq may be different on kernel(older than 4.3), which leads to invalid pointer access. The fix creates specific netns_frags for ovs. Signed-off-by: wangzhike <wangzhike@jd.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: enable VxLAN-gpe port creation in compat modeYang, Yi Y2017-07-191-0/+15
| | | | | | | | | In compat mode, ovs can't create L3 VxLAN-gpe port in old kernels if port creation failed by rtnetlink, this patch enables old kernels to create L3 VxLAN-gpe port. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* netdev: fix missing shifts of VXLAN_EXT_GPEEric Garver2017-07-131-2/+3
| | | | | | | | | | Contrary to the comment by the enum value, these are actually regular enum values that need shifted. VXLAN_EXT_GBP for example is used as a netlink value for vports. Fixes: 875ab13020b1 ("userspace: Handling of versatile tunnel ports") Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: Fix missing "_ATTR" docstrings from some actions.Justin Pettit2017-06-281-2/+2
| | | | | Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* openvswitch.h: OVS_KEY_ATTR_PACKET_TYPE is userspace-only.Ben Pfaff2017-06-271-0/+4
| | | | | | | | This wasn't clear before. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/334271.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>
* openvswitch.h: Use odp_port_t for port numbers in userspace-only structs.Ben Pfaff2017-06-201-2/+2
| | | | | | | | Using the correct type reduces the need for type conversions. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Reviewed-by: nickcooper-zhangtonghao <nic@opencloud.tech>
* compat: Restrict __ro_after_init usageGreg Rose2017-06-191-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attribute __ro_after_init was introduced in Linux kernel 4.5. If a data structure is given this attribute then after the driver module loads the memory page where the data resides will be marked read only. The compat code in cache.h always defines __ro_after_init if it is not already defined so that it can be used as an attribute for the datapath genl_family structure definitions. If __ro_after_init is defined then it is used "as-is" where it will apply the read only attribute after driver initialization. This is incorrect usage for the Generic Netlink genl_family structure definitions prior to Linux kernel 4.10. The genl_family structure in those kernels includes a list header member that will be written to when the generic netlink family is unregistered. This will cause a subsequent page fault and kernel panic because at this time the genl_family structure data has been marked read only in the page descriptor. A new compat macro is introduced in acinclude.m4 to detect when the genl_family structure has the family_list list header as a member. In this case HAVE_GENL_FAMILY_LIST is defined and if __ro_after_init is also defined then it is undefined and redefined as empty. This will prevent the genl_family data structure from being marked read only in kernels 4.5 through 4.9 and thus prevent the page fault when the generic netlink families in datapath.c are unregistered. [Committer notes] * Rolled a short explanation comment into the code. Fixes: ba63fe260bd5 ("datapath: Allow compile against current net-next.") CC: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* userspace: add vxlan gpe support to vportGeorg Schmuecking2017-06-021-0/+1
| | | | | | | | | | | | | | This patch is based on the "datapath: enable vxlangpe creation in compat mode" from Yi Yang. It introduces an extension option "gpe" to the vxlan port in the netdev-dpdk datapath. Description of vxlan gpe protocoll was added to header file lib/packets.h. In the vxlan specific methods the different packet are introduced and handled. Added VXLAN GPE tunnel push test. Signed-off-by: Yi Yang <yi.y.yang at intel.com> Signed-off-by: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: Switching of L3 packets in L2 pipelineJan Scheurich2017-06-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ports have a new layer3 attribute if they send/receive L3 packets. The packet_type included in structs dp_packet and flow is considered in ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite. A dummy ethernet header is pushed to L3 packets received from L3 ports before the the pipeline processing starts. The ethernet header is popped before sending a packet to a L3 port. For datapath ports that can receive L2 or L3 packets, the packet_type becomes part of the flow key for datapath flows and is handled appropriately in dpif-netdev. In the 'else' branch in flow_put_on_pmd() function, the additional check flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls lookup is sufficient to uniquely identify a flow and b) it caused false negatives because the flow in netdev->flow may not properly masked. In dpif_netdev_flow_put() we now use the same method for constructing the netdev_flow_key as the one used when adding the flow to the dplcs to make sure these always match. The function netdev_flow_key_from_flow() used so far was not only inefficient but sometimes caused mismatches and subsequent flow update failures. The kernel datapath does not support the packet_type match field. Instead it encodes the packet type implictly by the presence or absence of the Ethernet attribute in the flow key and mask. This patch filters the PACKET_TYPE attribute out of netlink flow key and mask to be sent to the kernel datapath. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* packets: Remove unnecessary "packed" annotations.Ben Pfaff2017-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I know of two reasons to mark a structure as "packed". The first is because the structure must match some defined interface and therefore compiler-inserted padding between or after members would cause its layout to diverge from that interface. This is not a problem in a structure that follows the general alignment rules that are seen in ABIs for all the architectures that OVS cares about: basically, that a struct member needs to be aligned on a boundary that is a multiple of the member's size. The second reason is because instances of the struct tend to be at misaligned addresses. struct eth_header and struct vlan_eth_header are normally aligned on 16-bit boundaries (at least), and they contain only 16-bit members, so there's no need to pack them. This commit removes the packed annotation. This commit also removes the packed annotation from struct llc_header. Since that struct only contains 8-bit members, I don't know of any benefit to packing it, period. This commit also removes a few more packed annotations that are much less important. When these packed annotations were removed, it caused a few warnings related to casts from 'uint8_t *' to more strictly aligned pointer types, related to struct ovs_action_push_tnl. That's because that struct had a trailing member used to store packet headers, that was declared as a uint8_t[]. Before, when this was cast to 'struct eth_header *', there was no change in alignment since eth_header was packed; now that eth_header is not packed, the compiler considers it suspicious. This commit avoids that problem by changing the member from uint8_t[] to uint32_t[], which assures the compiler that it is properly aligned. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* userspace: Support for push_eth and pop_eth actionsJan Scheurich2017-05-081-3/+1
| | | | | | | | | | | | | | | Add support for actions push_eth and pop_eth to the netdev datapath and the supporting libraries. This patch relies on the support for these actions in the kernel datapath to be present. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: backport: vxlan: do not output confusing error messageJiri Benc2017-05-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit baf4d7860771287f30fbe9b6b2dc18b04361439d Author: Jiri Benc <jbenc@redhat.com> Date: Thu Apr 27 21:24:36 2017 +0200 vxlan: do not output confusing error message The message "Cannot bind port X, err=Y" creates only confusion. In metadata based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and no error message should be printed. But when IPv6 tunnel was requested, such failure is fatal. The vxlan_socket_create does not know when the error is harmless and when it's not. Instead of passing such information down to vxlan_socket_create, remove the message completely. It's not useful. We propagate the error code up to the user space and the port number comes from the user space. There's nothing in the message that the process creating vxlan interface does not know. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: backport: vxlan: correctly handle ipv6.disable module parameterJiri Benc2017-05-031-3/+7
| | | | | | | | | | | | | | | | | | | | | | | upstream commit: commit d074bf9600443403aa24fbc12c1f18eadc90f5aa Author: Jiri Benc <jbenc@redhat.com> Date: Thu Apr 27 21:24:35 2017 +0200 vxlan: correctly handle ipv6.disable module parameter When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Remove untracked CT on newer kernels.Joe Stringer2017-05-031-0/+2
| | | | | | | | | | | | | | Upstream commits cc41c84b7e7f ("netfilter: kill the fake untracked conntrack objects") and ab8bc7ed864b ("netfilter: remove nf_ct_is_untracked") removed the 'untracked' conntrack objects and functions. The latter commit removes the usage of nf_ct_is_untracked() from OVS. However, older kernels still have a representation of 'untracked' CT objects so the code needs to remain until the kernel support is bumped to Linux 4.12 or newer. Introduce a macro to detect this symbol and wrap these lines in the macro check. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>
* datapath: Fix ovs_flow_key_update()Yi-Hung Wei2017-05-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 6f56f6186c18e3fd54122b73da68e870687b8c59 Author: Yi-Hung Wei <yihung.wei@gmail.com> Date: Thu Mar 30 12:36:03 2017 -0700 ovs_flow_key_update() is called when the flow key is invalid, and it is used to update and revalidate the flow key. Commit 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") introduces mac_proto field to flow key and use it to determine whether the flow key is valid. However, the commit does not update the code path in ovs_flow_key_update() to revalidate the flow key which may cause BUG_ON() on execute_recirc(). This patch addresses the aforementioned issue. Fixes: 329f45bc4f19 ("openvswitch: add mac_proto field to the flow key") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: correctly fragment packet with mpls headersYi-Hung Wei2017-05-031-4/+20
| | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c66549ffd666605831abf6cf19ce0571ad868e39 Author: Jiri Benc <jbenc@redhat.com> Date: Wed Oct 5 15:01:57 2016 +0200 openvswitch: correctly fragment packet with mpls headers If mpls headers were pushed to a defragmented packet, the refragmentation no longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The network header has to be shifted after the mpls headers for the fragmentation and restored afterwards. Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: set network header correctly on key extractYi-Hung Wei2017-05-031-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit f7d49bce8e741e1e6aa14ce4db1b6cea7e4be4e8 Author: Jiri Benc <jbenc@redhat.com> Date: Fri Sep 30 19:08:05 2016 +0200 openvswitch: mpls: set network header correctly on key extract After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in openvswitch was changed to have network header pointing to the start of the MPLS headers and inner_network_header pointing after the MPLS headers. However, key_extract was missed by the mentioned commit, causing incorrect headers to be set when a MPLS packet just enters the bridge or after it is recirculated. Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* datapath: Fixups for MPLS GSOYi-Hung Wei2017-05-032-15/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch backports the following two upstream commits to fix MPLS GSO in ovs datapath. Starting from upstream commit 48d2ab609b6b ("net: mpls: Fixups for GSO"), the mpls_gso kernel module relies on the fact that skb_network_header() points to the mpls header and skb_inner_network_header() points to the L3 header so that it can derive the length of mpls header correctly, and the upstream commit updates how ovs datapath marks the skb header when push and pop mpls. However, the old mpls_gso kernel module assumes that the skb_network_header() points to the L3 header, and the old mpls_gso kernel module will misbehave if the ovs datapath marks the skb_network_header() in the new way since it will treat mpls header as the L3 header. Because of the functional signature of mpls_gso_segment() does not change, this backport patch uses the new mpls_hdr() to determine if the kernel that ovs datapath is compiled with has the new or legacy mpls_gso kernel module. It has been tested on kernel 4.4 and 4.9. Upstream commit: commit 48d2ab609b6bbecb7698487c8579bc40de9d6dfa Author: David Ahern <dsa@cumulusnetworks.com> Date: Wed Aug 24 20:10:44 2016 -0700 net: mpls: Fixups for GSO As reported by Lennert the MPLS GSO code is failing to properly segment large packets. There are a couple of problems: 1. the inner protocol is not set so the gso segment functions for inner protocol layers are not getting run, and 2 MPLS labels for packets that use the "native" (non-OVS) MPLS code are not properly accounted for in mpls_gso_segment. The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment to call the gso segment functions for the higher layer protocols. That means skb_mac_gso_segment is called twice -- once with the network protocol set to MPLS and again with the network protocol set to the inner protocol. This patch sets the inner skb protocol addressing item 1 above and sets the network_header and inner_network_header to mark where the MPLS labels start and end. The MPLS code in OVS is also updated to set the two network markers. >From there the MPLS GSO code uses the difference between the network header and the inner network header to know the size of the MPLS header that was pushed. It then pulls the MPLS header, resets the mac_len and protocol for the inner protocol and then calls skb_mac_gso_segment to segment the skb. Afterward the inner protocol segmentation is done the skb protocol is set to mpls for each segment and the network and mac headers restored. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream commit: commit 85de4a2101acb85c3b1dde465e84596ccca99f2c Author: Jiri Benc <jbenc@redhat.com> Date: Fri Sep 30 19:08:07 2016 +0200 openvswitch: use mpls_hdr skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* compat: Remove unused netdevice backport code.Joe Stringer2017-04-282-76/+0
| | | | | | | | Commit 8063e0958780 ("datapath: Drop support for kernel older than 3.10") dropped support for these kernels, remove the old compat code. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* compat: Fix build error in kernels 4.10Greg Rose2017-04-282-0/+24
| | | | | | | | | | | | | | | | | | | Use the acinclude.m4 configuration file to check for the net parameter that was added to the ipv4 and ipv6 frags init functions in the 4.10 Linux kernel to check whether DEFRAG_ENABLE_TAKES_NET should be set and then check for that at compile time. This is an alternative solution patch for the issue reported by Raymond Burkholder and the patch submitted by Guoshuai Li. [Committer notes] Squash in "acinclude.m4: Add check for struct net parameter" which provides the HAVE_DEFRAG_ENABLE_TAKES_NET. Reported-by: Raymond Burkholder <ray@oneunified.net> CC: Guoshuai Li <ligs@dtdream.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* datapath: Delete conntrack entry clashing with an expectation.Jarno Rajahalme2017-04-271-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit cf5d70918877c6a6655dc1e92e2ebb661ce904fd Author: Jarno Rajahalme <jarno@ovn.org> Date: Fri Apr 14 14:26:38 2017 -0700 openvswitch: Delete conntrack entry clashing with an expectation. Conntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: nf_connlabels_replace() backport.Jarno Rajahalme2017-04-271-0/+47
| | | | | | | | | | | | | | | | Upstream commit 5a8145f7b222 ("netfilter: labels: don't emit ct event if labels were not changed"), released in Linux 4.7, changed nf_connlabels_replace() to trigger conntrack event for a label change only when the labels actually changed. Without this change an update event is triggered even if the labels already have the values they are being set to. There is no way we can detect this functional change from Linux headers, so provide replacements that work the same for older Linux releases regardless if a distribution provides backports or not. VMware-BZ: #1837218 Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Add eventmask support to CT action.Jarno Rajahalme2017-04-272-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 120645513f55a4ac5543120d9e79925d30a0156f Author: Jarno Rajahalme <jarno@ovn.org> Date: Fri Apr 21 16:48:06 2017 -0700 openvswitch: Add eventmask support to CT action. Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK, which can be used in conjunction with the commit flag (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which conntrack events (IPCT_*) should be delivered via the Netfilter netlink multicast groups. Default behavior depends on the system configuration, but typically a lot of events are delivered. This can be very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some types of events are of interest. Netfilter core init_conntrack() adds the event cache extension, so we only need to set the ctmask value. However, if the system is configured without support for events, the setting will be skipped due to extension not being found. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Typo fix.Jarno Rajahalme2017-04-271-1/+1
| | | | | | | | | | | | | | | | | | | Upstream commit: commit abd0a4f2b41812e9ba334945e256909e3d28da57 Author: Jarno Rajahalme <jarno@ovn.org> Date: Fri Apr 21 16:48:05 2017 -0700 openvswitch: Typo fix. Fix typo in a comment. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: pass extended ACK struct to parsing functionsJohannes Berg2017-04-205-9/+32
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit fceb6435e85298f747fee938415057af837f5a8a Author: Johannes Berg <johannes.berg@intel.com> Date: Wed Apr 12 14:34:07 2017 +0200 netlink: pass extended ACK struct to parsing functions Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Fix refcount leak on force commit.Jarno Rajahalme2017-04-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit b768b16de58d5e0b1d7c3f936825b25327ced20c Author: Jarno Rajahalme <jarno@ovn.org> Date: Tue Mar 28 11:25:26 2017 -0700 openvswitch: Fix refcount leak on force commit. The reference count held for skb needs to be released when the skb's nfct pointer is cleared regardless of if nf_ct_delete() is called or not. Failing to release the skb's reference cound led to deferred conntrack cleanup spinning forever within nf_conntrack_cleanup_net_list() when cleaning up a network namespace:    kworker/u16:0-19025 [004] 45981067.173642: sched_switch: kworker/u16:0:19025 [120] R ==> rcu_preempt:7 [120]    kworker/u16:0-19025 [004] 45981067.173651: kernel_stack: <stack trace> => ___preempt_schedule (ffffffffa001ed36) => _raw_spin_unlock_bh (ffffffffa0713290) => nf_ct_iterate_cleanup (ffffffffc00a4454) => nf_conntrack_cleanup_net_list (ffffffffc00a5e1e) => nf_conntrack_pernet_exit (ffffffffc00a63dd) => ops_exit_list.isra.1 (ffffffffa06075f3) => cleanup_net (ffffffffa0607df0) => process_one_work (ffffffffa0084c31) => worker_thread (ffffffffa008592b) => kthread (ffffffffa008bee2) => ret_from_fork (ffffffffa071b67c) Fixes: dd41d33f0b03 ("openvswitch: Add force commit.") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Openvswitch: Refactor sample and recirc actions implementationAndy Zhou2017-04-191-79/+93
| | | | | | | | | | | | | | | | Upstream commit: Openvswitch: Refactor sample and recirc actions implementation Added clone_execute() that both the sample and the recirc action implementation can use. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: bef7f7567a10 ("Openvswitch: Refactor sample and recirc actions implementation") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: openvswitch: Optimize sample action for the clone use casesAndy Zhou2017-04-195-98/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: openvswitch: Optimize sample action for the clone use cases With the introduction of open flow 'clone' action, the OVS user space can now translate the 'clone' action into kernel datapath 'sample' action, with 100% probability, to ensure that the clone semantics, which is that the packet seen by the clone action is the same as the packet seen by the action after clone, is faithfully carried out in the datapath. While the sample action in the datpath has the matching semantics, its implementation is only optimized for its original use. Specifically, there are two limitation: First, there is a 3 level of nesting restriction, enforced at the flow downloading time. This limit turns out to be too restrictive for the 'clone' use case. Second, the implementation avoid recursive call only if the sample action list has a single userspace action. The main optimization implemented in this series removes the static nesting limit check, instead, implement the run time recursion limit check, and recursion avoidance similar to that of the 'recirc' action. This optimization solve both #1 and #2 issues above. One related optimization attempts to avoid copying flow key as long as the actions enclosed does not change the flow key. The detection is performed only once at the flow downloading time. Another related optimization is to rewrite the action list at flow downloading time in order to save the fast path from parsing the sample action list in its original form repeatedly. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 798c166173ff ("openvswitch: Optimize sample action for the clone use cases") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: openvswitch: Refactor recirc key allocation.Andy Zhou2017-04-191-26/+40
| | | | | | | | | | | | | | | | | Upstream commit: openvswitch: Refactor recirc key allocation. The logic of allocating and copy key for each 'exec_actions_level' was specific to execute_recirc(). However, future patches will reuse as well. Refactor the logic into its own function clone_key(). Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 4572ef52a00b ("openvswitch: Refactor recirc key allocation.") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: openvswitch: Deferred fifo API change.Andy Zhou2017-04-191-7/+11
| | | | | | | | | | | | | | | | | | | | | Upstream commit: openvswitch: Deferred fifo API change. add_deferred_actions() API currently requires actions to be passed in as a fully encoded netlink message. So far both 'sample' and 'recirc' actions happens to carry actions as fully encoded netlink messages. However, this requirement is more restrictive than necessary, future patch will need to pass in action lists that are not fully encoded by themselves. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 47c697aa2d07 ("openvswitch: Deferred fifo API change.") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PADKris Murphy2017-04-191-0/+2
| | | | | | | | | | | | | | | | | | | | | openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PAD Added a case for OVS_TUNNEL_KEY_ATTR_PAD to the switch statement in ip_tun_from_nlattr in order to prevent the default case returning an error. Fixes: b46f6ded906e ("libnl: nla_put_be64(): align on a 64-bit area") Signed-off-by: Kris Murphy <kriskend@linux.vnet.ibm.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 8f3dbfd79ed9("openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PAD") Fixes: f34648187b03 ("datapath: backport: libnl: nla_put_be64(): align on a 64-bit area") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: net/openvswitch: Set the ipv6 source tunnel key address attribute ↵Or Gerlitz2017-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | correctly Upstream commit: net/openvswitch: Set the ipv6 source tunnel key address attribute correctly When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 3d20f1f7bd575 ("net/openvswitch: Set the ipv6 source tunnel key address attribute correctly") Fixes: 8a2d4905a00f ("datapath: Add support for IPv6 tunnels.") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: actions: fixed a brace coding style warning.Peter Downs2017-04-191-2/+1
| | | | | | | | | | | | | | Upstream commit: openvswitch: actions: fixed a brace coding style warning Fixed a brace coding style warning reported by checkpatch.pl Signed-off-by: Peter Downs <padowns@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: f1304f7ba398 ("openvswitch: actions: fixed a brace coding style warning") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>
* compat: ipv6: orphan skbs in reassembly unit.Eric Dumazet2017-04-193-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: ipv6: orphan skbs in reassembly unit Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer <joe@ovn.org> ================================================================== BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr ffff880062da0060 by task a.out/4140 page:ffffea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013 raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79 RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003 RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003 The buggy address belongs to the object at ffff880062da0000 which belongs to the cache RAWv6 of size 1504 The buggy address ffff880062da0060 is located 96 bytes inside of 1504-byte region [ffff880062da0000, ffff880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 ____fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 slab_post_alloc_hook mm/slab.h:432 slab_alloc_node mm/slub.c:2708 slab_alloc mm/slub.c:2716 kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721 sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334 sk_alloc+0x105/0x1010 net/core/sock.c:1396 inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183 __sock_create+0x4f6/0x880 net/socket.c:1199 sock_create net/socket.c:1239 SYSC_socket net/socket.c:1269 SyS_socket+0xf9/0x230 net/socket.c:1249 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Memory state around the buggy address: ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> This patch is a bugfix, and will be progressively backported to earlier kernels. If it is backported to any kernel 4.5 through 4.10, then users use that updated kernel with the OVS kernel module prior to this patch, it could cause a crash. The compat code here resolves such issues. Upstream: 48cac18ecf1d ("ipv6: orphan skbs in reassembly unit") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* datapath: Pack struct sw_flow_key.Jarno Rajahalme2017-04-194-34/+39
| | | | | | | | | | | | | | | | | | | Upstream commit: openvswitch: Pack struct sw_flow_key. struct sw_flow_key has two 16-bit holes. Move the most matched conntrack match fields there. In some typical cases this reduces the size of the key that needs to be hashed into half and into one cache line. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Upstream: 316d4d78cf9b ("openvswitch: Pack struct sw_flow_key.") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>
* datapath: Always define NF_CT_LABELS_MAX_SIZEAndy Zhou2017-04-191-3/+3
| | | | | | | | | When CONFIG_NF_CONNTRACK_LABLES is not set, upstream code still make use of NF_CT_LABLES_MAX_SIZE. Always define it in the compat code to keep back ports close to the upstream. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>