summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Set release dates for 2.9.0.v2.9.0Justin Pettit2018-02-192-2/+2
| | | | Signed-off-by: Justin Pettit <jpettit@ovn.org>
* ofp-meter: Fix use-after-free for decoding meter mods.Ben Pfaff2018-02-161-1/+1
| | | | | | | | | | ofputil_pull_bands() may change bands->data. Found by libfuzzer-ngram. Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun<pkusunyifeng@gmail.com>
* datapath: Remove padding from packet before L3+ conntrack processingEd Swierk2018-02-161-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 9382fe71c0058465e942a633869629929102843d Author: Ed Swierk <eswierk@skyportsystems.com> Date: Wed Jan 31 18:48:02 2018 -0800 openvswitch: Remove padding from packet before L3+ conntrack processing IPv4 and IPv6 packets may arrive with lower-layer padding that is not included in the L3 length. For example, a short IPv4 packet may have up to 6 bytes of padding following the IP payload when received on an Ethernet device with a minimum packet length of 64 bytes. Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(), and help() in nf_conntrack_ftp) assume skb->len reflects the length of the L3 header and payload, rather than referring back to ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by lower-layer padding. In the normal IPv4 receive path, ip_rcv() trims the packet to ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly in the br_netfilter receive path, br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3 length before invoking netfilter hooks. Currently in the OVS conntrack receive path, ovs_ct_execute() pulls the skb to the L3 header but does not trim it to the L3 length before calling nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp encounters a packet with lower-layer padding, nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log message. While extra zero bytes don't affect the checksum, the length in the IP pseudoheader does. That length is based on skb->len, and without trimming, it doesn't match the length the sender used when computing the checksum. In ovs_ct_execute(), trim the skb to the L3 length before higher-layer processing. Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ed Swierk <eswierk@skyportsystems.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: Fix pop_vlan action for double tagged framesEric Garver2018-02-161-3/+12
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c48e74736fccf25fb32bb015426359e1c2016e3b Author: Eric Garver <e@erig.me> Date: Wed Dec 20 15:09:22 2017 -0500 openvswitch: Fix pop_vlan action for double tagged frames skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Eric Garver <e@erig.me> Fixes: a27c454ee0 ("datapath: add processing of L3 packets") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: do not propagate headroom updates to internal portpaolo abeni2018-02-161-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 183dea5818315c0a172d21ecbcd2554894bf01e3 Author: Paolo Abeni <pabeni@redhat.com> Date: Thu Nov 30 15:35:33 2017 +0100 openvswitch: do not propagate headroom updates to internal port After commit 3a927bc7cf9d ("ovs: propagate per dp max headroom to all vports") the need_headroom for the internal vport is updated accordingly to the max needed headroom in its datapath. That avoids the pskb_expand_head() costs when sending/forwarding packets towards tunnel devices, at least for some scenarios. We still require such copy when using the ovs-preferred configuration for vxlan tunnels: br_int / \ tap vxlan (remote_ip:X) br_phy \ NIC where the route towards the IP 'X' is via 'br_phy'. When forwarding traffic from the tap towards the vxlan device, we will call pskb_expand_head() in vxlan_build_skb() because br-phy->needed_headroom is equal to tun->needed_headroom. With this change we avoid updating the internal vport needed_headroom, so that in the above scenario no head copy is needed, giving 5% performance improvement in UDP throughput test. As a trade-off, packets sent from the internal port towards a tunnel device will now experience the head copy overhead. The rationale is that the latter use-case is less relevant performance-wise. Signed-off-by: paolo abeni <pabeni@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: paolo abeni <pabeni@redhat.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* ovn-controller: Fix crash when sending GARP when openflow disconnection.Guoshuai Li2018-02-152-18/+33
| | | | | | | | | | | | | | This is call stack: Program received signal SIGABRT, Aborted. 1 0x00007ffff6a4f8e8 in __GI_abort () at abort.c:90 2 0x00000000004765d6 in ofputil_protocol_to_ofp_version (protocol=<optimized out>) at lib/ofp-util.c:769 3 0x000000000047c19e in ofputil_encode_packet_out (po=po@entry=0x7fffffffa0e0, protocol=<optimized out>) at lib/ofp-util.c:7060 4 0x0000000000410870 in send_garp (garp=0x83cfe0, current_time=current_time@entry=1200375400) at ovn/controller/pinctrl.c:1738 5 0x000000000041430f in send_garp_run (active_tunnels=<optimized out>, local_datapaths=0x7fffffffc0a0, chassis_index=<optimized out>, chassis=0x8194d0, br_int=<optimized out>, ctx=0x7fffffffc080) at ovn/controller/pinctrl.c:2069 Signed-off-by: Guoshuai Li <ligs@dtdream.com> Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-ipfix: Fix an issue in flow key partBenli Ye2018-02-151-3/+21
| | | | | | | | | | As struct ipfix_data_record_flow_key_iface didn't calculate its length in flow key part, it may cause problem when flow key part length is not enough. Use MAX_IF_LEN and MAX_IF_DESCR to pre-allocate memory for ipfix_data_record_flow_key_iface. Signed-off-by: Daniel Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovsdb-tool: Indicate "db" and "schema" are optional in man page.Justin Pettit2018-02-141-13/+18
| | | | | Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Reintroduce shared mempools.Ian Stokes2018-02-131-108/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit manually reverts the current per port mempool model to the previous shared mempool model for DPDK ports. OVS previously used a shared mempool model for ports with the same MTU configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model has a number of issues including: 1. Requires an increase in memory resource requirements to support the same number of ports as the shared port model. 2. Incorrect algorithm for mbuf provisioning for each mempool. These are considered blocking factors for current deployments of OVS when upgrading to OVS 2.9 as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. For clarity, the commits whose changes are removed include the following: netdev-dpdk: Create separate memory pool for each port: d555d9b netdev-dpdk: fix management of pre-existing mempools: b6b26021d Fix mempool names to reflect socket id: f06546a netdev-dpdk: skip init for existing mempools: 837c176 netdev-dpdk: manage failure in mempool name creation: 65056fd netdev-dpdk: Reword mp_size as n_mbufs: ad9b5b9 netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free: a08a115 netdev-dpdk: Fix mp_name leak on snprintf failure: ec6edc8 netdev-dpdk: Fix dpdk_mp leak in case of EEXIST: 173ef76 netdev-dpdk: Factor out struct dpdk_mp: 24e78f9 netdev-dpdk: Remove unused MAX_NB_MBUF: bc57ed9 netdev-dpdk: Fix mempool creation with large MTU: af5b0da Due to the number of commits and period of time they were introduced over, a simple revert was not possible. All code from the commits above is removed and the shared mempool code reintroduced as it was before its replacement. Code introduced by commit netdev-dpdk: Add debug appctl to get mempool information: be48173 has been modified to work with the shared mempool model. Cc: Antonio Fischetti <antonio.fischetti@gmail.com> Cc: Ilya Maximets <i.maximets@samsung.com> Cc: Kevin Traynor <ktraynor@redhat.com> Cc: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com>
* docs: Update supported DPDK versions.Ian Stokes2018-02-131-1/+1
| | | | | | | | Update the OVS to DPDK release table to use the latest stable DPDK 16.11.4 for OVS 2.7. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
* datapath: use ktime_get_ts64() instead of ktime_get_ts()Arnd Bergmann2018-02-124-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 311af51dcb5629f04976a8e451673f77e3301041 Author: Arnd Bergmann <arnd@arndb.de> Date: Mon Nov 27 12:41:38 2017 +0100 openvswitch: use ktime_get_ts64() instead of ktime_get_ts() timespec is deprecated because of the y2038 overflow, so let's convert this one to ktime_get_ts64(). The code is already safe even on 32-bit architectures, since it uses monotonic times. On 64-bit architectures, nothing changes, while on 32-bit architectures this avoids one type conversion. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Additional compatability check for ktime_get_ts64() exists or not. If not, then just continue using ktime_get_ts(). I added a new compatability header file "timekeeping.h". Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: fix the incorrect flow action alloc sizezhangliping2018-02-121-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 67c8d22a73128ff910e2287567132530abcf5b71 Author: zhangliping <zhangliping02@baidu.com> Date: Sat Nov 25 22:02:12 2017 +0800 openvswitch: fix the incorrect flow action alloc size If we want to add a datapath flow, which has more than 500 vxlan outputs' action, we will get the following error reports: openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Actions may not be safe on all matching packets ... ... It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but this is not the root cause. For example, for a vxlan output action, we need about 60 bytes for the nlattr, but after it is converted to the flow action, it only occupies 24 bytes. This means that we can still support more than 1000 vxlan output actions for a single datapath flow under the the current 32k max limitation. So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we shouldn't report EINVAL and keep it move on, as the judgement can be done by the reserve_sfa_size. Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: zhangliping <zhangliping02@baidu.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: fix data type in queue_gso_packetsGustavo A. R. Silva2018-02-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 2734166e89639c973c6e125ac8bcfc2d9db72b70 Author: Gustavo A. R. Silva <garsilva@embeddedor.com> Date: Sat Nov 25 13:14:40 2017 -0600 net: openvswitch: datapath: fix data type in queue_gso_packets gso_type is being used in binary AND operations together with SKB_GSO_UDP. The issue is that variable gso_type is of type unsigned short and SKB_GSO_UDP expands to more than 16 bits: SKB_GSO_UDP = 1 << 16 this makes any binary AND operation between gso_type and SKB_GSO_UDP to be always zero, hence making some code unreachable and likely causing undesired behavior. Fix this by changing the data type of variable gso_type to unsigned int. Addresses-Coverity-ID: 1462223 Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> While backporting this I found another couple of instances of the same issue so I fixed them up as well. Cc: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: Fix an error handling path in 'ovs_nla_init_match_and_action()Christophe JAILLET2018-02-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 5829e62ac17a40ab08c1b905565604a4b5fa7af6 Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Date: Mon Sep 11 21:56:20 2017 +0200 openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()' All other error handling paths in this function go through the 'error' label. This one should do the same. Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Fixes: 850c2a4d1a ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Fix compiler headersGreg Rose2018-02-122-0/+5
| | | | | | | | | | Since Linux kernel upstream commit d15155824c50 ("linux/compiler.h: Split into compiler.h and compiler_types.h") this error check for the gcc compiler header is no longer valid. Remove so that openvswitch builds for linux kernels 4.14.8 and since. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: Fix SKB_GSO_UDP usageGreg Rose2018-02-122-4/+16
| | | | | | | | Using SKB_GSO_UDP breaks the compilation on Linux 4.14. Check for the HAVE_SKB_GSO_UDP compiler #define. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: conntrack: make protocol tracker pointers constFlorian Westphal2018-02-121-2/+2
| | | | | | | | | | | | | | | | | | | Upstream commit: commit b3480fe059ac9121b5714205b4ddae14b59ef4be Author: Florian Westphal <fw@strlen.de> Date: Sat Aug 12 00:57:08 2017 +0200 netfilter: conntrack: make protocol tracker pointers const Doesn't change generated code, but will make it easier to eventually make the actual trackers themselvers const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat:inet_frag.h: Check for frag_percpu_counter_batchGreg Rose2018-02-121-0/+14
| | | | | | | | | Fix up the compat layer to check for frag_percpu_counter_batch and if not present then use atomic_sub and atomic_add as per the backport in the 3.16.50 LTS kernel. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* compat: Do not include headers when not compilingGreg Rose2018-02-122-2/+2
| | | | | | | | If the entire file is not going to be compiled because OVS is using upstream tunnel support then also don't bother pulling in the headers. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* datapath: Fix netdev_master_upper_dev_link for 4.14Greg Rose2018-02-123-5/+31
| | | | | | | | | An extended netlink ack has been added for 4.14 - add compat layer changes so that it compiles for all kernels up to and including 4.14. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* ovn: Allow DNS lookups over IPv6Mark Michelson2018-02-092-5/+63
| | | | | | | | | | | | | | | | | | | There was a bug in DNS request handling where the incoming packet was assumed to be IPv4. The result was that for the outgoing packet, we would attempt to write the IPv4 checksum and total length into what was actually an IPv6 header. This resulted in the source IPv6 address getting corrupted. Later, the source and destination IPv6 addresses would get swapped, resulting in the DNS response being sent to a nonsense destination. With this change, we check the ethertype of the packet to determine what l3 information to write, and where to write it. A test is also included that verifies that this works as expected. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1539608 Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* datapath: enable NSH supportYi Yang2018-02-0811-5/+684
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit b2d0f5d5dc53532e6f07bc546a476a55ebdfe0f3 Author: Yi Yang <yi.y.yang@intel.com> Date: Tue Nov 7 21:07:02 2017 +0800 openvswitch: enable NSH support OVS master and 2.8 branch has merged NSH userspace patch series, this patch is to enable NSH support in kernel data path in order that OVS can support NSH in compat mode by porting this. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Garver <e@erig.me> Acked-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* datapath: nsh: add GSO supportYi Yang2018-02-083-1/+10
| | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit c411ed854584a71b0e86ac3019b60e4789d88086 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Aug 28 21:43:24 2017 +0200 nsh: add GSO support Add a new nsh/ directory. It currently holds only GSO functions but more will come: in particular, code shared by openvswitch and tc to manipulate NSH headers. For now, assume there's no hardware support for NSH segmentation. We can always introduce netdev->nsh_features later. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* datapath: net: add NSH header structures and helpersYi Yang2018-02-082-0/+308
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit 1f0b7744c50573df464ca33d8e5275be509f852b Author: Yi Yang <yi.y.yang@intel.com> Date: Mon Aug 28 21:43:23 2017 +0200 net: add NSH header structures and helpers NSH (Network Service Header)[1] is a new protocol for service function chaining, it can be handled as a L3 protocol like IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH + Inner packet are two typical use cases. This patch adds NSH header structures and helpers for NSH GSO support and Open vSwitch NSH support. [1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/ [Jiri: added nsh_hdr() helper and renamed the header struct to "struct nshhdr" to match the usual pattern. Removed packet type defines, these are now shared with VXLAN-GPE.] Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* datapath: vxlan: factor out VXLAN-GPE next protocolYi Yang2018-02-084-31/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit: commit fa20e0e32cb3dfc1760b6254b64977f2fb5bd851 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Aug 28 21:43:22 2017 +0200 vxlan: factor out VXLAN-GPE next protocol The values are shared between VXLAN-GPE and NSH. Originally probably by coincidence but I notified both working groups about this last year and they seem to keep the values in sync since then. Hopefully they'll get a single IANA registry for the values, too. (I asked them for that.) Factor out the code to be shared by the NSH implementation. NSH and MPLS values are added in this patch, too. For MPLS, the drafts incorrectly assign only a single value, while we have two MPLS ethertypes. I raised the problem with both groups. For now, I assume the value is for unicast. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* datapath: ether: add NSH ethertypeYi Yang2018-02-081-0/+4
| | | | | | | | | | | | | | | | | | | | Upstream commit: commit 155e6f649757c902901e599c268f8b575ddac1f8 Author: Jiri Benc <jbenc@redhat.com> Date: Mon Aug 28 21:43:21 2017 +0200 ether: add NSH ethertype The NSH draft says: An IEEE EtherType, 0x894F, has been allocated for NSH. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* netdev-linux: Report netdev change events when mac changed.Tonghao Zhang2018-02-055-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | When mac addr of ports on bridge has been changed, for example, $ ip link set dev eth0 address 00:11:22:33:44:55 we should reconfigure the datapath id and mac addr of local port. But now openvswitch dont do that as expected. A simple example of how to reproduce it: $ ovs-vsctl add-br br0 $ ifconfig br0 # for example, mac is c6:c6:d7:46:b4:4b $ ip link set dev br0 address 00:11:22:33:44:55 $ ifconfig br0 # mac of br0 will be 00:11:22:33:44:55 then repeat: $ ip link set dev br0 address 00:11:22:33:44:55 $ ifconfig br0 # mac of br0 will be c6:c6:d7:46:b4:4b This patch reports the mac changed event when ports changed, then openvswitch will reconfigure the datapath id and mac addr of local port. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Makefile.am: Use correct path separator for WindowsShashank Ram2018-02-051-2/+2
| | | | | Signed-off-by: Shashank Ram <rams@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Merge branch 'dpdk_merge_2_9' of https://github.com/istokes/ovs into HEADBen Pfaff2018-02-015-0/+105
|\
| * netdev-dpdk: Add support for vHost dequeue zero copy (experimental)Ciara Loftus2018-01-315-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zero copy is disabled by default. To enable it, set the 'dq-zero-copy' option to 'true' when configuring the Interface: ovs-vsctl set Interface dpdkvhostuserclient0 options:vhost-server-path=/tmp/dpdkvhostuserclient0 options:dq-zero-copy=true When packets from a vHost device with zero copy enabled are destined for a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. Due to this requirement, the feature is considered 'experimental'. Testing of the patch showed a ~8% improvement when switching 512B packets between vHost devices on different VMs on the same host when zero copy was enabled on the transmitting device. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* | xlate: fix packets loopback caused by duplicate read of xcfgp.Huanle Han2018-02-011-29/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Some functions, such as xlate_normal_mcast_send_mrouters, test xbundle pointers equality to avoid sending packet back to in bundle. However, xbundle pointers port from different xcfgp for same port are inequal. This may lead to the packet loopback. This commit stores xcfgp on ctx at first and always uses the same xcfgp during one packet process period. Signed-off-by: Huanle Han <hanxueluo@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* | ovn-nbctl: update manpage for lsp-set-type.Han Zhou2018-02-011-1/+43
|/ | | | | Signed-off-by: Han Zhou <zhouhan@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Fix xstats leak on port destruction.Ilya Maximets2018-01-261-1/+4
| | | | | | | CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().Ilya Maximets2018-01-261-0/+2
| | | | | | | CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().Ilya Maximets2018-01-261-0/+2
| | | | | | | CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* vswitchd: show DPDK versionMatteo Croce2018-01-265-1/+16
| | | | | | | | | | Show DPDK version if Open vSwitch is compiled with DPDK support. Version can be retrieved with `ovs-vswitchd --version` or from OVS logs. Small change in ovs-ctl to avoid breakage on output change. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: fix port addition for ports sharing same PCI idYuanhan Liu2018-01-262-15/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some NICs have only one PCI address associated with multiple ports. This patch extends the dpdk-devargs option's format to cater for such devices. To achieve that, this patch uses a new syntax that will be adapted and implemented in future DPDK release (likely, v18.05): http://dpdk.org/ml/archives/dev/2017-December/084234.html And since it's the DPDK duty to parse the (complete and full) syntax and this patch is more likely to serve as an intermediate workaround, here I take a simpler and shorter syntax from it (note it's allowed to have only one category being provided): class=eth,mac=00:11:22:33:44:55:66 Also, old compatibility is kept. Users can still go on with using the PCI id to add a port (if that's enough for them). Meaning, this patch will not break anything. This patch is basically based on the one from Ciara: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html Cc: Loftus Ciara <ciara.loftus@intel.com> Cc: Thomas Monjalon <thomas@monjalon.net> Cc: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Fix requested MTU size validation.Ian Stokes2018-01-262-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu) in netdev_dpdk_set_mtu(), in order to determine if the total length of the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN. When setting an MTU we first check if the requested total frame length (which includes associated L2 overhead) will exceed the maximum frame length supported in netdev_dpdk_set_mtu(). The frame length is calculated by MTU_TO_FRAME_LEN as MTU + ETHER_HEADER + ETHER_CRC. The MTU for the device will be set at a later stage in dpdk_eth_dev_init() using rte_eth_dev_set_mtu(mtu). However when using rte_eth_dev_set_mtu(mtu) the calculation used to check that the frame does not exceed the max frame length for that device varies between DPDK device drivers. For example ixgbe driver calculates the frame length for a given MTU as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN i40e driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2 em driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE Currently it is possible to set an MTU for a netdev_dpdk device that exceeds the upper limit MTU for that devices DPDK driver. This leads to a segfault. This is because the frame length comparison as is, does not take into account the addition of the vlan tag overhead expected in the drivers. The netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent dpdk_eth_dev_init() will fail before the queues have been created for the DPDK device. This coupled with assumptions regarding reconfiguration requirements for the netdev will lead to a segfault when the rxq is polled for this device. A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when validating a requested MTU in netdev_dpdk_set_mtu(). MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following: mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN) By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS now takes into account the maximum L2 overhead that a DPDK driver could allow for in its frame size calculation. This allows OVS to flag an error rather than the DPDK driver if the frame length exceeds the max DPDK frame length. OVS can fail gracefully at this point and use the default MTU of 1500 to continue to configure the port. Note: this fix is a work around, a better approach would be if DPDK devices could report the maximum MTU value that can be requested on a per device basis. This capability however is not currently available. A downside of this patch is that the MTU upper limit will be reduced by 8 bytes for DPDK devices that do not need to account for vlan tags in the frame length driver calculations e.g. ixgbe devices upper MTU limit is reduced from the OVS point of view from 9710 to 9702. CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
* ofproto: Fix double-unref of temporary rule when learning.Ben Pfaff2018-01-262-10/+8
| | | | | | | | | | When ofproto_flow_mod_init() accepts a rule, it takes ownership of it and either unrefs it on error or transfers ownership to the struct it initializes on success, but ofproto_flow_mod_init_for_learn() was unref-ing it a second time if it reported an error. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* openvswitch/types.h: Drop the member name in initializer macroShashank Ram2018-01-251-3/+2
| | | | | | | | | | | | | | | | | | | MSVC++ compiler does not allow initializing a struct while explicitly initializing a member in the struct. Not allowed: static const struct eth_addr a = {{ .ea= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }}}; Alowed: static const struct eth_addr b = {{{ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }}}; *An extra curly brace is required for GCC in case the struct contains a union. Signed-off-by: Shashank Ram <rams@vmware.com> Tested-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* gre: strip gre-tso offload flagswenxu2018-01-251-0/+2
| | | | | | | | | | | | if the gro enable, ipgre receive a gre-tso package. After pop the gre-tunnel the encapsulation and GSO_ENCAP flags should be striped. or the packet encap again and will be dropped in ovs_iptunnel_handle_offloads Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
* ovn: OVN Support QoS meterGuoshuai Li2018-01-2416-76/+418
| | | | | | | | | | | | | | | | | This feature is used to limit the bandwidth of flows, such as floating IP. ovn-northd changes: 1. add bandwidth column in NB's QOS table. 2. add QOS_METER stages in Logical switch ingress/egress. 3. add set_meter() action in SB's LFlow table. ovn-controller changes: add meter_table for meter action process openflow meter table. Now, This feature is only supported in DPDK. Signed-off-by: Guoshuai Li <ligs@dtdream.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-controller: Add extend_table instead of group_table to expand meter.Guoshuai Li2018-01-2411-212/+321
| | | | | | | | | | The structure and function of the group table and meter table are similar, refactoring code is used to extend for add the meter table. The following function as lib: table init/destroy/clear/lookup/remove, assign id for contents, Move the contents of desired to existing. Signed-off-by: Guoshuai Li <ligs@dtdream.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Revert "compat:inet_frag.h: Check for frag_percpu_counter_batch"Ben Pfaff2018-01-241-14/+0
| | | | | | | | This reverts commit 822afef74f5e65af0cdc3916249ce85a70ae7b83. Requested-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343674.html Requested-by: Gregory Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: do not send packets to down tap ifaces.Flavio Leitner2018-01-243-1/+30
| | | | | | | | | | | | | | | | | | | | | | Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue. However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). This patch improves the situation by checking the TAP's interface state before issueing any syscall. However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tc flower: reorder tunnel encap/decap actionsJohn Hurley2018-01-241-5/+5
| | | | | | | | | | | The tc_flower conversion struct does not consider the order of actions. If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel, the netlink conversion to TC will add the set tunnel key action before the unset, leading to an incorrect TC rule. This patch reorders the netlink generation to ensure a decap is done before an encap if both exist. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* docs: Fix formatting in fedora.rstYi-Hung Wei2018-01-231-3/+3
| | | | | | | | Fix rst formatting in fedora.rst so that the commands look correctly on the web. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* LACP: Check active partner sys idRóbert Mulik2018-01-231-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | A reboot of one switch in an MC-LAG bond makes all bond links to go down, causing a total connectivity loss for 3 seconds. Packet capture shows that spurious LACP PDUs are sent to OVS with a different MAC address (partner system id) during the final stages of the MC-LAG switch reboot. The current implementation doesn't care about the partner sys_id (MAC address). The code change based on the following: - If an interface (lead interface) on a bond has an "attached" LACP connection, then any other slaves on that bond is allowed to become active only when its partner's sys_id is the same as the partner's sys_id of the lead interface. - So, when a slave interface of a bond becomes "current" (it gets valid LACP information), first checks if there is already an active interface on the bond. - If there is a lead, the slave checks for the partner sys_ids, and becomes active only when they are the same, otherwise it remains in "current" state, but "detached". - If there is no lead, it follows the old way, and accepts any partner sys_id. Signed-off-by: Robert Mulik <robert.mulik@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* compat:inet_frag.h: Check for frag_percpu_counter_batchGreg Rose2018-01-231-0/+14
| | | | | | | | | | Fix up the compat layer to check for frag_percpu_counter_batch and if not present then use atomic_sub and atomic_add as per the backport in the 3.16.50 LTS kernel. Fixes compile errors on 3.16 series kernels from 3.16.50 on. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* tests: Fix non-canonical MAC addresses in ovn.at.Leonid Ryzhyk2018-01-231-2/+2
| | | | Signed-off-by: Ben Pfaff <blp@ovn.org>