summaryrefslogtreecommitdiff
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* cmap: Fix example provided for CMAP_FOR_EACH.Justin Pettit2018-02-281-4/+3
| | | | | Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* Refer to database manpages in *ctl manpagesMark Michelson2018-02-262-11/+72
| | | | | | | | | | | | | | The ovn-nbctl, ovn-sbctl, and ovs-vsctl manpages are inconsistent in their "Database Commands" section when it comes to referring to what database tables exist. This commit amends this by making each *ctl manpage reference the corresponding database manpage instead. To aid in having a more handy list, the --help text of ovn-nbctl, ovn-sbctl, and ovs-vsctl have been modified to list the available tables. This is also referenced in the manpages for those applications. Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vlog: fix the incorrect zero padding in format_log_messagezhangliping2018-02-261-1/+1
| | | | | | | | | | If the format specifier does not have the 0 flag, we should pad with blanks instead of zeroes. Signed-off-by: zhangliping <zhangliping02@baidu.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com> Tested-by: Mark Michelson <mmichels@redhat.com>
* cmap: Fix bug in CMAP_FOR_EACH_WITH_HASH_PROTECTED.zhangliping2018-02-261-1/+1
| | | | | | | | | | | cmap_find_locked() should be cmap_find_protected(). This does not fix a user-visible bug because this macro did not have any users. Signed-off-by: zhangliping <zhangliping02@baidu.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com>
* ofp-parse: Include missing ofp-actions.h.Ilya Maximets2018-02-211-0/+1
| | | | | | | | | | | | | | | This fixes MacOS build: lib/ofp-parse.c:167:16: error: use of undeclared identifier 'IPPORT_FTP' lib/ofp-parse.c:171:16: error: use of undeclared identifier 'IPPORT_TFTP' CC: Ben Pfaff <blp@ovn.org> Fixes: 0d71302e36c4 ("ofp-util, ofp-parse: Break up into many separate modules.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovsdb-idlc: Implement synthetic columns.Ben Pfaff2018-02-162-3/+12
| | | | | | | | A synthetic column is one that is not present in the actual database but instead calculated by code in the client based on columns in the row. This can be useful to avoid repeatedly calculating the same function of a row. Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofp-meter: Fix use-after-free for decoding meter mods.Ben Pfaff2018-02-161-1/+1
| | | | | | | | | | ofputil_pull_bands() may change bands->data. Found by libfuzzer-ngram. Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun<pkusunyifeng@gmail.com>
* conntrack: Support conntrack flush by ct 5-tupleYi-Hung Wei2018-02-143-1/+76
| | | | | | | | | This patch adds support of flushing a conntrack entry specified by the conntrack 5-tuple in dpif-netdev. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Darrell Ball <dlu998@gmail.com>
* ofp-flow: Fix return value for ofputil_decode_flow_stats_reply().Ben Pfaff2018-02-131-16/+20
| | | | | | | | | This function returned errno values for some errors and OFPERR_* values for others. The callers all expected OFPERR_* values. This fixes the problem. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* Implement OF1.3 extension for OF1.4 role status feature.Ben Pfaff2018-02-131-17/+16
| | | | | | | | | | ONF extension pack 1 for OpenFlow 1.3 defines how to implement the OpenFlow 1.4 "role status" message in OpenFlow 1.3. This commit implements that feature. ONF-JIRA: EXT-191 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* ofp-util, ofp-parse: Break up into many separate modules.Ben Pfaff2018-02-1340-13048/+13495
| | | | | | | | | | | | ofp-util had been far too large and monolithic for a long time. This commit breaks it up into units that make some logical sense. It also moves the pieces of ofp-parse that were specific to each unit into the relevant unit. Most of this commit is just moving code around. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* json: Make it safe to pass null pointers to json_equal().Ben Pfaff2018-02-061-3/+3
| | | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Justin Pettit <jpettit@ovn.org>
* jsonrpc: Add comment for jsonrpc_msg_to_json().Ben Pfaff2018-02-061-0/+3
| | | | | | | From a glance at the prototype it wasn't obvious that it destroyed its argument. Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.Ben Pfaff2018-02-061-0/+5
| | | | | | | | | | OVS datapaths don't understand or parse IGMP fields, but OVS userspace does, so this commit updates odp_flow_key_to_flow() to report that properly to the caller. Reported-by: Huanle Han <hanxueluo@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-upcall: Slow path flows that datapath can't fully match.Ben Pfaff2018-02-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the OVS architecture, when a datapath doesn't have a match for a packet, it sends the packet and the flow that it extracted from it to userspace. Userspace then examines the packet and the flow and compares them. Commonly, the flow is the same as what userspace expects, given the packet, but there are two other possibilities: - The flow lacks one or more fields that userspace expects to be there, that is, the datapath doesn't understand or parse them but userspace does. This is, for example, what would happen if current OVS userspace, which understands and extracts TCP flags, were to be paired with an older OVS kernel module, which does not. Internally OVS uses the name ODP_FIT_TOO_LITTLE for this situation. - The flow includes fields that userspace does not know about, that is, the datapath understands and parses them but userspace does not. This is, for example, what would happen if an old OVS userspace that does not understand or extract TCP flags, were to be paired with a recent OVS kernel module that does. Internally, OVS uses the name ODP_FIT_TOO_MUCH for this situation. The latter is not a big deal and OVS doesn't have to do much to cope with it. The former is more of a problem. When the datapath can't match on all the fields that OVS supports, it means that OVS can't safely install a flow at all, other than one that directs packets to the slow path. Otherwise, if OVS did install a flow, it could match a packet that does not match the flow that OVS intended to match and could cause the wrong behavior. Somehow, this nuance was lost a long time. From about 2013 until today, it seems that OVS has ignored ODP_FIT_TOO_LITTLE. Instead, it happily installs a flow regardless of whether the datapath can actually fully match it. I imagine that this is rarely a problem because most of the time the datapath and userspace are well matched, but it is still an important problem to fix. This commit fixes it, by forcing flows into the slow path when the datapath cannot match specifically enough. CC: Ethan Jackson <ejj@eecs.berkeley.edu> Fixes: e79a6c833e0d ("ofproto: Handle flow installation and eviction in upcall.") Reported-by: Huanle Han <hanxueluo@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html Signed-off-by: Ben Pfaff <blp@ovn.org>
* Remove last mentions of 'facet' from comments.Ben Pfaff2018-02-061-1/+1
| | | | | | How did these survive so long?! OVS hasn't had facets since 2013. Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: Report netdev change events when mac changed.Tonghao Zhang2018-02-055-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | When mac addr of ports on bridge has been changed, for example, $ ip link set dev eth0 address 00:11:22:33:44:55 we should reconfigure the datapath id and mac addr of local port. But now openvswitch dont do that as expected. A simple example of how to reproduce it: $ ovs-vsctl add-br br0 $ ifconfig br0 # for example, mac is c6:c6:d7:46:b4:4b $ ip link set dev br0 address 00:11:22:33:44:55 $ ifconfig br0 # mac of br0 will be 00:11:22:33:44:55 then repeat: $ ip link set dev br0 address 00:11:22:33:44:55 $ ifconfig br0 # mac of br0 will be c6:c6:d7:46:b4:4b This patch reports the mac changed event when ports changed, then openvswitch will reconfigure the datapath id and mac addr of local port. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* util: Use lookup table to optimize hexit_value().Ben Pfaff2018-02-052-29/+16
| | | | | | | | | | | | Daniel Alvarez Sanchez reported a significant overall speedup in ovn-northd due to a similar patch. Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-February/046120.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Daniel Alvarez <dalvarez@redhat.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovs-router: fix router entry castWilliam Tu2018-02-011-5/+1
| | | | | | | | The offsetof(struct ovs_router_entry, cr) should always be 0, thus the else statement should never be reached. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Add unixctl option for ovn-northdVenkata Anil2018-02-012-1/+28
| | | | | Signed-off-by: Venkata Anil <vkommadi@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEADBen Pfaff2018-02-011-0/+18
|\
| * netdev-dpdk: Add support for vHost dequeue zero copy (experimental)Ciara Loftus2018-01-311-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zero copy is disabled by default. To enable it, set the 'dq-zero-copy' option to 'true' when configuring the Interface: ovs-vsctl set Interface dpdkvhostuserclient0 options:vhost-server-path=/tmp/dpdkvhostuserclient0 options:dq-zero-copy=true When packets from a vHost device with zero copy enabled are destined for a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. Due to this requirement, the feature is considered 'experimental'. Testing of the patch showed a ~8% improvement when switching 512B packets between vHost devices on different VMs on the same host when zero copy was enabled on the transmitting device. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* | ovs-vswitchd: Avoid or suppress memory leak warning for glibc aio.Ben Pfaff2018-02-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | The asynchronous IO library in glibc starts threads that show up as memory leaks in valgrind. This commit attempts to avoid the warnings by flushing all the asynchronous I/O to the log file before exiting. This only does part of the job for glibc since it keeps the threads around for some undefined idle time before killing them, so in addition this commit adds a valgrind suppression to stop displaying these warnings in any case. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmai.com>
* | ovs-vswitchd: Fire RCU callbacks before exit to reduce memory leak warnings.Ben Pfaff2018-02-012-2/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ovs-vswitchd makes extensive use of RCU to defer freeing memory past the latest time that it could be in use by a thread. Until now, ovs-vswitchd has not waited for RCU callbacks to fire before exiting. This meant that in many cases, when ovs-vswitchd exits, many blocks of memory are stuck in RCU callback queues, which valgrind often reports as "possible" memory leaks. This commit adds a new function ovsrcu_exit() that waits and fires as many RCU callbacks as it reasonably can. It can only do so for the thread that calls it and the thread that calls the callbacks, but generally speaking ovs-vswitchd shuts down other threads before it exits anyway, so this is pretty good. In my testing this eliminates most valgrind warnings for tests that run ovs-vswitchd. This ought to make it easier to distinguish new leaks that are real from existing non-leaks. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmai.com>
* | util: Document and rely on ovs_assert() always evaluating its argument.Ben Pfaff2018-02-0110-60/+24
| | | | | | | | | | | | | | | | | | | | The ovs_assert() macro always evaluates its argument, even when NDEBUG is defined so that failure is ignored. This behavior wasn't documented, and thus a lot of code didn't rely on it. This commit documents the behavior and simplifies bits of code that heretofore didn't rely on it. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* | Support accepting and displaying table names in OVS tools.Ben Pfaff2018-02-018-146/+328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OpenFlow has little-known support for naming tables. Open vSwitch has supported table names for ages, but it has never used or displayed them outside of commands dedicated to table manipulation. This commit adds support for table names in ovs-ofctl. When a table has a name, it displays that name in flows and actions, so that, for example, the following: table=1, arp, actions=resubmit(,2) might become: table=ingress_acl, arp, actions=resubmit(,mac_learning) given appropriately named tables. For backward compatibility, only interactive ovs-ofctl commands by default display table names; to display them in scripts, use the new --names option. This feature was inspired by a talk that Kei Nohguchi presented at Open vSwitch 2017 Fall Conference. CC: Kei Nohguchi <kei@nohguchi.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Mark Michelson <mmichels@redhat.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* | ofp-util: New data structure for mapping between table names and numbers.Ben Pfaff2018-01-311-41/+109
| | | | | | | | | | | | | | | | | | This shares the infrastructure for mapping port names and numbers. It will be used in an upcoming commit. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Mark Michelson <mmichels@redhat.com>
* | ofp-actions: Make formatting and parsing functions take a struct argument.Ben Pfaff2018-01-313-694/+507
| | | | | | | | | | | | | | | | | | | | An upcoming commit will add another parameter for parsing and formatting actions. It is much easier to add these parameters if they are encapsulated in a struct, so this commit first makes that change. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Mark Michelson <mmichels@redhat.com>
* | classifier: Refactor interface for classifier_remove().Ben Pfaff2018-01-314-24/+28
|/ | | | | | | | | | | | | | Until now, classifier_remove() returned either null or the classifier rule passed to it, which is an unusual interface. This commit changes it to return true if it succeeds or false on failure. In addition, most of classifier_remove()'s callers know ahead of time that it must succeed, even though most of them didn't bother with an assertion, so this commit adds a classifier_remove_assert() function as a helper. Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* classifier: Fix typo in comment.Ben Pfaff2018-01-301-1/+1
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEADBen Pfaff2018-01-274-17/+89
|\
| * netdev-dpdk: Fix xstats leak on port destruction.Ilya Maximets2018-01-261-1/+4
| | | | | | | | | | | | | | CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().Ilya Maximets2018-01-261-0/+2
| | | | | | | | | | | | | | CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().Ilya Maximets2018-01-261-0/+2
| | | | | | | | | | | | | | CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * vswitchd: show DPDK versionMatteo Croce2018-01-263-0/+14
| | | | | | | | | | | | | | | | | | | | Show DPDK version if Open vSwitch is compiled with DPDK support. Version can be retrieved with `ovs-vswitchd --version` or from OVS logs. Small change in ovs-ctl to avoid breakage on output change. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * netdev-dpdk: fix port addition for ports sharing same PCI idYuanhan Liu2018-01-261-15/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some NICs have only one PCI address associated with multiple ports. This patch extends the dpdk-devargs option's format to cater for such devices. To achieve that, this patch uses a new syntax that will be adapted and implemented in future DPDK release (likely, v18.05): http://dpdk.org/ml/archives/dev/2017-December/084234.html And since it's the DPDK duty to parse the (complete and full) syntax and this patch is more likely to serve as an intermediate workaround, here I take a simpler and shorter syntax from it (note it's allowed to have only one category being provided): class=eth,mac=00:11:22:33:44:55:66 Also, old compatibility is kept. Users can still go on with using the PCI id to add a port (if that's enough for them). Meaning, this patch will not break anything. This patch is basically based on the one from Ciara: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html Cc: Loftus Ciara <ciara.loftus@intel.com> Cc: Thomas Monjalon <thomas@monjalon.net> Cc: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * netdev-dpdk: Fix requested MTU size validation.Ian Stokes2018-01-261-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu) in netdev_dpdk_set_mtu(), in order to determine if the total length of the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN. When setting an MTU we first check if the requested total frame length (which includes associated L2 overhead) will exceed the maximum frame length supported in netdev_dpdk_set_mtu(). The frame length is calculated by MTU_TO_FRAME_LEN as MTU + ETHER_HEADER + ETHER_CRC. The MTU for the device will be set at a later stage in dpdk_eth_dev_init() using rte_eth_dev_set_mtu(mtu). However when using rte_eth_dev_set_mtu(mtu) the calculation used to check that the frame does not exceed the max frame length for that device varies between DPDK device drivers. For example ixgbe driver calculates the frame length for a given MTU as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN i40e driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2 em driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE Currently it is possible to set an MTU for a netdev_dpdk device that exceeds the upper limit MTU for that devices DPDK driver. This leads to a segfault. This is because the frame length comparison as is, does not take into account the addition of the vlan tag overhead expected in the drivers. The netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent dpdk_eth_dev_init() will fail before the queues have been created for the DPDK device. This coupled with assumptions regarding reconfiguration requirements for the netdev will lead to a segfault when the rxq is polled for this device. A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when validating a requested MTU in netdev_dpdk_set_mtu(). MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following: mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN) By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS now takes into account the maximum L2 overhead that a DPDK driver could allow for in its frame size calculation. This allows OVS to flag an error rather than the DPDK driver if the frame length exceeds the max DPDK frame length. OVS can fail gracefully at this point and use the default MTU of 1500 to continue to configure the port. Note: this fix is a work around, a better approach would be if DPDK devices could report the maximum MTU value that can be requested on a per device basis. This capability however is not currently available. A downside of this patch is that the MTU upper limit will be reduced by 8 bytes for DPDK devices that do not need to account for vlan tags in the frame length driver calculations e.g. ixgbe devices upper MTU limit is reduced from the OVS point of view from 9710 to 9702. CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
* | flow: Add some L7 payload data to most L4 protocols that accept it.Ben Pfaff2018-01-273-41/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes traffic generated by flow_compose() look slightly more realistic. It requires lots of updates to tests, but at least the tests themselves should be slightly more realistic too. At the same time, add --l7 and --l7-len options to ofproto/trace to allow users to specify the amount or contents of payloads that they want. Suggested-by: Brad Cowie <brad@cowie.nz> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* | flow: Simplify flow_compose_l4().Ben Pfaff2018-01-261-30/+10
|/ | | | | | | | | Each of the cases in flow_compose_l4() separately tracked the number of bytes of L4 data added to the packet. This commit makes the function do that in a single place without per-protocol bookkeeping. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* ovs-atomic: Fix typo in comment.Ben Pfaff2018-01-261-1/+1
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
* tc flower: reorder tunnel encap/decap actionsJohn Hurley2018-01-241-5/+5
| | | | | | | | | | | The tc_flower conversion struct does not consider the order of actions. If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel, the netlink conversion to TC will add the set tunnel key action before the unset, leading to an incorrect TC rule. This patch reorders the netlink generation to ensure a decap is done before an encap if both exist. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* LACP: Check active partner sys idRóbert Mulik2018-01-231-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | A reboot of one switch in an MC-LAG bond makes all bond links to go down, causing a total connectivity loss for 3 seconds. Packet capture shows that spurious LACP PDUs are sent to OVS with a different MAC address (partner system id) during the final stages of the MC-LAG switch reboot. The current implementation doesn't care about the partner sys_id (MAC address). The code change based on the following: - If an interface (lead interface) on a bond has an "attached" LACP connection, then any other slaves on that bond is allowed to become active only when its partner's sys_id is the same as the partner's sys_id of the lead interface. - So, when a slave interface of a bond becomes "current" (it gets valid LACP information), first checks if there is already an active interface on the bond. - If there is a lead, the slave checks for the partner sys_ids, and becomes active only when they are the same, otherwise it remains in "current" state, but "detached". - If there is no lead, it follows the old way, and accepts any partner sys_id. Signed-off-by: Robert Mulik <robert.mulik@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bfd: Send BFD packets with DSCP CS6Venkatesan Pradeep2018-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send BFD packets with TOS value equivalent to DSCP CS6 so that the network can apply the right QoS for those packets. This can help avoid BFD flaps due to network congestion. For a reference on this being the right choice, here is a short declaration: http://www.ciscopress.com/articles/article.asp?p=357102&seqNum=4 A long dissertation: https://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND/QoS-SRND-Book/QoSIntro.html But in a nutshell: Network engineers create various queue/drop policies based upon precedence. Routing protocols are considered high priority/high precedence. During link saturation events, packets will get dropped. By creating an egress policy where packets marked by CS6 are allowed front-of-the-queue status, one can be sure that hello's from the various protocols arrive when they need to, without delay and without loss. On the other hand, if the hellos are dropped as part of normal traffic operations, then traffic routing will flap, leading to further congestion and drops. CS6 is a 'well known' marker to network engineers. In many vendor's gear, it is automatically assigned to routing protocol packets. Since OVS does not perform queuing, and leaves that to the kernel edge operations, the queue policies can be used to ensure timely egress of the BFD packets during high utilization events. See also: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339784.html https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339785.html Thanks to Raymond Burkholder <ray@oneunified.net> for much of the above information. Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: do not send packets to down tap ifaces.Flavio Leitner2018-01-221-0/+16
| | | | | | | | | | | | | | | | | | | | | | Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue. However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). This patch improves the situation by checking the TAP's interface state before issueing any syscall. However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif: geneve: supply dpif function to get ifindexJohn Hurley2018-01-221-1/+1
| | | | | | | | | | | | | Geneve tunnels are not given a netdev_class function to determine their ifindex. This means when ofproto-dpif attempts to add a geneve netdev it fails in 'netdev_ports_insert' in netdev.c. Failure to add this means that further operations like offloading a rule that egresses to a geneve port will be rejected as the egress port cannot be found. This patch applies the same ifindex function to geneve as is used in vxlan. Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif: Add support for OVS_ACTION_ATTR_CT_CLEAREric Garver2018-01-207-0/+30
| | | | | | | | | | | | This supports using the ct_clear action in the kernel datapath. To preserve compatibility with current ct_clear behavior on old kernels, we only pass this action down to the datapath if a probe reveals the datapath actually supports it. Signed-off-by: Eric Garver <e@erig.me> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEADBen Pfaff2018-01-195-325/+686
|\
| * netdev-dpdk: add vhost-user get_status.Flavio Leitner2018-01-171-2/+60
| | | | | | | | | | | | | | | | | | Expose relevant vhost-user information in status. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * dpif-netdev: Add percentage of pmd/core used by each rxq.Kevin Traynor2018-01-171-13/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is based on the length of history that is stored about an rxq (currently 1 min). $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 4: isolated : false port: dpdkphy1 queue-id: 0 pmd usage: 70 % port: dpdkvhost0 queue-id: 0 pmd usage: 0 % pmd thread numa_id 0 core_id 6: isolated : false port: dpdkphy0 queue-id: 0 pmd usage: 64 % port: dpdkvhost1 queue-id: 0 pmd usage: 0 % These values are what would be used as part of rxq to pmd assignment due to a reconfiguration event e.g. adding pmds, adding rxqs or with the command: ovs-appctl dpif-netdev/pmd-rxq-rebalance Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * dpif-netdev: Reset the rxq current cycle counter on reload.Kevin Traynor2018-01-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | An rxq may have processing cycles counted in the current counter when a reload happens. That could temporarily create a small skew on the stats for an rxq. Reset the counter after reload. Fixes: 4809891b2e01 ("dpif-netdev: Count the rxq processing cycles for an rxq.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>