summaryrefslogtreecommitdiff
path: root/vswitchd
Commit message (Collapse)AuthorAgeFilesLines
* netdev-afxdp: add new netdev type for AF_XDP.William Tu2019-07-191-0/+15
| | | | | | | | | | | | | | | | The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket type built upon the eBPF and XDP technology. It is aims to have comparable performance to DPDK but cooperate better with existing kernel's networking stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program attached to the netdev, by-passing a couple of Linux kernel's subsystems As a result, AF_XDP socket shows much better performance than AF_PACKET For more details about AF_XDP, please see linux kernel's Documentation/networking/af_xdp.rst. Note that by default, this feature is not compiled in. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
* netdev-dpdk: Enable tx-retries-max config.Kevin Traynor2019-07-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | vhost tx retries can provide some mitigation against dropped packets due to a temporarily slow guest/limited queue size for an interface, but on the other hand when a system is fully loaded those extra cycles retrying could mean packets are dropped elsewhere. Up to now max vhost tx retries have been hardcoded, which meant no tuning and no way to disable for debugging to see if extra cycles spent retrying resulted in rx drops on some other interface. Add an option to change the max retries, with a value of 0 effectively disabling vhost tx retries. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* tunnel: Add layer 2 IPv6 GRE encapsulation support.William Tu2019-07-031-12/+20
| | | | | | | | | | | | | | | | | The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type= legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets and encap with GRE header with outer IPv6 header. Encapsulation of layer 3 packet over IPv6 GRE, ip6gre, is not supported yet. I tested it by running: # make check-kernel TESTSUITEFLAGS='-k ip6gre' under kernel 5.2 and for userspace: # make check TESTSUITEFLAGS='-k ip6gre' Tested-by: Greg Rose <gvrose8192@gmail.com> Tested-at: https://travis-ci.org/gvrose8192/ovs-experimental/builds/552977116 Reviewed-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Always cleanup userspace datapath.Ilya Maximets2019-07-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'netdev' datapath is implemented within ovs-vswitchd process and can not exist without it, so it should be gracefully terminated with a full cleanup of resources upon ovs-vswitchd exit. This change forces dpif cleanup for 'netdev' datapath regardless of passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to not pass this additional option everytime for userspace datapath installations and also allowes to not terminate system datapath in setups where both datapaths runs at the same time. The main part is that dpif_port_del() will lead to netdev_close() and subsequent netdev_class->destroy(dev) which will stop HW NICs and free their resources. For vhost-user interfaces it will invoke vhost driver unregistering with a properly closed vhost-user connection. For upcoming AF_XDP netdev this will allow to gracefully destroy xdp sockets and unload xdp programs from linux interfaces. Another important thing is that port deletion will also trigger flushing of flows offloaded to HW NICs. Exception made for 'internal' ports that could have user ip/route configuration. These ports will not be removed without '--cleanup'. This change fixes OVS disappearing from the DPDK point of view (keeping HW NICs improperly configured, sudden closing of vhost-user connections) and will help with linux devices clearing with upcoming AF_XDP netdev support. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Separate disable system and route.William Tu2019-06-261-0/+5
| | | | | | | | | | | | | | | | | Previously, '--disable-system' disables both system dp and the system routing table. The patch makes '--disable-system' only disable system dp and adds '--disable-system-route' for disabling the route table. This fixes failures when 'make check-system-userspace' for tunnel cases. As a consequence, hitting errors due to OVS userspace parses the IGMP packet but its datapaths do not, so odp_flow_key_to_flow() return ODP_FIT_TOO_LITTLE. commit c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.") Fix it by filtering out the IGMP-related error message. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OpenFlow: Enable OpenFlow 1.5 by default.Ben Pfaff2019-06-201-10/+3
| | | | | | | | Open vSwitch now supports all OpenFlow 1.5 required features, so enable it by default. Acked-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev: Split up netdev offloading to separate module.Ilya Maximets2019-06-111-0/+1
| | | | | | | | | | | | | | | New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* dpctl: Update docs about dump-flows and HW offloading.Ilya Maximets2019-06-111-0/+5
| | | | | | | | | | | Since introduction of dynamic flow API for netdevs, tricky accesses to uninitialized flow API are no longer possible. So, ovs-dpctl doesn't support dumping HW offloaded flows now. Claim this in docs and man pages. Additionally forbidden 'type' argument for 'ovs-dpctl dump-flows'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>
* ofproto-dpif-xlate: Add "always" mode to priority tagsEli Britstein2019-05-242-3/+7
| | | | | | | | | | | Configure "if-nonzero" priority tags to retain the 802.1Q header when the VLAN ID is zero, except both the VLAN ID and priority are zero. Add a "always" configuration option to retain the 802.1Q header in such frames as well. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Change priority tags from boolean to enumEli Britstein2019-05-242-4/+9
| | | | | | | | | | | | | Priority tags is a port configuration to determine how the port treats priority tags, e.g. zero VLAN ID. Change the type from boolean to enum as a pre-step towards introducing additional modes. The new options are "never", equivalent to previously "false", and "if-nonzero", equivalent to previously "true". "true" is still supported for backwards compatibility. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Post-copy Live Migration support for vhost-user-client.Liliia Butorina2019-05-241-0/+16
| | | | | | | | | | | | | | | | | | | | | | | Post-copy Live Migration for vHost supported since DPDK 18.11 and QEMU 2.12. New global config option 'vhost-postcopy-support' added to control this feature. Ex.: ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true Changing this value requires restarting the daemon. It's safe to enable this knob even if QEMU doesn't support post-copy LM. Feature marked as experimental and disabled by default because it may cause PMD thread hang on destination host on page fault for the time of page downloading from the source. Feature is not compatible with 'mlockall' and 'dequeue zero-copy'. Support added only for vhost-user-client. Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
* vswitchd: Track status of memory locking.Ilya Maximets2019-05-241-0/+2
| | | | | | | | Needed for the future post-copy live migration support for vhost-user ports. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
* ovs-vswitchd: Update limits section in manpage.Ben Pfaff2019-05-101-5/+7
| | | | | | Reported-by: William Konitzer <wkonitzer@mirantis.com> Acked-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge: Propagate patch port pairing errors to db.Ilya Maximets2019-03-261-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Virtual ports like 'patch' ports that almost fully implemented on 'ofproto' layer could have internal to 'ofproto' statuses that could not be retrieved from 'netdev' or other layers. For example, in current implementation there is no way to get the patch port pairing status (i.e. if it has usable peer?). New 'ofproto-provider' API function 'vport_get_status' introduced to cover this gap. It allowes 'bridge' layer to retrive current status of ofproto virtual ports and propagate it to DB. For now we're only interested in pairing errors of 'patch' ports. That are propagated to the 'error' column of the 'Interface' table. Ex.: $ ovs-vsctl show ... Bridge "br1" ... Port "patch1" Interface "patch1" type: patch options: {peer="patch0"} error: "No usable peer 'patch0' exists in 'system' datapath." Bridge "br0" datapath_type: netdev ... Port "patch0" Interface "patch0" type: patch options: {peer="patch1"} error: "No usable peer 'patch1' exists in 'netdev' datapath." Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: netem QoS supportSharon K2019-03-141-0/+28
| | | | | Signed-off-by: Sharon Krendel <thekafkaf@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Allow user to configure controllers as "primary" or "service".Ben Pfaff2019-02-053-59/+69
| | | | | | | | | | | | | | Normally it makes sense for an active connection to be primary and a passive connection to be a service connection, but I've run into a corner case where it is better for a passive connection to be a primary connection. This specific case is for use with OFtest, which expects to be a primary controller. However, it also wants to reconnect frequently, which is slow for active connections because of the backoff; by configuring a passive, primary controller, OFtest can reconnect as frequently and as quickly as it wants, making the overall test much faster. Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Remove support for OpenFlow 1.6 (draft).Ben Pfaff2019-02-052-7/+5
| | | | | | | | | ONF abandoned the OpenFlow specification, so that OpenFlow 1.6 will never be completed. It did not contain much in the way of useful features, so remove what support Open vSwitch already had. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
* connmgr: Make treatment of active and passive connections more uniform.Ben Pfaff2019-02-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Until now, connmgr has handled active and passive OpenFlow connections in quite different ways. Any active connection, whether it was currently connected or not, was always maintained as an ofconn. Whenever such a connection (re)connected, its settings were cleared. On the other hand, passive connections had a separate listener which created an ofconn when a new connection came in, and these ofconns would be deleted when such a connection was closed. This approach is inelegant and has occasionally led to bugs when reconnection didn't clear all of the state that it should have. There's another motivation here. Currently, active connections are always primary controllers and passive connections are always service controllers (as documented in ovs-vswitchd.conf.db(5)). Sometimes it would be useful to have passive primary controllers (maybe active service controllers too but I haven't personally run into that use case). As is, this is difficult to implement because there is so much different code in use between active and passive connections. This commit will make it easier. Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpdk: Limit DPDK memory usage.Ilya Maximets2019-02-011-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | Since 18.05 release, DPDK moved to dynamic memory model in which hugepages could be allocated on demand. At the same time '--socket-mem' option was re-defined as a size of pre-allocated memory, i.e. memory that should be allocated at startup and could not be freed. So, DPDK with a new memory model could allocate more hugepage memory than specified in '--socket-mem' or '-m' options. This change adds new configurable 'other_config:dpdk-socket-limit' which could be used to limit the ammount of memory DPDK could use. It uses new DPDK option '--socket-limit'. Ex.: ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024" Also, in order to preserve old behaviour, if '--socket-limit' is not specified, it will be defaulted to the amount of memory specified by '--socket-mem' option, i.e. OVS will not be able to allocate more. This is needed, for example, to disallow OVS to allocate more memory than reserved for it by Nova in OpenStack installations. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Per-port configurable EMC.Ilya Maximets2019-01-181-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | Conditional EMC insert helps a lot in scenarios with high numbers of parallel flows, but in current implementation this option affects all the threads and ports at once. There are scenarios where we have different number of flows on different ports. For example, if one of the VMs encapsulates traffic using additional headers, it will receive large number of flows but only few flows will come out of this VM. In this scenario it's much faster to use EMC instead of classifier for traffic from the VM, but it's better to disable EMC for the traffic which flows to VM. To handle above issue introduced 'emc-enable' configurable to enable/disable EMC on a per-port basis. Ex.: ovs-vsctl set interface dpdk0 other_config:emc-enable=false EMC probability kept as is and it works for all the ports with 'emc-enable=true'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* Adding support for PMD auto load balancingNitin Katiyar2019-01-161-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port rx queues that have not been statically assigned to PMDs are currently assigned based on periodically sampled load measurements. The assignment is performed at specific instances – port addition, port deletion, upon reassignment request via CLI etc. Due to change in traffic pattern over time it can cause uneven load among the PMDs and thus resulting in lower overall throughout. This patch enables the support of auto load balancing of PMDs based on measured load of RX queues. Each PMD measures the processing load for each of its associated queues every 10 seconds. If the aggregated PMD load reaches 95% for 6 consecutive intervals then PMD considers itself to be overloaded. If any PMD is overloaded, a dry-run of the PMD assignment algorithm is performed by OVS main thread. The dry-run does NOT change the existing queue to PMD assignments. If the resultant mapping of dry-run indicates an improved distribution of the load then the actual reassignment will be performed. The automatic rebalancing will be disabled by default and has to be enabled via configuration option. The interval (in minutes) between two consecutive rebalancing can also be configured via CLI, default is 1 min. Following example commands can be used to set the auto-lb params: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5" Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com> Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* ovs-actions: New document describing OVS actions in detail.Ben Pfaff2019-01-101-1/+1
| | | | | Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* docs: Fix cross-references that referred to discussions that have moved.Ben Pfaff2018-11-151-5/+5
| | | | | Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Documentation: IPsec tunnel tutorial and documentation.Qiuyu Xiao2018-11-091-8/+147
| | | | | | | | | | | | | tutorials/index.rst gives a step-by-setp guide to set up OVS IPsec tunnel. tutorials/ipsec.rst gives detailed explanation on the IPsec tunnel configuration methods and forwarding modes. Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx@gmail.com> Signed-off-by: Ansis Atteka <aatteka@ovn.org> Co-authored-by: Ansis Atteka <aatteka@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Update documentation for legacy_l3 type packetsGreg Rose2018-11-091-0/+6
| | | | | | | | | The documentation needs to specify that for GRE tunnels there is no support for legacy_l3 type packets in the kernel datapath. Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* documentation: man vswitchd.conf.db(5) updated flow-restore-waitZak Whittington2018-11-021-0/+10
| | | | | | | | | | | | Commit 7ed73428a changed the behavior of flow-restore-wait to also prevent the switch from connecting to controllers in the controller table, but failed to update the man page documentation generated by vswitchd/vswitch.xml to reflect this. This commit adds that documentation. Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* connmgr: Improve interface for setting controllers.Ben Pfaff2018-10-311-61/+41
| | | | | | | | | | | Using an shash instead of an array simplifies the code for both the caller and the callee. Putting the set of allowed OpenFlow versions into the ofproto_controller data structure also simplifies the overall function interface slightly. Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge.c: prevent controller connects while flow-restore-waitZak Whittington2018-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When force-reload-kmod is used, it shows an error when reinstalling tlvs during "Restoring saved flows" step: OFPT_ERROR (xid=0x4): NXTTMFC_ALREADY_MAPPED This is caused by a race condition between the restore script, which calls ofctl, and the connected controllers both adding back the same TLVs. The restore script already sets flow-restore-wait to true while doing flow restoration, and sets it back to false after it is done, and this patch utilizes that fact to prevent the TLV race. It does this by preventing vswitchd from connecting to controllers in the controller table while it is in a flow-restore-wait state. With this patch, when bridge_configure_remotes() calls bridge_get_controllers(), it first checks if flow-restore-wait has been set, and if so, it ignores any controllers in the controller database and sets n_controllers to 0. This solution does preserve the management service controller which is added via bridge_ofproto_controller_for_mgmt() after checking whether we should call bridge_get_controllers() (and thus n_controllers is properly set to 1, etc) VMware-BZ: 2195377 Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* revalidator: Rebalance offloaded flows based on the pps rateSriharsha Basavapatna via dev2018-10-191-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* dpif-netdev-perf: Clarify frequency number.Ilya Maximets2018-10-121-5/+1
| | | | | | | | | | | | | | | | | | | | | | | 'dpif-netdev/pmd-perf-show' command prints the frequency number calculated from the total number of cycles spent for iterations for the measured period. This number could be confusing, because users may think that it should be equal to CPU frequency, especially on non-x86 systems where TSC frequency likely does not match with CPU one. Moreover, counted TSC cycles could differ from the HW TSC cycles in case of a large number of PMD reloads, because cycles spent outside of the main polling loop are not taken into account anywhere. In this case the frequency will not match even TSC frequency. Let's clarify the meaning in order to avoid this misunderstanding. 'Cycles' replaced with 'Used TSC cycles', which describes how many TSC cycles consumed by the main polling loop. % of the total TSC cycles now printed instead of GHz frequency, because GHz is unclear for understanding, especially without knowing the exact TSC frequency. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* Revert "bridge: Fix ovs-appctl qos/show repeated queue information"Ben Pfaff2018-10-031-1/+0
| | | | | | | | | | | | This reverts commit 6b4d0211e84a ("bridge: Fix ovs-appctl qos/show repeated queue information"), which is no longer necessary now that commit 65f3c34c7417 ("netdev: Properly clear 'details' when iterating in NETDEV_QOS_FOR_EACH.") has been applied. The former commit fixed a symptom of the root cause fixed by the latter. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
* bridge: Fix ovs-appctl qos/show repeated queue informationEelco Chaudron2018-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below would stop qos/show to repeat information from the previous queues. See below an example before and after the fix: Before: $ ovs-appctl qos/show p5p2 QoS: p5p2 linux-htb max-rate: 2428800 Default: burst: 12512 min-rate: 12000 max-rate: 2428800 tx_packets: 0 tx_bytes: 0 tx_errors: 0 Queue 20: burst: 12512 burst: 12512 min-rate: 12000 min-rate: 12000 max-rate: 607200 max-rate: 2428800 tx_packets: 28780 tx_bytes: 43572920 tx_errors: 17611 Queue 10: burst: 12512 burst: 12512 burst: 12512 max-rate: 2428800 max-rate: 607200 max-rate: 2428800 min-rate: 12000 min-rate: 12000 min-rate: 12000 tx_packets: 71751 tx_bytes: 108631014 tx_errors: 18503 After: $ ovs-appctl qos/show p5p2 QoS: p5p2 linux-htb max-rate: 2428800 Default: burst: 12512 min-rate: 12000 max-rate: 2428800 tx_packets: 0 tx_bytes: 0 tx_errors: 0 Queue 20: burst: 12512 min-rate: 12000 max-rate: 607200 tx_packets: 28780 tx_bytes: 43572920 tx_errors: 17611 Queue 10: burst: 12512 min-rate: 12000 max-rate: 2428800 tx_packets: 71751 tx_bytes: 108631014 tx_errors: 18503 Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif-netdev: Add round-robin based rxq to pmd assignment.Kevin Traynor2018-09-141-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | Prior to OVS 2.9 automatic assignment of Rxqs to PMDs (i.e. CPUs) was done by round-robin. That was changed in OVS 2.9 to ordering the Rxqs based on their measured processing cycles. This was to assign the busiest Rxqs to different PMDs, improving aggregate throughput. For the most part the new scheme should be better, but there could be situations where a user prefers a simple round-robin scheme because Rxqs from a single port are more likely to be spread across multiple PMDs, and/or traffic is very bursty/unpredictable. Add 'pmd-rxq-assign' config to allow a user to select round-robin based assignment. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* vswitch.xml: Better explain vlan-limit.Ben Pfaff2018-09-071-4/+4
| | | | | | | CC: Eric Garver <e@erig.me> Requested-by: Jerry Lilijun <jerry.lilijun@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eric Garver <e@erig.me>
* vswitch.xml: Fix key type and description style of tc-policy.Ilya Maximets2018-08-301-10/+17
| | | | | | | | | | | The set of supported values specified. Style fixed to look good in man page. Fixed indents. CC: Paul Blakey <paulb@mellanox.com> Fixes: 691d20cbdcf3 ("other-config: Add tc-policy switch to control tc flower flag") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* vswitch.xml: Fix type of dpdk-init key.Ilya Maximets2018-08-271-1/+2
| | | | | | | | | | This adds available modes to the man page. CC: Kevin Traynor <ktraynor@redhat.com> Fixes: 6d947d508a51 ("vswitch.xml: Update dpdk-init documentation.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* vswitch.xml: Update dpdk-init documentation.Kevin Traynor2018-08-101-4/+10
| | | | | | | | | | dpdk-init is now a string. Add description of 'true' and 'try'. Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database") Cc: aconole@redhat.com Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpif-netdev: Add SMC cache after EMC cacheYipeng Wang2018-07-241-0/+13
| | | | | | | | | | | | | | | | | | | | | | This patch adds a signature match cache (SMC) after exact match cache (EMC). The difference between SMC and EMC is SMC only stores a signature of a flow thus it is much more memory efficient. With same memory space, EMC can store 8k flows while SMC can store 1M flows. It is generally beneficial to turn on SMC but turn off EMC when traffic flow count is much larger than EMC size. SMC cache will map a signature to an dp_netdev_flow index in flow_table. Thus, we add two new APIs in cmap for lookup key by index and lookup index by key. For now, SMC is an experimental feature that it is turned off by default. One can turn it on using ovsdb options. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* bridge: Clean leaking netdevs when route is added.Tiago Lam2018-07-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding a route to a bridge, by executing "$appctl ovs/route/add $IP/$MASK $BR", a reference to the existing netdev is taken and stored in an instantiated ip_dev struct which is then stored in an addr_list list in tnl-ports.c. When OvS is signaled to exit, as a result of a "$appctl $OVS_PID exit --cleanup", for example, the bridge takes care of destroying its allocated port and iface structs. While destroying and freeing an iface, the netdev associated with it is also destroyed. However, for this to happen its ref_cnt must be 0. Otherwise the destructor of the netdev (specific to each datapath) won't be called. On the userspace datapath this means a system interface, such as "br0", wouldn't get deleted upon exit of OvS (when a route happens to be assocaited). This was first observed in the "ptap - triangle bridge setup with L2 and L3 GRE tunnels" test, which runs as part of the system userspace testsuite and uses the netdev datapath (as opoosed to several tests which use the dummy datapath, where this issue isn't seen). The test would pass every other time and fail the rest of the times because the needed system interfaces (br-p1, br-p2 and br-p3) were already present (from the previous successfull run which didn't clean up properly), leading to a failure. To fix the leak and clean up the interfaces upon exit, on its final stage before destroying a netdev, in iface_destroy__(), the bridge calls tnl_port_map_delete_ipdev() which takes care of freeing the instatiated ip_dev structs that refer to a specific netdev. An extra test is also introduced which verifies that the resources used by OvS netdev datapath have been correctly cleaned up between OVS_TRAFFIC_VSWITCHD_STOP and AT_CLEANUP. Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Document new fdb statistics commandsEelco Chaudron2018-07-061-0/+5
| | | | | | | Document the new fdb/stats-clear and fdb/stats-show commands Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEADBen Pfaff2018-07-061-0/+17
|\
| * dpdk: Support both shared and per port mempools.Ian Stokes2018-07-061-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit re-introduces the concept of shared mempools as the default memory model for DPDK devices. Per port mempools are still available but must be enabled explicitly by a user. OVS previously used a shared mempool model for ports with the same MTU and socket configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model potentially requires an increase in memory resource requirements to support the same number of ports and configuration as the shared port model. This is considered a blocking factor for current deployments of OVS when upgrading to future OVS releases as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. This commit resolves the issue by re-introducing shared mempools as the default memory behaviour in OVS DPDK but also refactors the memory configuration code to allow for per port mempools. This patch adds a new global config option, per-port-memory, that controls the enablement of per port mempools for DPDK devices. ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true This value defaults to false; to enable per port memory support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. The mempool sweep functionality is also replaced with the sweep functionality from OVS 2.9 found in commits c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.) A new document to discuss the specifics of the memory models and example memory requirement calculations is also added. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>
* | DNS: Add basic support for asynchronous DNS resolvingYifeng Sun2018-07-062-47/+50
|/ | | | | | | | | | | | | | | | | | | | | | This patch is a simple implementation for the proposal discussed in https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337038.html and https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340013.html. It enables ovs-vswitchd and other utilities to use DNS names when specifying OpenFlow and OVSDB remotes. Below are some of the features and limitations of this patch: - Resolving is asynchornous in daemon context, avoiding blocking main loop; - Resolving is synchronous in general utility context; - Both IPv4 and IPv6 are supported; - The resolving API is thread-safe; - Depends on the unbound library; - When multiple ip addresses are returned, only the first one is used; - /etc/nsswitch.conf isn't respected as unbound library doesn't look at it; - For async-resolving, caller need to retry later; there is no callback. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* mac-learning: Increase default mac table size to 8K from 2KEelco Chaudron2018-06-271-1/+1
| | | | | | | | | | | | | | | In field deployments of OVS (mostly in combination with OpenStack) we see that the 2K default MAC forwarding table is too small. On average this tables is around 5k entries, hence this patch to increase the default value to the next power of 2, i.e. 8K. This increase in size does not automatically increase the memory footprint, as the memory for the MAC entries, are allocated only when needed. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEADBen Pfaff2018-06-122-3/+6
|\
| * ovs-thread: Fix thread id for threads not started with ovs_thread_create()Eelco Chaudron2018-06-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ping-pong'in a live VM migration between two machines running OVS-DPDK every now and then the ping misses would increase dramatically. For example: Acked-by: Ilya Maximets <i.maximets@samsung.com> ===========Stream Rate: 3Mpps=========== No Stream_Rate Downtime Totaltime Ping_Loss Moongen_Loss 0 3Mpps 128 13974 115 7168374 1 3Mpps 145 13620 17 1169770 2 3Mpps 140 14499 116 7141175 3 3Mpps 142 13358 16 1150606 4 3Mpps 136 14004 16 1124020 5 3Mpps 139 15494 214 13170452 6 3Mpps 136 15610 217 13282413 7 3Mpps 146 13194 17 1167512 8 3Mpps 148 12871 16 1162655 9 3Mpps 137 15615 214 13170656 I identified this issue being introduced in OVS commit, f3e7ec254738 ("Update relevant artifacts to add support for DPDK 17.05.1.") and more specific due to DPDK commit, af1475918124 ("vhost: introduce API to start a specific driver"). The combined changes no longer have OVS start the vhost socket polling thread at startup, but DPDK will do it on its own when the first vhost client is started. Figuring out the reason why this happens kept me puzzled for quite some time... What happens is that the callbacks called from the vhost thread are calling ovsrcu_synchronize() as part of destroy_device(). This will end-up calling seq_wait__(). By default, all created threads outside of OVS will get thread id 0, which is equal to the main ovs thread. So for example in the seq_wait__() function above if the main thread is waiting already we won't add ourselves as a waiter. The fix below assigns OVSTHREAD_ID_UNSET to none OVS created threads, which will get updated to a valid ID on the first call to ovsthread_id_self(). Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Fixes: f3e7ec254738 ("Update relevant artifacts to add support for DPDK 17.05.1.") Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
| * OVS-DPDK: Change "dpdk-socket-mem" default value.Marcin Rybka2018-06-081-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | When "dpdk-socket-mem" and "dpdk-alloc-mem" are not specified, "dpdk-socket-mem" will be set to allocate 1024MB on each NUMA node. This change will prevent OVS from failing when NIC is attached on NUMA node 1 and higher. Patch contains documentation update. Signed-off-by: Marcin Rybka <marcinx.rybka@intel.com> Co-authored-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com> Tested-by: Hariprasad Govindharajan <hariprasad.govindharajan@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* | treewide: Convert leading tabs to spaces.Ben Pfaff2018-06-113-16/+16
|/ | | | | | | | | It's always been OVS coding style to use spaces rather than tabs for indentation, but some tabs have snuck in over time. This commit converts them to spaces. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
* dpdk: reflect status and version in the databaseAaron Conole2018-05-253-3/+24
| | | | | | | | | | | | | | | | The normal way of retrieving the running DPDK status involves parsing log files and issuing various incantations of ovs-vsctl and ovs-appctl commands to determine whether the rte_eal_init successfully started. This commit adds two new records to reflect the dpdk version, and the dpdk initialization status. To support this, the other_config:dpdk-init configuration block supports the 'true' and 'try' keywords now, instead of just 'true'. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* userspace: add erspan tunnel support.William Tu2018-05-211-0/+34
| | | | | | | | | | | | ERSPAN is a tunneling protocol based on GRE tunnel. The patch add erspan tunnel support for ovs-vswitchd with userspace datapath. Configuring erspan tunnel is similar to gre tunnel, but with additional erspan's parameters. Matching a flow on erspan's metadata is also supported, see ovs-fields for more details. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>