| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology. It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vhost tx retries can provide some mitigation against
dropped packets due to a temporarily slow guest/limited queue
size for an interface, but on the other hand when a system
is fully loaded those extra cycles retrying could mean
packets are dropped elsewhere.
Up to now max vhost tx retries have been hardcoded, which meant
no tuning and no way to disable for debugging to see if extra
cycles spent retrying resulted in rx drops on some other
interface.
Add an option to change the max retries, with a value of
0 effectively disabling vhost tx retries.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch adds ip6gre support. Tunnel type 'ip6gre' with packet_type=
legacy_l2 is a layer 2 GRE tunnel over IPv6, carrying inner ethernet packets
and encap with GRE header with outer IPv6 header. Encapsulation of layer 3
packet over IPv6 GRE, ip6gre, is not supported yet. I tested it by running:
# make check-kernel TESTSUITEFLAGS='-k ip6gre'
under kernel 5.2 and for userspace:
# make check TESTSUITEFLAGS='-k ip6gre'
Tested-by: Greg Rose <gvrose8192@gmail.com>
Tested-at: https://travis-ci.org/gvrose8192/ovs-experimental/builds/552977116
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'netdev' datapath is implemented within ovs-vswitchd process and can
not exist without it, so it should be gracefully terminated with a
full cleanup of resources upon ovs-vswitchd exit.
This change forces dpif cleanup for 'netdev' datapath regardless of
passing '--cleanup' to 'ovs-appctl exit'. Such solution allowes to
not pass this additional option everytime for userspace datapath
installations and also allowes to not terminate system datapath in
setups where both datapaths runs at the same time.
The main part is that dpif_port_del() will lead to netdev_close()
and subsequent netdev_class->destroy(dev) which will stop HW NICs
and free their resources. For vhost-user interfaces it will invoke
vhost driver unregistering with a properly closed vhost-user
connection. For upcoming AF_XDP netdev this will allow to gracefully
destroy xdp sockets and unload xdp programs from linux interfaces.
Another important thing is that port deletion will also trigger
flushing of flows offloaded to HW NICs.
Exception made for 'internal' ports that could have user ip/route
configuration. These ports will not be removed without '--cleanup'.
This change fixes OVS disappearing from the DPDK point of view
(keeping HW NICs improperly configured, sudden closing of vhost-user
connections) and will help with linux devices clearing with upcoming
AF_XDP netdev support.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, '--disable-system' disables both system dp and the system
routing table. The patch makes '--disable-system' only disable system
dp and adds '--disable-system-route' for disabling the route table.
This fixes failures when 'make check-system-userspace' for tunnel cases.
As a consequence, hitting errors due to OVS userspace parses the IGMP packet
but its datapaths do not, so odp_flow_key_to_flow() return ODP_FIT_TOO_LITTLE.
commit c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.")
Fix it by filtering out the IGMP-related error message.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
| |
Open vSwitch now supports all OpenFlow 1.5 required features, so enable
it by default.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.
Flow API providers further encapsulated.
The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Since introduction of dynamic flow API for netdevs, tricky
accesses to uninitialized flow API are no longer possible.
So, ovs-dpctl doesn't support dumping HW offloaded flows now.
Claim this in docs and man pages. Additionally forbidden
'type' argument for 'ovs-dpctl dump-flows'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Configure "if-nonzero" priority tags to retain the 802.1Q header
when the VLAN ID is zero, except both the VLAN ID and priority are zero.
Add a "always" configuration option to retain the 802.1Q header in such
frames as well.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Priority tags is a port configuration to determine how the port treats
priority tags, e.g. zero VLAN ID. Change the type from boolean to enum
as a pre-step towards introducing additional modes. The new options are
"never", equivalent to previously "false", and "if-nonzero",
equivalent to previously "true". "true" is still supported for backwards
compatibility.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Post-copy Live Migration for vHost supported since DPDK 18.11 and
QEMU 2.12. New global config option 'vhost-postcopy-support' added
to control this feature. Ex.:
ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true
Changing this value requires restarting the daemon. It's safe to
enable this knob even if QEMU doesn't support post-copy LM.
Feature marked as experimental and disabled by default because it may
cause PMD thread hang on destination host on page fault for the time
of page downloading from the source.
Feature is not compatible with 'mlockall' and 'dequeue zero-copy'.
Support added only for vhost-user-client.
Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
|
|
|
|
|
|
| |
Needed for the future post-copy live migration support for
vhost-user ports.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
|
|
|
|
| |
Reported-by: William Konitzer <wkonitzer@mirantis.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Virtual ports like 'patch' ports that almost fully implemented on
'ofproto' layer could have internal to 'ofproto' statuses that
could not be retrieved from 'netdev' or other layers. For example,
in current implementation there is no way to get the patch port
pairing status (i.e. if it has usable peer?).
New 'ofproto-provider' API function 'vport_get_status' introduced to
cover this gap. It allowes 'bridge' layer to retrive current status
of ofproto virtual ports and propagate it to DB.
For now we're only interested in pairing errors of 'patch' ports.
That are propagated to the 'error' column of the 'Interface' table.
Ex.:
$ ovs-vsctl show
...
Bridge "br1"
...
Port "patch1"
Interface "patch1"
type: patch
options: {peer="patch0"}
error: "No usable peer 'patch0' exists in 'system' datapath."
Bridge "br0"
datapath_type: netdev
...
Port "patch0"
Interface "patch0"
type: patch
options: {peer="patch1"}
error: "No usable peer 'patch1' exists in 'netdev' datapath."
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
| |
Signed-off-by: Sharon Krendel <thekafkaf@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally it makes sense for an active connection to be primary and a
passive connection to be a service connection, but I've run into a corner
case where it is better for a passive connection to be a primary
connection. This specific case is for use with OFtest, which expects to be
a primary controller. However, it also wants to reconnect frequently,
which is slow for active connections because of the backoff; by
configuring a passive, primary controller, OFtest can reconnect as
frequently and as quickly as it wants, making the overall test much faster.
Acked-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
| |
ONF abandoned the OpenFlow specification, so that OpenFlow 1.6 will never
be completed. It did not contain much in the way of useful features, so
remove what support Open vSwitch already had.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, connmgr has handled active and passive OpenFlow connections in
quite different ways. Any active connection, whether it was currently
connected or not, was always maintained as an ofconn. Whenever such a
connection (re)connected, its settings were cleared. On the other hand,
passive connections had a separate listener which created an ofconn when
a new connection came in, and these ofconns would be deleted when such a
connection was closed. This approach is inelegant and has occasionally
led to bugs when reconnection didn't clear all of the state that it
should have.
There's another motivation here. Currently, active connections are
always primary controllers and passive connections are always service
controllers (as documented in ovs-vswitchd.conf.db(5)). Sometimes it would
be useful to have passive primary controllers (maybe active service
controllers too but I haven't personally run into that use case). As is,
this is difficult to implement because there is so much different code in
use between active and passive connections. This commit will make it
easier.
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 18.05 release, DPDK moved to dynamic memory model in which
hugepages could be allocated on demand. At the same time '--socket-mem'
option was re-defined as a size of pre-allocated memory, i.e. memory
that should be allocated at startup and could not be freed.
So, DPDK with a new memory model could allocate more hugepage memory
than specified in '--socket-mem' or '-m' options.
This change adds new configurable 'other_config:dpdk-socket-limit'
which could be used to limit the ammount of memory DPDK could use.
It uses new DPDK option '--socket-limit'.
Ex.:
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024"
Also, in order to preserve old behaviour, if '--socket-limit' is not
specified, it will be defaulted to the amount of memory specified by
'--socket-mem' option, i.e. OVS will not be able to allocate more.
This is needed, for example, to disallow OVS to allocate more memory
than reserved for it by Nova in OpenStack installations.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Conditional EMC insert helps a lot in scenarios with high numbers
of parallel flows, but in current implementation this option affects
all the threads and ports at once. There are scenarios where we have
different number of flows on different ports. For example, if one
of the VMs encapsulates traffic using additional headers, it will
receive large number of flows but only few flows will come out of
this VM. In this scenario it's much faster to use EMC instead of
classifier for traffic from the VM, but it's better to disable EMC
for the traffic which flows to VM.
To handle above issue introduced 'emc-enable' configurable to
enable/disable EMC on a per-port basis. Ex.:
ovs-vsctl set interface dpdk0 other_config:emc-enable=false
EMC probability kept as is and it works for all the ports with
'emc-enable=true'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.
Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.
This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.
If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.
If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.
The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.
Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
| |
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
| |
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tutorials/index.rst gives a step-by-setp guide to set up OVS IPsec
tunnel.
tutorials/ipsec.rst gives detailed explanation on the IPsec tunnel
configuration methods and forwarding modes.
Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx@gmail.com>
Signed-off-by: Ansis Atteka <aatteka@ovn.org>
Co-authored-by: Ansis Atteka <aatteka@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
| |
The documentation needs to specify that for GRE tunnels there is no
support for legacy_l3 type packets in the kernel datapath.
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 7ed73428a changed the behavior of flow-restore-wait to
also prevent the switch from connecting to controllers in the
controller table, but failed to update the man page documentation
generated by vswitchd/vswitch.xml to reflect this.
This commit adds that documentation.
Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Using an shash instead of an array simplifies the code for both the caller
and the callee. Putting the set of allowed OpenFlow versions into the
ofproto_controller data structure also simplifies the overall function
interface slightly.
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When force-reload-kmod is used, it shows an error when reinstalling
tlvs during "Restoring saved flows" step:
OFPT_ERROR (xid=0x4): NXTTMFC_ALREADY_MAPPED
This is caused by a race condition between the restore script,
which calls ofctl, and the connected controllers both adding back
the same TLVs.
The restore script already sets flow-restore-wait to true while
doing flow restoration, and sets it back to false after it is
done, and this patch utilizes that fact to prevent the TLV race.
It does this by preventing vswitchd from connecting to
controllers in the controller table while it is in a
flow-restore-wait state.
With this patch, when bridge_configure_remotes() calls
bridge_get_controllers(), it first checks if flow-restore-wait
has been set, and if so, it ignores any controllers in the
controller database and sets n_controllers to 0.
This solution does preserve the management service controller
which is added via bridge_ofproto_controller_for_mgmt() after
checking whether we should call bridge_get_controllers()
(and thus n_controllers is properly set to 1, etc)
VMware-BZ: 2195377
Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the third patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The dynamic rebalancing functionality is implemented in this patch. The
ukeys that are not scheduled for deletion are obtained and passed as input
to the rebalancing routine. The rebalancing is done in the context of
revalidation leader thread, after all other revalidator threads are
done with gathering rebalancing data for flows.
For each netdev that is in OOR state, a list of flows - both offloaded
and non-offloaded (pending) - is obtained using the ukeys. For each netdev
that is in OOR state, the flows are grouped and sorted into offloaded and
pending flows. The offloaded flows are sorted in descending order of
pps-rate, while pending flows are sorted in ascending order of pps-rate.
The rebalancing is done in two phases. In the first phase, we try to
offload all pending flows and if that succeeds, the OOR state on the device
is cleared. If some (or none) of the pending flows could not be offloaded,
then we start replacing an offloaded flow that has a lower pps-rate than
a pending flow, until there are no more pending flows with a higher rate
than an offloaded flow. The flows that are replaced from the device are
added into kernel datapath.
A new OVS configuration parameter "offload-rebalance", is added to ovsdb.
The default value of this is "false". To enable this feature, set the
value of this parameter to "true", which provides packets-per-second
rate based policy to dynamically offload and un-offload flows.
Note: This option can be enabled only when 'hw-offload' policy is enabled.
It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
offload errors (specifically ENOSPC error this feature depends on) reported
by an offloaded device are supressed by TC-Flower kernel module.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'dpif-netdev/pmd-perf-show' command prints the frequency number
calculated from the total number of cycles spent for iterations
for the measured period. This number could be confusing, because
users may think that it should be equal to CPU frequency, especially
on non-x86 systems where TSC frequency likely does not match with
CPU one.
Moreover, counted TSC cycles could differ from the HW TSC cycles
in case of a large number of PMD reloads, because cycles spent
outside of the main polling loop are not taken into account anywhere.
In this case the frequency will not match even TSC frequency.
Let's clarify the meaning in order to avoid this misunderstanding.
'Cycles' replaced with 'Used TSC cycles', which describes how many TSC
cycles consumed by the main polling loop. % of the total TSC cycles
now printed instead of GHz frequency, because GHz is unclear for
understanding, especially without knowing the exact TSC frequency.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 6b4d0211e84a ("bridge: Fix ovs-appctl qos/show
repeated queue information"), which is no longer necessary now that
commit 65f3c34c7417 ("netdev: Properly clear 'details' when iterating
in NETDEV_QOS_FOR_EACH.") has been applied. The former commit fixed
a symptom of the root cause fixed by the latter.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch below would stop qos/show to repeat information from the previous queues.
See below an example before and after the fix:
Before:
$ ovs-appctl qos/show p5p2
QoS: p5p2 linux-htb
max-rate: 2428800
Default:
burst: 12512
min-rate: 12000
max-rate: 2428800
tx_packets: 0
tx_bytes: 0
tx_errors: 0
Queue 20:
burst: 12512
burst: 12512
min-rate: 12000
min-rate: 12000
max-rate: 607200
max-rate: 2428800
tx_packets: 28780
tx_bytes: 43572920
tx_errors: 17611
Queue 10:
burst: 12512
burst: 12512
burst: 12512
max-rate: 2428800
max-rate: 607200
max-rate: 2428800
min-rate: 12000
min-rate: 12000
min-rate: 12000
tx_packets: 71751
tx_bytes: 108631014
tx_errors: 18503
After:
$ ovs-appctl qos/show p5p2
QoS: p5p2 linux-htb
max-rate: 2428800
Default:
burst: 12512
min-rate: 12000
max-rate: 2428800
tx_packets: 0
tx_bytes: 0
tx_errors: 0
Queue 20:
burst: 12512
min-rate: 12000
max-rate: 607200
tx_packets: 28780
tx_bytes: 43572920
tx_errors: 17611
Queue 10:
burst: 12512
min-rate: 12000
max-rate: 2428800
tx_packets: 71751
tx_bytes: 108631014
tx_errors: 18503
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to OVS 2.9 automatic assignment of Rxqs to PMDs
(i.e. CPUs) was done by round-robin.
That was changed in OVS 2.9 to ordering the Rxqs based on
their measured processing cycles. This was to assign the
busiest Rxqs to different PMDs, improving aggregate
throughput.
For the most part the new scheme should be better, but
there could be situations where a user prefers a simple
round-robin scheme because Rxqs from a single port are
more likely to be spread across multiple PMDs, and/or
traffic is very bursty/unpredictable.
Add 'pmd-rxq-assign' config to allow a user to select
round-robin based assignment.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
| |
CC: Eric Garver <e@erig.me>
Requested-by: Jerry Lilijun <jerry.lilijun@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eric Garver <e@erig.me>
|
|
|
|
|
|
|
|
|
|
|
| |
The set of supported values specified.
Style fixed to look good in man page. Fixed indents.
CC: Paul Blakey <paulb@mellanox.com>
Fixes: 691d20cbdcf3 ("other-config: Add tc-policy switch to
control tc flower flag")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
|
|
|
|
|
|
|
|
|
|
| |
This adds available modes to the man page.
CC: Kevin Traynor <ktraynor@redhat.com>
Fixes: 6d947d508a51 ("vswitch.xml: Update dpdk-init documentation.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
dpdk-init is now a string. Add description of 'true' and 'try'.
Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database")
Cc: aconole@redhat.com
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a signature match cache (SMC) after exact match
cache (EMC). The difference between SMC and EMC is SMC only stores
a signature of a flow thus it is much more memory efficient. With
same memory space, EMC can store 8k flows while SMC can store 1M
flows. It is generally beneficial to turn on SMC but turn off EMC
when traffic flow count is much larger than EMC size.
SMC cache will map a signature to an dp_netdev_flow index in
flow_table. Thus, we add two new APIs in cmap for lookup key by
index and lookup index by key.
For now, SMC is an experimental feature that it is turned off by
default. One can turn it on using ovsdb options.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adding a route to a bridge, by executing "$appctl ovs/route/add
$IP/$MASK $BR", a reference to the existing netdev is taken and stored
in an instantiated ip_dev struct which is then stored in an addr_list
list in tnl-ports.c. When OvS is signaled to exit, as a result of a
"$appctl $OVS_PID exit --cleanup", for example, the bridge takes care of
destroying its allocated port and iface structs. While destroying and
freeing an iface, the netdev associated with it is also destroyed.
However, for this to happen its ref_cnt must be 0. Otherwise the
destructor of the netdev (specific to each datapath) won't be called. On
the userspace datapath this means a system interface, such as "br0",
wouldn't get deleted upon exit of OvS (when a route happens to be
assocaited).
This was first observed in the "ptap - triangle bridge setup with L2 and
L3 GRE tunnels" test, which runs as part of the system userspace
testsuite and uses the netdev datapath (as opoosed to several tests
which use the dummy datapath, where this issue isn't seen). The test
would pass every other time and fail the rest of the times because the
needed system interfaces (br-p1, br-p2 and br-p3) were already present
(from the previous successfull run which didn't clean up properly),
leading to a failure.
To fix the leak and clean up the interfaces upon exit, on its final
stage before destroying a netdev, in iface_destroy__(), the bridge calls
tnl_port_map_delete_ipdev() which takes care of freeing the instatiated
ip_dev structs that refer to a specific netdev.
An extra test is also introduced which verifies that the resources used
by OvS netdev datapath have been correctly cleaned up between
OVS_TRAFFIC_VSWITCHD_STOP and AT_CLEANUP.
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
| |
Document the new fdb/stats-clear and fdb/stats-show commands
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.
OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html
However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.
This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.
This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.
This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.
ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.
The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits
c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)
A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Tested-by: Tiago Lam <tiago.lam@intel.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a simple implementation for the proposal discussed in
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337038.html and
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340013.html.
It enables ovs-vswitchd and other utilities to use DNS names when specifying
OpenFlow and OVSDB remotes.
Below are some of the features and limitations of this patch:
- Resolving is asynchornous in daemon context, avoiding blocking main loop;
- Resolving is synchronous in general utility context;
- Both IPv4 and IPv6 are supported;
- The resolving API is thread-safe;
- Depends on the unbound library;
- When multiple ip addresses are returned, only the first one is used;
- /etc/nsswitch.conf isn't respected as unbound library doesn't look at it;
- For async-resolving, caller need to retry later; there is no callback.
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In field deployments of OVS (mostly in combination with OpenStack) we
see that the 2K default MAC forwarding table is too small.
On average this tables is around 5k entries, hence this patch to
increase the default value to the next power of 2, i.e. 8K.
This increase in size does not automatically increase the memory
footprint, as the memory for the MAC entries, are allocated only when
needed.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When ping-pong'in a live VM migration between two machines running
OVS-DPDK every now and then the ping misses would increase
dramatically. For example:
Acked-by: Ilya Maximets <i.maximets@samsung.com>
===========Stream Rate: 3Mpps===========
No Stream_Rate Downtime Totaltime Ping_Loss Moongen_Loss
0 3Mpps 128 13974 115 7168374
1 3Mpps 145 13620 17 1169770
2 3Mpps 140 14499 116 7141175
3 3Mpps 142 13358 16 1150606
4 3Mpps 136 14004 16 1124020
5 3Mpps 139 15494 214 13170452
6 3Mpps 136 15610 217 13282413
7 3Mpps 146 13194 17 1167512
8 3Mpps 148 12871 16 1162655
9 3Mpps 137 15615 214 13170656
I identified this issue being introduced in OVS commit,
f3e7ec254738 ("Update relevant artifacts to add support for DPDK 17.05.1.")
and more specific due to DPDK commit,
af1475918124 ("vhost: introduce API to start a specific driver").
The combined changes no longer have OVS start the vhost socket polling
thread at startup, but DPDK will do it on its own when the first vhost
client is started.
Figuring out the reason why this happens kept me puzzled for quite some time...
What happens is that the callbacks called from the vhost thread are
calling ovsrcu_synchronize() as part of destroy_device(). This will
end-up calling seq_wait__().
By default, all created threads outside of OVS will get thread id 0,
which is equal to the main ovs thread. So for example in the
seq_wait__() function above if the main thread is waiting already we
won't add ourselves as a waiter.
The fix below assigns OVSTHREAD_ID_UNSET to none OVS created threads,
which will get updated to a valid ID on the first call to
ovsthread_id_self().
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Fixes: f3e7ec254738 ("Update relevant artifacts to add support for DPDK
17.05.1.")
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When "dpdk-socket-mem" and "dpdk-alloc-mem" are not specified,
"dpdk-socket-mem" will be set to allocate 1024MB on each NUMA node.
This change will prevent OVS from failing when NIC is attached on
NUMA node 1 and higher. Patch contains documentation update.
Signed-off-by: Marcin Rybka <marcinx.rybka@intel.com>
Co-authored-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Tested-by: Hariprasad Govindharajan <hariprasad.govindharajan@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|/
|
|
|
|
|
|
|
| |
It's always been OVS coding style to use spaces rather than tabs for
indentation, but some tabs have snuck in over time. This commit converts
them to spaces.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The normal way of retrieving the running DPDK status involves parsing
log files and issuing various incantations of ovs-vsctl and ovs-appctl
commands to determine whether the rte_eal_init successfully started.
This commit adds two new records to reflect the dpdk version, and
the dpdk initialization status.
To support this, the other_config:dpdk-init configuration block supports
the 'true' and 'try' keywords now, instead of just 'true'.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
ERSPAN is a tunneling protocol based on GRE tunnel. The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters. Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|