| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
min-revalidate-pps.
Depending on the driver implementation, it can take from 0.2 seconds
up to 2 seconds before offloaded flow statistics are updated. This is
true for both TC and rte_flow-based offloading. This is causing a
problem with min-revalidate-pps, as old statistic values are used
during this period.
This fix will wait for at least 2 seconds, by default, before assuming no
packets where received during this period.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Add options to the IPFIX table configure the interval to send statistics
and template information.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Add support to count upcall packets per port, both succeed and failed,
which is a better way to see how many packets upcalled on each interface.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
100 Mbps was a fair assumption 13 years ago. Modern days 10 Gbps seems
like a good value in case no information is available otherwise.
The change mainly affects QoS which is currently limited to 100 Mbps if
the user didn't specify 'max-rate' and the card doesn't report the
speed or OVS doesn't have a predefined enumeration for the speed
reported by the NIC.
Calculation of the path cost for STP/RSTP is also affected if OVS is
unable to determine the link speed.
Lower link speed adapters are typically good at reporting their speed,
so chances for overshoot should be low. But newer high-speed adapters,
for which there is no speed enumeration or if there are some other
issues, will not suffer that much.
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
The count of received multicast packets has been computed internally,
but not exposed to ovsdb. Fix this.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the current xenserver implementation - it is obsolete and
since 3.0 we do not support kernel module builds [1].
1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html
[i.maximets]
Can be added back if people willing to maintain it will be found.
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
This config param allows the delivery of broadcast and multicast
packets to the secondary interface of non-lacp bonds, equivalent
to the option 'all_slaves_active' for Linux kernel bonds.
Reported-at: https://bugzilla.redhat.com/1720935
Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
implementations.
This commit introduces the initial infrastructure required to allow
different implementations for OvS actions. The patch introduces action
function pointers which allows user to switch between different action
implementations available. This will allow for more performance and flexibility
so the user can choose the action implementation to best suite their use case.
Signed-off-by: Emma Finn <emma.finn@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
netdev_set_dpif_type() can only be used with a normalized dpif type
as an argument, which is a constant static string derived from a type
of a dpif_class or a constant string "system". Usage of a same
constant string allows netdev-offload module to compare types by
simply comparing pointers.
OTOH, 'br->ofproto->type' is a dynamic string that:
a. Can be NULL.
b. Even if not NULL and equal, can be a different dynamically
allocated string.
Both these qualities breaks assumptions made by all other modules
related to HW offload, breaking the functionality.
Fix that by moving netdev_set_dpif_type() to dpif.c and calling with
a correct constant string as an argument.
The call moved from bridge.c to dpif.c, because we need to have access
to the dpif class, but bridge.c should not.
Not trying to set the dpif_type inside the netdev_ports_insert(),
because it's used now outside the offloading context. So, it's
cleaner to move the netdev_set_dpif_type() call outside of the
netdev-offload module.
Additionally removed the redundant call from the netdev_ports_insert()
and refactored the function, since it doesn't need an extra argument
anymore.
Fixes: 4f19a78a61c5 ("netdev-vport: Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.")
Reported-by: Roi Dayan <roid@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-December/390117.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Lin Huang <linhuang@ruijie.com.cn>
Acked-by: Roi Dayan <roid@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Userspace tunnel doesn't have a valid device in the kernel. So
get_ifindex() function (ioctl) always get error during
adding a port, deleting a port or updating a port status.
The info log is
"2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
on vxlan_sys_4789 device failed: No such device"
If there are a lot of userspace tunnel ports on a bridge, the
iface_refresh_netdev_status() function will spend a lot of time.
So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
return -ENODEV.
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added new function to return memory usage statistics for database
objects inside IDL. Statistics similar to what ovsdb-server reports.
Not counting _Server database as it should be small, hence doesn't
worth adding extra code to the ovsdb-cs module. Can be added later
if needed.
ovs-vswitchd is a user in OVS, but this API will be mostly useful for
OVN daemons.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current algorithm of ovsdb_datum_union looks like this:
for-each atom in b:
if not bin_search(a, atom):
push(a, clone(atom))
quicksort(a)
So, the complexity looks like this:
Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb)
Comparisons clones Comparisons for quicksort
for search
ovsdb_datum_union() is heavily used in database transactions while
new element is added to a set. For example, if new logical switch
port is added to a logical switch in OVN. This is a very common
use case where CMS adds one new port to an existing switch that
already has, let's say, 100 ports. For this case ovsdb-server will
have to perform:
1 * log2(100) + 1 clone + 101 * log2(101)
Comparisons Comparisons for
for search quicksort.
~7 1 ~707
Roughly 714 comparisons of atoms and 1 clone.
Since binary search can give us position, where new atom should go
(it's the 'low' index after the search completion) for free, the
logic can be re-worked like this:
copied = 0
for-each atom in b:
desired_position = bin_search(a, atom)
push(result, a[ copied : desired_position - 1 ])
copied = desired_position
push(result, clone(atom))
push(result, a[ copied : Na ])
swap(a, result)
Complexity of this schema:
Nb * log2(Na) + Nb + Na
Comparisons clones memory copy on push
for search
'swap' is just a swap of a few pointers. 'push' is not a 'clone',
but a simple memory copy of 'union ovsdb_atom'.
In general, this schema substitutes complexity of a quicksort
with complexity of a memory copy of Na atom structures, where we're
not even copying strings that these atoms are pointing to.
Complexity in the example above goes down from 714 comparisons
to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.
General complexity of a memory copy should always be lower than
complexity of a quicksort, especially because these copies usually
performed in bulk, so this new schema should work faster for any input.
All in all, this change allows to execute several times more
transactions per second for transactions that adds new entries to sets.
Alternatively, union can be implemented as a linear merge of two
sorted arrays, but this will result in O(Na) comparisons, which
is more than Nb * log2(Na) in common case, since Na is usually
far bigger than Nb. Linear merge will also mean per-atom memory
copies instead of copying in bulk.
'replace' functionality of ovsdb_datum_union() had no users, so it
just removed. But it can easily be added back if needed in the future.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The vswitchd database schema requires role names to be "master" or
"slave", but this code tried to use "primary" and "secondary".
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/218
Tested-at: https://github.com/openvswitch/ovs-issues/issues/218#issuecomment-875374045
Fixes: 807152a4ddfb ("Use primary/secondary, not master/slave, as names for OpenFlow roles.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the function ofproto_set_flow_limit() was not checking
'limit' value. It maybe negative, which will be lead to a big
unsigned value. The 'limit' should never be negative so it's better
to just use smap_get_uint() to get it right.
And fix ofproto_set_max_idle(), ofproto_set_min_revalidate_pps(),
ofproto_set_max_revalidator() and ofproto_set_bundle_idle_timeout()
together.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The documentation for inactivity_probe says this:
inactivity_probe: optional integer
Maximum number of milliseconds of idle time on connec‐
tion to controller before sending an inactivity probe
message. If Open vSwitch does not communicate with the
controller for the specified number of seconds, it will
send a probe. If a response is not received for the same
additional amount of time, Open vSwitch assumes the con‐
nection has been broken and attempts to reconnect. De‐
fault is implementation-specific. A value of 0 disables
inactivity probes.
This means that a value of 0 should disable inactivity probes and any
other value should be in milliseconds. The code in bridge.c was
actually interpreting it as any value between 0 and 999 disabling
inactivity probes. That was surprising when I accidentally configured
it to 5 or to 10, not remembering that it was in milliseconds, and
disabled them entirely. This fixes the problem.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OVS has support for using policing to enforce a rate limit in
kilobits per second. This is configured using OVSDB. f.e.
$ ovs-vsctl set interface tap0 ingress_policing_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_burst=100
This patch adds a related feature, allowing policing to enforce a rate
limit in kilo-packets per second. This is also configured using OVSDB.
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100
The kilo-bit and kilo-packet rate limits may be used separately or in
combination.
Add separate action for BPS and PPS in netlink message.
Revise code and change action result to pipe to allow
traffic pipe into second action.
This patch implements the feature for:
* OVSDB (northbound API)
* TC policer when used both with and without TC offload (kernel API)
Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new term is "member".
Most of these changes should not change user-visible behavior. One
place where they do is in "ovs-ofctl dump-flows", which will now output
"members:..." inside "bundle" actions instead of "slaves:...". I don't
expect this to cause real problems in most systems. The old syntax
is still supported on input for backward compatibility.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
's->lacp_fallback_ab_cfg' initialized down below in the code, so
we're using it uninitialized to detect if we need to get 'bond-primary'
configuration.
Found by valgrind:
Conditional jump or move depends on uninitialised value(s)
at 0x409114: port_configure_bond (bridge.c:4569)
by 0x409114: port_configure (bridge.c:1284)
by 0x40F6E6: bridge_reconfigure (bridge.c:917)
by 0x411425: bridge_run (bridge.c:3330)
by 0x406D84: main (ovs-vswitchd.c:127)
Uninitialised value was created by a stack allocation
at 0x408C53: port_configure (bridge.c:1190)
Fix that by moving this code to the point where 'lacp_fallback_ab_cfg'
already initialized. Additionally clarified behavior of 'bond-primary'
in manpages for the fallback to AB case.
Fixes: b4e50218a0f8 ("bond: Add 'primary' interface concept for active-backup mode.")
Acked-by: Jeff Squyres <jsquyres@cisco.com>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
| |
There is one remaining use under datapath. That change should happen
upstream in Linux first according to our usual policy.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In AB bonding, if the current active slave becomes disabled, a
replacement slave is arbitrarily picked from the remaining set of
enabled slaves. This commit adds the concept of a "primary" slave: an
interface that will always be (or become) the current active slave if
it is enabled.
The rationale for this functionality is to allow the designation of a
preferred interface for a given bond. For example:
1. Bond is created with interfaces p1 (primary) and p2, both enabled.
2. p1 becomes the current active slave (because it was designated as
the primary).
3. Later, p1 fails/becomes disabled.
4. p2 is chosen to become the current active slave.
5. Later, p1 becomes re-enabled.
6. p1 is chosen to become the current active slave (because it was
designated as the primary)
Note that p1 becomes the active slave once it becomes re-enabled, even
if nothing has happened to p2.
This "primary" concept exists in Linux kernel network interface
bonding, but did not previously exist in OVS bonding.
Only one primary slave interface is supported per bond, and is only
supported for active/backup bonding.
The primary slave interface is designated via
"other_config:bond-primary" when creating a bond.
Also, while adding tests for the "primary" concept, make a few small
improvements to the non-primary AB bonding test.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accoridng to vswitch.ovsschema, each CT_Zone record may have
zero or one associcated CT_Timeout_policy. Thus, this patch
checks if ovsrec_ct_timeout_policy exist before accesses the
record.
VMWare-BZ: 2585825
Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.")
Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:
1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.
2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.
Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.
Proposed optimization:
This proposal introduces a new load-balancing output action instead of
recirculation.
Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.
Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.
Instead, we will now generate action as
'lb_output(<bond id>)'
This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).
Example:
Current scheme:
With 8 UDP flows (with random UDP src port):
flow-dump from pmd on cpu core: 2
recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)
recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1
New scheme:
We can do with a single flow entry (for any number of new flows):
in_port(7),<...> actions:lb_output(1)
A new CLI has been added to dump datapath bond cache as given below.
# ovs-appctl dpif-netdev/bond-show [dp]
Bond cache:
bond-id 1 :
bucket 0 - slave 2
bucket 1 - slave 1
bucket 2 - slave 2
bucket 3 - slave 1
Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
| |
Signed-off-by: Kirill A. Kornilov <kornilov@zelax.ru>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.
A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.
It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.
If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.
It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Split the update of rstp_statistics column and rstp_status column in
Port table into two different functions. This helps in controlling the
number of times the rstp_statistics column is updated with the key
"stats-update_interval" in Open_vSwitch table.
Signed-off-by: Krishna Kolakaluri <kkolakaluri@plume.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometimes interface updates could happen in a way ifnotifier is not
able to catch. For example some heavy operations (device reset) in
netdev-dpdk could require re-applying of the bridge configuration.
For this purpose new manual notifier introduced. Its function
'if_notifier_manual_report()' could be called directly by the code
that aware about changes. This new notifier is thread-safe.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add argument '--offload-stats' for command ovs-appctl bridge/dump-flows
to display the offloaded packets statistics.
The commands display as below:
orignal command:
ovs-appctl bridge/dump-flows br0
duration=574s, n_packets=1152, n_bytes=110768, priority=0,actions=NORMAL
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop
new command with argument '--offload-stats'
Notice: 'n_offload_packets' are a subset of n_packets and 'n_offload_bytes' are
a subset of n_bytes.
ovs-appctl bridge/dump-flows --offload-stats br0
duration=582s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, n_offload_bytes=107992, priority=0,actions=NORMAL
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x3,actions=drop
Signed-off-by: zhaozhanxu <zhaozhanxu@163.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch adds support for fetching the datapath's capabilities
from the result of 'check_support()', and write the supported capability
to a new database column, called 'capabilities' under Datapath table.
To see how it works, run:
# ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
# ovs-vsctl -- --id=@m create Datapath datapath_version=0 \
'ct_zones={}' 'capabilities={}' \
-- set Open_vSwitch . datapaths:"netdev"=@m
# ovs-vsctl list-dp-cap netdev
ufid=true sample_nesting=true clone=true tnl_push_pop=true \
ct_orig_tuple=true ct_eventmask=true ct_state=true \
ct_clear=true max_vlan_headers=1 recirc=true ct_label=true \
max_hash_alg=1 ct_state_nat=true ct_timeout=true \
ct_mark=true ct_orig_tuple6=true check_pkt_len=true \
masked_set_action=true max_mpls_depth=3 trunc=true ct_zone=true
Signed-off-by: William Tu <u9012063@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
---
v5:
Add improved documentation from Ben and
fix checkpatch error (tab and line 79 char)
v4:
rebase to master
v3:
fix 32-bit build, reported by Greg
travis: https://travis-ci.org/williamtu/ovs-travis/builds/599276267
v2:
rebase to master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch consumes the CT_Zone and CT_Timeout_Policy tables, maintains
the zone-based configuration in the vswitchd. Whenever there is a
database change, vswitchd will read the datapath, CT_Zone, and
CT_Timeout_Policy tables from ovsdb, builds an internal snapshot of the
database configuration in bridge.c, and pushes down the change into
ofproto and dpif layer.
If a new zone-based timeout policy is added, it updates the zone to
timeout policy mapping in the per datapath type datapath structure in
dpif-backer, and pushes down the timeout policy into the datapath via
dpif interface.
If a timeout policy is no longer used, for kernel datapath, vswitchd
may not be able to remove it from datapath immediately since
datapath flows can still reference the to-be-deleted timeout policies.
Thus, we keep an timeout policy kill list, that vswitchd will go
back to the list periodically and try to kill the unused timeout policies.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Hi all:
Currently OVS maintains several Statistics counters per interface. "rx_missed_errors" counter is amount them and collects pkts not received due to local resource constaints. Many ovs netdevs support collecting this counter, such as netdev-linux, netdev-dpdk, netdev-bsd and so on. But as far as I know, this counter can't be read by command "ovs-vsctl list interface <int-name>|grep statistics". I have found the root cause(may be I was wrong) is in task "iface_refresh_stats", the "rx_missed_errors" is not in the macro IFACE_STATS. So even if this counter is updated by netdev, it woundn't be read by users.
This simple patch tries to solve this problem, many thanks for your kindly reminder.
Signed-off-by: Liu Chang <liuchang@cmss.chinamobile.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ofconn packet-in queue for packets that can't be immediately sent
on the rconn connection was limited to 100 packets (hardcoded value).
While increasing this limit is usually not recommended as it might
create buffer bloat and increase latency, in scaled scenarios it is
useful if the administrator (or CMS) can adjust the queue size.
One such situation was noticed while performing scale testing of the
OVN IGMP functionality: triggering ~200 simultaneous IGMP reports
was causing tail drops on the packet-in queue towards ovn-controller.
This commit adds the possibility to configure the queue size for:
- management controller (br-int.mgmt): through the
other_config:controller-queue-size column of the Bridge table. This
value is limited to 512 as large queues definitely affect latency. If
not present the default value of 100 is used. This is done in order to
maintain the same default behavior as before the commit.
- other controllers: through the controller_queue_size column of the
Controller table. This value is also limited to 512. If not present
the code uses the Bridge:other_config:controller-queue-size
configuration.
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's possible that a port added to the system with certain kinds
of invalid parameters will cause the 'could not add' log to be
triggered. When this happens, the vswitch run loop can continually
re-attempt adding the port. While the parameters remain invalid
the vswitch run loop will re-trigger the warning, flooding the
syslog.
This patch adds a simple rate limit to the log.
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new configuration option, "min-revalidate-pps" to the
Open_vSwitch "other-config" column. This sets minimum pps that flow must
have in order to be revalidated when revalidation duration exceeds half of
max-revalidator config variable.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new configuration option, "max-revalidator" to the
Open_vSwitch "other-config" column. This sets maximum allowed ravalidator
timeout. Actual timeout value is determined at runtime as minimum of
"max-idle" and "max-revalidator".
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.
Flow API providers further encapsulated.
The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Configure "if-nonzero" priority tags to retain the 802.1Q header
when the VLAN ID is zero, except both the VLAN ID and priority are zero.
Add a "always" configuration option to retain the 802.1Q header in such
frames as well.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Priority tags is a port configuration to determine how the port treats
priority tags, e.g. zero VLAN ID. Change the type from boolean to enum
as a pre-step towards introducing additional modes. The new options are
"never", equivalent to previously "false", and "if-nonzero",
equivalent to previously "true". "true" is still supported for backwards
compatibility.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Virtual ports like 'patch' ports that almost fully implemented on
'ofproto' layer could have internal to 'ofproto' statuses that
could not be retrieved from 'netdev' or other layers. For example,
in current implementation there is no way to get the patch port
pairing status (i.e. if it has usable peer?).
New 'ofproto-provider' API function 'vport_get_status' introduced to
cover this gap. It allowes 'bridge' layer to retrive current status
of ofproto virtual ports and propagate it to DB.
For now we're only interested in pairing errors of 'patch' ports.
That are propagated to the 'error' column of the 'Interface' table.
Ex.:
$ ovs-vsctl show
...
Bridge "br1"
...
Port "patch1"
Interface "patch1"
type: patch
options: {peer="patch0"}
error: "No usable peer 'patch0' exists in 'system' datapath."
Bridge "br0"
datapath_type: netdev
...
Port "patch0"
Interface "patch0"
type: patch
options: {peer="patch1"}
error: "No usable peer 'patch1' exists in 'netdev' datapath."
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally it makes sense for an active connection to be primary and a
passive connection to be a service connection, but I've run into a corner
case where it is better for a passive connection to be a primary
connection. This specific case is for use with OFtest, which expects to be
a primary controller. However, it also wants to reconnect frequently,
which is slow for active connections because of the backoff; by
configuring a passive, primary controller, OFtest can reconnect as
frequently and as quickly as it wants, making the overall test much faster.
Acked-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, connmgr has handled active and passive OpenFlow connections in
quite different ways. Any active connection, whether it was currently
connected or not, was always maintained as an ofconn. Whenever such a
connection (re)connected, its settings were cleared. On the other hand,
passive connections had a separate listener which created an ofconn when
a new connection came in, and these ofconns would be deleted when such a
connection was closed. This approach is inelegant and has occasionally
led to bugs when reconnection didn't clear all of the state that it
should have.
There's another motivation here. Currently, active connections are
always primary controllers and passive connections are always service
controllers (as documented in ovs-vswitchd.conf.db(5)). Sometimes it would
be useful to have passive primary controllers (maybe active service
controllers too but I haven't personally run into that use case). As is,
this is difficult to implement because there is so much different code in
use between active and passive connections. This commit will make it
easier.
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Using an shash instead of an array simplifies the code for both the caller
and the callee. Putting the set of allowed OpenFlow versions into the
ofproto_controller data structure also simplifies the overall function
interface slightly.
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When force-reload-kmod is used, it shows an error when reinstalling
tlvs during "Restoring saved flows" step:
OFPT_ERROR (xid=0x4): NXTTMFC_ALREADY_MAPPED
This is caused by a race condition between the restore script,
which calls ofctl, and the connected controllers both adding back
the same TLVs.
The restore script already sets flow-restore-wait to true while
doing flow restoration, and sets it back to false after it is
done, and this patch utilizes that fact to prevent the TLV race.
It does this by preventing vswitchd from connecting to
controllers in the controller table while it is in a
flow-restore-wait state.
With this patch, when bridge_configure_remotes() calls
bridge_get_controllers(), it first checks if flow-restore-wait
has been set, and if so, it ignores any controllers in the
controller database and sets n_controllers to 0.
This solution does preserve the management service controller
which is added via bridge_ofproto_controller_for_mgmt() after
checking whether we should call bridge_get_controllers()
(and thus n_controllers is properly set to 1, etc)
VMware-BZ: 2195377
Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 6b4d0211e84a ("bridge: Fix ovs-appctl qos/show
repeated queue information"), which is no longer necessary now that
commit 65f3c34c7417 ("netdev: Properly clear 'details' when iterating
in NETDEV_QOS_FOR_EACH.") has been applied. The former commit fixed
a symptom of the root cause fixed by the latter.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch below would stop qos/show to repeat information from the previous queues.
See below an example before and after the fix:
Before:
$ ovs-appctl qos/show p5p2
QoS: p5p2 linux-htb
max-rate: 2428800
Default:
burst: 12512
min-rate: 12000
max-rate: 2428800
tx_packets: 0
tx_bytes: 0
tx_errors: 0
Queue 20:
burst: 12512
burst: 12512
min-rate: 12000
min-rate: 12000
max-rate: 607200
max-rate: 2428800
tx_packets: 28780
tx_bytes: 43572920
tx_errors: 17611
Queue 10:
burst: 12512
burst: 12512
burst: 12512
max-rate: 2428800
max-rate: 607200
max-rate: 2428800
min-rate: 12000
min-rate: 12000
min-rate: 12000
tx_packets: 71751
tx_bytes: 108631014
tx_errors: 18503
After:
$ ovs-appctl qos/show p5p2
QoS: p5p2 linux-htb
max-rate: 2428800
Default:
burst: 12512
min-rate: 12000
max-rate: 2428800
tx_packets: 0
tx_bytes: 0
tx_errors: 0
Queue 20:
burst: 12512
min-rate: 12000
max-rate: 607200
tx_packets: 28780
tx_bytes: 43572920
tx_errors: 17611
Queue 10:
burst: 12512
min-rate: 12000
max-rate: 2428800
tx_packets: 71751
tx_bytes: 108631014
tx_errors: 18503
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adding a route to a bridge, by executing "$appctl ovs/route/add
$IP/$MASK $BR", a reference to the existing netdev is taken and stored
in an instantiated ip_dev struct which is then stored in an addr_list
list in tnl-ports.c. When OvS is signaled to exit, as a result of a
"$appctl $OVS_PID exit --cleanup", for example, the bridge takes care of
destroying its allocated port and iface structs. While destroying and
freeing an iface, the netdev associated with it is also destroyed.
However, for this to happen its ref_cnt must be 0. Otherwise the
destructor of the netdev (specific to each datapath) won't be called. On
the userspace datapath this means a system interface, such as "br0",
wouldn't get deleted upon exit of OvS (when a route happens to be
assocaited).
This was first observed in the "ptap - triangle bridge setup with L2 and
L3 GRE tunnels" test, which runs as part of the system userspace
testsuite and uses the netdev datapath (as opoosed to several tests
which use the dummy datapath, where this issue isn't seen). The test
would pass every other time and fail the rest of the times because the
needed system interfaces (br-p1, br-p2 and br-p3) were already present
(from the previous successfull run which didn't clean up properly),
leading to a failure.
To fix the leak and clean up the interfaces upon exit, on its final
stage before destroying a netdev, in iface_destroy__(), the bridge calls
tnl_port_map_delete_ipdev() which takes care of freeing the instatiated
ip_dev structs that refer to a specific netdev.
An extra test is also introduced which verifies that the resources used
by OvS netdev datapath have been correctly cleaned up between
OVS_TRAFFIC_VSWITCHD_STOP and AT_CLEANUP.
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
| |
It's always been OVS coding style to use spaces rather than tabs for
indentation, but some tabs have snuck in over time. This commit converts
them to spaces.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The normal way of retrieving the running DPDK status involves parsing
log files and issuing various incantations of ovs-vsctl and ovs-appctl
commands to determine whether the rte_eal_init successfully started.
This commit adds two new records to reflect the dpdk version, and
the dpdk initialization status.
To support this, the other_config:dpdk-init configuration block supports
the 'true' and 'try' keywords now, instead of just 'true'.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|