| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default Open vSwitch tries to configure internal interfaces MTU to
match the bridge minimum, overriding any attempt by the user to
configure it through standard system tools, or the database.
While this works in many simple cases (there are probably many users
that rely on this) it may create problems for more advanced use cases
(like any overlay networks).
This commit allows the user to override the default behavior by
providing an explict MTU in the mtu_request column in the Interface
table.
This means that Open vSwitch will now treat differently database MTU
requests from standard system tools MTU requests (coming from `ip link`
or `ifconfig`), but this seems the best way to remain compatible with
old users while providing a more powerful interface.
Suggested-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Tested-by: Joe Stringer <joe@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct ofpact_learn_spec is variable-length. The 'n_specs' member of
struct ofpact_learn counted the number of specs, but the iteration loops
over struct ofpact_learn_spec only iterated as far as the *minimum* length
of 'n_specs' specs.
This fixes the problem, which exhibited as consistent failures for test 431
(learning action - TCPv6 port learning), seemingly only on i386 since it
shows up for my personal development machine but appears to not happen for
anyone else.
Fixes: dfe191d5faa6 ("ofp-actions: Waste less memory in learn actions.")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jarno Rajahalme <jarno@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the order of the steps that are followed
every second in the sFlow agent. By moving the receiver_tick()
step to the end, we ensure that any counters that were polled
during the poller_tick() step are flushed immediately to the
sFlow collector. This eliminates what was a variable time-delay
between counters being polled and being flushed.
The variable time-delay that this eliminates could be up to
a second because counters lingering in the output buffer could be
flushed at any time by the arrival of random packet-samples.
Since the sFlow standard does not require that a poll-timestamp be sent
along with the counters the collector must use his receive-time as the
timestamp, so that extra second of variable delay was "stretching or
shrinking" the time between successive counter readings. This
affected any counter-rate calculation that was based only on the delta
between sucessive samples. The effect was small with a polling
interval of 60 seconds: just +/- 2%. But the effect grew larger
when faster polling was configured. For example, if the counters
were pushed every 5 seconds then the instantaneous rate
calculations could wander by +/- 20%. For a thorough analysis
of this problem, see Rick Jones' paper:
"High Frequency sFlow v5 Counter Sampling"
ftp://ftp.netperf.org/papers/high_freq_sflow/hf_sflow_counters.pdf
So this patch makes it possible to obtain usable results even
when high-frequency polling is configured.
Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid using nested zero-sized arrays to allow compilation with MSVC.
Also, make sure the immediate data is accessed only if it exists, and
that the size is always calculated from struct learn_spec field
'n_bits'.
Fixes: dfe191d5faa6 ("ofp-actions: Waste less memory in learn actions.")
Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the value and mask to be added to the end of the set field
action without any extra bytes, exept for the usual ofp-actions
padding to 8 bytes. Together with some structure member packing this
saves on average about to 256 bytes for each set field and load action
(as set field internal representation is also used for load actions).
On a specific production data set each flow entry uses on average
about 4.2 load or set field actions. This means that with this patch
an average of more than 1kb can be saved for each flow with such a
flow table.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Make the immediate data member 'src_imm' of a learn spec allocated at
the end of the action for just the right size. This, together with
some structure packing saves on average of ~128 bytes for each learn
spec in each learn action. Typical learn actions have about 4 specs
each, so this amounts to saving about 0.5kb for each learn action.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
| |
Better not use access to the *_collection_stub(), as it is an internal
implementation detail.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OVS implementation of buffering packets that are sent to the
controller is not compliant with the OpenFlow specifications after
OpenFlow 1.0, which is possibly true since OpenFlow 1.0 is not really
specifying the packet buffering behavior.
OVS implementation executes the buffered packet against the actions of
the modified or added rule, whereas OpenFlow (since 1.1) specifies
that the packet should be matched against the flow table 0 and
processed accordingly.
Rather than fix this behavior, and potentially break OVS users, the
packet buffering feature is removed altogether. After all, such
packet buffering is an optional OpenFlow feature, and as such any
possible users should continue to work without this feature.
This patch also makes OVS check the received 'buffer_id' values more
rigorously, and fixes some internal users accordingly.
Found by inspection.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
| |
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Disconnect named pipes that failed connection.
Found by testing.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Some tunnel flags are purely internal implementation details (primarily
FLOW_TNL_F_UDPIF). These shouldn't be output when we format tunnel
flows, so this masks them out.
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If NUMA information can't be derived from a vHost User device, only
print an error if the VHOST_NUMA option is enabled in DPDK. Otherwise
'fail' silently.
Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
| |
'netdev_dpdk_send__()' function can be greatly simplified by using
recently introduced 'netdev_dpdk_filter_packet_len()'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces function 'netdev_dpdk_filter_packet_len()' which is
intended to find and remove all packets with 'pkt_len > max_packet_len'
from the Tx batch.
It fixes inaccurate counting of 'tx_bytes' in vHost case if there was
dropped packets and allows to simplify send function.
Fixes: 0072e931b207 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d2fa6c676a13e86acc7f17261b2d87484f625d45.
When doing a restart, the routing table will open ports as system, which
prevents internal ports to be opened with the right type. That causes failures
in creating the ports.
We should revisit this patch after finding a proper fix on the routing table
layer.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new utility is intended to fulfill for OVN the purpose that
"ofproto/trace" has for Open vSwitch. First, it's meant to be a useful
tool for troubleshooting and diagnosis and in general for improving one's
understanding of the emergent properties of a flow table. Second, it
simplifies and increases the practical scope of testing, as well as making
testing more reliable and repeatable and failures easier to interpret.
This commit adds only a single test that uses the new utility, based on the
oldest OVN end-to-end test "ovn -- 3 HVs, 1 LS, 3 lports/HV". The
differences between the old and the new test illustrate properties of
tracing. First, the new test does not start any ovn-controller processes
or simulate any hypervisors in a nontrivial way. This is because ovn-trace
does not actually forward packets or rely on the physical structure of the
system. Second, whereas the old test tested not just the logical but also
the physical structure of the system, it needed to have several logical
ports, a total of 9 (3 on each of 3 HVs), whereas since this test only
tests the logical network implementation it can use a smaller number. This
property also means that the new test runs signicantly faster than the old
one (less than a second on my laptop).
In my opinion this approach points the way toward the future of OVN
testing. Certainly, we need end-to-end tests. However, I believe that the
bulk of our tests can be broken into ones that test the logical network
implementation (using tracing) and ones that test physical/logical
translation.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function nxm_execute_reg_move() was almost a general-purpose function
for manipulating subfields, except for its awkward interface that took a
struct ofpact_reg_move instead of a plain source and destination. This
commit introduces a general-purpose function in meta-flow that corrects
this flaw, and updates the callers. An upcoming commit will introduce a
new user of the function.
This commit also introduces a related function mf_subfield_swap() to swap
the contents of subfields. An upcoming commit will introduce the first
user.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Acked-by: Justin Pettit <jpettit@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, vHost ports in OVS have only been able to operate in 'server'
mode whereby OVS creates and manages the vHost socket and essentially
acts as the vHost 'server'. With this commit a new mode, 'client' mode,
is available. In this mode, OVS acts as the vHost 'client' and connects
to the socket created and managed by QEMU which now acts as the vHost
'server'. This mode allows for reconnect capability, which allows a
vHost port to resume normal connectivity in event of switch reset.
By default dpdkvhostuser ports still operate in 'server' mode. That is
unless a valid 'vhost-server-path' is specified for a device like so:
ovs-vsctl set Interface dpdkvhostuser0
options:vhost-server-path=/path/to/socket
'vhost-server-path' represents the full path of the vhost user socket
that has been or will be created by QEMU. Once specified, the port stays
in 'client' mode for the remainder of its lifetime.
QEMU v2.7.0+ is required when using OVS in vHost client mode and QEMU in
vHost server mode.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
| |
A mix of vhost_user_ and vhost_ is used when naming vhost functions. The
'user_' has been dropped for consistency. Also remove empty init
functions for netdev dpdk classes.
Suggested-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Daniele Di Proietto <diproiettod at vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit removes the 'dpdkvhostcuse' port type from the userspace
datapath. vhost-cuse ports are quickly becoming obsolete as the
vhost-user port type begins to support a greater feature-set thanks to
the addition of things like vhost-user multiqueue and potential
upcoming features like vhost-user client-mode and vhost-user reconnect.
The feature is also expected to be removed from DPDK soon.
One potential drawback of the removal of this support is that a
userspace vHost port type is not available in OVS for use with older
versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old
this should however be a low impact change.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
|
|
|
|
|
|
|
|
|
| |
ovs-dpctl and ovs-ofctl lack a read-only option to prevent
running of commands that perform read-write operations. Add
it and the necessary scaffolding to each.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
The conditional replication code had hardly any comments. This adds some.
This commit also fixes a number of style problems, factors out some code
into a helper function, and moves some struct declarations from a public
header, that were not used by client code, into more private locations.
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
| |
The function always allocated a clause but didn't use it if it was
going to be a duplicate.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Fernandes <flavio@flaviof.com>
|
|
|
|
|
|
|
|
|
| |
Both ovsdb_idl_condition_reset() and ovsdb_idl_clause_free() call
ovs_list_remove() on the clause's 'node' member, but it should only be
called once.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
Only 'dpdk' ports support flow control. This patch stops 'dpdkr' ports
from attempting to initialise this feature as this port type does not
support it.
Fixes: 9fd39370c12c ("netdev-dpdk: Add Flow Control support.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
| |
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Mauricio Vásquez B <mauricio.vasquez@polito.it>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
| |
Also, netdev-dummy needs to call netdev_change_seq_changed() in
set_mtu().
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"internal" netdevs are treated specially in OVS (e.g. for MTU), but
the dummy datapath remaps both "system" and "internal" devices to the
same "dummy" netdev class, so there's no way to discern those in tests.
This commit adds a new "dummy-internal" netdev type, which will be used
by the dummy datapath for internal ports, so that other parts of the
code can understand which ports are internal just by looking at the
netdev object.
The alternative solution, using the original interface type ("internal")
instead of the translated netdev type ("dummy"), is harder to implement,
because in so many places only the netdev object is available.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
| |
This will allow run() and wait() methods to be shared between different
classes and still perform class-specific work.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Commit f1ab6e06 ("Add/user partial set updates.) incorrectly
did not include HPE attribution for derived files
lib/ovsdb-set-op.[ch]. Add the attribution to correct this.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patchset mimics the changes introduced in
f199df26 (ovsdb-idl: Add partial map updates functionality.)
010fe7ae (ovsdb-idlc.in: Autogenerate partial map updates functions.)
7251075c (tests: Add test for partial map updates.)
b1048e6a (ovsdb-idl: Fix issues detected in Partial Map Update feature)
but for columns that store sets of values rather than key-value
pairs. These columns will now be able to use the OVSDB mutate
operation to transmit deltas on the wire rather than use
verify/update and transmit wait/update operations on the wire.
Side effect of modifying the comments in the partial map update
tests.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OVN implements native DHCPv6. DHCPv6 options are stored
in the 'DHCP_Options' NB table and logical ports refer to this
table to configure the DHCPv6 options.
For each logical port configured with DHCPv6 Options following flows
are added
- A logical flow which copies the DHCPv6 options to the DHCPv6
request packets using the 'put_dhcpv6_opts' action and advances the
packet to the next stage.
- A logical flow which implements the DHCPv6 reponder by sending
the DHCPv6 reply back to the inport once the 'put_dhcpv6_opts' action
is applied.
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
| |
Call for poll_immediate_wake() when condition is changed.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.
Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.
The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
[diproiettod@vmware.com rebased]
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
| |
Every provider silently drops the const attribute when converting the
parameter to the appropriate subclass. Might as well drop the const
attribute from the parameter, since this is a "set" function.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'mtu_request' column can be used to set the MTU of a specific
interface.
This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.
The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken. It will be reintroduced by a subsequent commit on
this series.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Use the appropriate format specifier for size_t, otherwise the 32-bit
build fails.
Reported-at: https://travis-ci.org/openvswitch/ovs/jobs/151938383
Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted
subtables")
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Joe Stringer <joe@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit provides the ability to 'listen' on DPDK ports and save
packets to a pcap file with a DPDK app that uses the librte_pdump
library. One such app is the 'pdump' app that can be found in the DPDK
'app' directory. Instructions on how to use this can be found in
INSTALL.DPDK-ADVANCED.md
Pdump capability in OVS with DPDK will only be initialised if the
CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are
set in DPDK. libpcap is required if the above configuration is used.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The user-space datapath (dpif-netdev) consists of a first level "exact match
cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
and the OVS forwarding performance is determined by the megaflow classifier.
The megaflow classifier (dpcls) consists of a variable number of hash tables
(aka subtables), each containing megaflow entries with the same mask of
packet header and metadata fields to match upon. A dpcls lookup matches a
given packet against all subtables in sequence until it hits a match. As
megaflow cache entries are by construction non-overlapping, the first match
is the only match.
Today the order of the subtables in the dpcls is essentially random so that
on average a dpcls lookup has to visit N/2 subtables for a hit, when N is the
total number of subtables. Even though every single hash-table lookup is
fast, the performance of the current dpcls degrades when there are many
subtables.
How does the patch address this issue:
In reality there is often a strong correlation between the ingress port and a
small subset of subtables that have hits. The entire megaflow cache typically
decomposes nicely into partitions that are hit only by packets entering from
a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM ->
Phy).
Therefore, maintaining a separate dpcls instance per ingress port with its
subtable vector sorted by frequency of hits reduces the average number of
subtables lookups in the dpcls to a minimum, even if the total number of
subtables gets large. This is possible because megaflows always have an exact
match on in_port, so every megaflow belongs to unique dpcls instance.
For thread safety, the PMD thread needs to block out revalidators during the
periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD.
To monitor the effectiveness of the patch we have enhanced the ovs-appctl
dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups
per hit" to report the average number of subtable lookup needed for a
megaflow match. Ideally, this should be close to 1 and almost all cases much
smaller than N/2.
The PMD tests have been adjusted to the additional line in pmd-stats-show.
We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh.
With pure L3 tenant traffic between VMs on different nodes the resulting
netdev dpcls contains N=4 subtables. Each packet traversing the OVS
datapath is subject to dpcls lookup twice due to the tunnel termination.
Disabling the EMC, we have measured a baseline performance (in+out) of ~1.45
Mpps (64 bytes, 10K L4 packet flows). The average number of subtable lookups
per dpcls match is 2.5. With the patch the average number of subtable lookups
per dpcls match is reduced to 1 and the forwarding performance grows by ~50%
to 2.13 Mpps.
Even with EMC enabled, the patch improves the performance by 9% (for 1000 L4
flows) and 34% (for 50K+ L4 flows).
As the actual number of subtables will often be higher in reality, we can
assume that this is at the lower end of the speed-up one can expect from this
optimization. Just running a parallel ping between the VXLAN tunnel endpoints
increases the number of subtables and hence the average number of subtable
lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput
to 1.2 Mpps. With the patch the parallel ping has no impact on average number
of subtable lookups and performance. The performance gain is then ~75%.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Windows if a file path contains ":" we can safely say it is an absolute
file name.
This patch allows file_name checks to report correctly when using
"abs_file_name".
Found by testing.
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Sairam Venugopal <vsairam@vmware.com>
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will
free mbufs while executing 'netdev_dpdk_policer_run()'. After
that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()'
if 'may_steal == true'. This behaviour will break mempool.
Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't
do this ('may_steal == false'). This will lead to using of already freed
packets by the upper layers.
Fix that by copying all packets that we can't steal like it done
for DPDK_DEV_ETH devices and freeing only packets not freed by QoS.
Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8bdfe1313894047d44349fa4cf4402970865950f.
I failed to see that lib/dpif-netdev.c actually needs the concurrency
provided by pvector prior to this change. More specifically, when a
subtable is removed, concurrent lookups may skip over another subtable
swapped in to the place of the removed subtable in the vector.
Since this was the only use of the non-concurrent pvector, it is
cleaner to revert the whole patch.
Reported-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With incremental processing of logical flows desired conntrack groups
are not being persisted. This patch adds this capability, with the
side effect of adding a ds_clone method that this capability leverages.
Signed-off-by: Ryan Moats <rmoats@us.ibm.com>
Reported-by: Guru Shetty <guru@ovn.org>
Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076320.html
Fixes: 70c7cfe ("ovn-controller: Add incremental processing to lflow_run and physical_run")
Acked-by: Flavio Fernandes <flavio@flaviof.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
| |
The function netdev_vport_get_dpif_port_strdup is not
used anymore. So we can remove it now.
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.
Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.
Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
ports from attached virtio.")
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.
Also this commit modifies egress_policer_qos_set() to correctly lock the QoS
spinlock while the egress policer configuration is updated to avoid
segfault.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
| |
Clang reports that value stored to 'tok' during initialization is never
read.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback. destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.
This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.
This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.
Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
|