| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following tests use the nc command and should be skipped if
nc is not present.
- "offloads - check interface meter offloading - offloads disabled"
- "offloads - check interface meter offloading - offloads enabled"
Fixes: 5660b89a309d ("dpif-netlink: Offloading meter to tc police action")
Reported-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conntrack - ICMP related to original direction" test does not
use nc and therefore does not need to be skipped if nc is not present.
Fixes: d0e4206230b3 ("tests: ICMP related to original direction test.")
Reported-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UndefinedBehaviorSanitizer:
lib/netdev-offload-tc.c:1356:50: runtime error:
member access within misaligned address 0x60700001a89c for type
'const struct (unnamed struct at lib/netdev-offload-tc.c:1350:27)',
which requires 8 byte alignment 0x60700001a89c: note: pointer points here
24 00 04 00 01 00 00 05 00 00 0d 00 0a 00 00 00 00 00 00 00 ...
^
0 0xd5d183 in parse_put_flow_ct_action lib/netdev-offload-tc.c:1356:50
1 0xd5783f in netdev_tc_parse_nl_actions lib/netdev-offload-tc.c:2015:19
2 0xd4027c in netdev_tc_flow_put lib/netdev-offload-tc.c:2355:11
3 0x9666d7 in netdev_flow_put lib/netdev-offload.c:318:14
4 0xcd4c0a in parse_flow_put lib/dpif-netlink.c:2297:11
5 0xcd4c0a in try_send_to_netdev lib/dpif-netlink.c:2384:15
6 0xcd4c0a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
7 0x87d40e in dpif_operate lib/dpif.c:1372:13
8 0x6d43e9 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
9 0x6d43e9 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
10 0x6cf6ea in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
11 0xb6d7ea in ovsthread_wrapper lib/ovs-thread.c:423:12
12 0x7f5ccf017801 in start_thread
13 0x7f5ccefb744f in __GI___clone3
Fixes: 9221c721bec0 ("netdev-offload-tc: Add conntrack label and mark support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pmd-perf-show with pmd-perf-metrics=true displays a histogram
with averages. However, averages were not displayed when there
is no iterations.
They will be all zero so it is not hiding useful information
but the stats look incomplete without them, especially when
they are displayed for some PMD thread cores and not others.
The histogram print is large and this is just an extra couple
of lines, so might as well print them all the time to ensure
that the user does not think there is something missing from
the display.
Before patch:
Histograms
cycles/it
499 0
716 0
1025 0
1469 0
<snip>
After patch:
Histograms
cycles/it
499 0
716 0
1025 0
1469 0
<snip>
---------------
cycles/it
0
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some stats in pmd-perf-show don't check for divide by zero
which results in not a number (-nan).
This is a normal case for some of the stats when there are
no Rx queues assigned to the PMD thread core.
It is not obvious what -nan is to a user so add a check for
divide by zero and set stat to 0 if present.
Before patch:
pmd thread numa_id 1 core_id 9:
Iterations: 0 (-nan us/it)
- Used TSC cycles: 0 ( 0.0 % of total cycles)
- idle iterations: 0 ( -nan % of used cycles)
- busy iterations: 0 ( -nan % of used cycles)
After patch:
pmd thread numa_id 1 core_id 9:
Iterations: 0 (0.00 us/it)
- Used TSC cycles: 0 ( 0.0 % of total cycles)
- idle iterations: 0 ( 0.0 % of used cycles)
- busy iterations: 0 ( 0.0 % of used cycles)
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Test fails is 'nc' is not available, it should be skipped instead.
Fixes: b020a416e24c ("System Tests: Enhance NAT tests.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a Python script that can be used to analyze the
revalidator runs by providing statistics (including some real time
graphs).
The USDT events can also be captured to a file and used for
later offline analysis.
The following blog explains the Open vSwitch revalidator
implementation and how this tool can help you understand what is
happening in your system.
https://developers.redhat.com/articles/2022/10/19/open-vswitch-revalidator-process-explained
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Fedora 37 (at least), MFEX unit tests are failing because of
deprecation warnings:
$ python3 tests/mfex_fuzzy.py test_traffic.pcap 2000
/usr/lib/python3.11/site-packages/scapy/layers/ipsec.py:471:
CryptographyDeprecationWarning: Blowfish has been deprecated
cipher=algorithms.Blowfish,
/usr/lib/python3.11/site-packages/scapy/layers/ipsec.py:485:
CryptographyDeprecationWarning: CAST5 has been deprecated
cipher=algorithms.CAST5,
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today the minimum value for this setting is 1. This patch allows it to
be 0, meaning not checking pps at all, and always do revalidation.
This is particularly useful for environments where some of the
applications with long-lived connections may have very low traffic for
certain period but have high rate of burst periodically. It is desirable
to keep the datapath flows instead of periodically deleting them to
avoid burst of packet miss to userspace.
When setting to 0, there may be more datapath flows to be revalidated,
resulting in higher CPU cost of revalidator threads. This is the
downside but in certain cases this is still more desirable than packet
misses to user space.
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rte_pktmbuf_free_bulk() function was introduced in 19.11 and became
stable in 21.11. Use it to free arrays of mbufs instead of freeing
packets one by one.
In simple V2V testing with 64B packets, 2 PMD threads and bidirectional
traffic this change improves performance by 3.5 - 4.5 %.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Column conversion involves converting it to json and back. These are
heavy operations and completely unnecessary if the column type didn't
change. Most of the time schema changes only add new columns/tables
without changing existing ones at all. Clone the column instead to
save some time.
This will also save time while destroying the original database since
we will only need to reduce reference counters on unchanged datum
objects that were cloned instead of actually freeing them.
Additionally, moving the column lookup into a separate loop, so we
don't perform an shash lookup for each column of each row.
Testing with 440 MB OVN_Southbound database shows 70% speed up of the
ovsdb_convert() function. Execution time reduced from 15 to 4.4
seconds, 3.5 of which is a post-conversion transaction replay. Overall
time required for the online database conversion reduced from 37 to 25
seconds.
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
| |
Will be used in the next commit to optimize database conversion.
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the timer slack for the PMD threads is reduced we can also
reduce the start/increment for PMD load based sleeping to match it.
This will further reduce initial sleep times making it more resilient
to interfaces that might be sensitive to large sleep times.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The default Linux timer slack groups timer expires into 50 uS intervals.
With some traffic patterns this can mean that returning to process
packets after a sleep takes too long and packets are dropped.
Add a helper to util.c and set use it to reduce the timer slack
for PMD threads, so that sleeps with smaller resolutions can be done
to prevent sleeping for too long.
Fixes: de3bbdc479a9 ("dpif-netdev: Add PMD load based sleeping.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401121.html
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Ilya reported, we have a ABBA deadlock between DPDK vq->access_lock
and OVS dev->mutex when OVS main thread refreshes statistics, while a
vring state change event is being processed for a same vhost port.
To break from this situation, move vring state change notifications
handling from the vhost-events DPDK thread to a dedicated thread
using a lockless queue.
Besides, for the case when a bogus/malicious guest is sending continuous
updates, add a counter of pending updates in the queue and warn if a
threshold of 1000 entries is reached.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401101.html
Fixes: 3b29286db1c5 ("netdev-dpdk: Add per virtqueue statistics.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The counter for the number of atoms has to be re-set to the number from
the new database, otherwise the value will be incorrect. For example,
this is causing the atom counter doubling after online conversion of
a clustered database.
Miscounting may also lead to increased memory consumption by the
transaction history or otherwise too aggressive transaction history
sweep.
Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
| |
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
| |
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
ovs-vsctl's connections are short-lived, so it doesn't care about db
status changes.
Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For ovsdb clients that are short-lived, e.g. when using
ovn-nbctl/ovn-sbctl to read some metrics from the OVN NB/SB server, they
don't really need to be aware of db changes, because they exit
immediately after getting the initial response for the requested data.
In such use cases, however, the clients still send 'set_db_change_aware'
request, which results in server side error logs when the server tries
to send out the response for the 'set_db_change_aware' request, because
at the moment the client that is supposed to receive the request has
already closed the connection and exited. E.g.:
2023-01-10T18:23:29.431Z|00007|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
2023-01-10T18:23:29.431Z|00008|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)
To avoid such problems, this patch provides an API to allow a client to
choose to not send the 'set_db_change_aware' request.
There was an earlier attempt to fix this [0], but it was not accepted
back then as discussed in the email [1]. It was also discussed in the
emails that an alternative approach is to use notification instead of
request, but that would require protocol changes and taking backward
compatibility into consideration. So this patch takes a different
approach and tries to keep the change small.
[0] http://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-git-send-email-dceara@redhat.com/
[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050919.html
Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Add extension that allows to flush connections from CT
by specifying fields that the connections should be
matched against. This allows to match only some fields
of the connection e.g. source address for orig direction.
Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the CT can be flushed by dpctl only by specifying
the whole 5-tuple. This is not very convenient when there are
only some fields known to the user of CT flush. Add new struct
ofp_ct_match which represents the generic filtering that can
be done for CT flush. The match is done only on fields that are
non-zero with exception to the icmp fields.
This allows the filtering just within dpctl, however it is a
preparation for OpenFlow extension.
Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sleep for an incremental amount of time if none of the Rx queues
assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
on an polling iteration of the PMD.
Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
sleep time to zero (i.e. no sleep).
Sleep time will be increased on each iteration where the low load
conditions remain up to a total of the max sleep time which is set
by the user e.g:
ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500
The default pmd-maxsleep value is 0, which means that no sleeps
will occur and the default behaviour is unchanged from previously.
Also add new stats to pmd-perf-show to get visibility of operation
e.g.
...
- sleep iterations: 153994 ( 76.8 % of iterations)
Sleep time (us): 9159399 ( 59 us/iteration avg.)
...
Reviewed-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xnanosleep forces the thread into quiesce state in anticipation that
it will be sleeping for a considerable time and that the thread may
need to quiesce before the sleep is finished.
In some cases, a very short sleep may be requested and in that case
the overhead of going to into quiesce state may be unnecessary.
To allow for those cases add a xnanosleep_no_quiesce() variant.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
This archive website disappeared.
On the other hand, the link to an obsolete dpif-provider man page
probably did not provide much info and we can simply mention the current
file.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
| |
rst.ninjs.org is not available anymore, but there are alternatives
listed in this doc.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
| |
netperf.org was shut down in favor of some HP related resources.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sphinx linkcheck complains with:
Warning, treated as error:
.../Documentation/intro/install/windows.rst:1093:broken link:
www.appveyor.com ()
Add a https scheme in link to AppVeyor website.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
iproute2 git repositories were split and moved around v4.15 [1].
It is time to fix the link in OVS documentation.
1: https://lore.kernel.org/netdev/20180129082052.0eb85e9b@xeon-e3/
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Following DPDK commit bd2a4d4b2e3a ("ethdev: forbid direction attribute
in transfer flow rules"), the ingress attribute presence is rejected for
transfer flows.
Fixes: a77c7796f23a ("dpdk: Update to use v22.11.1.")
Acked-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
| |
Low test coverage on this area caused some errors to remain unnoticed.
Add basic functional test of rculist.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
| |
12.4 was released in December. That means that 12.3
will become unavailable in a near future. Updating.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some environments, ovs-vswitchd gets shutdown before the pkill of
testpmd has been completed, which results in the following error messages:
Removing port 'dpdkvhostuser0' while vhost device still attached.
To restore connectivity after re-adding of port, VM on socket '' must be restarted.
This patch will wait for the socket disconnect to be handled by the
vhost-user before shutting down OVS.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vhost library now provides finegrained statistics for guest
notifications:
- notifications for buffer reclaim by the guest,
- notifications for buffer availability to the guest,
Example before this patch:
$ ovs-appctl coverage/show |
grep vhost_notification
vhost_notification 0.0/sec 0.000/sec 2.0283/sec total: 7302
$ ovs-vsctl get interface vhost4 statistics |
sed -e 's#[{}]##g' -e 's#, #\n#g' |
grep guest_notifications
rx_q0_guest_notifications=66
tx_q0_guest_notifications=7236
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DPDK vhost-user library maintains more granular per queue stats
which can replace what OVS was providing for vhost-user ports.
The benefits for OVS:
- OVS can skip parsing packet sizes on the rx side,
- dev->stats_lock won't be taken in rx/tx code unless some packet is
dropped,
- vhost-user is aware of which packets are transmitted to the guest,
so per *transmitted* packet size stats can be reported,
- more internal stats from vhost-user may be exposed, without OVS
needing to understand them,
Note: the vhost-user library does not provide global stats for a port.
The proposed implementation is to have the global stats (exposed via
netdev_get_stats()) computed by querying and aggregating all per queue
stats.
Since per queue stats are exposed via another netdev ops
(netdev_get_custom_stats()), this may lead to some race and small
discrepancies.
This issue might already affect other netdev classes.
Example:
$ ovs-vsctl get interface vhost4 statistics |
sed -e 's#[{}]##g' -e 's#, #\n#g' |
grep -v =0$
rx_1_to_64_packets=12
rx_256_to_511_packets=15
rx_65_to_127_packets=21
rx_broadcast_packets=15
rx_bytes=7497
rx_multicast_packets=33
rx_packets=48
rx_q0_good_bytes=242
rx_q0_good_packets=3
rx_q0_guest_notifications=3
rx_q0_multicast_packets=3
rx_q0_size_65_127_packets=2
rx_q0_undersize_packets=1
rx_q1_broadcast_packets=15
rx_q1_good_bytes=7255
rx_q1_good_packets=45
rx_q1_guest_notifications=45
rx_q1_multicast_packets=30
rx_q1_size_256_511_packets=15
rx_q1_size_65_127_packets=19
rx_q1_undersize_packets=11
tx_1_to_64_packets=36
tx_256_to_511_packets=45
tx_65_to_127_packets=63
tx_broadcast_packets=45
tx_bytes=22491
tx_multicast_packets=99
tx_packets=144
tx_q0_broadcast_packets=30
tx_q0_good_bytes=14994
tx_q0_good_packets=96
tx_q0_guest_notifications=96
tx_q0_multicast_packets=66
tx_q0_size_256_511_packets=30
tx_q0_size_65_127_packets=42
tx_q0_undersize_packets=24
tx_q1_broadcast_packets=15
tx_q1_good_bytes=7497
tx_q1_good_packets=48
tx_q1_guest_notifications=48
tx_q1_multicast_packets=33
tx_q1_size_256_511_packets=15
tx_q1_size_65_127_packets=21
tx_q1_undersize_packets=12
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently tc offload flow packet counters will roll over every ~4
billion packets. This is because the packet counter in struct
tc_stats provided by TCA_STATS_BASIC is a 32bit integer.
Now we check for the optional TCA_STATS_PKT64 attribute which provides
the full 64bit packet counter if the 32bit one has rolled over. Because
the TCA_STATS_PKT64 attribute may appear multiple times in a netlink
message, the method of parsing attributes was changed.
Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Reported-at: https://bugzilla.redhat.com/1776816
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GitHub and Sphinx are parsing links differently. Sphinx knows about
the overall documentation structure and all the sections defined in
other docs, while GitHub is using direct rst 2 html conversion and
doesn't know any of that. Sphinx wants links to sections in other
docs to be defined with a :doc: field, but GitHub can't parse that
and requires having a direct link to the other rST document.
The problem is that we have a top level MAINTAINERS.rst, that should
be parseable by GitHub, included in the maintainers.rst in the
main documentation section that is used by Sphinx to generate html,
pdf and other docs. So, it's hard to make links work in both.
Working around that limitation by using rST substitutions for the
links. Cutting off the substitutions for actual links and adding
:doc: links instead during the file inclusion for Sphinx.
Reported-by: Igor Zhukov <ivzhukov@sbercloud.ru>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
The text enclosed in '<...>' supposed to be an actual link and not the
name of the link. This generates incorrect links that lead nowhere.
Also, a single underscore supposed to be used for external links.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'==' is not defined by POSIX and not supported by some shells.
This is causing test failures and potential other issues:
./tests/testsuite: 54: test: X2: unexpected operator
./tests/testsuite: 54: test: X157: unexpected operator
./tests/testsuite: 54: test: X116: unexpected operator
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052157.html
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The dpif_execute_helper_cb() function is supposed to add the
OVS_ACTION_ATTR_SET(OVS_KEY_ATTR_TUNNEL()) action to the
list of actions when passing it down to the kernel.
This function was only checking if the IPv4 destination
address was set, not both. This patch fixes this, including
a datapath testcase.
Fixes: 076caa2fb077 ("ofproto: Meter translation.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the dpif_nl_exec_monitor.py script that will used the
existing dpif_netlink_operate__:op_flow_execute USDT probe to show
all DPIF_OP_EXECUTE operations being queued for transmission over
the netlink interface.
Here is an example, truncated output:
Display DPIF_OP_EXECUTE operations being queued for transmission...
TIME CPU COMM PID NL_SIZE
3124.516679897 1 ovs-vswitchd 8219 180
nlmsghdr : len = 0, type = 36, flags = 1, seq = 0, pid = 0
genlmsghdr: cmd = 3, version = 1, reserver = 0
ovs_header: dp_ifindex = 21
> Decode OVS_PACKET_ATTR_* TLVs:
nla_len 46, nla_type OVS_PACKET_ATTR_PACKET[1], data: 00 00 00...
nla_len 20, nla_type OVS_PACKET_ATTR_KEY[2], data: 08 00 02 00...
> Decode OVS_KEY_ATTR_* TLVs:
nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00...
nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00...
nla_len 88, nla_type OVS_PACKET_ATTR_ACTIONS[3], data: 4c 00 03...
> Decode OVS_ACTION_ATTR_* TLVs:
nla_len 76, nla_type OVS_ACTION_ATTR_SET[3], data: 48 00...
> Decode OVS_TUNNEL_KEY_ATTR_* TLVs:
nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_ID[0], data:...
nla_len 20, nla_type OVS_TUNNEL_KEY_ATTR_IPV6_DST[13], ...
nla_len 5, nla_type OVS_TUNNEL_KEY_ATTR_TTL[4], data: 40
nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT[5]...
nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_CSUM[6], data:
nla_len 6, nla_type OVS_TUNNEL_KEY_ATTR_TP_DST[10],...
nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS[8],...
nla_len 8, nla_type OVS_ACTION_ATTR_OUTPUT[1], data: 02 00 00 00
- Dumping OVS_PACKET_ATR_PACKET data:
###[ Ethernet ]###
dst = 00:00:00:00:ec:01
src = 04:f4:bc:28:57:00
type = IPv4
###[ IP ]###
version = 4
ihl = 5
tos = 0x0
len = 50
id = 0
flags =
frag = 0
ttl = 127
proto = icmp
chksum = 0x2767
src = 10.0.0.1
dst = 10.0.0.100
\options \
###[ ICMP ]###
type = echo-request
code = 0
chksum = 0xf7f3
id = 0x0
seq = 0xc
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
All supported versions of Fedora do package libxdp and libbpf, so it
makes sense to enable AF_XDP support.
Control files for debian packaging are much less flexible, so its hard
to enable AF_XDP builds while not breaking builds for version of Ubuntu
and Debian that do not package libbpf or libxdp.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change we will try to detect all the netdev-afxdp
dependencies and enable AF_XDP support by default if they are
present at the build time.
Configuration script behaves in a following way:
- ./configure --enable-afxdp
Will check for AF_XDP dependencies and fail if they are
not available.
- ./configure --disable-afxdp
Disables checking for AF_XDP. Build will not support
AF_XDP even if all dependencies are installed.
- Just ./configure or ./configure --enable-afxdp=auto
Will check for AF_XDP dependencies. Will print a warning
if they are not available, but will continue without AF_XDP
support. If dependencies are available in a system, this
option is equal to --enable-afxdp.
'--disable-afxdp' added to the debian and fedora package builds
to keep predictable behavior.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
Necessary bits was removed from the kernel's libbpf in 6.0 release,
so the instructions on how to build libbpf from kernel sources are
now incorrect. Suggest to use libbpf and libxdp packaged by
distributions instead.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AF_XDP bits was removed from kernel's libbpf in 6.0. libbpf
and libxdp are now primary way to build AF_XDP applications.
Most of modern distributions are already packaging some version
of libbpf, so it's better to test building with it instead
of building old unsupported kernel tree.
Ubuntu started packaging libxdp only in 22.10, so not using
it for now.
Kernel build infrastructure in CI scripts is not needed anymore.
Removed.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Sparse complains about 64M umem initialization. Hide it from
the checker instead of disabling a warning globally.
SPARSE_FLAGS are kept in the CI script even though they are
empty at the moment.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp.
Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8
and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach().
Updating configuration and source code to accommodate above changes
and allow building OVS with AF_XDP support on newer systems:
- Checking the version of libbpf by detecting availability
of bpf_xdp_detach.
- Checking availability of the libxdp in a system by looking
for a library providing libxdp_strerror(), if libbpf is
newer than 0.6. And checking for xsk.h header provided by
libxdp-dev[el].
- Use xsk.h from libbpf if it is older than 0.7 and not linking
with libxdp in this case as there are known incompatible
versions of libxdp in distributions.
- Check for the NEED_WAKEUP feature replaced with direct checking
in the source code if XDP_USE_NEED_WAKEUP is defined.
- Checking availability of bpf_xdp_query_id and bpf_xdp_detach
and using them instead of deprecated APIs. Fall back to old
functions if not found.
- Dropped LIBBPF_LDADD variable as it makes library and function
detection much harder without providing any actual benefits.
AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS.
- Header includes moved around to files where they are actually used.
- Removed libelf dependency as it is not really used.
With these changes it should be possible to build OVS with either:
- libbpf built from the kernel sources (5.19 or older).
- libbpf < 0.7 provided in distributions.
- libxdp and libbpf >= 0.7 provided in newer distributions.
While it is technically possible to build with libbpf 0.7+ without
libxdp at the moment we're not allowing that for a few reasons.
First, required functions in libbpf are deprecated and can be removed
in future releases. Second, support for all these combinations makes
the detection code fairly complex.
AFAIK, most of the distributions packaging libbpf 0.7+ do package
libxdp as well.
libxdp added as a build dependency for Fedora build since all
supported versions of Fedora are packaging this library.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 11+ generates a warning:
In file included from lib/netdev-linux-private.h:30,
from lib/netdev-afxdp.c:19:
In function 'dp_packet_delete',
inlined from 'dp_packet_delete' at lib/dp-packet.h:246:1,
inlined from 'dp_packet_batch_add__' at lib/dp-packet.h:775:9,
inlined from 'dp_packet_batch_add' at lib/dp-packet.h:783:5,
inlined from 'netdev_afxdp_rxq_recv' at lib/netdev-afxdp.c:898:9:
lib/dp-packet.h:260:9: warning: 'free' called on pointer
'*umem.xpool.array' with nonzero offset [8, 2558044588346441168]
[-Wfree-nonheap-object]
260 | free(b);
| ^~~~~~~
But it is a false positive since the code path is not possible.
In this call chain the packet will always have source DPBUF_AFXDP
and the free() will never be called. GCC doesn't see that, because
initialization function dp_packet_use_afxdp() is part of a different
translation unit.
Disabling a warning in this particular place to avoid build failures.
Older versions of clang do not have the -Wfree-nonheap-object, so we
need to additionally guard the pragmas. Clang is using GCC pragmas
and complains about unknown ones.
Reported-at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108187
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
For GCC builds we're overriding --disable-ssl or --enable-shared
options set up in the GHA yml file.
Fix that by adding to EXTRA_OPTS instead.
Fixes: 2581b0ad1159 ("travis: Combine kernel builds.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, pmd_rebalance_dry_run() calculate overall variance of
all pmds regardless of their numa location. The overall result may
hide un-balance in an individual numa.
Considering the following case. Numa0 is free because VMs on numa0
are not sending pkts, while numa1 is busy. Within numa1, pmds
workloads are not balanced. Obviously, moving 500 kpps workloads from
pmd 126 to pmd 62 will make numa1 much more balance. For numa1
the variance improvement will be almost 100%, because after rebalance
each pmd in numa1 holds same workload(variance ~= 0). But the overall
variance improvement is only about 20%, which may not trigger auto_lb.
```
numa_id core_id kpps
0 30 0
0 31 0
0 94 0
0 95 0
1 126 1500
1 127 1000
1 63 1000
1 62 500
```
As auto_lb doesn't balance workload across numa nodes. So it makes
more sense to calculate variance improvement per numa node.
Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|