| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Command with empty string following "dpctl/ct-get-limits zone="
such as "ovs-appctl dpctl/ct-get-limits zone=" will cause
parse_ct_limit_zones() dereferencing null.
Signed-off-by: Zhiqi Chen <chenzhiqi.123@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When doing performance testing with OVS v3.1 we ran into a deadlock
situation with the netdev_hmap_rwlock read/write lock. After some
debugging, it was discovered that the netdev_hmap_rwlock read lock
was taken recursively. And well in the following sequence of events:
netdev_ports_flow_get()
It takes the read lock, while it walks all the ports
in the port_to_netdev hmap and calls:
- netdev_flow_get() which will call:
- netdev_tc_flow_get() which will call:
- netdev_ifindex_to_odp_port()
This function also takes the same read lock to
walk the ifindex_to_port hmap.
In OVS a read/write lock does not support recursive readers. For details
see the comments in ovs-thread.h. If you do this, it will lock up,
mainly due to OVS setting the PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP
attribute to the lock.
The solution with this patch is to use two separate read/write
locks, with an order guarantee to avoid another potential deadlock.
Fixes: 9fe21a4fc12a ("netdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182541
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometimes there is a need to clean empty chains as done in
delete_chains_from_netdev(). The cited commit doesn't remove
the chain completely which cause adding ingress_block later to fail.
This can be reproduced with adding bond as ovs port which makes ovs
use ingress_block for it.
While at it add the netdev name that fails to the log.
Fixes: e1e5eac5b016 ("tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the specific call to dpif_netlink_dp_transact() (line 398) in
dpif_netlink_open(), the 'dp' content is not being used in the branch
when no error is returned (starting line 430). Furthermore, the 'dp'
and 'buf' variables are overwritten later in this same branch when a
new netlink request is sent (line 437), which results in a memory leak.
Reported by Address Sanitizer.
Indirect leak of 1024 byte(s) in 1 object(s) allocated from:
0 0x7fe09d3bfe70 in __interceptor_malloc (/usr/lib64/libasan.so.4+0xe0e70)
1 0x8759be in xmalloc__ lib/util.c:140
2 0x875a9a in xmalloc lib/util.c:175
3 0x7ba0d2 in ofpbuf_init lib/ofpbuf.c:141
4 0x7ba1d6 in ofpbuf_new lib/ofpbuf.c:169
5 0x9057f9 in nl_sock_transact lib/netlink-socket.c:1113
6 0x907a7e in nl_transact lib/netlink-socket.c:1817
7 0x8b5abe in dpif_netlink_dp_transact lib/dpif-netlink.c:5007
8 0x89a6b5 in dpif_netlink_open lib/dpif-netlink.c:398
9 0x5de16f in do_open lib/dpif.c:348
10 0x5de69a in dpif_open lib/dpif.c:393
11 0x5de71f in dpif_create_and_open lib/dpif.c:419
12 0x47b918 in open_dpif_backer ofproto/ofproto-dpif.c:764
13 0x483e4a in construct ofproto/ofproto-dpif.c:1658
14 0x441644 in ofproto_create ofproto/ofproto.c:556
15 0x40ba5a in bridge_reconfigure vswitchd/bridge.c:885
16 0x41f1a9 in bridge_run vswitchd/bridge.c:3313
17 0x42d4fb in main vswitchd/ovs-vswitchd.c:132
18 0x7fe09cc03c86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86)
Fixes: b841e3cd4a28 ("dpif-netlink: Fix feature negotiation for older kernels.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
An unnecessarily overflow would occurs when the 'value' is longer than
4294967295. So it's required to check ranges to avoid uint32_t overflow.
Reported-by: Nan Zhou <zhounan14@huawei.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The name of the current thread consists of a name with a maximum
length of 16 bytes and a thread ID. The final name may be longer
than 16 bytes. If the name is longer than 16 bytes, the thread
name will fail to be set
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenFlow15 Packet-Out message contains the match instead of the
in_port. The flow.tunnel.metadata.tab is not inited but used in the
loop of tun_metadata_to_nx_match.
The coredump gdb backtrace is:
0 memcpy_from_metadata (dst=0x2f060, src=0x30880, loc=0x10) at lib/tun-metadata.c:467
1 metadata_loc_from_match_read (match=0x30598, is_masked=<..>,
mask=0x30838, idx=0, map=0x0)
at lib/tun-metadata.c:865
2 metadata_loc_from_match_read (is_masked=<...>, mask=0x30838, idx=0,
match=0x30598, map=0x0)
at lib/tun-metadata.c:854
3 tun_metadata_to_nx_match (b=0x892260, oxm=OFP15_VERSION, match=0x30598)
at lib/tun-metadata.c:888
4 nx_put_raw (b=0x892260, oxm=OFP15_VERSION, match=0x30598,
cookie=<...>, cookie=0, cookie_mask=<...>, cookie_mask=0)
at lib/nx-match.c:1186
5 oxm_put_match (b=0x892260, match=0x30598, version=OFP15_VERSION)
at lib/nx-match.c:1343
6 ofputil_encode_packet_out (po=0x30580, protocol=<...>) at lib/ofp-packet.c:1226
7 process_packet_in (sw=0x891d70, oh=<...>) at lib/learning-switch.c:619
8 lswitch_process_packet (msg=0x892210, sw=0x891d70) at lib/learning-switch.c:374
9 lswitch_run (sw=0x891d70) at lib/learning-switch.c:324
10 main (argc=<...>, argv=<...>) at utilities/ovs-testcontroller.c:180
Fix that by initing the flow metadata.
Fixes: 35eb6326d5d0 ("ofp-util: Add flow metadata to ofputil_packet_out")
Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
interval in ms.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ofproto layer currently treats nw_proto field as overloaded to mean
both that a proper nw layer exists, as well as the value contained in
the header for the nw proto. However, this is incorrect behavior as
relevant standards permit that any value, including '0' should be treated
as a valid value.
Because of this overload, when the ofproto layer builds action list for
a packet with nw_proto of 0, it won't build the complete action list that
we expect to be built for the packet. That will cause a bad behavior
where all packets passing the datapath will fall into an incomplete
action set.
The fix here is to unwildcard nw_proto, allowing us to preserve setting
actions for protocols which we know have support for the actions we
program. This means that a traffic which contains nw_proto == 0 cannot
cause connectivity breakage with other traffic on the link.
Reported-by: David Marchand <dmarchand@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Declaration of 'struct conn' will not be visible outside of this function.
Declaration of 'struct conntrack' will not be visible outside of this function.
Declaration of 'struct timeout_policy' will not be visible outside of this function.
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The device may be deleted and added with ifindex changed.
The tc rules on the device will be deleted if the device is deleted.
The func tc_del_filter will fail when flow del. The mapping of
ufid to tc will not be deleted.
The traffic will trigger the same flow(with same ufid) to put to tc
on the new device. Duplicated ufid mapping will be added.
If the hashmap is expanded, the old mapping entry will be the first entry,
and now the dp flow can't be deleted.
Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit b8bf410a5 [0] broke the `ovs-vsctl add` command
which now overwrites the value if it existed already.
This patch reverts the code around the `cmd_add` function
to restore the previous behavior. It also adds testing coverage
for this functionality.
[0] https://github.com/openvswitch/ovs/commit/b8bf410a5c94173da02279b369d75875c4035959
Fixes: b8bf410a5c94 ("db-ctl-base: Use partial map/set updates for last add/set commands.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182767
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UB Sanitizer report:
lib/netdev-offload-tc.c:1276:19: runtime error: load of misaligned
address 0x7f74e801976c for type 'union ovs_u128', which requires 8 byte
alignment
0 in netdev_tc_flow_dump_next lib/netdev-offload-tc.c:1276
1 in netdev_flow_dump_next lib/netdev-offload.c:303
2 in dpif_netlink_flow_dump_next lib/dpif-netlink.c:1921
[...]
Fixes: 8f7620e6a406 ("netdev-tc-offloads: Implement netdev flow dump api using tc interface")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
| |
This patch adds ODP actions for SRv6 and its tests.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SRv6 (Segment Routing IPv6) tunnel vport is responsible
for encapsulation and decapsulation the inner packets with
IPv6 header and an extended header called SRH
(Segment Routing Header). See spec in:
https://datatracker.ietf.org/doc/html/rfc8754
This patch implements SRv6 tunneling in userspace datapath.
It uses `remote_ip` and `local_ip` options as with existing
tunnel protocols. It also adds a dedicated `srv6_segs` option
to define a sequence of routers called segment list.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.
If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
In some tunnels, inner packet needs to support both IPv4
and IPv6. Therefore, this patch improves to allow two
protocols to be tied together in one tunneling.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the current implementation the available CPUs will not be read
until 10s have passed since the system's boot. For systems that boot
faster, this can make ovs-vswitchd create fewer handlers than necessary
for some time.
Fixes: 0d23948a598a ("ovs-thread: Detect changes in number of CPUs.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2180460
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ensure at least 1 handler is created even if something goes wrong during
cpu detection or prime numer calculation.
Fixes: a5cacea5f988 ("handlers: Create additional handler threads when using CPU isolation.")
Suggested-by: Aaron Conole <aconole@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
parse_tc_flower_to_actions() was not reporting errors, which would
cause parse_tc_flower_to_match() to ignore them.
Fixes: dd03672f7bbb ("netdev-offload-tc: Move flower_to_match action handling to isolated function.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Open vSwitch generally tries to let the underlying operating system
managed the low level details of hardware, for example DMA mapping,
bus arbitration, etc. However, when using DPDK, the underlying
operating system yields control of many of these details to userspace
for management.
In the case of some DPDK port drivers, configuring rte_flow or even
allocating resources may require access to iopl/ioperm calls, which
are guarded by the CAP_SYS_RAWIO privilege on linux systems. These
calls are dangerous, and can allow a process to completely compromise
a system. However, they are needed in the case of some userspace
driver code which manages the hardware (for example, the mlx
implementation of backend support for rte_flow).
Here, we create an opt-in flag passed to the command line to allow
this access. We need to do this before ever accessing the database,
because we want to drop all privileges asap, and cannot wait for
a connection to the database to be established and functional before
dropping. There may be distribution specific ways to do capability
management as well (using for example, systemd), but they are not
as universal to the vswitchd as a flag.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Gaetan Rivet <gaetanr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifying datapath with "dpctl/flush-conntrack" didn't
work as expected and caused error:
ovs-dpctl: field system@ovs-system missing value (Invalid argument)
To prevent that, check if we have datapath as first argument
and use it accordingly.
Also add couple of test cases to ensure that everything works as
expected.
Fixes: a9ae73b916ba ("ofp, dpif: Allow CT flush based on partial match.")
Signed-off-by: Ales Musil <amusil@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use the "ip route add ... src ..." command to set the preferred
source address for each entry in the kernel FIB. OVS has a mechanism to
cache the FIB, but the preferred source address is ignored and
calculated with its own logic. This patch resolves the difference
between kernel FIB and OVS route table cache by retrieving the
RTA_PREFSRC attribute of Netlink messages.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adding a route with ovs/route/add command, the source address
in "ovs_router_entry" structure is always the FIRST address that the
interface has. See "ovs_router_get_netdev_source_address"
function for more information.
If an interface has multiple ipv4 and/or ipv6 addresses, there are use
cases where the user wants to control the source address. This patch
therefore addresses this issue by adding a src parameter.
Note that same constraints also exist when caching routes from
Kernel FIB with Netlink, but are not dealt with in this patch.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixed the manual page to indicate that both IPv4/IPv6
are supported. Also added missing pkt_mark on one side
and fixed the "gw" and "bridge" notation quirks.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up the parser to accept pkt_mark and gw in any order.
pkt_mark and gw are normally expected to be specified exactly once.
However, as with other tools, if specified multiple times, the last
specification is used. Also, pkt_mark and gw have separate prefix
strings so they can be parsed in any order.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
This is useful in test cases where multiple IPv4/IPv6 addresses
are assigned together.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
ATOMIC_VAR_INIT has a trivial definition
`#define ATOMIC_VAR_INIT(value) (value)`,
is deprecated in C17/C++20, and will be removed in newer standards in
newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the recent Antrea project testing, some port could not be created
on Windows.
When doing debug, our team found there is one case happening when multiple
ports are waiting for be created with correct port number.
Some system type port will be created netdev successfully and it will cause
conflict as in the dpif side it will be internal type. So finally the port
will be created failed and it could not be easily recovered.
With the patch, on Windows the netdev creating will be blocked for system
type when the ovs_tyep got on dpif is internal. More detailed case description
is in the reported issue No.262 with link below.
Reported-at:https://github.com/openvswitch/ovs-issues/issues/262
Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the ukey's action set changes, it could cause the flow to use a
different datapath, for example, when it moves from tc to kernel.
This will cause the the cached previous datapath statistics to be used.
This change will reset the cached statistics when a change in
datapath is discovered.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flow lookup doesn't include masks of the final stage in a resulting
flow wildcards in case that stage had L4 ports match. Only the result
of ports trie lookup is added to the mask. It might be sufficient in
many cases, but it's not correct, because ports trie is not how we
decided that the packet didn't match in this subtable. In fact, we
used a full subtable mask in order to determine that, so all the
subtable mask bits has to be added.
Ports trie can still be used to adjust ports' mask, but it is not
sufficient to determine that the packet didn't match.
Assuming we have following 2 OpenFlow rules on the bridge:
table=0, priority=10,tcp,tp_dst=80,tcp_flags=+psh actions=drop
table=0, priority=0 actions=output(1)
The first high priority rule supposed to drop all the TCP data traffic
sent on port 80. The handshake, however, is allowed for forwarding.
Both 'tcp_flags' and 'tp_dst' are on the final stage in the flow.
Since the stage mask from that stage is not incorporated into the flow
wildcards and only ports mask is getting updated, we have the following
megaflow for the SYN packet that has no match on 'tcp_flags':
$ ovs-appctl ofproto/trace br0 "in_port=br0,tcp,tp_dst=80,tcp_flags=syn"
Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80
Datapath actions: 1
If this flow is getting installed into datapath flow table, all the
packets for port 80, regardless of TCP flags, will be forwarded.
Incorporating all the looked at bits from the final stage into the
stages map in order to get all the necessary wildcards. Ports mask
has to be updated as a last step, because it doesn't cover the full
64-bit slot in the flowmap.
With this change, in the example above, OVS is producing correct
flow wildcards including match on TCP flags:
Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80,tcp_flags=-psh
Datapath actions: 1
This way only -psh packets will be forwarded, as expected.
This issue affects all other fields on stage 4, not only TCP flags.
Tests included to cover tcp_flags, nd_target and ct_tp_src/dst.
First two are frequently used, ct ones are sharing the same flowmap
slot with L4 ports, so important to test.
Before the pre-computation of stage masks, flow wildcards were updated
during lookup, so there was no issue. The bits of the final stage was
lost with introduction of 'stages_map'.
Recent adjustment of segment boundaries exposed 'tcp_flags' to the issue.
Reported-at: https://github.com/openvswitch/ovs-issues/issues/272
Fixes: ca44218515f0 ("classifier: Adjust segment boundary to execute prerequisite processing.")
Fixes: fa2fdbf8d0c1 ("classifier: Pre-compute stage masks.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when a exp_list contains more than the clean_end's number of nodes,
and these nodes will not expire immediately. Then, every times we
call conntrack_clean, it use the same next_sweep to get exp_list.
Actually, we should add i every times after we call ct_sweep.
Fixes: 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.")
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Liang Mancang <liangmc1@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
| |
Adds jitter option to enable emulating latency fluctuation with netem.
Submitted-at: https://github.com/openvswitch/ovs/pull/407
Signed-off-by: Miika Petäjäniemi <miika.petajaniemi@solita.fi>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The inner header was not handled properly.
Simplify the code which allows proper handling
of the inner headers.
Reported-at: https://bugzilla.redhat.com/2137754
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 08e9e5337383 fixed proper initialization of the dns-resolve
module, and made DNS resolution asynchronous.
A side effect of that change revealed a long standing logic bug
which broke ovsdb-server listener configuration using DNS names.
Previously this worked because the DNS resolution would block,
now that DNS resolution is asynchronous the code before this
change would assume the error from jsonrpc_pstream_open meant
the remote was a specification for an active outgoing
connection, even when that was not the case.
To fix this a couple of changes was made to socket-util:
1) Pass optional result of dns resolution from inet_parse_passive.
When (re-)configuring listeners that use DNS names, we may need
to know whether the provided connection string is invalid or if
the provided DNS name has finished resolving.
2) Check dns resolution status in inet_open_passive.
If the connection string is valid, and contains a DNS name,
inet_open_passive will now return -EAGAIN if dns resolution
failed. DNS resolution failure may either mean the asynchronous
resolver has not completed yet, or that the name does not resolve.
Reported-at: https://bugs.launchpad.net/bugs/1998781
Fixes: 08e9e5337383 ("ovsdb: raft: Fix inability to read the database with DNS host names.")
Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a tc flow was installed but has not yet been used, report it as such.
In addition, add a delay to the "IGMP - flood under normal action" test
case to make it work with many repetitions. This delay is also present
in other ICMP/IGMP tests.
Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the order of the Netlink attributes for odp_flow_key_from_flow__()
the same as the kernel will return them.
This will make sure the attributes displayed in the dpctl/dump-flows
output appear in the same order for all datapath.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tc does not support conntrack ALGs. Even worse, with tc enabled, they
should not be used/configured at all. This is because even though TC
will ignore the rules with ALG configured, i.e., they will flow through
the kernel module, return traffic might flow through a tc conntrack
rule, and it will not invoke the ALG helper.
Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
tc was not setting the OVS_CT_ATTR_FORCE_COMMIT flag when a forced
commit was requested. This patch will fix this.
Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include and run the system-traffic.at tests as part of the system offload
testsuite. Exclude all the tests that will not run without any special
modifications.
Lowered log level for "recirc_id sharing not supported" message, so tests
will not fail with older kernels. This is not an error level message, but
should be debug, like all other, EOPNOTSUPP, related log messages.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UB Sanitizer report:
lib/hash.h:219:17: runtime error: load of misaligned address
0x7ffc164a88b4 for type 'const uint64_t', which requires 8 byte
alignment
#0 in hash_words_inline lib/hash.h:219
#1 in hash_words lib/hash.h:297
[...]
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UB Sanitizer report:
lib/dp-packet.h:587:22: runtime error: member access within misaligned
address 0x000001ecde10 for type 'struct dp_packet', which requires 64
byte alignment
#0 in dp_packet_set_base lib/dp-packet.h:587
#1 in dp_packet_use__ lib/dp-packet.c:46
#2 in dp_packet_use lib/dp-packet.c:60
#3 in dp_packet_init lib/dp-packet.c:126
#4 in dp_packet_new lib/dp-packet.c:150
[...]
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UB Sanitizer report:
lib/netdev-dummy.c:197:15: runtime error: member access within
misaligned address 0x00000217a7f0 for type 'struct
dummy_packet_stream', which requires 64 byte alignment
^
#0 dummy_packet_stream_init lib/netdev-dummy.c:197
#1 dummy_packet_stream_create lib/netdev-dummy.c:208
#2 dummy_packet_conn_set_config lib/netdev-dummy.c:436
[...]
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.
This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.
It also updates the check_pkt_len tc test to purge the flows, so we do
not use existing updated tc flow counters, but start with fresh installed
set of datapath flows.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Add support to count upcall packets per port, both succeed and failed,
which is a better way to see how many packets upcalled on each interface.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A long long time ago, an effort was made to make tc flower
rtnl_lock() free. However, on the OVS part we forgot to add
the TCA_KIND "flower" attribute, which tell the kernel to skip
the lock. This patch corrects this by adding the attribute for
the delete and get operations.
The kernel code calls tcf_proto_is_unlocked() to determine the
rtnl_lock() is needed for the specific tc protocol. It does this
in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter().
If the name is not set, tcf_proto_is_unlocked() will always return
false. If set, the specific protocol is queried for unlocked support.
Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UndefinedBehaviorSanitizer:
lib/netdev-offload-tc.c:1356:50: runtime error:
member access within misaligned address 0x60700001a89c for type
'const struct (unnamed struct at lib/netdev-offload-tc.c:1350:27)',
which requires 8 byte alignment 0x60700001a89c: note: pointer points here
24 00 04 00 01 00 00 05 00 00 0d 00 0a 00 00 00 00 00 00 00 ...
^
0 0xd5d183 in parse_put_flow_ct_action lib/netdev-offload-tc.c:1356:50
1 0xd5783f in netdev_tc_parse_nl_actions lib/netdev-offload-tc.c:2015:19
2 0xd4027c in netdev_tc_flow_put lib/netdev-offload-tc.c:2355:11
3 0x9666d7 in netdev_flow_put lib/netdev-offload.c:318:14
4 0xcd4c0a in parse_flow_put lib/dpif-netlink.c:2297:11
5 0xcd4c0a in try_send_to_netdev lib/dpif-netlink.c:2384:15
6 0xcd4c0a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
7 0x87d40e in dpif_operate lib/dpif.c:1372:13
8 0x6d43e9 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
9 0x6d43e9 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
10 0x6cf6ea in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
11 0xb6d7ea in ovsthread_wrapper lib/ovs-thread.c:423:12
12 0x7f5ccf017801 in start_thread
13 0x7f5ccefb744f in __GI___clone3
Fixes: 9221c721bec0 ("netdev-offload-tc: Add conntrack label and mark support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pmd-perf-show with pmd-perf-metrics=true displays a histogram
with averages. However, averages were not displayed when there
is no iterations.
They will be all zero so it is not hiding useful information
but the stats look incomplete without them, especially when
they are displayed for some PMD thread cores and not others.
The histogram print is large and this is just an extra couple
of lines, so might as well print them all the time to ensure
that the user does not think there is something missing from
the display.
Before patch:
Histograms
cycles/it
499 0
716 0
1025 0
1469 0
<snip>
After patch:
Histograms
cycles/it
499 0
716 0
1025 0
1469 0
<snip>
---------------
cycles/it
0
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some stats in pmd-perf-show don't check for divide by zero
which results in not a number (-nan).
This is a normal case for some of the stats when there are
no Rx queues assigned to the PMD thread core.
It is not obvious what -nan is to a user so add a check for
divide by zero and set stat to 0 if present.
Before patch:
pmd thread numa_id 1 core_id 9:
Iterations: 0 (-nan us/it)
- Used TSC cycles: 0 ( 0.0 % of total cycles)
- idle iterations: 0 ( -nan % of used cycles)
- busy iterations: 0 ( -nan % of used cycles)
After patch:
pmd thread numa_id 1 core_id 9:
Iterations: 0 (0.00 us/it)
- Used TSC cycles: 0 ( 0.0 % of total cycles)
- idle iterations: 0 ( 0.0 % of used cycles)
- busy iterations: 0 ( 0.0 % of used cycles)
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rte_pktmbuf_free_bulk() function was introduced in 19.11 and became
stable in 21.11. Use it to free arrays of mbufs instead of freeing
packets one by one.
In simple V2V testing with 64B packets, 2 PMD threads and bidirectional
traffic this change improves performance by 3.5 - 4.5 %.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|