summaryrefslogtreecommitdiff
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* dpctl: Fix dereferencing null pointer in parse_ct_limit_zones().Zhiqi Chen2023-05-111-2/+3
| | | | | | | | | Command with empty string following "dpctl/ct-get-limits zone=" such as "ovs-appctl dpctl/ct-get-limits zone=" will cause parse_ct_limit_zones() dereferencing null. Signed-off-by: Zhiqi Chen <chenzhiqi.123@bytedance.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload: Fix deadlock/recursive use of the netdev_hmap_rwlock rwlock.Eelco Chaudron2023-05-101-32/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing performance testing with OVS v3.1 we ran into a deadlock situation with the netdev_hmap_rwlock read/write lock. After some debugging, it was discovered that the netdev_hmap_rwlock read lock was taken recursively. And well in the following sequence of events: netdev_ports_flow_get() It takes the read lock, while it walks all the ports in the port_to_netdev hmap and calls: - netdev_flow_get() which will call: - netdev_tc_flow_get() which will call: - netdev_ifindex_to_odp_port() This function also takes the same read lock to walk the ifindex_to_port hmap. In OVS a read/write lock does not support recursive readers. For details see the comments in ovs-thread.h. If you do this, it will lock up, mainly due to OVS setting the PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP attribute to the lock. The solution with this patch is to use two separate read/write locks, with an order guarantee to avoid another potential deadlock. Fixes: 9fe21a4fc12a ("netdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182541 Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tc: Fix cleaning chains.Roi Dayan2023-04-282-4/+11
| | | | | | | | | | | | | Sometimes there is a need to clean empty chains as done in delete_chains_from_netdev(). The cited commit doesn't remove the chain completely which cause adding ingress_block later to fail. This can be reproduced with adding bond as ovs port which makes ovs use ingress_block for it. While at it add the netdev name that fails to the log. Fixes: e1e5eac5b016 ("tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Fix memory leak dpif_netlink_open().Yunjian Wang2023-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the specific call to dpif_netlink_dp_transact() (line 398) in dpif_netlink_open(), the 'dp' content is not being used in the branch when no error is returned (starting line 430). Furthermore, the 'dp' and 'buf' variables are overwritten later in this same branch when a new netlink request is sent (line 437), which results in a memory leak. Reported by Address Sanitizer. Indirect leak of 1024 byte(s) in 1 object(s) allocated from: 0 0x7fe09d3bfe70 in __interceptor_malloc (/usr/lib64/libasan.so.4+0xe0e70) 1 0x8759be in xmalloc__ lib/util.c:140 2 0x875a9a in xmalloc lib/util.c:175 3 0x7ba0d2 in ofpbuf_init lib/ofpbuf.c:141 4 0x7ba1d6 in ofpbuf_new lib/ofpbuf.c:169 5 0x9057f9 in nl_sock_transact lib/netlink-socket.c:1113 6 0x907a7e in nl_transact lib/netlink-socket.c:1817 7 0x8b5abe in dpif_netlink_dp_transact lib/dpif-netlink.c:5007 8 0x89a6b5 in dpif_netlink_open lib/dpif-netlink.c:398 9 0x5de16f in do_open lib/dpif.c:348 10 0x5de69a in dpif_open lib/dpif.c:393 11 0x5de71f in dpif_create_and_open lib/dpif.c:419 12 0x47b918 in open_dpif_backer ofproto/ofproto-dpif.c:764 13 0x483e4a in construct ofproto/ofproto-dpif.c:1658 14 0x441644 in ofproto_create ofproto/ofproto.c:556 15 0x40ba5a in bridge_reconfigure vswitchd/bridge.c:885 16 0x41f1a9 in bridge_run vswitchd/bridge.c:3313 17 0x42d4fb in main vswitchd/ovs-vswitchd.c:132 18 0x7fe09cc03c86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86) Fixes: b841e3cd4a28 ("dpif-netlink: Fix feature negotiation for older kernels.") Reviewed-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofp-parse: Check ranges on string to uint32_t conversion.Yunjian Wang2023-04-251-5/+2
| | | | | | | | | | | An unnecessarily overflow would occurs when the 'value' is longer than 4294967295. So it's required to check ranges to avoid uint32_t overflow. Reported-by: Nan Zhou <zhounan14@huawei.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* util: Fix an issue that thread name cannot be set.Songtao Zhan2023-04-251-0/+6
| | | | | | | | | | | The name of the current thread consists of a name with a maximum length of 16 bytes and a thread ID. The final name may be longer than 16 bytes. If the name is longer than 16 bytes, the thread name will fail to be set Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* learning-switch: Fix coredump of OpenFlow15 learning-switch.Faicker Mo2023-04-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OpenFlow15 Packet-Out message contains the match instead of the in_port. The flow.tunnel.metadata.tab is not inited but used in the loop of tun_metadata_to_nx_match. The coredump gdb backtrace is: 0 memcpy_from_metadata (dst=0x2f060, src=0x30880, loc=0x10) at lib/tun-metadata.c:467 1 metadata_loc_from_match_read (match=0x30598, is_masked=<..>, mask=0x30838, idx=0, map=0x0) at lib/tun-metadata.c:865 2 metadata_loc_from_match_read (is_masked=<...>, mask=0x30838, idx=0, match=0x30598, map=0x0) at lib/tun-metadata.c:854 3 tun_metadata_to_nx_match (b=0x892260, oxm=OFP15_VERSION, match=0x30598) at lib/tun-metadata.c:888 4 nx_put_raw (b=0x892260, oxm=OFP15_VERSION, match=0x30598, cookie=<...>, cookie=0, cookie_mask=<...>, cookie_mask=0) at lib/nx-match.c:1186 5 oxm_put_match (b=0x892260, match=0x30598, version=OFP15_VERSION) at lib/nx-match.c:1343 6 ofputil_encode_packet_out (po=0x30580, protocol=<...>) at lib/ofp-packet.c:1226 7 process_packet_in (sw=0x891d70, oh=<...>) at lib/learning-switch.c:619 8 lswitch_process_packet (msg=0x892210, sw=0x891d70) at lib/learning-switch.c:374 9 lswitch_run (sw=0x891d70) at lib/learning-switch.c:324 10 main (argc=<...>, argv=<...>) at utilities/ovs-testcontroller.c:180 Fix that by initing the flow metadata. Fixes: 35eb6326d5d0 ("ofp-util: Add flow metadata to ofputil_packet_out") Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-dpctl: Add new command dpctl/ct-[sg]et-sweep-interval.Paolo Valerio2023-04-0610-1/+128
| | | | | | | | | | | | | | Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.") the sweep interval changed as well as the constraints related to the sweeper. Being able to change the default reschedule time may be convenient in some conditions, like debugging. This patch introduces new commands allowing to get and set the sweep interval in ms. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Always mask ip proto field.Aaron Conole2023-04-061-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | The ofproto layer currently treats nw_proto field as overloaded to mean both that a proper nw layer exists, as well as the value contained in the header for the nw proto. However, this is incorrect behavior as relevant standards permit that any value, including '0' should be treated as a valid value. Because of this overload, when the ofproto layer builds action list for a packet with nw_proto of 0, it won't build the complete action list that we expect to be built for the packet. That will cause a bad behavior where all packets passing the datapath will fall into an incomplete action set. The fix here is to unwildcard nw_proto, allowing us to preserve setting actions for protocols which we know have support for the actions we program. This means that a traffic which contains nw_proto == 0 cannot cause connectivity breakage with other traffic on the link. Reported-by: David Marchand <dmarchand@redhat.com> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873 Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* conntrack-tp: Fix clang warning.Lin Huang2023-04-041-0/+7
| | | | | | | | | Declaration of 'struct conn' will not be visible outside of this function. Declaration of 'struct conntrack' will not be visible outside of this function. Declaration of 'struct timeout_policy' will not be visible outside of this function. Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Del ufid mapping if device not exist.Faicker Mo2023-04-031-1/+2
| | | | | | | | | | | | | | | | | The device may be deleted and added with ifindex changed. The tc rules on the device will be deleted if the device is deleted. The func tc_del_filter will fail when flow del. The mapping of ufid to tc will not be deleted. The traffic will trigger the same flow(with same ufid) to put to tc on the new device. Duplicated ufid mapping will be added. If the hashmap is expanded, the old mapping entry will be the first entry, and now the dp flow can't be deleted. Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* db-ctl-base: Partially revert b8bf410a5.Daniel Alvarez Sanchez2023-03-301-33/+9
| | | | | | | | | | | | | | | | | The commit b8bf410a5 [0] broke the `ovs-vsctl add` command which now overwrites the value if it existed already. This patch reverts the code around the `cmd_add` function to restore the previous behavior. It also adds testing coverage for this functionality. [0] https://github.com/openvswitch/ovs/commit/b8bf410a5c94173da02279b369d75875c4035959 Fixes: b8bf410a5c94 ("db-ctl-base: Use partial map/set updates for last add/set commands.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182767 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-tc-offloads: Fix misaligned 8 byte read.Mike Pattrick2023-03-291-2/+2
| | | | | | | | | | | | | | | | | | UB Sanitizer report: lib/netdev-offload-tc.c:1276:19: runtime error: load of misaligned address 0x7f74e801976c for type 'union ovs_u128', which requires 8 byte alignment 0 in netdev_tc_flow_dump_next lib/netdev-offload-tc.c:1276 1 in netdev_flow_dump_next lib/netdev-offload.c:303 2 in dpif_netlink_flow_dump_next lib/dpif-netlink.c:1921 [...] Fixes: 8f7620e6a406 ("netdev-tc-offloads: Implement netdev flow dump api using tc interface") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp: Add SRv6 tunnel actions.Nobuhiro MIKI2023-03-291-0/+70
| | | | | | | This patch adds ODP actions for SRv6 and its tests. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add SRv6 tunnel support.Nobuhiro MIKI2023-03-298-1/+226
| | | | | | | | | | | | | | | | | SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* flow: Support rt_hdr in parse_ipv6_ext_hdrs().Nobuhiro MIKI2023-03-295-21/+58
| | | | | | | | | | | | Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers. If it exists, the first address is retrieved. If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in the header are not reported to the caller. Of course, "frag_hdr" and "rt_hdr" are properly parsed inside this function. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tnl-ports: Support multiple nw_protos.Nobuhiro MIKI2023-03-291-32/+48
| | | | | | | | | In some tunnels, inner packet needs to support both IPv4 and IPv6. Therefore, this patch improves to allow two protocols to be tied together in one tunneling. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-thread: Fix cpus not read for the first 10s.Adrian Moreno2023-03-271-1/+1
| | | | | | | | | | | | | | | With the current implementation the available CPUs will not be read until 10s have passed since the system's boot. For systems that boot faster, this can make ovs-vswitchd create fewer handlers than necessary for some time. Fixes: 0d23948a598a ("ovs-thread: Detect changes in number of CPUs.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2180460 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Always create at least 1 handler.Adrian Moreno2023-03-271-1/+1
| | | | | | | | | | | | Ensure at least 1 handler is created even if something goes wrong during cpu detection or prime numer calculation. Fixes: a5cacea5f988 ("handlers: Create additional handler threads when using CPU isolation.") Suggested-by: Aaron Conole <aconole@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Fix parse_tc_flower_to_actions() reporting errors.Eelco Chaudron2023-03-221-9/+27
| | | | | | | | | parse_tc_flower_to_actions() was not reporting errors, which would cause parse_tc_flower_to_match() to ignore them. Fixes: dd03672f7bbb ("netdev-offload-tc: Move flower_to_match action handling to isolated function.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Allow retaining CAP_SYS_RAWIO privileges.Aaron Conole2023-03-224-13/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open vSwitch generally tries to let the underlying operating system managed the low level details of hardware, for example DMA mapping, bus arbitration, etc. However, when using DPDK, the underlying operating system yields control of many of these details to userspace for management. In the case of some DPDK port drivers, configuring rte_flow or even allocating resources may require access to iopl/ioperm calls, which are guarded by the CAP_SYS_RAWIO privilege on linux systems. These calls are dangerous, and can allow a process to completely compromise a system. However, they are needed in the case of some userspace driver code which manages the hardware (for example, the mlx implementation of backend support for rte_flow). Here, we create an opt-in flag passed to the command line to allow this access. We need to do this before ever accessing the database, because we want to drop all privileges asap, and cannot wait for a connection to the database to be established and functional before dropping. There may be distribution specific ways to do capability management as well (using for example, systemd), but they are not as universal to the vswitchd as a flag. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Fix flush-conntrack with datapath as argument.Ales Musil2023-03-151-3/+9
| | | | | | | | | | | | | | | | | | Specifying datapath with "dpctl/flush-conntrack" didn't work as expected and caused error: ovs-dpctl: field system@ovs-system missing value (Invalid argument) To prevent that, check if we have datapath as first argument and use it accordingly. Also add couple of test cases to ensure that everything works as expected. Fixes: a9ae73b916ba ("ofp, dpif: Allow CT flush based on partial match.") Signed-off-by: Ales Musil <amusil@redhat.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* route-table: Retrieving the preferred source address from Netlink.Nobuhiro MIKI2023-03-073-5/+20
| | | | | | | | | | | | | | We can use the "ip route add ... src ..." command to set the preferred source address for each entry in the kernel FIB. OVS has a mechanism to cache the FIB, but the preferred source address is ignored and calculated with its own logic. This patch resolves the difference between kernel FIB and OVS route table cache by retrieving the RTA_PREFSRC attribute of Netlink messages. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-router: Introduce src option in ovs/route/add command.Nobuhiro MIKI2023-03-071-10/+76
| | | | | | | | | | | | | | | | | | | When adding a route with ovs/route/add command, the source address in "ovs_router_entry" structure is always the FIRST address that the interface has. See "ovs_router_get_netdev_source_address" function for more information. If an interface has multiple ipv4 and/or ipv6 addresses, there are use cases where the user wants to control the source address. This patch therefore addresses this issue by adding a src parameter. Note that same constraints also exist when caching routes from Kernel FIB with Netlink, but are not dealt with in this patch. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto: Fix man page for tunnel related commands.Nobuhiro MIKI2023-03-071-3/+3
| | | | | | | | | | | Fixed the manual page to indicate that both IPv4/IPv6 are supported. Also added missing pkt_mark on one side and fixed the "gw" and "bridge" notation quirks. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-router: Cleanup parser for ovs/route/add command.Nobuhiro MIKI2023-03-071-24/+29
| | | | | | | | | | | | | | This patch cleans up the parser to accept pkt_mark and gw in any order. pkt_mark and gw are normally expected to be specified exactly once. However, as with other tools, if specified multiple times, the last specification is used. Also, pkt_mark and gw have separate prefix strings so they can be parsed in any order. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dummy: Support multiple IP addresses.Nobuhiro MIKI2023-03-071-26/+44
| | | | | | | | | | This is useful in test cases where multiple IPv4/IPv6 addresses are assigned together. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* treewide: Remove uses of ATOMIC_VAR_INIT.Fangrui Song2023-03-0614-20/+11
| | | | | | | | | | ATOMIC_VAR_INIT has a trivial definition `#define ATOMIC_VAR_INIT(value) (value)`, is deprecated in C17/C++20, and will be removed in newer standards in newer GCC/Clang (e.g. https://reviews.llvm.org/D144196). Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-windows: Add checking when creating netdev with system type on WindowsWilson Peng2023-03-061-0/+11
| | | | | | | | | | | | | | | | | | | | In the recent Antrea project testing, some port could not be created on Windows. When doing debug, our team found there is one case happening when multiple ports are waiting for be created with correct port number. Some system type port will be created netdev successfully and it will cause conflict as in the dpif side it will be internal type. So finally the port will be created failed and it could not be easily recovered. With the patch, on Windows the netdev creating will be blocked for system type when the ovs_tyep got on dpif is internal. More detailed case description is in the reported issue No.262 with link below. Reported-at:https://github.com/openvswitch/ovs-issues/issues/262 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
* ofproto-dpif-upcall: Reset ukey's last stats value if the datapath changed.Eelco Chaudron2023-03-035-0/+17
| | | | | | | | | | | | | When the ukey's action set changes, it could cause the flow to use a different datapath, for example, when it moves from tc to kernel. This will cause the the cached previous datapath statistics to be used. This change will reset the cached statistics when a change in datapath is discovered. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* classifier: Fix missing masks on a final stage with ports trie.Ilya Maximets2023-02-281-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flow lookup doesn't include masks of the final stage in a resulting flow wildcards in case that stage had L4 ports match. Only the result of ports trie lookup is added to the mask. It might be sufficient in many cases, but it's not correct, because ports trie is not how we decided that the packet didn't match in this subtable. In fact, we used a full subtable mask in order to determine that, so all the subtable mask bits has to be added. Ports trie can still be used to adjust ports' mask, but it is not sufficient to determine that the packet didn't match. Assuming we have following 2 OpenFlow rules on the bridge: table=0, priority=10,tcp,tp_dst=80,tcp_flags=+psh actions=drop table=0, priority=0 actions=output(1) The first high priority rule supposed to drop all the TCP data traffic sent on port 80. The handshake, however, is allowed for forwarding. Both 'tcp_flags' and 'tp_dst' are on the final stage in the flow. Since the stage mask from that stage is not incorporated into the flow wildcards and only ports mask is getting updated, we have the following megaflow for the SYN packet that has no match on 'tcp_flags': $ ovs-appctl ofproto/trace br0 "in_port=br0,tcp,tp_dst=80,tcp_flags=syn" Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80 Datapath actions: 1 If this flow is getting installed into datapath flow table, all the packets for port 80, regardless of TCP flags, will be forwarded. Incorporating all the looked at bits from the final stage into the stages map in order to get all the necessary wildcards. Ports mask has to be updated as a last step, because it doesn't cover the full 64-bit slot in the flowmap. With this change, in the example above, OVS is producing correct flow wildcards including match on TCP flags: Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80,tcp_flags=-psh Datapath actions: 1 This way only -psh packets will be forwarded, as expected. This issue affects all other fields on stage 4, not only TCP flags. Tests included to cover tcp_flags, nd_target and ct_tp_src/dst. First two are frequently used, ct ones are sharing the same flowmap slot with L4 ports, so important to test. Before the pre-computation of stage masks, flow wildcards were updated during lookup, so there was no issue. The bits of the final stage was lost with introduction of 'stages_map'. Recent adjustment of segment boundaries exposed 'tcp_flags' to the issue. Reported-at: https://github.com/openvswitch/ovs-issues/issues/272 Fixes: ca44218515f0 ("classifier: Adjust segment boundary to execute prerequisite processing.") Fixes: fa2fdbf8d0c1 ("classifier: Pre-compute stage masks.") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* conntrack: Fix conntrack_clean may access the same exp_list each time.Liang Mancang2023-02-211-2/+2
| | | | | | | | | | | | | when a exp_list contains more than the clean_end's number of nodes, and these nodes will not expire immediately. Then, every times we call conntrack_clean, it use the same next_sweep to get exp_list. Actually, we should add i every times after we call ct_sweep. Fixes: 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.") Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Liang Mancang <liangmc1@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-linux: Add jitter parameter to the netem qos options.Miika Petäjäniemi2023-02-211-8/+18
| | | | | | | | Adds jitter option to enable emulating latency fluctuation with netem. Submitted-at: https://github.com/openvswitch/ovs/pull/407 Signed-off-by: Miika Petäjäniemi <miika.petajaniemi@solita.fi> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* conntrack: Properly unNAT inner header of related traffic.Ales Musil2023-02-131-163/+89
| | | | | | | | | | | The inner header was not handled properly. Simplify the code which allows proper handling of the inner headers. Reported-at: https://bugzilla.redhat.com/2137754 Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb-server: Fix handling of DNS name for listener configuration.Frode Nordahl2023-02-102-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 08e9e5337383 fixed proper initialization of the dns-resolve module, and made DNS resolution asynchronous. A side effect of that change revealed a long standing logic bug which broke ovsdb-server listener configuration using DNS names. Previously this worked because the DNS resolution would block, now that DNS resolution is asynchronous the code before this change would assume the error from jsonrpc_pstream_open meant the remote was a specification for an active outgoing connection, even when that was not the case. To fix this a couple of changes was made to socket-util: 1) Pass optional result of dns resolution from inet_parse_passive. When (re-)configuring listeners that use DNS names, we may need to know whether the provided connection string is invalid or if the provided DNS name has finished resolving. 2) Check dns resolution status in inet_open_passive. If the connection string is valid, and contains a DNS name, inet_open_passive will now return -EAGAIN if dns resolution failed. DNS resolution failure may either mean the asynchronous resolver has not completed yet, or that the name does not resolve. Reported-at: https://bugs.launchpad.net/bugs/1998781 Fixes: 08e9e5337383 ("ovsdb: raft: Fix inability to read the database with DNS host names.") Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving") Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: If the flow has not been used, report it as such.Eelco Chaudron2023-02-091-1/+13
| | | | | | | | | | | | | | | If a tc flow was installed but has not yet been used, report it as such. In addition, add a delay to the "IGMP - flood under normal action" test case to make it work with many repetitions. This delay is also present in other ICMP/IGMP tests. Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp-util: Make odp_flow_key_from_flow__ nlattr order the same as the kernel.Eelco Chaudron2023-02-091-10/+11
| | | | | | | | | | | | | | Make the order of the Netlink attributes for odp_flow_key_from_flow__() the same as the kernel will return them. This will make sure the attributes displayed in the dpctl/dump-flows output appear in the same order for all datapath. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Conntrack ALGs are not supported with tc.Eelco Chaudron2023-02-091-0/+4
| | | | | | | | | | | | | | | tc does not support conntrack ALGs. Even worse, with tc enabled, they should not be used/configured at all. This is because even though TC will ignore the rules with ALG configured, i.e., they will flow through the kernel module, return traffic might flow through a tc conntrack rule, and it will not invoke the ALG helper. Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Fix tc conntrack force commit support.Eelco Chaudron2023-02-091-2/+11
| | | | | | | | | | | | tc was not setting the OVS_CT_ATTR_FORCE_COMMIT flag when a forced commit was requested. This patch will fix this. Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tests: Include working system-traffic tests into the system-offloads-testsuite.Eelco Chaudron2023-02-081-1/+1
| | | | | | | | | | | | | | | | Include and run the system-traffic.at tests as part of the system offload testsuite. Exclude all the tests that will not run without any special modifications. Lowered log level for "recirc_id sharing not supported" message, so tests will not fail with older kernels. This is not an error level message, but should be debug, like all other, EOPNOTSUPP, related log messages. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* hash: Avoid 64bit crc intrinsics on 32bit aligned data.Mike Pattrick2023-02-031-1/+64
| | | | | | | | | | | | | | | UB Sanitizer report: lib/hash.h:219:17: runtime error: load of misaligned address 0x7ffc164a88b4 for type 'const uint64_t', which requires 8 byte alignment #0 in hash_words_inline lib/hash.h:219 #1 in hash_words lib/hash.h:297 [...] Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dp-packet: Allocate on cacheline boundary with DPDK.Mike Pattrick2023-02-032-0/+8
| | | | | | | | | | | | | | | | | UB Sanitizer report: lib/dp-packet.h:587:22: runtime error: member access within misaligned address 0x000001ecde10 for type 'struct dp_packet', which requires 64 byte alignment #0 in dp_packet_set_base lib/dp-packet.h:587 #1 in dp_packet_use__ lib/dp-packet.c:46 #2 in dp_packet_use lib/dp-packet.c:60 #3 in dp_packet_init lib/dp-packet.c:126 #4 in dp_packet_new lib/dp-packet.c:150 [...] Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dummy: Allocate dummy_packet_stream on cacheline boundary.Mike Pattrick2023-02-031-5/+5
| | | | | | | | | | | | | | | | UB Sanitizer report: lib/netdev-dummy.c:197:15: runtime error: member access within misaligned address 0x00000217a7f0 for type 'struct dummy_packet_stream', which requires 64 byte alignment ^ #0 dummy_packet_stream_init lib/netdev-dummy.c:197 #1 dummy_packet_stream_create lib/netdev-dummy.c:208 #2 dummy_packet_conn_set_config lib/netdev-dummy.c:436 [...] Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Preserve tc statistics when flow gets modified.Eelco Chaudron2023-02-032-16/+84
| | | | | | | | | | | | | | | When a flow gets modified, i.e. the actions are changes, the tc layer will remove, and re-add the flow. This is causing all the counters to be reset. This patch will remember the previous tc counters and adjust any requests for statistics. This is done in a similar way as the rte_flow implementation. It also updates the check_pkt_len tc test to purge the flows, so we do not use existing updated tc flow counters, but start with fresh installed set of datapath flows. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Add support to count upcall packets.wangchuanlei2023-01-314-11/+36
| | | | | | | | | Add support to count upcall packets per port, both succeed and failed, which is a better way to see how many packets upcalled on each interface. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: wangchuanlei <wangchuanlei@inspur.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().Eelco Chaudron2023-01-304-13/+22
| | | | | | | | | | | | | | | | | | | | A long long time ago, an effort was made to make tc flower rtnl_lock() free. However, on the OVS part we forgot to add the TCA_KIND "flower" attribute, which tell the kernel to skip the lock. This patch corrects this by adding the attribute for the delete and get operations. The kernel code calls tcf_proto_is_unlocked() to determine the rtnl_lock() is needed for the specific tc protocol. It does this in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter(). If the name is not set, tcf_proto_is_unlocked() will always return false. If set, the specific protocol is queried for unlocked support. Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Fix misaligned access to ct label.Ilya Maximets2023-01-271-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UndefinedBehaviorSanitizer: lib/netdev-offload-tc.c:1356:50: runtime error: member access within misaligned address 0x60700001a89c for type 'const struct (unnamed struct at lib/netdev-offload-tc.c:1350:27)', which requires 8 byte alignment 0x60700001a89c: note: pointer points here 24 00 04 00 01 00 00 05 00 00 0d 00 0a 00 00 00 00 00 00 00 ... ^ 0 0xd5d183 in parse_put_flow_ct_action lib/netdev-offload-tc.c:1356:50 1 0xd5783f in netdev_tc_parse_nl_actions lib/netdev-offload-tc.c:2015:19 2 0xd4027c in netdev_tc_flow_put lib/netdev-offload-tc.c:2355:11 3 0x9666d7 in netdev_flow_put lib/netdev-offload.c:318:14 4 0xcd4c0a in parse_flow_put lib/dpif-netlink.c:2297:11 5 0xcd4c0a in try_send_to_netdev lib/dpif-netlink.c:2384:15 6 0xcd4c0a in dpif_netlink_operate lib/dpif-netlink.c:2455:23 7 0x87d40e in dpif_operate lib/dpif.c:1372:13 8 0x6d43e9 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5 9 0x6d43e9 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9 10 0x6cf6ea in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13 11 0xb6d7ea in ovsthread_wrapper lib/ovs-thread.c:423:12 12 0x7f5ccf017801 in start_thread 13 0x7f5ccefb744f in __GI___clone3 Fixes: 9221c721bec0 ("netdev-offload-tc: Add conntrack label and mark support") Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev-perf: Add metric averages when no iterations.Kevin Traynor2023-01-271-23/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pmd-perf-show with pmd-perf-metrics=true displays a histogram with averages. However, averages were not displayed when there is no iterations. They will be all zero so it is not hiding useful information but the stats look incomplete without them, especially when they are displayed for some PMD thread cores and not others. The histogram print is large and this is just an extra couple of lines, so might as well print them all the time to ensure that the user does not think there is something missing from the display. Before patch: Histograms cycles/it 499 0 716 0 1025 0 1469 0 <snip> After patch: Histograms cycles/it 499 0 716 0 1025 0 1469 0 <snip> --------------- cycles/it 0 Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev-perf: Remove not a number stat value.Kevin Traynor2023-01-271-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some stats in pmd-perf-show don't check for divide by zero which results in not a number (-nan). This is a normal case for some of the stats when there are no Rx queues assigned to the PMD thread core. It is not obvious what -nan is to a user so add a check for divide by zero and set stat to 0 if present. Before patch: pmd thread numa_id 1 core_id 9: Iterations: 0 (-nan us/it) - Used TSC cycles: 0 ( 0.0 % of total cycles) - idle iterations: 0 ( -nan % of used cycles) - busy iterations: 0 ( -nan % of used cycles) After patch: pmd thread numa_id 1 core_id 9: Iterations: 0 (0.00 us/it) - Used TSC cycles: 0 ( 0.0 % of total cycles) - idle iterations: 0 ( 0.0 % of used cycles) - busy iterations: 0 ( 0.0 % of used cycles) Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dpdk: Free mbufs in bulk.Ilya Maximets2023-01-271-10/+3
| | | | | | | | | | | | | rte_pktmbuf_free_bulk() function was introduced in 19.11 and became stable in 21.11. Use it to free arrays of mbufs instead of freeing packets one by one. In simple V2V testing with 64B packets, 2 PMD threads and bidirectional traffic this change improves performance by 3.5 - 4.5 %. Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>