delta/openvswitch.git - github.com: openvswitch/ovs.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Set release date for 2.17.5.v2.17.5	Ilya Maximets	2022-12-20	2	-2/+7
\| \| \| \| \|	Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	lldp: Fix bugs when parsing malformed AutoAttach.	Qian Chen	2022-12-20	2	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The OVS LLDP implementation includes support for AutoAttach standard, which the 'upstream' lldpd project does not include. As part of adding this support, the message parsing for these TLVs did not include proper length checks for the LLDP_TLV_AA_ELEMENT_SUBTYPE and the LLDP_TLV_AA_ISID_VLAN_ASGNS_SUBTYPE elements. The result is that a message without a proper boundary will cause an overread of memory, and lead to undefined results, including crashes or other unidentified behavior. The fix is to introduce proper bounds checking for these elements. Introduce a unit test to ensure that we have some proper rejection in this code base in the future. Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard") Signed-off-by: Qian Chen <cq674350529@163.com> Co-authored-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Use unmasked key when adding datapath flows.	Eelco Chaudron	2022-12-20	2	-4/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The datapath supports installing wider flows, and OVS relies on this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed to be added. However, if we try to add a wildcard rule, the installation fails: # ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 # ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2 ovs-vswitchd: updating flow table (File exists) The reason is that the key used to determine if the flow is already present in the system uses the original key ANDed with the mask. This results in the IP address not being part of the (miniflow) key, i.e., being substituted with an all-zero value. When doing the actual lookup, this results in the key wrongfully matching the first flow, and therefore the flow does not get installed. The solution is to use the unmasked key for the existence check, the same way this is handled in the "slow" dpif_flow_put() case. OVS relies on the fact that overlapping flows can exist if one is a superset of the other. Note that this is only true when the same set of actions is applied. This is due to how the revalidator process works. During revalidation, OVS removes too generic flows from the datapath to avoid incorrect matches but allows too narrow flows to stay in the datapath to avoid the data plane disruption and also to avoid constant flow deletions if the datapath ignores wildcards on certain fields/bits. See flow_wildcards_has_extra() check in the revalidate_ukey__() function. The problem here is that we have a too narrow flow installed, and now OpenFlow rules got changed, so the actual flow should be more generic. Revalidators will not remove the narrow flow, and we will eventually get an upcall on the packet that doesn't match the narrow flow, but we will not be able to install a more generic flow because after masking with the new wider mask, the key matches on the narrow flow, so we get EEXIST. Fixes: beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb-cs: Consider default conditions implicitly acked.	Dumitru Ceara	2022-12-13	5	-53/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When initializing a monitor table the default monitor condition is [True] which matches the behavior of the server (to send all rows of that table). There's no need to include this default condition in the initial monitor request so we can consider it implicitly acked by the server. This fixes the incorrect (one too large) expected condition sequence number reported by ovsdb_idl_set_condition() when application is trying to set a [True] condition for a new table. Reported-by: Numan Siddique <numans@ovn.org> Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	rculist: Use rculist_back_protected to access prev.	Adrian Moreno	2022-12-06	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The .prev member of a rculist should not be used directly by users because it's not rcu-safe. A convenient fake mutex (rculist_fake_mutex) helps ensuring that in conjunction with clang's thread safety extensions. Only writers with exclusive access to the rculist should access .prev via some of the provided *_protected() accessors. Use rculist_back_protected() in REVERSE_PROTECTED iterators to avoid clang's compilation warning. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Prepare for 2.17.5.	Ilya Maximets	2022-12-01	3	-1/+10
\| \| \| \| \| \|	Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Set release date for 2.17.4.v2.17.4	Ilya Maximets	2022-12-01	2	-2/+5
\| \| \| \| \| \|	Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	odp-util: Fix reporting unknown keys as keys with bad length.	Ilya Maximets	2022-11-30	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	check_attr_len() currently reports all unknown keys as keys with bad length. For example, IPv6 extension headers are printed out like this in flow dumps: eth_type(0x86dd),ipv6(...) (bad key length 2, expected -1)(00 00/(bad mask length 2, expected -1)(00 00), icmpv6(type=0/0,code=0/0) However, since the key is unknown, the length check on it makes no sense and should be ignored. This will allow the unknown key to be caught later by the format_unknown_key() function and printed in a more user-friendly way: eth_type(0x86dd),ipv6(...),key32(00 00/00 00),icmpv6(type=0/0,code=0/0) '32' here is the actual index of the key attribute, so we know that it is unknown attribute #32 with the value/mask pair printed out inside the parenthesis. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovs-dpctl-top: Fix ovs-dpctl-top via pipe.	Timothy Redaelli	2022-11-30	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently it's not possible to use ovs-dpctl-top via pipe (eg: ovs-dpctl dump-flows \| ovs-dpctl-top --script --verbose) since Python3 doesn't allow to open a file (stdin in our case) in binary mode without buffering enabled. This commit changes the behaviour in order to directly pass stdin to flows_read instead of re-opening it without buffering. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	rculist: Fix iteration macros.	Ilya Maximets	2022-11-24	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some macros for rculist have no users and there are no unit tests specific to that library as well, so broken code wasn't spotted while updating to multi-variable iterators. Fixing multiple problems like missing commas, parenthesis, incorrect variable and macro names. Fixes: d293965d7b06 ("rculist: use multi-variable helpers for loop macros.") Reported-by: Subrata Nath <subrata.nath@nokia.com> Co-authored-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	vswitchd: Publish per iface received multicast packets.	David Marchand	2022-11-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The count of received multicast packets has been computed internally, but not exposed to ovsdb. Fix this. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	learn: Fix parsing immediate value for a field match.	Ilya Maximets	2022-11-24	2	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value is right-justified after the string parsing with parse_int_string(), i.e. it is in BE byte order and aligned to the right side of the array. For example, the 0x10011 value in a 4-byte field will look like 0x00 0x01 0x00 0x11. However, value copy to the resulted ofpact is performed from the start of the memory. So, in case the destination size is smaller than the original field size, incorrect part of the value will be copied. In the 0x00 0x01 0x00 0x11 example above, if the copy is performed to a 3-byte field, the first 3 bytes will be copied, which are 0x00 0x01 0x00 instead of 0x01 0x00 0x11. This leads to a problem where NXM_NX_REG3[0..16]=0x10011 turns into NXM_NX_REG3[0..16]=0x100 after the parsing. Fix that by offsetting the starting position to the size difference in bytes similarly to how it is done in learn_parse_load_immediate(). While at it, changing &imm to imm.b in function calls that expect byte arrays as an argument. The old way is technically correct, but more error prone. The mf_write_subfield_value() call was also incorrect. However, the 'match' variable is actually not used for anything since checking removal in commit: dd43a558597b ("Do not perform validation in learn_parse();") So, just removing the call and the 'match' variable entirely instead of fixing it. Fixes: 21b2fa617126 ("ofp-parse: Allow match field names in actions and brackets in matches.") Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-November/052100.html Reported-by: Thomas Lee <newsforthomas@engineer.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	datapath-windows: Check the condition to reset pseudo header checksum on Rx side	Wilson Peng	2022-11-24	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If ovs node running on Windows is processing NAT action on the RX side, it will reset pseudo header checksum only if the L4 checksum is same as the calculated pseudo header checksum before NAT action. Without the fix, if the L4 header checksum is filled with a pseudo header checksum (sourceip, dstip, protocol, tcppayloadlen+tcpheaderlen) OVS will still do the checksum update(replace some IP and port and recalculate the checksum). It will lead to incorrect L4 header checksum. Reported-at:https://github.com/openvswitch/ovs-issues/issues/265 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
*	netdev-offload-dpdk: Enhance the support of tunnel pop action	Chaoyong He	2022-11-10	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \|	Populate the 'is_ipv6' field of 'struct rte_flow_tunnel', which can be used in the implementation of tunnel pop action for DPDK PMD. Fixes: be56e063d028 ("netdev-offload-dpdk: Support tunnel pop action.") Signed-off-by: Chaoyong He <chaoyong.he@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Acked-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>
*	ci: Update meson requirement for DPDK.	Ian Stokes	2022-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current version of meson used for building DPDK is 0.49.2. This has the restriction of holding the required python version to 3.9. A recent change [1] in DPDK bumped requirements on meson to 0.53.2. Update the version of meson used to build DPDK to 0.53.2 to remove the restriction. [1] https://git.dpdk.org/dpdk/commit/?id=909ad7b80e5e Signed-off-by: Ian Stokes <ian.stokes@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
*	ovsdb: transaction: Fix weak reference leak.	Han Zhou	2022-11-04	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a row is deleted, if the row has weak references to other rows, the weak reference nodes attached to the destination rows (through weak->dst_node hmap) are not destroyed. Deleting weak references is properly handled when a row is modified. The removed references are taken care by: 1. assess_weak_refs() figures out the deleted references from the row and add them to txn_row->deleted_refs. 2. before commit, in ovsdb_txn_update_weak_refs() it finds the destination row for each item in txn_row->deleted_refs (from step 1), and destroy the corresponding weak references of the destination row. However, when the row is deleted, the step 1 in assess_weak_refs() is missing. It directly returns without adding the deleted references to txn_row->deleted_refs. So, the destination nodes will keep those weak references although the source side of the references are already deleted. When such rows that originating weak references are created and deleted, more and more such useless weak reference structures accumulate in the memory, and can stay there until the destination rows are deleted. It is possible that the destination row is never deleted, and in such case the ovsdb-server memory keeps growing (although it is not strictly memory leak, because the structures are still referenced). This problem has an impact to applications like OVN SB DB - the memory grows very fast in long-running deployments and finally causes OOM. This patch fixes it by generating deleted_refs for deleted rows in assess_weak_refs(). Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.") Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb: transaction: Refactor assess_weak_refs.	Han Zhou	2022-11-04	1	-42/+36
\| \| \| \| \| \| \| \| \|	The loops for adding weak refs are quite similar. Abstract to a function, which will be used by one more cases later. The patch also changes the txn_row arg to the source row. Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovs-tcpdump: Cleanup mirror port on SIGHUP/SIGTERM.	Daniel Ding	2022-11-02	1	-18/+22
\| \| \| \| \| \| \| \| \| \| \|	If ovs-tcpdump received HUP or TERM signal, mirror and mirror interface should be destroyed. This often happens, when controlling terminal is closed, like ssh session closed, and other users use kill to terminate it. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Daniel Ding <zhihui.ding@easystack.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-linux: Fix inability to apply QoS on ports with custom qdiscs.	Ilya Maximets	2022-11-02	2	-4/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tc_del_qdisc() function only removes qdiscs with handle '1:0'. If for some reason the interface has a qdisc with non-zero handle attached, tc_del_qdisc() will not delete it and subsequent tc_install() will fail to install a new qdisc. The problem is that Libvirt by default is setting noqueue qdisc for all tap interfaces it creates. This is done for performance reasons to ensure lockless xmit. The issue is causing non-working QoS in OpenStack setups since new versions of Libvirt started to use OVS to configure it. In the past, Libvirt configured TC on its own, bypassing OVS. Removing the handle value from the deletion request, so any qdisc can be removed. Changing the error checking to also pass ENOENT, since that is the error reported if only default qdisc is present. Alternative solution might be to use NLM_F_REPLACE, but that will be a larger change with a potential need of refactoring. Potential side effect of the change is that OVS may start removing qdiscs that it didn't remove before. Though it's not a new issue and 'linux-noop' QoS type should be used for ports that OVS should not touch. Otherwise, OVS owns qdiscs on all interfaces attached to it. While at it, adding more logs as errors are not logged in any way at the moment making the issue hard to debug. Reported-at: https://bugzilla.redhat.com/2138339 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052088.html Reported-at: https://github.com/openvswitch/ovs-issues/issues/268 Suggested-by: Slawek Kaplonski <skaplons@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	tc: Fix misaligned writes while parsing pedit.	Ilya Maximets	2022-11-02	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Offsets within 'rewrite' action are not 4-byte aligned, so has to be accessed carefully. SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/tc.c:1132:17 in lib/tc.c:1132:17: runtime error: store to misaligned address 0x7fba215b2025 for type 'ovs_be32' (aka 'unsigned int'), which requires 4 byte alignment 0 0xd78857 in nl_parse_act_pedit lib/tc.c:1132:24 1 0xd68103 in nl_parse_single_action lib/tc.c:1936:15 2 0xd624ee in nl_parse_flower_actions lib/tc.c:2024:19 3 0xd624ee in nl_parse_flower_options lib/tc.c:2139:12 4 0xd5f082 in parse_netlink_to_tc_flower lib/tc.c:2187:12 5 0xd6a2a1 in tc_replace_flower lib/tc.c:3776:19 6 0xd2ae8f in netdev_tc_flow_put lib/netdev-offload-tc.c:2350:11 7 0x951d07 in netdev_flow_put lib/netdev-offload.c:318:14 8 0xcbb81a in parse_flow_put lib/dpif-netlink.c:2297:11 9 0xcbb81a in try_send_to_netdev lib/dpif-netlink.c:2384:15 10 0xcbb81a in dpif_netlink_operate lib/dpif-netlink.c:2455:23 11 0x8678ae in dpif_operate lib/dpif.c:1372:13 12 0x6bcc89 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5 13 0x6bcc89 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9 14 0x6b7f9a in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13 15 0xb54c5a in ovsthread_wrapper lib/ovs-thread.c:422:12 16 0x7fba2f2081ce in start_thread (/lib64/libpthread.so.0+0x81ce) 17 0x7fba2de39dd2 in clone (/lib64/libc.so.6+0x39dd2) Fixes: 8ada482bbe19 ("tc: Add header rewrite using tc pedit action") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	odp-util: Add missing separator in format_odp_conntrack_action().	Paolo Valerio	2022-11-02	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	If OVS_CT_ATTR_TIMEOUT is included, the resulting output is the following: actions:ct(commit,timeout=1nat(src=10.1.1.240)) Fix it by trivially adding a trailing ',' to timeout as well. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	vswitch.xml: Fix the name of rstp-path-cost option.	Ilya Maximets	2022-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	For some reason it is documented as 'rstp-port-path-cost', while the code and some other bits of documentation use 'rstp-path-cost'. Fixes: 9efd308e957c ("Rapid Spanning Tree Protocol (IEEE 802.1D).") Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	mac-learning: Fix learned fdb entries not age out issue.	Lin Huang	2022-11-02	2	-23/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After user add a static fdb entry, the get_lru() function will always return the static fdb entry. That's normal fdb entries will not age out through mac_learning_run(). Fix the issue by modify the get_lru() function to check the entry->expires field and not return the entry which entry->expires is MAC_ENTRY_AGE_STATIC_ENTRY. Adding a unit test for this. Fixes: ccc24fc88d59 ("ofproto-dpif: APIs and CLI option to add/delete static fdb entry.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Zhang Yuhuang <zhangyuhuang@ruijie.com.cn> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofproto-dpif-xlate: Update tunnel neighbor when receive gratuitous ARP.	Han Ding	2022-11-02	2	-3/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	OVS now just allow the ARP Reply which the destination address is matched against the known xbridge addresses to update tunnel neighbor. So when OVS receive the gratuitous ARP from underlay gateway which the source address and destination address are all gateway IP, tunnel neighbor will not be updated. Fixes: ba07cf222a0c ("Handle gratuitous ARP requests and replies in tnl_arp_snoop()") Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to terminate_native_tunnel()") Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Han Ding <handing@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	bond: Fix crash while logging not yet enabled member.	yangchang	2022-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log should be printed with the member name, not the active member name, and the active member does not judge whether it is NULL. If null, OVS will crash with the following backtrace: (gdb) bt 0 bond_check_admissibility (ofproto/bond.c:877) 1 is_admissible (ofproto/ofproto-dpif-xlate.c:2574) 2 xlate_normal (ofproto/ofproto-dpif-xlate.c:3027) 3 xlate_output_action (ofproto/ofproto-dpif-xlate.c:5284) 4 do_xlate_actions (ofproto/ofproto-dpif-xlate.c:6960) 5 xlate_actions (ofproto/ofproto-dpif-xlate.c:7924) 6 upcall_xlate (ofproto/ofproto-dpif-upcall.c:1237) 7 process_upcall (ofproto/ofproto-dpif-upcall.c:1456) 8 upcall_cb (ofproto/ofproto-dpif-upcall.c:1358) 9 dp_netdev_upcall (lib/dpif-netdev.c:7793) 10 handle_packet_upcall (lib/dpif-netdev.c:8255) 11 fast_path_processing (lib/dpif-netdev.c:8374) 12 dp_netdev_input__ (lib/dpif-netdev.c:8463) 13 dp_netdev_input (lib/dpif-netdev.c:8501) 14 dp_netdev_process_rxq_port (lib/dpif-netdev.c:5337) 15 pmd_thread_main (lib/dpif-netdev.c:6944) 16 ovsthread_wrapper (lib/ovs-thread.c:422) 17 ?? (/lib64/libpthread.so.0) 18 clone (/lib64/libc.so.6) Fixes: 423416f58749 ("lacp: report desync in ovs threads enabling slave") Signed-off-by: yangchang <yangchang@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-dpdk: Fix tx_dropped counters value.	Robin Jarry	2022-11-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Packets that could not be transmitted because the TXQ are full should be taken into account in the global ovs_tx_failure_drops as it was the case before commit 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit path."). netdev_dpdk_eth_tx_burst() returns the number of packets that were not transmitted. Add that number to stats.tx_failure_drops and only include the packets that were dropped in previous steps afterwards. Fixes: 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit path.") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	unaligned: Correct the stats of packet_count and byte_count on Windows.	Wilson Peng	2022-10-25	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The stats(byte_count) is got via function call ofputil_decode_flow_stats_reply() and for OpenFlow15 it will also call oxs_pull_entry__(). Currently we found on Windows the byte_count counter is incorrect. It will get the byte_count on OpenFlow15 handling via ntohll(get_unaligned_be64(payload)) Quote the comments below from Ilya Maximets (thanks for the given soluton and explanation): static inline uint64_t get_unaligned_u64__(const uint64_t p_) ... return ntohll(((uint64_t) p[0] << 56) \| ((uint64_t) p[1] << 48) \| ((uint64_t) p[2] << 40) \| ((uint64_t) p[3] << 32) \| (p[4] << 24) \| (p[5] << 16) \| (p[6] << 8) \| p[7]); And indeed the expression above has an issue with data types. The problem is the (p[4] << 24) part. The p[4] itself has a type 'uint8_t' which is unsigned 8bit value. It is not enough to hold the result of a left shift, so compiler automatically promotes it to the 'int' by default. But it is signed* 32bit value. In your original report p[4] was equal to 0x81. After the left shift it became 0x81000000. Looks correct, but the type is 'int'. The next operation that we do is '\|' with the previous shifted bytes that were explicitly converted to uint64_t before the left shift. So we have uint64_t \| int. In this case compiler needs to extend the 'int' to 'unit64_t' before performing the operation. And since the 'int' is signed and the sign bit happens to be set in the 0x81000000, the sign extension is performed in order to preserve the value. The result is 0xffffffff81000000. And that is breaking everything else. From the new test below, it is incorrect for the n_bytes counter via OpenFlow15 on CMD: ovs-ofctl dump-flows. With the patch, get_unaligned_u64__() will return correct value to caller on Windows. In the output (Got via original CMD without fix) below n_bytes 2177130813 will be incorrectly changed to 18446744071591715133 when processing OpenFlow15 which is equal to 0xFFFFFFFF81C4613D and here the p[4] on Windows is 0x81. With the fix, new compiled ovs-ofctl1025.exe could dump the correct n_bytes counter Via OpenFlow15. ovs-ofctl.exe -O OpenFlow15 dump-flows nsx-managed \| findstr 1516011 cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=18446744071591715133, cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=18446744071591715133, ovs-ofctl.exe -O OpenFlow10 dump-flows nsx-managed \| findstr 1516011 cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813, cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813, ovs-ofctl.exe dump-flows nsx-managed \| findstr 1516011 cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813, cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813, With the fix, new compiled ovs-ofctl1025.exe could dump the correct n_bytes counter Via OpenFlow15. ovs-ofctl1025.exe -O OpenFlow15 dump-flows nsx-managed \| findstr 1516011 cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813, cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813, Fixes: afa3a93165f1 ("Add header for access to potentially unaligned data.") Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	tests: Fix filtering of whole-second durations.	Ilya Maximets	2022-10-25	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current macros are unable to filter whole seconds, e.g. 'duration:6s'. This is causing random test failures, most frequently in CirrusCI: ./dpif-netdev.at:370: ovs-ofctl -O OpenFlow13 meter-stats br0 \| strip_timers --- - +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/990/stdout @@ -1,5 +1,5 @@ OFPST_METER reply (OF1.3) (xid=0x2): -meter:1 flow_count:1 packet_in_count:10 byte_in_count:600 duration:0.0s bands: +meter:1 flow_count:1 packet_in_count:10 byte_in_count:600 duration:6s bands: Fix sed matches to correctly handle that scenario. Repeating the [0-9\.] twice because it is hard to write a shorter portable version with sed. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload: Set 'miss_api_supported' to be under netdev.	Eli Britstein	2022-10-25	4	-16/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cited commit introduced a flag in dpif-netdev level, to optimize performance and avoid hw_miss_packet_recover() for devices with no such support. However, there is a race condition between traffic processing and assigning a 'flow_api' object to the netdev. In such case, EOPNOTSUPP is returned by netdev_hw_miss_packet_recover() in netdev-offload.c layer because 'flow_api' is not yet initialized. As a result, the flag is falsely disabled, and subsequent packets won't be recovered, though they should. In order to fix it, move the flag to be in netdev-offload layer, to avoid that race. Fixes: 6e50c1651869 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices with no support.") Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	cmap: Add thread fence for slot update.	Yanqin Wei	2022-10-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	Bucket update in the cmap lib is protected by a counter. But hash setting is possible to be moved before counter update. This patch fix this issue. Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com> Reviewed-by: Gavin Hu <Gavin.Hu@arm.com> Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofproto-dpif-xlate: Do not use zero-weight buckets in select groups.	Ben Pfaff	2022-10-18	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \|	The OpenFlow specification says that buckets in select groups with a weight of zero should not be selected, but the ofproto-dpif implementation could select them in corner cases. This fixes the problem. Reported-by: ychen <ychen103103@163.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359349.html Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	github: Update versions of action dependencies.	Ilya Maximets	2022-10-12	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checkout@v2, cache@v2 and setup-python@v2 are using outdated Node.js 12 which is now deprecated in GHA [1], so these actions will stop working soon. Updating to most recent major versions with Node.js 16. This stops GHA from throwing warnings in every build. While at it, also updating upload-artifacts to more recent version. [1] https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/ Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovs-tcpdump: Fix bond port unable to capture jumbo frames.	Lin Huang	2022-10-11	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the ovs-tcpdump utility creates a tap port to capture the frames of a bond port. If a user want to capture the packets from the bond port which member interface's mtu is more than 1500. By default the utility creates a tap port which mtu is 1500, regardless the member interface's mtu config. So that user can't get the bond port frames which mtu is lager than 1500. This patch fix this issue by checking the member interface's mtu and set maximal mtu value to the tap port. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	json: Fix deep copy of objects and arrays.	Ilya Maximets	2022-10-11	2	-12/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reference counting for json objects was introduced the old json_clone() function became json_deep_clone(), but it still calls shallow json_clone() while cloning objects and arrays not really producing a deep copy. Fixing that by making other functions to perform a deep copy as well. There are no users for this functionality inside OVS right now, but OVS exports this functionality externally. 'ovstest test-json' extended to test both versions of a clone on provided inputs. Fixes: 9854d473adea ("json: Use reference counting in JSON objects") Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Prepare for 2.17.4.	Ilya Maximets	2022-10-07	3	-1/+10
\| \| \| \| \|	Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Set release date for 2.17.3.v2.17.3	Ilya Maximets	2022-10-07	2	-2/+3
\| \| \| \| \|	Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Add support for OpenSSL 3.0 functions.	Timothy Redaelli	2022-10-07	3	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	In OpenSSL 3.0 some functions were deprecated and replaced. This commit adds some #ifdef to build without warning on both OpenSSL 1.x and OpenSSL 3.x. For OpenSSL 3.x, the default built-in DH parameters are used (as suggested by SSL_CTX_set_dh_auto manpage). Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dhparams: Fix .c file generation with OpenSSL >= 3.0.	Timothy Redaelli	2022-10-07	1	-8/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since OpenSSL upstream commit 1696b8909bbe ("Remove -C from dhparam,dsaparam,ecparam") "openssl dhparam" doesn't support -C anymore. This commit changes generate-dhparams-c to generate dhparams.c by parsing "openssl dhparam -in "$1" -text -noout" output directly. The generated file won't be used on OpenSSL >= 3.0, but it's still needed to be generated if OVS is built on OpenSSL < 3.0. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	daemon-unix: Fix file descriptor leak when monitor restarts child.	Fengqi Li	2022-10-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When segmentation fault occurred in ovn-northd, monitor will try to restart the ovn-northd daemon process every 10s. Assume the following scenarios: There is a segmentation fault and the ovn-northd daemon process does not restart properly every time. New fds are created each time the ovn-northd daemon process is restarted by the monitor process, but old fds(fd[0]) owned by the monitor process was not closed properly. One pipe leak for each restart of the ovn-northd daemon process. After a long time file descriptors were exhausted. Fixes: e2ed6fbeb18c ("fatal-signal: Catch SIGSEGV and print backtrace.") Signed-off-by: Fengqi Li <lifengqi@inspur.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	vconn: Allow ECONNREFUSED in refuse connection test.	Mike Pattrick	2022-10-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "tcp vconn - refuse connection" test may fail due to a Connection Refused error. The network stack returns ECONNREFUSED on a reset connection in SYN_SENT state and EPIPE or ECONNRESET in all other cases. 2022-09-19T17:45:48Z\|00001\|socket_util\|INFO\|0:127.0.0.1: listening on port 34189 2022-09-19T17:45:48Z\|00002\|poll_loop\|DBG\|wakeup due to [POLLOUT][ POLLERR][POLLHUP] on fd 4 (127.0.0.1:47140<->) at ../lib/stream-fd. c:153 test-vconn: unexpected vconn_connect() return value 111 (Connection refused) ../../tests/vconn.at:21: exit code was 1, expected 0 530. vconn.at:21: 530. tcp vconn - refuse connection (vconn.at:21): FAILED (vconn.at:21) This was observed from a CI system, and isn't a common case. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Use DPDK 21.11.2 release.	Michael Phelan	2022-10-04	4	-10/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update OVS CLI and relevant documentation to use DPDK 21.11.2. DPDK 21.11.2 contains fixes for the CVEs listed below: CVE-2022-28199 [1] CVE-2022-2132 [2] A bug was introduced in DPDK 21.11.1 by the commit 01e3dee29c02 ("vhost: fix unsafe vring addresses modifications"). This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation of the virtqueues happen. A fix [3] has been posted and pushed to the DPDK 21.11 branch. If a user wishes to avoid the issue then it is recommended to use DPDK 21.11.0 until the release of DPDK 21.11.3. It should be noted that DPDK 21.11.0 does not benefit from the numerous bug and CVE fixes addressed since its release. If a user wishes to benefit from these fixes it is recommended to use DPDK 21.11.2. [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199 [2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132 [3] https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.marchand@redhat.com/ Signed-off-by: Michael Phelan <michael.phelan@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	m4: Test avx512 for x86 only.	Cheng Li	2022-09-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	'as' command of arm version may don't support option '--64', this patch is to move the avx512 test into x86 branch to avoid this. Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.") Tested-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Cheng Li <lic121@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb-idl: Preserve references for rows deleted in same IDL run as their ↵	Xavier Simonart	2022-09-27	3	-5/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insertion. Considering two DB rows, 'a' from table A and 'b' from table B (with column 'ref_a' a reference to table A): a = {A._uuid=<U1>} b = {B._uuid=<U2>, B.ref_a=<U1>} Assuming both records are inserted in the IDL client's in-memory view of the database, if row 'b' is also deleted in the same run, it should generate the following tracked changes: - for table A: - inserted records: a = {A._uuid=<U1>} - for table B: - inserted records: b = {B._uuid=<U2>, B.ref_a=<U1>} - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>} Before this patch, inserted and deleted records in table B would (in some cases [0]) be b = {B._uuid=<U2>, B.ref_a=[]}. Having B.ref_a=[] would violate the integrity of the database from client perspective. test-ovsdb has also been updated to show that one row can be both inserted and deleted within one IDL run. [0] In ovn-controller the fact that the reference is NULL caused a crash in the following case, when both commands were handled by ovn-controller within the same loop: $ ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- \ lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" $ ovn-nbctl lsp-del sw0-port1 Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126450 Fixes: 91e1ff5dde39 ("ovsdb-idl: Don't reparse orphaned rows.") Signed-off-by: Xavier Simonart <xsimonar@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: idl: Fix idl.Row.__str__ method.	Christopher Aubut	2022-09-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes idl.Row's __str__ method to only print if the column exists on the object. The Row object passed to the 'updates' argument of Idl.notify only contains a subset of columns. Printing that argument causes an AttributeError. Fixes: 6a1c98461b46 ("Add a __str__ method to idl.Row") Submitted-at: https://github.com/openvswitch/ovs/pull/392 Acked-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Christopher Aubut <christopher@aubut.me> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	bond: Avoid deadlock while updating post recirculation rules.	Ilya Maximets	2022-09-16	4	-4/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the PACKET_OUT from controller ends up with sending packet to a bond interface, the main thread will take locks in the following order: handle_openflow --> take ofproto_mutex handle_packet_out packet_xlate output_normal bond_update_post_recirc_rules --> take rwlock in bond.c If at the same time revalidator thread is processing other packet with the output to the same bond: xlate_actions output_normal bond_update_post_recirc_rules --> take rwlock in bond.c update_recirc_rules ofproto_dpif_add_internal_flow ofproto_flow_mod --> take ofproto_mutex So, it is possible for these 2 threads to lock each other by taking one lock and waiting for another thread to release the second lock. It is also possible for the main thread to lock itself up by trying to acquire ofproto_mutex for the second time, if it will actually proceed with update_recirc_rules() after taking the bond rwlock. The problem appears to be that bond_update_post_recirc_rules() is called during the flow translation even if side effects are prohibited, which is the case for openflow PACKET_OUT handling. Skipping actual flow updates during the flow translation if side effects are disabled to avoid the deadlock. Since flows are not installed now when actions translated for very first packet, installing initial flows in bond_reconfigure(). This will cover the case of allocating a new recirc_id. Also checking if we need to update flows in bond_run() to cover link state changes. Regression test is added to catch the double lock case. Reported-at: https://github.com/openvswitch/ovs-issues/issues/259 Reported-by: Daniel Ding <zhihui.ding@easystack.cn> Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using recirculation") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofproto-dpif-upcall: Add debug commands to pause/resume revalidators.	Ilya Maximets	2022-09-16	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \|	New commands 'revalidator/pause' and 'revalidator/resume'. Not documented, since these should not be used in production environments. Will be used for unit tests in the next commit. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	test-list: Fix false-positive build failure with GCC 12.	Ilya Maximets	2022-09-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC 12.2.1 on Fedora 36 generates the following false-positive warning that is treated as error with -Werror: tests/test-list.c: In function 'test_list_construction': tests/test-list.c:110:9: error: 'values' may be used uninitialized 110 \| check_list(&list, values, n); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ For some reason it fails to recognize that array will not be used if 'n' equals zero. Fix that by just initializing arrays in full before using, since it's just a test code. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	tests: Fix tests with GNU grep 3.8.	Andreas Stieger	2022-09-14	7	-97/+97
\| \| \| \| \| \| \| \| \| \| \|	GNU grep 3.8 started to emit warnings when invoking egrep/fgrep. In some cases this breaks tests that check stderr. Replace the commands with their grep -E and grep -F counterparts throughout. Reported-at: https://bugzilla.opensuse.org/show_bug.cgi?id=1203239 Submitted-at: https://github.com/openvswitch/ovs/pull/395 Signed-off-by: Andreas Stieger <Andreas.Stieger@gmx.de> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	cirrus: Upgrade to FreeBSD 13.1 image.	Ilya Maximets	2022-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	13.1 got released in May and now we have problems updating some packages in 13.0 and CI is failing. Update to 13.1 to unblock the CI. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-linux: Skip some internal kernel stats gathering.	Jon Kohler	2022-09-09	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For netdev_linux_update_via_netlink(), hint to the kernel that we do not need it to gather netlink internal stats when we want to update the netlink flags, as those stats are not rendered within OVS. Background: ovs-vswitchd can spend quite a bit of time blocked by the kernel during netlink calls, especially systems with many cores. This time is dominated by the kernel-side internal stats gathering mechanism in netlink, specifically: inet6_fill_link_af inet6_fill_ifla6_attrs __snmp6_fill_stats64 In Linux 4.4+, there exists a hint for netlink requests to not trigger the ipv6 stats gathering mechanism, which greatly reduces the amount of time that ovs-vswitchd is on CPU. Testing and Results: Tested booting 320 VM's and measuring OVS utilization with perf record, then visualized into a flamegraph using a patched version of ovs 2.14.2. Calls under bridge_run() seem to get hit the worst by this issue. Before bridge_run() == 11.3% of samples After bridge_run() == 3.4% of samples Note that there are at least two observed netlink calls under bridge_run that are still kernel stats heavy after this patch: Call 1: bridge_run -> netdev_run -> route_table_run -> route_table_reset -> ovs_router_insert -> ovs_router_insert__ -> get_src_addr -> netdev_ger_addr_list -> netdev_linux_get_addr_list -> getifaddrs Since the actual netlink call is coming from getifaddrs() in glibc, fixing would likely involve either duplicating glibc code in ovs source or patch glibc. Call 2: bridge_run -> iface_refresh_stats -> netdev_get_stats -> netdev_linux_get_stats -> get_stats_via_netlink This does use netlink based stats; however, it isn't immediately clear if just dropping the stats from inet6_fill_link_af would impact anything or not. Given this call is more intermittent, its of lesser concern. Acked-by: Greg Smith <gasmith@nutanix.com> Signed-off-by: Jon Kohler <jon@nutanix.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>