delta/openvswitch.git - github.com: openvswitch/ovs.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Set release date for 3.0.0.v3.0.0	Ilya Maximets	2022-08-15	2	-2/+2
\| \| \| \| \| \|	Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	releases: Mark 2.17 as a new LTS release.	Ilya Maximets	2022-08-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	With release of OVS v3.0.0, according to our release process, 2.17.x becomes a new LTS series. Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	docs: Remove remaining references to OVS kmod and XenServer.	Ilya Maximets	2022-08-15	7	-88/+23
\| \| \| \| \| \| \| \| \| \| \|	README file still mentions a kernel module and some parts of the documentation still have XenServer references, e.g. 'xs-*' database configuration options. Removing them. Fixes: 422e90437854 ("make: Remove the Linux datapath.") Fixes: 83c9518e7c67 ("xenserver: Remove xenserver.") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	handlers: Fix handlers mapping.	Michael Santana	2022-08-15	3	-7/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The handler and CPU mapping in upcalls are incorrect, and this is specially noticeable systems with cpu isolation enabled. Say we have a 12 core system where only every even number CPU is enabled C0, C2, C4, C6, C8, C10 This means we will create an array of size 6 that will be sent to kernel that is populated with sockets [S0, S1, S2, S3, S4, S5] The problem is when the kernel does an upcall it checks the socket array via the index of the CPU, effectively adding additional load on some CPUs while leaving no work on other CPUs. e.g. C0 indexes to S0 C2 indexes to S2 (should be S1) C4 indexes to S4 (should be S2) Modulo of 6 (size of socket array) is applied, so we wrap back to S0 C6 indexes to S0 (should be S3) C8 indexes to S2 (should be S4) C10 indexes to S4 (should be S5) Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5 get no work assigned to them This leads to the kernel to throw the following message: "openvswitch: cpu_id mismatch with handler threads" Instead we will send the kernel a corrected array of sockets the size of all CPUs in the system, or the largest core_id on the system, which ever one is greatest. This is to take care of systems with non-continous core cpus. In the above example we would create a corrected array in a round-robin(assuming prime bias) fashion as follows: [S0, S1, S2, S3, S4, S5, S6, S0, S1, S2, S3, S4] Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") Co-authored-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Michael Santana <msantana@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	handlers: Create additional handler threads when using CPU isolation.	Michael Santana	2022-08-15	3	-2/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Additional threads are required to service upcalls when we have CPU isolation (in per-cpu dispatch mode). The reason additional threads are required is because it creates a more fair distribution. With more threads we decrease the load of each thread as more threads would decrease the number of cores each threads is assigned. Adding additional threads also increases the chance OVS utilizes all cores available to use. Some RPS schemas might make some handler threads get all the workload while others get no workload. This tends to happen when the handler thread count is low. An example would be an RPS that sends traffic on all even cores on a system with only the lower half of the cores available for OVS to use. In this example we have as many handlers threads as there are available cores. In this case 50% of the handler threads get all the workload while the other 50% get no workload. Not only that, but OVS is only utilizing half of the cores that it can use. This is the worst case scenario. The ideal scenario is to have as many threads as there are cores - in this case we guarantee that all cores OVS can use are utilized But, adding as many threads are there are cores could have a performance hit when the number of active cores (which all threads have to share) is very low. For this reason we avoid creating as many threads as there are cores and instead meet somewhere in the middle. The formula used to calculate the number of handler threads to create is as follows: handlers_n = min(next_prime(active_cores+1), total_cores) Assume default behavior when total_cores <= 2, that is do not create additional threads when we have less than 2 total cores on the system Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") Signed-off-by: Michael Santana <msantana@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	xenserver: Remove xenserver.	Greg Rose	2022-08-15	47	-7794/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the current xenserver implementation - it is obsolete and since 3.0 we do not support kernel module builds [1]. 1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html [i.maximets] Can be added back if people willing to maintain it will be found. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	acinclude: Improve vpopcntdq build check.	Cian Ferriter	2022-08-12	2	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for vpopcntdq instruction generation by the compiler was already checked in the OVS_CHECK_AVX512 AC function by checking if the compiler accepted the -mavx512vpopcntdq option. However, there can be situations where the compiler supports vpopcntdq generation but the assembler doesn't support the instruction. The below OVS_CHECK_AVX512VPOPCNTDQ AC function will check for both compiler and assembler support for the vpopcntdq instruction. Fixes: cb1c64007734 ("acinclude: Add seperate checks for AVX512 ISA.") Reported-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	packets: Fix misaligned access to ip6_hdr.	Ales Musil	2022-08-12	1	-2/+2
\| \| \| \| \| \| \| \| \|	The ip6_hdr is aligned to 4 bytes, but the pointer from dp_packet_l3 is aligned to 2 bytes. Use ovs_16aligned_ip6_hdr instead to get 2 bytes alignment. Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Do not send non-zero flag for a SSL socket.	Miro Tomaska	2022-08-12	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pyOpenSSL was recently switched for the Python standard library ssl module in the cited commit. Python SSLsocket.send() does not allow non-zero optional flag and it will explicitly raise an exception for that. pyOpenSSL did nothing with this flag but kept it to be compatible with socket API: https://github.com/pyca/pyopenssl/blob/main/src/OpenSSL/SSL.py#L1844 Fixes: 68543dd523bd ("python: Replace pyOpenSSL with ssl.") Reported-at: https://bugzilla.redhat.com/2115035 Acked-By: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Miro Tomaska <mtomaska@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb: Fix copying weak references into transaction history.	Ilya Maximets	2022-08-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Transaction history is used only to construct row data updates for clients, it's not used for checking data integrity, hence it doesn't need a copy of weak references. Not copying this data saves a lot of CPU cycles and memory in some cases. For example, in 250-node density-heavy scenario in ovn-heater these references can take up to 70% of RSS, which is about 8 GB of essentially wasted memory as reported by valgrind massif: ------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) ------------------------------------------------------------------------------- 20 1,011,495,832,314 11,610,557,104 10,217,785,620 1,392,771,484 0 88.00% (10,217,785,620B) (heap allocation functions) malloc/new/new[] ->70.47% (8,181,819,064B) 0x455372: xcalloc__ (util.c:121) ->70.07% (8,135,785,424B) 0x41609D: ovsdb_weak_ref_clone (row.c:66) ->70.07% (8,135,785,424B) 0x41609D: ovsdb_row_clone (row.c:151) ->34.74% (4,034,041,440B) 0x41B7C9: ovsdb_txn_clone (transaction.c:1124) \| ->34.74% (4,034,041,440B) 0x41B7C9: ovsdb_txn_add_to_history (transaction.c:1163) \| ->34.74% (4,034,041,440B) 0x41B7C9: ovsdb_txn_replay_commit (transaction.c:1198) \| ->34.74% (4,034,041,440B) 0x408C35: parse_txn (ovsdb-server.c:633) \| ->34.74% (4,034,041,440B) 0x408C35: read_db (ovsdb-server.c:663) \| ->34.74% (4,034,041,440B) 0x406C9D: main_loop (ovsdb-server.c:238) \| ->34.74% (4,034,041,440B) 0x406C9D: main (ovsdb-server.c:500) \| ->34.74% (4,034,041,440B) 0x41B7DE: ovsdb_txn_clone (transaction.c:1125) ->34.74% (4,034,041,440B) 0x41B7DE: ovsdb_txn_add_to_history (transaction.c:1163) ->34.74% (4,034,041,440B) 0x41B7DE: ovsdb_txn_replay_commit (transaction.c:1198) ->34.74% (4,034,041,440B) 0x408C35: parse_txn (ovsdb-server.c:633) ->34.74% (4,034,041,440B) 0x408C35: read_db (ovsdb-server.c:663) ->34.74% (4,034,041,440B) 0x406C9D: main_loop (ovsdb-server.c:238) ->34.74% (4,034,041,440B) 0x406C9D: main (ovsdb-server.c:500) Replacing ovsdb_row_clone() with ovsdb_row_datum_clone() to avoid cloning unnecessary metadata. The ovsdb_txn_clone() function re-named to avoid issues if it will be re-used in the future for some other use-case. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Simplify AVX512 build time checks to enhance readability.	Sunil Pai G	2022-08-10	4	-16/+16
\| \| \| \| \| \| \| \| \| \|	The preprocessor comparison string to check AVX512 capabilities are lengthy and effecting user readability. Simpify this by aliasing the checks. Suggested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	github: Move CI to ubuntu 20.04 base image.	Ilya Maximets	2022-08-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	18.04 image is deprecated and will disappear soon. Also some slowdowns and brownouts are planned to push users away from this deprecated version: https://github.com/actions/virtual-environments/issues/6002 Moving to 20.04. Can't move to 22.04 at the moment because of deprecation warnings from openssl 3.0. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-tc: Disable offload of IPv6 fragments.	Ilya Maximets	2022-08-08	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OVS kernel datapath and TC are parsing IPv6 fragments differently. For IPv6 later fragments, according to the original design [1], OVS always sets nw_proto=44 (IPPROTO_FRAGMENT), regardless of the type of the L4 protocol. This leads to situation where flow for nw_proto=44 gets installed to TC, but packets can not match on it, causing all IPv6 later fragments to go to userspace significantly degrading performance. Disabling offload for such packets, so the flow can be installed to the OVS kernel datapath instead. Disabling for all IPv6 fragments including the first one, because it doesn't make a lot of sense to handle them separately. It may also cause potential problems with conntrack trying to re-assemble a packet from fragments handled by different datapaths (first in HW, later in OVS kernel). Checking both 'nw_proto' and 'nw_frag' as classifier might decide to match only on one of them and also nw_proto will not be 44 for the first fragment. The issue was hidden for some time due to incorrect behavior of the OVS kernel datapath that was recently fixed in kernel commit: 12378a5a75e3 ("net: openvswitch: fix parsing of nw_proto for IPv6 fragments") To allow offloading in the future either flow dissector in TC should be changed to parse packets in the same way as OVS does, or parsing in OVS kernel and userspace should be made configurable, so users can opt-in to the behavior change. Silent change of the behavior (change by default) is not an option, because existing OpenFlow pipelines may depend on a certain behavior. [1] https://docs.openvswitch.org/en/latest/topics/design/#fragments Fixes: 83e866067ea6 ("netdev-tc-offloads: Add support for IP fragmentation") Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovs-save: Use right OpenFlow version for add-tlv-map.	Han Ding	2022-08-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	When the bridge protocols is not included Openflow10, printing an error message "version negotiation failed" when doing "Restoring saved flows". Signed-off-by: Han Ding <handing@chinatelecom.cn> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-traffic: Fix IPv4 fragmentation test sequence for check-kernel.	Paolo Valerio	2022-08-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following test sequence: conntrack - IPv4 fragmentation incomplete reassembled packet conntrack - IPv4 fragmentation with fragments specified leads to a systematic failure of the latter test on the kernel datapath (linux). Multiple executions of the former may also lead to multiple failures. This is due to the fact that fragments not yet reassembled are kept in a queue for /proc/sys/net/ipv4/ipfrag_time seconds, and if the kernel receives a fragment already present in the queue, it returns -EINVAL. Below the related log message: \|00058\|dpif\|WARN\|system@ovs-system: execute ct(commit) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a, nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,nw_frag=first,tp_src=1, tp_dst=2 udp_csum:0 Fix the sequence by sending the second fragment in "conntrack - IPv4 fragmentation incomplete reassembled packet", once the checks are done. IPv6 tests are not affected as the defrag kernel code path pretends to add the duplicate fragment to the queue returning -EINPROGRESS, when a duplicate is detected. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-traffic: Fix incorrect neigh entry in ipv6 header modification test.	Ilya Maximets	2022-08-08	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The permanent neighbor entry for fc00::1 is added into a wrong namespace, so in order to reply to a ping from at_ns1, the address of fc00::1 has to be discovered. Interfaces are attached to OVS and we're removing flows that can forward ND requests after initial setup. In case ND request wasn't sent and replied before that, at_ns1 will not be able to discover fc00:1 and won't reply to pings. It's hard to catch this condition while running tests locally, but for some reason our CI is failing consistently. Fix the issue by removing all the unnecessary permanent entries and just allowing all the normal traffic to flow through the low priority OVS flow, so all addresses can be discovered. Also adding one more wait to avoid occasional drops of the very first packet. Fixes: 2ff43c78c685 ("packets: Re-calculate IPv6 checksum only for first frag upon modify.") Acked-by: Salem Sol <salems@nvidia.com> Acked-by: Michael Phelan <michael.phelan@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-traffic: Don't run IPv6 header modification test on kernels < 5.19.	Ilya Maximets	2022-08-08	3	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OVS kernel module is incorrectly updating checksums while changing IPv6 fields of later fragments that doesn't really have L4 headers. This makes the 'ping6 between two ports with header modify' test fail on most of the distribution kernels. The issue got indirectly fixed in latest 5.19 with commit: 12378a5a75e3 ("net: openvswitch: fix parsing of nw_proto for IPv6 fragments") The reason is that set_ipv6() function in net/openvswitch/actions.c is using the protocol number from the parsed flow key and not from the packet itself, and nw_proto=44 is not a protocol where we can update the checksum. It was backported to all supported upstream stable trees, but didn't find its way to most of the distributions yet. Restricting the test to 5.19+ kernels to avoid failures on distro kernels. Additionally allowing the previous test for later fragments to be executed in userspace testsuite. Fixes: 2ff43c78c685 ("packets: Re-calculate IPv6 checksum only for first frag upon modify.") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Fix E275 missing whitespace after keyword.	Ilya Maximets	2022-08-04	4	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With just released flake8 5.0 we're getting a bunch of E275 errors: utilities/bugtool/ovs-bugtool.in:959:23: E275 missing whitespace after keyword tests/test-ovsdb.py:623:11: E275 missing whitespace after keyword python/setup.py:105:8: E275 missing whitespace after keyword python/setup.py:106:8: E275 missing whitespace after keyword python/ovs/db/idl.py:145:15: E275 missing whitespace after keyword python/ovs/db/idl.py:167:15: E275 missing whitespace after keyword make[2]: *** [flake8-check] Error 1 This breaks CI on branches below 2.16. We don't see a problem right now on newer branches because we're installing extra dependencies that backtrack flake8 down to 4.1 or even 3.9. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	tc: Use sparse hex dump while printing inconsistencies.	Ilya Maximets	2022-08-04	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of a very long hex string something like this will be printed: \|DBG\|tc flower compare failed mask compare: Expected Mask: 00000000 ff ff 00 00 ff ff ff ff-ff ff ff ff ff ff ff ff 00000020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 03 00 00000090 00 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff 000000c0 ff 00 00 00 ff ff 00 00-ff ff ff ff ff ff ff ff Received Mask: 00000000 ff ff 00 00 ff ff ff ff-ff ff ff ff ff ff ff ff 00000020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 03 00 00000090 00 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff 000000c0 ff 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff It's easier to spot the difference this way and count which bytes are to blame, since offsets are printed as well. Using a sparse dump to avoid printing huge number of all-zero lines. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-tc: Print unused mask bits on failure.	Ilya Maximets	2022-08-04	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change extends the debug logging with the sparse dump of the flow mask structure to make debug process easier. Sample output: \|netdev_offload_tc\|DBG\|offloading isn't supported, unknown attribute Unused mask bits: 00000270 00 00 00 00 00 00 00 00-00 00 00 ff 00 00 00 00 In this example, 0x270 + 11 = 635, which is an offset of the nsh.mdtype in the struct flow. Suggested-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dynamic-string: Add function for a sparse hex dump.	Ilya Maximets	2022-08-04	2	-8/+30
\| \| \| \| \| \| \| \|	New function to dump large and sparsely populated data structures like struct flow. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-offloads-traffic: Fix waiting for netcat indefinitely.	Ilya Maximets	2022-08-04	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	$NC_EOF_OPT should be used to avoid some netcat implementations to wait indefinitely. This fixes the check-offloads testsuite hanging in Ubuntu 22.04. Fixes: 5660b89a309d ("dpif-netlink: Offloading meter to tc police action") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netlink: Fix incorrect bit shift in compat mode.	Ilya Maximets	2022-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior in lib/dpif-netlink.c:1077:40: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' #0 0x73fc31 in dpif_netlink_port_add_compat lib/dpif-netlink.c:1077:40 #1 0x73fc31 in dpif_netlink_port_add lib/dpif-netlink.c:1132:17 #2 0x2c1745 in dpif_port_add lib/dpif.c:597:13 #3 0x07b279 in port_add ofproto/ofproto-dpif.c:3957:17 #4 0x01b209 in ofproto_port_add ofproto/ofproto.c:2124:13 #5 0xfdbfce in iface_do_create vswitchd/bridge.c:2066:13 #6 0xfdbfce in iface_create vswitchd/bridge.c:2109:13 #7 0xfdbfce in bridge_add_ports__ vswitchd/bridge.c:1173:21 #8 0xfb5319 in bridge_add_ports vswitchd/bridge.c:1189:5 #9 0xfb5319 in bridge_reconfigure vswitchd/bridge.c:901:9 #10 0xfae0f9 in bridge_run vswitchd/bridge.c:3334:9 #11 0xfe67dd in main vswitchd/ovs-vswitchd.c:129:9 #12 0x4b6d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) #13 0x4b6e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) #14 0x562594eed024 in _start (vswitchd/ovs-vswitchd+0x787024) Fixes: 526df7d8543f ("tunnel: Provide framework for tunnel extensions for VXLAN-GBP and others") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Use setuptools instead of distutils.	Timothy Redaelli	2022-08-04	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Python 3.12, distutils will be removed and it's currently (3.10+) deprecated (see PEP 632). Since the suggested and simplest replacement is setuptools, this commit replaces distutils to use setuptools instead. setuptools < 59.0 doesn't have setuptools.errors and so, in this case, distutils.errors is still used. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	packets: Re-calculate IPv6 checksum only for first frag upon modify.	Salem Sol	2022-08-04	2	-4/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	In case of modifying an IPv6 packet src/dst address the L4 checksum should be recalculated only for the first frag. Currently it's done for all frags, leading to incorrect reassembled packet checksum. Fix it by adding a new flag to recalculate the checksum only for the first frag. Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action") Signed-off-by: Salem Sol <salems@nvidia.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-linux: set correct action for packets that passed policer	Vlad Buslov	2022-08-04	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Referenced commit changed policer action type from TC_ACT_UNSPEC (continue) to TC_ACT_PIPE. However, since neither TC hardware offload layer nor mlx5 driver at the time validated action type and always assumed 'continue', the breakage wasn't caught until later validation code was added. The change also broke valid configuration when sending from offload-capable device to non-offload capable. For example, when sending from mlx5 VF to OvS bridge netdevice the traffic that passed matchall classifier with policer could no longer match the following flower rule in software: filter protocol all pref 1 matchall chain 0 filter protocol all pref 1 matchall chain 0 handle 0x1 in_hw (rule hit 7863) action order 1: police 0x1 rate 32Mbit burst 1000Kb mtu 64Kb action drop/pipe overhead 0b ref 1 bind 1 installed 17 sec firstused 17 sec Action statistics: Sent 152199634 bytes 102550 pkt (dropped 1315, overlimits 1315 requeues 0) Sent software 74612172 bytes 51275 pkt Sent hardware 77587462 bytes 51275 pkt backlog 0b 0p requeues 0 used_hw_stats delayed filter protocol ip pref 3 flower chain 0 filter protocol ip pref 3 flower chain 0 handle 0x1 dst_mac aa:94:1f:f2:f8:44 src_mac e4:00:01:08:00:02 eth_type ipv4 ip_flags nofrag not_in_hw action order 1: skbedit ptype host pipe index 1 ref 1 bind 1 installed 6 sec used 6 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: mirred (Ingress Redirect to device br-ovs) stolen index 1 ref 1 bind 1 installed 6 sec used 6 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 401a9c8b3d403c62240d3eb5e21c1604 no_percpu Fix the issue by restoring matchall and basic policers action type to 'continue'. Fixes: c2567e533f8a ("add port-based ingress policing based packet-per-second rate-limiting") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>
*	test-ovsdb: Fix false-positive leaks from LeakSanitizer.	Ilya Maximets	2022-07-29	2	-4/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LeakSanitizer for some reason reports these json objects as leaked, even though we do have references to them at the moment ovs_fatal() called from check_ovsdb_error(). Previously it complained only with -O2, but with newer versions of clang/llvm it started complaining even with -O1. For example, negative ovsdb parsing tests are failing on ubuntu 22.04 with clang 14 if built with ASan and detect_leaks=1. Fix that by destroying the json object before aborting the process. And we may also build with default -O2 in CI with that change. Alternative implementation might be to just pass the json to destroy to every check_ovsdb_error() call, but indirect registering of the pointer seems a bit less invasive. Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	m4: Update ax_func_posix_memalign to the latest version.	Ilya Maximets	2022-07-29	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the obsolescence warning for AC_TRY_RUN with autoconf 2.70+: $ ./boot.sh configure.ac:141: warning: The macro `AC_TRY_RUN' is obsolete. configure.ac:141: You should run autoupdate. ./lib/autoconf/general.m4:2997: AC_TRY_RUN is expanded from... lib/m4sugar/m4sh.m4:692: _AS_IF_ELSE is expanded from... lib/m4sugar/m4sh.m4:699: AS_IF is expanded from... ./lib/autoconf/general.m4:2249: AC_CACHE_VAL is expanded from... ./lib/autoconf/general.m4:2270: AC_CACHE_CHECK is expanded from... m4/ax_func_posix_memalign.m4:27: AX_FUNC_POSIX_MEMALIGN is expanded from... configure.ac:141: the top level Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	m4: Replace obsolete AC_HELP_STRING with AS_HELP_STRING.	Ilya Maximets	2022-07-29	2	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AS_HELP_STRING is a direct replacement for AC_HELP_STRING. It is available since autoconf 2.57a. OVS requires 2.63, so AS_HELP_STRING can be freely used. This fixes the following warning on systems with 2.70+: $ ./boot.sh ... configure.ac:92: warning: The macro `AC_HELP_STRING' is obsolete. configure.ac:92: You should run autoupdate. ... Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	debian: Fix incorrect linkage of the python C extension.	Ilya Maximets	2022-07-29	2	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current version of debian/rules simply passes the libopenvswitch.a as a command line argument via LDFLAGS, but that doesn't actually lead to this library being statically linked into python extension, which is a shared library. Instead, the build "succeeds", but the resulted extension is not usable, because most of the symbols are missing: from ovs import _json ImportError: /usr/lib/python3/dist-packages/ovs/_json.cpython-310-x86_64-linux-gnu.so: undefined symbol: json_parser_finish '-lopenvswitch' with a path to a static library should be passed instead to make it actually statically linked. But even that is not enough as all the libraries that libopenvswitch.a was built with also has to be passed. Otherwise, we'll have unresolved symbols like ssl, cap-ng, etc. The most convenient way to get all the required libraries and cflags seems to be by using pkg-config. Setting several environment variables for pkg-config, so it can find the libopenvswitch.pc in non-standard directory, not skip default locations and also report them with the right base directory. Extra '-Wl,-Bstatic -lopenvswitch -Wl,-Bdynamic' is added before all the libs to ensure static linking of libopenvswitch even if the dynamic library is available in a system. One more problem here is that it is not possible to link static library into dynamic library if the static one is not position independent. So, we have to build everything with -fPIC, otherwise it's not possible to build C extensions. Also added a simple CI script to check that we're able to use python C extension after installing a package. Fixes: 6ad3be9749ab ("debian: Fix build of python json C extension.") Acked-by: Frode Nordahl <frode.nordahl@canonical.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Add ability to pass extra libs and cflags for C extension.	Ilya Maximets	2022-07-29	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \|	In order to correctly link with static libopenvswitch.a library, users should also provide required cflags and all the libraries libopenvswitch.a was actually built with and depends on. Otherwise, it's not possible to link correctly. Fixes: 671f93fe42d3 ("python: Allow building json C extension with static OVS library.") Acked-by: Frode Nordahl <frode.nordahl@canonical.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	libopenvswitch.pc: Add missing libs for a static build.	Ilya Maximets	2022-07-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSL, BPF, lcap-ng and other libraries are in use by a static library, so they has to be linked while building applications with that static library, i.e. 'pkg-config --libs --static libopenvswitch' must return -lssl, -lcap-ng, etc. in the output for a successful build. For dynamic library (non-private Libs) all these libraries will be dynamically linked to libopenvswitch.so, so the application will pick them up without having a direct dependency. Acked-by: Frode Nordahl <frode.nordahl@canonical.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	rhel: Stop installing internal headers.	Ilya Maximets	2022-07-29	6	-36/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, openvswitch-devel installs following header tree: /usr/include /openflow/.h /openvswitch /.h /openflow/.h /openvswitch/.h /sparse/.h /lib/.h Few issues with that: 1. openflow and openvswitch headers are installed twice. Once in the main /usr/include and second time in the /usr/include/openvswitch/. 2. For some reason internal headers such as lib/.h and fairly useless headers such as sparse/.h are installed as well. One more issue is that current pkg-config files doesn't work with builds installed with 'make install', because 'make install' doesn't create this weird header tree. While double install of same headers is not a huge problem, it doesn't seem right. Installation of the internal headers is a bigger issue. They are not part of API/ABI and we do not provide any stability guarantees for them. We are making incompatible changes constantly in minor updates, so users should not rely on these headers. If it's necessary for some external application to use them, this external application should not link with libopenvswitch dynamically and also it can't expect the static library to not break these API/ABI, hence there is no real point installing them. Application should use OVS as a submodule like OVN does or compile itself by obtaining required version of OVS sources otherwise. Another option is to properly export and install required headers. pkg-config configuration files updated as necessary. Fixes: 4886d4d2495b ("debian, rhel: Ship ovs shared libraries and header files") Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python-c-ext: Handle initialization failures.	Ilya Maximets	2022-07-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	PyModule_AddObject() may fail and it doesn't steal references in this case. The error condition should be handled to avoid possible memory leaks. And while it's not strictly specified if PyModule_Create may fail, most of the examples in python documentation include handling of a NULL case. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-linux: Do not touch LAG members if master is not attached to OVS.	Tao Liu	2022-07-26	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bond master netdev may be created without a classification type, due to routing or tunneling code. If bond master is not attached to ovs, the ingress block on LAG members should not be updated. Simple reproducer: tc q ls dev net3 ingress ip a add 10.1.1.1/30 dev bond0 ip l set net3 master bond0 tc q ls dev net3 ingress Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev: Clear auto_classified if netdev reopened with the type specified.	Tao Liu	2022-07-26	1	-18/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	When netdev first opened by netdev_open(..., NULL, ...), netdev_class sets to system by default, and auto_classified sets to true. If netdev reopens by netdev_open(..., "system", ...), auto_classified should be cleared. This will be used in next patch to fix lag issue. Fixes: 8c2c225e481d ("netdev: Fix netdev_open() to track and recreate classless interfaces") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-offloads-traffic: Avoid check_pkt_len action test random failures.	David Marchand	2022-07-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On my Fedora 36, the test with enabled offloads often fails with one of those ping failing. By chance (?), the previous tcpdumps are not stopped and I can see for example: 10:04:02.534492 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 2, length 72 10:04:02.639443 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 3, length 72 10:04:02.743447 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 4, length 72 10:04:02.846447 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 5, length 72 10:04:02.950519 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 6, length 72 10:04:03.054697 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 7, length 72 10:04:03.158448 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 8, length 72 10:04:03.262541 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 9, length 72 10:04:03.366444 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 10, length 72 10:04:03.466501 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 62835, seq 11, length 72 The first ping request has not been handled correctly. Adding a sleep 1 (like other offloads unit tests) seems to be enough to avoid this situation. Fixes: 02dabb21f243 ("tests: Add check_pkt_len action test to system-offload-traffic.") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-traffic: Properly stop dangling ping after geneve test.	Ilya Maximets	2022-07-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	Ping process remains in the system after the test. Using a proper macro that will correctly register it for stopping at cleanup stage. Fixes: 134e6831acca ("system-traffic: Check frozen state handling with TLV map change") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: David Marchand <david.marchand@redhat.com>
*	conntrack: Fix conntrack multiple new state.	Eli Britstein	2022-07-25	2	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \|	A connection is established if we see packets from both directions. The cited commit fixed the issue of sending twice in one direction, but still an issue if more than that. Fix it. Fixes: a867c010ee91 ("conntrack: Fix conntrack new state") Signed-off-by: Eli Britstein <elibr@nvidia.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python-c-ext: Fix a couple of build warnings.	Timothy Redaelli	2022-07-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	ovs/_json.c:67:20: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] ovs/_json.c:132:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare] Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python-c-ext: Remove Python 2 support.	Timothy Redaelli	2022-07-22	1	-33/+1
\| \| \| \| \| \| \| \| \|	Since Python 2 is not supported anymore, remove Python 2 support from C extension too Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.") Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	odp-execute: Avoid unnecessary logging for action implementations.	Ilya Maximets	2022-07-22	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need to log if the implementation didn't change. Scalar one is default, any change will be logged. And availability is not really important to log at INFO level. Moving these logs to DBG level to avoid littering the log file and confusing users. We do the same for miniflow_extract and datapath interface implementations. Additionally text of the log message made more readable and uniform with the one used for miniflow_extract. Fixes: 95e4a35b0a1d ("odp-execute: Add function pointers to odp-execute for different action implementations.") Acked-by: Emma Finn <emma.finn@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-dpdk: Add testpmd clean up in MTU unit tests.	Michael Phelan	2022-07-22	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	The MTU vport unit tests do not clean up testpmd after use which causes them to fail randomly. This commit amends the MTU vport unit tests to clean up testpmd after running. Fixes: bf47829116a8 ("tests: Add OVS-DPDK MTU unit tests.") Reported-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Michael Phelan <michael.phelan@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-dpdk: Setting RSS hash types in RSS action.	Harold Huang	2022-07-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we send parallel flows such as VXLAN to a PF[1] port in OVS with multiple PMDs. OVS will create a RTE flow with Mark and RSS actions to send flows to the software data path. But the RSS action does not work well and all the flows are forwarded to a single PMD. This is because RSS hash types should be set in RSS action. [1]: In our testbed, a Mellanox ConnectX-6 is used as a PF port. [i.maximets] DPDK PMD drivers supposed to provide "best-effort" RSS configuration if the type is set to zero. However, they are very inconsistent in practice and barely put any effort to provide a good configuration. For example, mlx5 driver seems to use just RTE_ETH_RSS_IP, which is not enough for most deployments. Setting the types the same way we configure them for a normal RSS in netdev-dpdk to workaround the scalability issue. Signed-off-by: Harold Huang <baymaxhuang@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	lib: Print nw_frag in flow key.	Rosemarie O'Riorden	2022-07-19	11	-261/+261
\| \| \| \| \| \| \| \| \| \| \|	nw_frag was not being printed in the flow key because it was improperly masked and printed. Since this field is only two bits, it needs to use a different macro to be masked. During printing, the switch statement switched on the whole 8 bits rather than just the two that are relevant. This caused nw_frag to often not be printed at all. Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb: Remove extra make target dependency for local-config.5.	Ilya Maximets	2022-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \|	ovsdb/ directory should not be a dependency, otherwise the man page is getting re-built every time unrelated files are changed. Fixes: 6f24c2bc769a ("ovsdb: Add Local_Config schema.") Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ci: Prefer pip3 to install unit test dependencies.	David Marchand	2022-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	While it looks like the right python3 versions of those dependencies seems to be installed in the CI, prefer calling this via pip3 like the rest of the script. Fixes: 445dceb88461 ("python: Introduce unit tests.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Prepare for 3.0.0.	Ilya Maximets	2022-07-15	7	-18/+19
\| \| \| \| \| \|	Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofproto/bond: Add knob 'all-members-active'.	Christophe Fontaine	2022-07-15	6	-1/+162
\| \| \| \| \| \| \| \| \| \|	This config param allows the delivery of broadcast and multicast packets to the secondary interface of non-lacp bonds, equivalent to the option 'all_slaves_active' for Linux kernel bonds. Reported-at: https://bugzilla.redhat.com/1720935 Signed-off-by: Christophe Fontaine <cfontain@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Add unit tests for filtering engine.	Adrian Moreno	2022-07-15	2	-0/+222
\| \| \| \| \| \| \| \|	Add unit test for OFFilter class. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>