summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* test-stream: Add ssl tests for stream open block.HEADmasterStefan Hoffmann2023-05-113-5/+56
| | | | | | | | | This tests stream.c and stream.py with ssl connection at CHECK_STREAM_OPEN_BLOCK. For the tests, ovsdb needs to be build with libssl. Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tests-ovsdb: Switch OVSDB_START_IDLTEST to macro.Stefan Hoffmann2023-05-111-38/+45
| | | | | | | | Define bash function as macro now. Later we can extend this macro for other usecases. Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* AUTHORS: Add Zhiqi Chen.Ilya Maximets2023-05-111-5/+6
| | | | | | | Additionally re-sorted part of the list that was particularly not ordered. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Fix dereferencing null pointer in parse_ct_limit_zones().Zhiqi Chen2023-05-112-2/+19
| | | | | | | | | Command with empty string following "dpctl/ct-get-limits zone=" such as "ovs-appctl dpctl/ct-get-limits zone=" will cause parse_ct_limit_zones() dereferencing null. Signed-off-by: Zhiqi Chen <chenzhiqi.123@bytedance.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload: Fix deadlock/recursive use of the netdev_hmap_rwlock rwlock.Eelco Chaudron2023-05-101-32/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing performance testing with OVS v3.1 we ran into a deadlock situation with the netdev_hmap_rwlock read/write lock. After some debugging, it was discovered that the netdev_hmap_rwlock read lock was taken recursively. And well in the following sequence of events: netdev_ports_flow_get() It takes the read lock, while it walks all the ports in the port_to_netdev hmap and calls: - netdev_flow_get() which will call: - netdev_tc_flow_get() which will call: - netdev_ifindex_to_odp_port() This function also takes the same read lock to walk the ifindex_to_port hmap. In OVS a read/write lock does not support recursive readers. For details see the comments in ovs-thread.h. If you do this, it will lock up, mainly due to OVS setting the PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP attribute to the lock. The solution with this patch is to use two separate read/write locks, with an order guarantee to avoid another potential deadlock. Fixes: 9fe21a4fc12a ("netdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182541 Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix use-after-free when xlate_actions().Yunjian Wang2023-05-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, bundle->cvlans and xbundle->cvlans are pointing to the same memory location. This can cause issues if the main thread modifies bundle->cvlans and frees it while the revalidator thread is still accessing xbundle->cvlans. This leads to use-after-free error. AddressSanitizer: heap-use-after-free on address 0x615000007b08 at pc 0x0000004ede1e bp 0x7f3120ee0310 sp 0x7f3120ee0300 READ of size 8 at 0x615000007b08 thread T25 (revalidator25) 0 0x4ede1d in bitmap_is_set lib/bitmap.h:91 1 0x4fcb26 in xbundle_allows_cvlan ofproto/ofproto-dpif-xlate.c:2028 2 0x4fe279 in input_vid_is_valid ofproto/ofproto-dpif-xlate.c:2294 3 0x502abf in xlate_normal ofproto/ofproto-dpif-xlate.c:3051 4 0x5164dc in xlate_output_action ofproto/ofproto-dpif-xlate.c:5361 5 0x522576 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7047 6 0x52a751 in xlate_actions ofproto/ofproto-dpif-xlate.c:8061 7 0x4e2b66 in xlate_key ofproto/ofproto-dpif-upcall.c:2212 8 0x4e2e13 in xlate_ukey ofproto/ofproto-dpif-upcall.c:2227 9 0x4e345d in revalidate_ukey__ ofproto/ofproto-dpif-upcall.c:2276 10 0x4e3f85 in revalidate_ukey ofproto/ofproto-dpif-upcall.c:2395 11 0x4e7ac5 in revalidate ofproto/ofproto-dpif-upcall.c:2858 12 0x4d9ed3 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:1010 13 0x7cd92e in ovsthread_wrapper lib/ovs-thread.c:423 14 0x7f312ff01f3a (/usr/lib64/libpthread.so.0+0x8f3a) 15 0x7f312fc8f51f in clone (/usr/lib64/libc.so.6+0xf851f) 0x615000007b08 is located 8 bytes inside of 512-byte region [0x615000007b00,0x615000007d00) freed by thread T0 here: 0 0x7f3130378ad8 in free (/usr/lib64/libasan.so.4+0xe0ad8) 1 0x49044e in bundle_set ofproto/ofproto-dpif.c:3431 2 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455 3 0x40e6c9 in port_configure vswitchd/bridge.c:1300 4 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921 5 0x41f1a9 in bridge_run vswitchd/bridge.c:3313 6 0x42d4fb in main vswitchd/ovs-vswitchd.c:132 7 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86) previously allocated by thread T0 here: 0 0x7f3130378e70 in __interceptor_malloc 1 0x8757fe in xmalloc__ lib/util.c:140 2 0x8758da in xmalloc lib/util.c:175 3 0x875927 in xmemdup lib/util.c:188 4 0x475f63 in bitmap_clone lib/bitmap.h:79 5 0x47797c in vlan_bitmap_clone lib/vlan-bitmap.h:40 6 0x49048d in bundle_set ofproto/ofproto-dpif.c:3433 7 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455 8 0x40e6c9 in port_configure vswitchd/bridge.c:1300 9 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921 10 0x41f1a9 in bridge_run vswitchd/bridge.c:3313 11 0x42d4fb in main vswitchd/ovs-vswitchd.c:132 12 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86) Fixes: fed8962aff57 ("Add new port VLAN mode "dot1q-tunnel"") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ci: Separate DPDK from OVS build.David Marchand2023-05-056-78/+145
| | | | | | | | | | | | | | Let's separate DPDK compilation from the rest of OVS build: - this avoids multiple jobs building DPDK in parallel, which especially affects builds in the dpdk-latest branch, - we separate concerns about DPDK build requirements from OVS build requirements, like python dependencies, - building DPDK does not depend on how we will link OVS against it, so we can use a single cache entry regardless of DPDK_SHARED option, Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb-idl.at: Fix write-changed-only tests without change tracking.Ilya Maximets2023-05-041-2/+2
| | | | | | | | | | | | | | The '-w' command line argument is not passed to test-ovsdb in the OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C, so it juts repeats normal tests without testing the feature. Adding the flag. And using the long version of the flag to make things more obvious and harder to overlook. Swapping the argument in the other working test as well, just for consistency. Fixes: d94cd0d3eec3 ("ovsdb-idl: Support write-only-changed IDL monitor mode.") Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tc: Fix cleaning chains.Roi Dayan2023-04-282-4/+11
| | | | | | | | | | | | | Sometimes there is a need to clean empty chains as done in delete_chains_from_netdev(). The cited commit doesn't remove the chain completely which cause adding ingress_block later to fail. This can be reproduced with adding bond as ovs port which makes ovs use ingress_block for it. While at it add the netdev name that fails to the log. Fixes: e1e5eac5b016 ("tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* AUTHORS: Add Stefan, Luca and Max.Ilya Maximets2023-04-261-2/+5
| | | | | | Also, slightly re-sort the list to fix the order. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* python-stream: Handle SSL error in do_handshake.Stefan Hoffmann2023-04-261-1/+2
| | | | | | | | | | | | | | | | | | | In some cases ovsdb server or relay gets restarted, ovsdb python clients may keep the local socket open. Instead of reconnecting a lot of failures will be logged. This can be reproduced with ssl connections to the server/relay and restarting it, so it has the same IP after restart. This patch catches the Exceptions at do_handshake to recreate the connection on the client side. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Fix memory leak dpif_netlink_open().Yunjian Wang2023-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the specific call to dpif_netlink_dp_transact() (line 398) in dpif_netlink_open(), the 'dp' content is not being used in the branch when no error is returned (starting line 430). Furthermore, the 'dp' and 'buf' variables are overwritten later in this same branch when a new netlink request is sent (line 437), which results in a memory leak. Reported by Address Sanitizer. Indirect leak of 1024 byte(s) in 1 object(s) allocated from: 0 0x7fe09d3bfe70 in __interceptor_malloc (/usr/lib64/libasan.so.4+0xe0e70) 1 0x8759be in xmalloc__ lib/util.c:140 2 0x875a9a in xmalloc lib/util.c:175 3 0x7ba0d2 in ofpbuf_init lib/ofpbuf.c:141 4 0x7ba1d6 in ofpbuf_new lib/ofpbuf.c:169 5 0x9057f9 in nl_sock_transact lib/netlink-socket.c:1113 6 0x907a7e in nl_transact lib/netlink-socket.c:1817 7 0x8b5abe in dpif_netlink_dp_transact lib/dpif-netlink.c:5007 8 0x89a6b5 in dpif_netlink_open lib/dpif-netlink.c:398 9 0x5de16f in do_open lib/dpif.c:348 10 0x5de69a in dpif_open lib/dpif.c:393 11 0x5de71f in dpif_create_and_open lib/dpif.c:419 12 0x47b918 in open_dpif_backer ofproto/ofproto-dpif.c:764 13 0x483e4a in construct ofproto/ofproto-dpif.c:1658 14 0x441644 in ofproto_create ofproto/ofproto.c:556 15 0x40ba5a in bridge_reconfigure vswitchd/bridge.c:885 16 0x41f1a9 in bridge_run vswitchd/bridge.c:3313 17 0x42d4fb in main vswitchd/ovs-vswitchd.c:132 18 0x7fe09cc03c86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86) Fixes: b841e3cd4a28 ("dpif-netlink: Fix feature negotiation for older kernels.") Reviewed-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofp-parse: Check ranges on string to uint32_t conversion.Yunjian Wang2023-04-251-5/+2
| | | | | | | | | | | An unnecessarily overflow would occurs when the 'value' is longer than 4294967295. So it's required to check ranges to avoid uint32_t overflow. Reported-by: Nan Zhou <zhounan14@huawei.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* util: Fix an issue that thread name cannot be set.Songtao Zhan2023-04-251-0/+6
| | | | | | | | | | | The name of the current thread consists of a name with a maximum length of 16 bytes and a thread ID. The final name may be longer than 16 bytes. If the name is longer than 16 bytes, the thread name will fail to be set Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* doc: Fix the list of supported tunnels in README.Nobuhiro MIKI2023-04-251-1/+1
| | | | | | | | Without distinguishing between IPv4 and IPv6, such as GRE and GRE-IPv6, nine types of tunneling are currently supported. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* learning-switch: Fix coredump of OpenFlow15 learning-switch.Faicker Mo2023-04-254-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OpenFlow15 Packet-Out message contains the match instead of the in_port. The flow.tunnel.metadata.tab is not inited but used in the loop of tun_metadata_to_nx_match. The coredump gdb backtrace is: 0 memcpy_from_metadata (dst=0x2f060, src=0x30880, loc=0x10) at lib/tun-metadata.c:467 1 metadata_loc_from_match_read (match=0x30598, is_masked=<..>, mask=0x30838, idx=0, map=0x0) at lib/tun-metadata.c:865 2 metadata_loc_from_match_read (is_masked=<...>, mask=0x30838, idx=0, match=0x30598, map=0x0) at lib/tun-metadata.c:854 3 tun_metadata_to_nx_match (b=0x892260, oxm=OFP15_VERSION, match=0x30598) at lib/tun-metadata.c:888 4 nx_put_raw (b=0x892260, oxm=OFP15_VERSION, match=0x30598, cookie=<...>, cookie=0, cookie_mask=<...>, cookie_mask=0) at lib/nx-match.c:1186 5 oxm_put_match (b=0x892260, match=0x30598, version=OFP15_VERSION) at lib/nx-match.c:1343 6 ofputil_encode_packet_out (po=0x30580, protocol=<...>) at lib/ofp-packet.c:1226 7 process_packet_in (sw=0x891d70, oh=<...>) at lib/learning-switch.c:619 8 lswitch_process_packet (msg=0x892210, sw=0x891d70) at lib/learning-switch.c:374 9 lswitch_run (sw=0x891d70) at lib/learning-switch.c:324 10 main (argc=<...>, argv=<...>) at utilities/ovs-testcontroller.c:180 Fix that by initing the flow metadata. Fixes: 35eb6326d5d0 ("ofp-util: Add flow metadata to ofputil_packet_out") Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb: Monitor: Keep and maintain the initial change set.Ilya Maximets2023-04-241-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change sets in OVSDB monitor are storing all the changes that happened between a particular transaction ID and now. Initial change set basically contains all the data. On each monitor request a new initial change set is created by creating an empty change set and adding all the database rows. Then it is converted into JSON reply and immediately untracked and destroyed. This is causing significant performance issues if many clients are requesting new monitors at the same time. For example, that is happening after database schema conversion, because conversion triggers cancellation of all monitors. After cancellation, every client sends a new monitor request. The server then creates a new initial change set, sends a reply, destroys initial change set and repeats that for each client. On a system with 200 MB database and 500 clients, cluster of 3 servers spends 20 minutes replying to all the clients (200 MB x 500 = 100 GB): timeval|WARN|Unreasonably long 1201525ms poll interval Of course, all the clients are already disconnected due to inactivity at this point. When they are re-connecting back, server accepts new connections one at a time, so inactivity probes will not be triggered anymore, but it still takes another 20 minutes to handle all the incoming connections. Let's keep the initial change set around for as long as the monitor itself exists. This will allow us to not construct a new change set on each new monitor request and even utilize the JSON cache in some cases. All that at a relatively small maintenance cost, since we'll need to commit changes to one extra change set on every transaction. Measured memory usage increase due to keeping around a shallow copy of a database is about 10%. Measured CPU usage difference during normal operation is negligible. With this change it takes only 30 seconds to send out all the monitor replies in the example above. So, it's a 40x performance improvement. On a more reasonable setup with 250 nodes, the process takes up to 8-10 seconds instead of 4-5 minutes. Conditional monitoring will benefit from this change as well, however results might be less impressive due to lack of JSON cache. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb: Avoid converting database twice on an initiator.Ilya Maximets2023-04-247-20/+65
| | | | | | | | | | | | | | | Cluster member, that initiates the schema conversion, converts the database twice. First time while verifying the possibility of the conversion, and the second time after reading conversion request back from the storage. Keep the converted database from the first time around and use it after reading the request back from the storage. This cuts in half the conversion CPU cost. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb: Perform conversion with no data for clustered databases.Ilya Maximets2023-04-246-3/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, database schema conversion in case of clustered database produces a transaction record with both new schema and converted database data. So, the sequence of events is following: 1. Get the new schema. 2. Convert the database to a new schema. 3. Translate the newly converted database into JSON. 4. Write the schema + data JSON to the storage. 5. Destroy converted version of a database. 6. Read schema + data JSON from the storage and parse. 7. Create a new database from a parsed database data. 8. Replace current database with the new one. Most of these steps are very computationally expensive. Also, conversion to/from JSON is much more expensive than direct database conversion with ovsdb_convert() that can make use of shallow data copies. Instead of doing all that, let's make use of previously introduced ability to not write the converted data into the storage. The process will look like this then: 1. Get the new schema. 2. Convert the database to a new schema (to verify that it is possible). 3. Write the schema to the storage. 4. Destroy converted version of a database. 5. Read the new schema from the storage and parse. 6. Convert the database to a new schema. 7. Replace current database with the new one. Most of the operations here are performed on the small schema object, instead of the actual database data. Two remaining data operations (actual conversion) are noticeably faster than conversion to/from JSON due to reference counting and shallow data copies. Steps 4-6 can be optimized later to not convert twice on the process that initiates the conversion. The change results in following performance improvements in conversion of OVN_Southbound database schema from version 20.23.0 to 20.27.0 (measured on a single-server RAFT cluster with no clients): | Before | After +---------+-------------------+---------+------------------ DB size | Total | Max poll interval | Total | Max poll interval --------+---------+-------------------+---------+------------------ 542 MB | 47 sec. | 26 sec. | 15 sec. | 10 sec. 225 MB | 19 sec. | 10 sec. | 6 sec. | 4.5 sec. 542 MB database had 19.5 M atoms, 225 MB database had 7.5 M atoms. Overall performance improvement is about 3x. Also, note that before this change database conversion basically doubles the database file on disk. Now it only writes a small schema JSON. Since the change requires backward-incompatible database file format changes, documentation is updated on how to perform an upgrade. Handled the same way as we did for the previous incompatible format change in 2.15 (column diffs). Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb: Allow conversion records with no data in a clustered storage.Ilya Maximets2023-04-244-36/+93
| | | | | | | | | | | | | | | | | | | | | | | | | If the schema with no data was read from the clustered storage, it should mean a database conversion request. In general, we can get: 1. Just data --> Transaction record. 2. Schema + Data --> Database conversion or raft snapshot install. 3. Just schema --> New. Database conversion request. We cannot distinguish between conversion and snapshot installation request in the current implementation, so we will keep handling conversion with data in the same way as before, i.e. if data is provided, we should use it. ovsdb-tool is updated to handle this record type as well while converting cluster to standalone. This change doesn't introduce a way for such records to appear in the database. That will be added in the future commits targeting conversion speed increase. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb: Check for ephemeral columns before writing a new schema.Ilya Maximets2023-04-245-12/+24
| | | | | | | | | | | | | | | | | | Clustered databases do not support ephemeral columns, but ovsdb-server checks for them after the conversion result is read from the storage. It's much easier to recover if this constraint is checked before writing to the storage instead. It's not a big problem, because the check is always performed by the native ovsdb clients before sending a conversion request. But the server, in general, should not trust clients to do the right thing. Check in the update_schema() remains, because we shouldn't blindly trust the storage. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb-tool: Fix cluster-to-standalone for DB conversion records.Ilya Maximets2023-04-244-0/+107
| | | | | | | | | | | | | | | | | | | If database conversion happens, both schema and the new data are present in the database record. However, the schema is just silently ignored by ovsdb-tool cluster-to-standalone. This creates data inconsistency if the new data contains new columns, for example, so the resulting database file will not be readable, or data will be lost. Fix that by re-setting the database whenever a conversion record is found and actually writing a new schema that will match the actual data. The database file will not be that similar to the original, but there is no way to represent conversion in a standalone database file format otherwise. Fixes: 00de46f9ee42 ("ovsdb-tool: Convert clustered db to standalone db.") Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* system-offloads-traffic: Fix tc ingress pps check for meter offload.David Marchand2023-04-191-1/+1
| | | | | | | | | Caught during some code review. SUPPORT_TC_INGRESS_PPS has been replaced with CHECK_TC_INGRESS_PPS(). Fixes: 5f0fdf5e2c2e ("test: Move check for tc ingress pps support to test script.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>
* ovs-dpctl: Add new command dpctl/ct-[sg]et-sweep-interval.Paolo Valerio2023-04-0612-1/+153
| | | | | | | | | | | | | | Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.") the sweep interval changed as well as the constraints related to the sweeper. Being able to change the default reschedule time may be convenient in some conditions, like debugging. This patch introduces new commands allowing to get and set the sweep interval in ms. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* github: Test building Fedora RPMs.Ilya Maximets2023-04-061-0/+37
| | | | | | | | | | | | | | Testing that RPMs can be built to catch possible spec file issues like missing dependencies. GitHub seems to have an agreement with Docker Hub about rate limiting of image downloads, so it should not affect us. We may switch to quay.io if that will ever become a problem in the future. Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* AUTHORS: Add Songtao Zhan.Ilya Maximets2023-04-061-0/+1
| | | | Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-tcpdump: Stdout is shutdown before ovs-tcpdump exit.Songtao Zhan2023-04-061-0/+11
| | | | | | | | | | | | If there is a pipe behind ovs-tcpdump (such as ovs-tcpdump -i eth0 | grep "192.168.1.1"), the child process (grep "192.168.1.1") may exit first and close the pipe when received SIGTERM. When farther process (ovs-tcpdump) exit, stdout is flushed into broken pipe, and then received a exception IOError. To avoid such problems, ovs-tcpdump first close stdout before exit. Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Always mask ip proto field.Aaron Conole2023-04-066-10/+229
| | | | | | | | | | | | | | | | | | | | | | | | | The ofproto layer currently treats nw_proto field as overloaded to mean both that a proper nw layer exists, as well as the value contained in the header for the nw proto. However, this is incorrect behavior as relevant standards permit that any value, including '0' should be treated as a valid value. Because of this overload, when the ofproto layer builds action list for a packet with nw_proto of 0, it won't build the complete action list that we expect to be built for the packet. That will cause a bad behavior where all packets passing the datapath will fall into an incomplete action set. The fix here is to unwildcard nw_proto, allowing us to preserve setting actions for protocols which we know have support for the actions we program. This means that a traffic which contains nw_proto == 0 cannot cause connectivity breakage with other traffic on the link. Reported-by: David Marchand <dmarchand@redhat.com> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873 Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* conntrack-tp: Fix clang warning.Lin Huang2023-04-041-0/+7
| | | | | | | | | Declaration of 'struct conn' will not be visible outside of this function. Declaration of 'struct conntrack' will not be visible outside of this function. Declaration of 'struct timeout_policy' will not be visible outside of this function. Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* AUTHORS: Add Faicker Mo.Ilya Maximets2023-04-031-0/+1
| | | | Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Del ufid mapping if device not exist.Faicker Mo2023-04-032-1/+57
| | | | | | | | | | | | | | | | | The device may be deleted and added with ifindex changed. The tc rules on the device will be deleted if the device is deleted. The func tc_del_filter will fail when flow del. The mapping of ufid to tc will not be deleted. The traffic will trigger the same flow(with same ufid) to put to tc on the new device. Duplicated ufid mapping will be added. If the hashmap is expanded, the old mapping entry will be the first entry, and now the dp flow can't be deleted. Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* db-ctl-base: Partially revert b8bf410a5.Daniel Alvarez Sanchez2023-03-302-38/+12
| | | | | | | | | | | | | | | | | The commit b8bf410a5 [0] broke the `ovs-vsctl add` command which now overwrites the value if it existed already. This patch reverts the code around the `cmd_add` function to restore the previous behavior. It also adds testing coverage for this functionality. [0] https://github.com/openvswitch/ovs/commit/b8bf410a5c94173da02279b369d75875c4035959 Fixes: b8bf410a5c94 ("db-ctl-base: Use partial map/set updates for last add/set commands.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182767 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* vswitch.xml: Add description of SRv6 tunnel and related options.Nobuhiro MIKI2023-03-301-7/+32
| | | | | | | | | The description of SRv6 was missing in vswitch.xml, which is used to generate the man page, so this patch adds it. Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.") Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-tc-offloads: Fix misaligned 8 byte read.Mike Pattrick2023-03-291-2/+2
| | | | | | | | | | | | | | | | | | UB Sanitizer report: lib/netdev-offload-tc.c:1276:19: runtime error: load of misaligned address 0x7f74e801976c for type 'union ovs_u128', which requires 8 byte alignment 0 in netdev_tc_flow_dump_next lib/netdev-offload-tc.c:1276 1 in netdev_flow_dump_next lib/netdev-offload.c:303 2 in dpif_netlink_flow_dump_next lib/dpif-netlink.c:1921 [...] Fixes: 8f7620e6a406 ("netdev-tc-offloads: Implement netdev flow dump api using tc interface") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp: Add SRv6 tunnel actions.Nobuhiro MIKI2023-03-295-2/+127
| | | | | | | This patch adds ODP actions for SRv6 and its tests. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add SRv6 tunnel support.Nobuhiro MIKI2023-03-2918-1/+450
| | | | | | | | | | | | | | | | | SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* flow: Support rt_hdr in parse_ipv6_ext_hdrs().Nobuhiro MIKI2023-03-295-21/+58
| | | | | | | | | | | | Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers. If it exists, the first address is retrieved. If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in the header are not reported to the caller. Of course, "frag_hdr" and "rt_hdr" are properly parsed inside this function. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tnl-ports: Support multiple nw_protos.Nobuhiro MIKI2023-03-291-32/+48
| | | | | | | | | In some tunnels, inner packet needs to support both IPv4 and IPv6. Therefore, this patch improves to allow two protocols to be tied together in one tunneling. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tests: Define new ADD_VETH_NS macro.Nobuhiro MIKI2023-03-291-0/+16
| | | | | | | | | | | The new ADD_VETH_NS macro creates two netns and connects them with a veth pair. We can use it for testing in a generic purpose. e.g. ADD_VETH_NS([ns1], [p1], [1.1.1.1/24], [ns2], [p2], [1.1.1.2/24]) Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-thread: Fix cpus not read for the first 10s.Adrian Moreno2023-03-271-1/+1
| | | | | | | | | | | | | | | With the current implementation the available CPUs will not be read until 10s have passed since the system's boot. For systems that boot faster, this can make ovs-vswitchd create fewer handlers than necessary for some time. Fixes: 0d23948a598a ("ovs-thread: Detect changes in number of CPUs.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2180460 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Always create at least 1 handler.Adrian Moreno2023-03-271-1/+1
| | | | | | | | | | | | Ensure at least 1 handler is created even if something goes wrong during cpu detection or prime numer calculation. Fixes: a5cacea5f988 ("handlers: Create additional handler threads when using CPU isolation.") Suggested-by: Aaron Conole <aconole@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-offload-tc: Fix parse_tc_flower_to_actions() reporting errors.Eelco Chaudron2023-03-221-9/+27
| | | | | | | | | parse_tc_flower_to_actions() was not reporting errors, which would cause parse_tc_flower_to_match() to ignore them. Fixes: dd03672f7bbb ("netdev-offload-tc: Move flower_to_match action handling to isolated function.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tests/mfex: Retain support for cryptography pre-v37.Mike Pattrick2023-03-221-2/+3
| | | | | | | | | | | | | | | | | | | Prior to v37.0.0, CryptographyDeprecationWarning could not be imported from __init__.py resulting in: Traceback (most recent call last): File "mfex_fuzzy.py", line 9, in <module> category=cryptography.CryptographyDeprecationWarning, AttributeError: module 'cryptography' has no attribute 'CryptographyDeprecationWarning' This import was only added to __init__ to deprecate python3.6. Importing the exception from cryptography.utils is the compatible option. Fixes: c3ed0bf34b8a ("tests/mfex: Silence Blowfish/CAST5 deprecation warnings.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Allow retaining CAP_SYS_RAWIO privileges.Aaron Conole2023-03-2214-26/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open vSwitch generally tries to let the underlying operating system managed the low level details of hardware, for example DMA mapping, bus arbitration, etc. However, when using DPDK, the underlying operating system yields control of many of these details to userspace for management. In the case of some DPDK port drivers, configuring rte_flow or even allocating resources may require access to iopl/ioperm calls, which are guarded by the CAP_SYS_RAWIO privilege on linux systems. These calls are dangerous, and can allow a process to completely compromise a system. However, they are needed in the case of some userspace driver code which manages the hardware (for example, the mlx implementation of backend support for rte_flow). Here, we create an opt-in flag passed to the command line to allow this access. We need to do this before ever accessing the database, because we want to drop all privileges asap, and cannot wait for a connection to the database to be established and functional before dropping. There may be distribution specific ways to do capability management as well (using for example, systemd), but they are not as universal to the vswitchd as a flag. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* vswitch: Add missing documentation for "ct_flush" capability.Ales Musil2023-03-151-0/+6
| | | | | | | Fixes: 08146bf7d9b4 ("openflow: Add extension to flush CT by generic match.") Signed-off-by: Ales Musil <amusil@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Fix flush-conntrack with datapath as argument.Ales Musil2023-03-153-3/+55
| | | | | | | | | | | | | | | | | | Specifying datapath with "dpctl/flush-conntrack" didn't work as expected and caused error: ovs-dpctl: field system@ovs-system missing value (Invalid argument) To prevent that, check if we have datapath as first argument and use it accordingly. Also add couple of test cases to ensure that everything works as expected. Fixes: a9ae73b916ba ("ofp, dpif: Allow CT flush based on partial match.") Signed-off-by: Ales Musil <amusil@redhat.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-upcall: Remove redundant time_msec() in revalidate().Eelco Chaudron2023-03-151-3/+1
| | | | | | | | | | | Remove one of two consecutive time_msec() calls in the revalidate() function. We take the time stamp after udpif_get_n_flows(), to avoid any potential delays in getting the number of offloaded flows. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-upcall: Wait for valid hw flow stats before applying ↵Eelco Chaudron2023-03-156-10/+48
| | | | | | | | | | | | | | | | | min-revalidate-pps. Depending on the driver implementation, it can take from 0.2 seconds up to 2 seconds before offloaded flow statistics are updated. This is true for both TC and rte_flow-based offloading. This is causing a problem with min-revalidate-pps, as old statistic values are used during this period. This fix will wait for at least 2 seconds, by default, before assuming no packets where received during this period. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* system-traffic: Fix conntrack test cases which are failing with af_xdp.Eelco Chaudron2023-03-131-4/+4
| | | | | | | | | | | | | | | | | | | The recently added test cases below are not passing on the af_xdp datapath due to tcpdump not working on the OVS ports with this datapath. conntrack - ICMP related NAT with single port conntrack - ICMPv6 related NAT with single port conntrack - ICMP from different source related with NAT The tests are changed to attach tcpdump on the associated veth port in the netns. Tests are now passing with all datapaths (afxdp, kernel, userspace, and offloads). Fixes: 8bd688063078 ("system-traffic.at: Add icmp error tests while dnatting address and port.") Fixes: 0a7587034dc9 ("conntrack: Properly unNAT inner header of related traffic.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* route-table: Retrieving the preferred source address from Netlink.Nobuhiro MIKI2023-03-075-5/+61
| | | | | | | | | | | | | | We can use the "ip route add ... src ..." command to set the preferred source address for each entry in the kernel FIB. OVS has a mechanism to cache the FIB, but the preferred source address is ignored and calculated with its own logic. This patch resolves the difference between kernel FIB and OVS route table cache by retrieving the RTA_PREFSRC attribute of Netlink messages. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>