summaryrefslogtreecommitdiff
path: root/NEWS
Commit message (Collapse)AuthorAgeFilesLines
* ovsdb: Perform conversion with no data for clustered databases.Ilya Maximets2023-04-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, database schema conversion in case of clustered database produces a transaction record with both new schema and converted database data. So, the sequence of events is following: 1. Get the new schema. 2. Convert the database to a new schema. 3. Translate the newly converted database into JSON. 4. Write the schema + data JSON to the storage. 5. Destroy converted version of a database. 6. Read schema + data JSON from the storage and parse. 7. Create a new database from a parsed database data. 8. Replace current database with the new one. Most of these steps are very computationally expensive. Also, conversion to/from JSON is much more expensive than direct database conversion with ovsdb_convert() that can make use of shallow data copies. Instead of doing all that, let's make use of previously introduced ability to not write the converted data into the storage. The process will look like this then: 1. Get the new schema. 2. Convert the database to a new schema (to verify that it is possible). 3. Write the schema to the storage. 4. Destroy converted version of a database. 5. Read the new schema from the storage and parse. 6. Convert the database to a new schema. 7. Replace current database with the new one. Most of the operations here are performed on the small schema object, instead of the actual database data. Two remaining data operations (actual conversion) are noticeably faster than conversion to/from JSON due to reference counting and shallow data copies. Steps 4-6 can be optimized later to not convert twice on the process that initiates the conversion. The change results in following performance improvements in conversion of OVN_Southbound database schema from version 20.23.0 to 20.27.0 (measured on a single-server RAFT cluster with no clients): | Before | After +---------+-------------------+---------+------------------ DB size | Total | Max poll interval | Total | Max poll interval --------+---------+-------------------+---------+------------------ 542 MB | 47 sec. | 26 sec. | 15 sec. | 10 sec. 225 MB | 19 sec. | 10 sec. | 6 sec. | 4.5 sec. 542 MB database had 19.5 M atoms, 225 MB database had 7.5 M atoms. Overall performance improvement is about 3x. Also, note that before this change database conversion basically doubles the database file on disk. Now it only writes a small schema JSON. Since the change requires backward-incompatible database file format changes, documentation is updated on how to perform an upgrade. Handled the same way as we did for the previous incompatible format change in 2.15 (column diffs). Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-dpctl: Add new command dpctl/ct-[sg]et-sweep-interval.Paolo Valerio2023-04-061-0/+3
| | | | | | | | | | | | | | Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.") the sweep interval changed as well as the constraints related to the sweeper. Being able to change the default reschedule time may be convenient in some conditions, like debugging. This patch introduces new commands allowing to get and set the sweep interval in ms. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add SRv6 tunnel support.Nobuhiro MIKI2023-03-291-0/+2
| | | | | | | | | | | | | | | | | SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Allow retaining CAP_SYS_RAWIO privileges.Aaron Conole2023-03-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open vSwitch generally tries to let the underlying operating system managed the low level details of hardware, for example DMA mapping, bus arbitration, etc. However, when using DPDK, the underlying operating system yields control of many of these details to userspace for management. In the case of some DPDK port drivers, configuring rte_flow or even allocating resources may require access to iopl/ioperm calls, which are guarded by the CAP_SYS_RAWIO privilege on linux systems. These calls are dangerous, and can allow a process to completely compromise a system. However, they are needed in the case of some userspace driver code which manages the hardware (for example, the mlx implementation of backend support for rte_flow). Here, we create an opt-in flag passed to the command line to allow this access. We need to do this before ever accessing the database, because we want to drop all privileges asap, and cannot wait for a connection to the database to be established and functional before dropping. There may be distribution specific ways to do capability management as well (using for example, systemd), but they are not as universal to the vswitchd as a flag. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* route-table: Retrieving the preferred source address from Netlink.Nobuhiro MIKI2023-03-071-0/+2
| | | | | | | | | | | | | | We can use the "ip route add ... src ..." command to set the preferred source address for each entry in the kernel FIB. OVS has a mechanism to cache the FIB, but the preferred source address is ignored and calculated with its own logic. This patch resolves the difference between kernel FIB and OVS route table cache by retrieving the RTA_PREFSRC attribute of Netlink messages. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-router: Introduce src option in ovs/route/add command.Nobuhiro MIKI2023-03-071-0/+3
| | | | | | | | | | | | | | | | | | | When adding a route with ovs/route/add command, the source address in "ovs_router_entry" structure is always the FIRST address that the interface has. See "ovs_router_get_netdev_source_address" function for more information. If an interface has multiple ipv4 and/or ipv6 addresses, there are use cases where the user wants to control the source address. This patch therefore addresses this issue by adding a src parameter. Note that same constraints also exist when caching routes from Kernel FIB with Netlink, but are not dealt with in this patch. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ipfix: Make template and stats interval configurable.Adrian Moreno2023-02-271-0/+2
| | | | | | | | | Add options to the IPFIX table configure the interval to send statistics and template information. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-linux: Add jitter parameter to the netem qos options.Miika Petäjäniemi2023-02-211-0/+2
| | | | | | | | Adds jitter option to enable emulating latency fluctuation with netem. Submitted-at: https://github.com/openvswitch/ovs/pull/407 Signed-off-by: Miika Petäjäniemi <miika.petajaniemi@solita.fi> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* utilities: Add support to set umask in ovs-ctl.Vladislav Odintsov2023-02-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds new ovs-ctl options to pass umask configuration to allow OVS daemons set requested socket permissions on group. Previous behaviour (if using with systemd service unit) created sockets with 0750 permissions mask (group has no write permission). Write permission for group is reasonable in usecase, where ovs-vswitchd or ovsdb-server runs as a non-privileged user:group (say, openvswitch:openvswitch) and it is needed to access unix socket from process running as another non-privileged user. In this case administrator has to add that user to openvswitch group and can connect to OVS sockets from a process running under that user. Two new ovs-ctl options --ovsdb-server-umask and --ovs-vswitchd-umask were added to manage umask values for appropriate daemons. This is useful for systemd users: both ovs-vswitchd and ovsdb-server systemd units read options from single /etc/sysconfig/openvswitch configuration file. So, with separate options it is possible to set umask only for specific daemon. OPTIONS="--ovsdb-server-umask=0002" in /etc/openvswitch/sysconfig file will set umask to 0002 value before starting only ovsdb-server, while OPTIONS="--ovs-vswitchd-umask=0002" will set umask to ovs-vswitchd daemon. Previous behaviour (not setting umask) is left as default. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401501.html Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Set release date for 3.1.0.Ilya Maximets2023-02-161-1/+1
| | | | | Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Add support to count upcall packets.wangchuanlei2023-01-311-0/+4
| | | | | | | | | Add support to count upcall packets per port, both succeed and failed, which is a better way to see how many packets upcalled on each interface. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: wangchuanlei <wangchuanlei@inspur.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Prepare for post-3.1.0 (3.1.90).Ilya Maximets2023-01-161-0/+4
| | | | | Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Prepare for 3.1.0.Ilya Maximets2023-01-161-1/+1
| | | | | Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* openflow: Add extension to flush CT by generic match.Ales Musil2023-01-161-0/+6
| | | | | | | | | | | Add extension that allows to flush connections from CT by specifying fields that the connections should be matched against. This allows to match only some fields of the connection e.g. source address for orig direction. Reported-at: https://bugzilla.redhat.com/2120546 Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofp, dpif: Allow CT flush based on partial match.Ales Musil2023-01-161-0/+3
| | | | | | | | | | | | | | | | Currently, the CT can be flushed by dpctl only by specifying the whole 5-tuple. This is not very convenient when there are only some fields known to the user of CT flush. Add new struct ofp_ct_match which represents the generic filtering that can be done for CT flush. The match is done only on fields that are non-zero with exception to the icmp fields. This allows the filtering just within dpctl, however it is a preparation for OpenFlow extension. Reported-at: https://bugzilla.redhat.com/2120546 Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Add PMD load based sleeping.Kevin Traynor2023-01-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sleep for an incremental amount of time if none of the Rx queues assigned to a PMD have at least half a batch of packets (i.e. 16 pkts) on an polling iteration of the PMD. Upon detecting the threshold of >= 16 pkts on an Rxq, reset the sleep time to zero (i.e. no sleep). Sleep time will be increased on each iteration where the low load conditions remain up to a total of the max sleep time which is set by the user e.g: ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500 The default pmd-maxsleep value is 0, which means that no sleeps will occur and the default behaviour is unchanged from previously. Also add new stats to pmd-perf-show to get visibility of operation e.g. ... - sleep iterations: 153994 ( 76.8 % of iterations) Sleep time (us): 9159399 ( 59 us/iteration avg.) ... Reviewed-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* acinclude.m4: Build with AF_XDP support by default if possible.Ilya Maximets2023-01-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change we will try to detect all the netdev-afxdp dependencies and enable AF_XDP support by default if they are present at the build time. Configuration script behaves in a following way: - ./configure --enable-afxdp Will check for AF_XDP dependencies and fail if they are not available. - ./configure --disable-afxdp Disables checking for AF_XDP. Build will not support AF_XDP even if all dependencies are installed. - Just ./configure or ./configure --enable-afxdp=auto Will check for AF_XDP dependencies. Will print a warning if they are not available, but will continue without AF_XDP support. If dependencies are available in a system, this option is equal to --enable-afxdp. '--disable-afxdp' added to the debian and fedora package builds to keep predictable behavior. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-afxdp: Allow building with libxdp and newer libbpf.Ilya Maximets2023-01-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp. Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8 and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach(). Updating configuration and source code to accommodate above changes and allow building OVS with AF_XDP support on newer systems: - Checking the version of libbpf by detecting availability of bpf_xdp_detach. - Checking availability of the libxdp in a system by looking for a library providing libxdp_strerror(), if libbpf is newer than 0.6. And checking for xsk.h header provided by libxdp-dev[el]. - Use xsk.h from libbpf if it is older than 0.7 and not linking with libxdp in this case as there are known incompatible versions of libxdp in distributions. - Check for the NEED_WAKEUP feature replaced with direct checking in the source code if XDP_USE_NEED_WAKEUP is defined. - Checking availability of bpf_xdp_query_id and bpf_xdp_detach and using them instead of deprecated APIs. Fall back to old functions if not found. - Dropped LIBBPF_LDADD variable as it makes library and function detection much harder without providing any actual benefits. AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS. - Header includes moved around to files where they are actually used. - Removed libelf dependency as it is not really used. With these changes it should be possible to build OVS with either: - libbpf built from the kernel sources (5.19 or older). - libbpf < 0.7 provided in distributions. - libxdp and libbpf >= 0.7 provided in newer distributions. While it is technically possible to build with libbpf 0.7+ without libxdp at the moment we're not allowing that for a few reasons. First, required functions in libbpf are deprecated and can be removed in future releases. Second, support for all these combinations makes the detection code fairly complex. AFAIK, most of the distributions packaging libbpf 0.7+ do package libxdp as well. libxdp added as a build dependency for Fedora build since all supported versions of Fedora are packaging this library. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* docs: Add documentation for pmd-rxq-show secs parameter.Kevin Traynor2022-12-211-0/+3
| | | | | | | | | Add description of new '-secs' parameter in docs. Also, add to NEWS as it is a user facing change. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* travis: Drop support.David Marchand2022-12-211-0/+2
| | | | | | | | | | | | | | Following a change in the terms of use, free Travis credits are really too low for a realistic usage by OVS contributors. As a consequence, testing OVS with Travis has been abandoned by most (if not all) contributors to the project. Drop the Travis configuration from our repository, clean references in the documentation and move GHA specifics to the association yml. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-thread: Detect changes in number of CPUs.Adrian Moreno2022-12-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, things like the number of handler and revalidator threads are calculated based on the number of available CPUs. However, this number is considered static and only calculated once, hence ignoring events such as cpus being hotplugged, switched on/off or affinity mask changing. On the other hand, checking the number of available CPUs multiple times per second seems like an overkill. Affinity should not change that often and, even if it does, the impact of destroying and recreating all the threads so often is probably a price too expensive to pay. I tested the impact of updating the threads every 5 seconds and saw an impact in the main loop duration of <1% and a worst-case scenario impact in throughput of < 5% [1]. This patch sets the default period to 10 seconds just to be safer. [1] Tested in the worst-case scenario of disabling the kernel cache (other_config:flow-size=0), modifying ovs-vswithd's affinity so the number of handlers go up and down every 5 seconds and calculated the difference in netperf's ops/sec. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovs-ctl: Allow inclusion of hugepages in coredumps.Mike Pattrick2022-12-201-0/+4
| | | | | | | | | Add new option --dump-hugepages option in ovs-ctl to enable the addition of hugepages in the core dump filter. Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Update to use v22.11.1.Ian Stokes2022-12-061-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit add support to for DPDK v22.11.1, it includes the following changes. 1. ci: Reduce DPDK compilation time. 2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07. http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528 3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07. http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332 4. netdev-dpdk: Report device bus specific information. 5. netdev-dpdk: Drop reference to Rx header split. http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808 In addition documentation was also updated in this commit for use with DPDK v22.11.1. The Debian shared DPDK compilation test is removed as part of this patch due to a packaging requirement. Once DPDK v22.11.1 is available in Debian repositories it should be re-enabled in OVS. For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Tested-by: Michael Phelan <michael.phelan@intel.com> Tested-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* ovsdb-idl: Add the support to specify the uuid for row insert.Numan Siddique2022-11-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | ovsdb-server allows the OVSDB clients to specify the uuid for the row inserts [1]. Both the C IDL client library and Python IDL are missing this feature. This patch adds this support. In C IDL, for each schema table, a new function is generated - <schema_table>insert_persistent_uuid(txn, uuid) which can be used the clients to persist the uuid. ovs-vsctl and other derivatives of ctl now supports the same in the generic 'create' command with the option "--id=<UUID>". In Python IDL, the uuid to persist can be specified in the Transaction.insert() function. [1] - a529e3cd1f("ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.:) Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev: Assume default link speed to be 10 Gbps instead of 100 Mbps.Ilya Maximets2022-11-301-0/+4
| | | | | | | | | | | | | | | | | | | | | 100 Mbps was a fair assumption 13 years ago. Modern days 10 Gbps seems like a good value in case no information is available otherwise. The change mainly affects QoS which is currently limited to 100 Mbps if the user didn't specify 'max-rate' and the card doesn't report the speed or OVS doesn't have a predefined enumeration for the speed reported by the NIC. Calculation of the path cost for STP/RSTP is also affected if OVS is unable to determine the link speed. Lower link speed adapters are typically good at reporting their speed, so chances for overshoot should be low. But newer high-speed adapters, for which there is no speed enumeration or if there are some other issues, will not suffer that much. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Use DPDK 21.11.2 release.Michael Phelan2022-10-041-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update OVS CLI and relevant documentation to use DPDK 21.11.2. DPDK 21.11.2 contains fixes for the CVEs listed below: CVE-2022-28199 [1] CVE-2022-2132 [2] A bug was introduced in DPDK 21.11.1 by the commit 01e3dee29c02 ("vhost: fix unsafe vring addresses modifications"). This bug can cause a deadlock when vIOMMU is enabled and NUMA reallocation of the virtqueues happen. A fix [3] has been posted and pushed to the DPDK 21.11 branch. If a user wishes to avoid the issue then it is recommended to use DPDK 21.11.0 until the release of DPDK 21.11.3. It should be noted that DPDK 21.11.0 does not benefit from the numerous bug and CVE fixes addressed since its release. If a user wishes to benefit from these fixes it is recommended to use DPDK 21.11.2. [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199 [2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132 [3] https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.marchand@redhat.com/ Signed-off-by: Michael Phelan <michael.phelan@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* datapath-windows: Add IPv6 conntrack ip fragment support on windowsldejing2022-09-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementation on Windows: IPv6 conntrack ip fragment feature use a link list to store ip fragment. When ipv6 fragment module receives a fragment packet, it will store length of the fragment, until to the received length equal to the packet length before fragmented, it will reassemble fragment packet to a complete packet and send the complete packet to conntrack module. After conntrack processed the packet, fragment module will divide the complete packet into small fragment and send it to destination. Currently, ipv6 was implemented in a indenpent module, for the reason it can reduce the risk of introduce bug to ipv4 fragmenb module. Testing Topology: On the Windows VM runs on the ESXi host, two hyper-v ports attached to the ovs bridge; one hyper-v port worked as client and the other port worked as server. Testing Case: 1.UdpV6 a) UdpV6 fragment with multiple ipv6 extension fields. b) UdpV6 fragment in normal scenario. c) UdpV6 fragment in nat scenario. 2.IcmpV6 a) IcmpV6 fragment in normal scenario. b) IcmpV6 fragment in nat scenario. Signed-off-by: ldejing <ldejing@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
* ofproto-dpif-trace: add --name option for ofproto/trace.Nobuhiro MIKI2022-09-151-0/+3
| | | | | | | | | | | | | Most of commands in ovs-ofctl and ovs-appctl can display port names instead of port numbers by using --names option. This change adds similar functionality to ofproto/trace. For backward compatibility, the default behavior is the same as before. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Set release date for 3.0.0.Ilya Maximets2022-08-151-1/+1
| | | | | | Acked-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* xenserver: Remove xenserver.Greg Rose2022-08-151-0/+2
| | | | | | | | | | | | | Remove the current xenserver implementation - it is obsolete and since 3.0 we do not support kernel module builds [1]. 1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html [i.maximets] Can be added back if people willing to maintain it will be found. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Prepare for post-3.0.0 (3.0.90).Ilya Maximets2022-07-151-0/+4
| | | | | | Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Prepare for 3.0.0.Ilya Maximets2022-07-151-4/+4
| | | | | | Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto/bond: Add knob 'all-members-active'.Christophe Fontaine2022-07-151-0/+2
| | | | | | | | | | This config param allows the delivery of broadcast and multicast packets to the secondary interface of non-lacp bonds, equivalent to the option 'all_slaves_active' for Linux kernel bonds. Reported-at: https://bugzilla.redhat.com/1720935 Signed-off-by: Christophe Fontaine <cfontain@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* python: Add ovs datapath flow parsing.Adrian Moreno2022-07-151-0/+3
| | | | | | | | | | | | | | | | | A ODPFlow is a Flow with the following sections: ufid info (e.g: bytes, packets, dp, etc) match actions Only three datapath actions require special handling: gre: because it has double parenthesis geneve: because it supports many concatenated lists of options nat: we reuse the decoder used for openflow actions Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* make: Remove the Linux datapath.Greg Rose2022-07-151-0/+3
| | | | | | | | | | | | | | | | Update the necessary make and configure files to remove the Linux datapath and then remove the datapath. Move datapath/linux/compat/include/linux/openvswitch.h to include/linux/openvswitch.h because it is needed to generate header files used by the userspace switch. Also remove references to the Linux datapath from auxiliary files and utilities since it is no longer supported. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* debian: Update packaging source from Debian/Ubuntu.Frode Nordahl2022-07-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | * Update upstream OVS debian packaging to be on par with package source in Debian/Ubuntu: - Provide a openvswitch-switch-dpdk package that integrates with the dpdk package in the distributions so that end users can opt into a DPDK-enabled Open vSwitch binary. - Provide systemd service files. - Provide openvswitch-source package for reproducible integrated build of for example OVN. - Stop building shared library and subsequently remove libopenvswitch and libopenvswitch-dev binary packages. Co-authored-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Luca Boccassi <bluca@debian.org> Co-authored-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Co-authored-by: James Page <james.page@ubuntu.com> Signed-off-by: James Page <james.page@ubuntu.com> Co-authored-by: Corey Bryant <corey.bryant@canonical.com> Signed-off-by: Corey Bryant <corey.bryant@canonical.com> Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp-execute: Add ISA implementation of actions.Emma Finn2022-07-151-0/+1
| | | | | | | | | | | | | | This commit adds the AVX512 implementation of the action functionality. Usage: $ ovs-appctl odp-execute/action-impl-set avx512 Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* acinclude: Add configure option to enable actions autovalidator at build time.Kumar Amber2022-07-151-0/+2
| | | | | | | | | | | | | | This commit adds a new command to allow the user to enable the actions autovalidator by default at build time thus allowing for running unit test by default. $ ./configure --enable-actions-default-autovalidator Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* odp-execute: Add command to switch action implementation.Emma Finn2022-07-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a new command to allow the user to switch the active action implementation at runtime. Usage: $ ovs-appctl odp-execute/action-impl-set scalar This commit also adds a new command to retrieve the list of available action implementations. This can be used by to check what implementations of actions are available and what implementation is active during runtime. Usage: $ ovs-appctl odp-execute/action-impl-show Added separate test-case for ovs-actions show/set commands: odp-execute - actions implementation Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Kumar Amber <kumar.amber@intel.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* odp-execute: Add auto validation function for actions.Emma Finn2022-07-151-0/+2
| | | | | | | | | | | | | | | | | | This commit introduced the auto-validation function which allows users to compare the batch of packets obtained from different action implementations against the linear action implementation. The autovalidator function can be triggered at runtime using the following command: $ ovs-appctl odp-execute/action-impl-set autovalidator Signed-off-by: Emma Finn <emma.finn@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Add shared mempool config.Kevin Traynor2022-07-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mempools may currently be shared between DPDK ports based on port MTU and NUMA. With some hint from the user we can increase the sharing on MTU and hence reduce memory consumption in many cases. For example, a port with MTU 9000, uses a mempool with an mbuf size based on 9000 MTU. A port with MTU 1500, uses a different mempool with an mbuf size based on 1500 MTU. In this case, assuming same NUMA, both these ports could share the 9000 MTU mempool. The user must give a hint as order of creation of ports and setting of MTUs may vary and we need to ensure that upgrades from older OVS versions do not require more memory. This scheme can also prevent multiple mempools being created for cases where a port is added picking up a default MTU and an appropriate mempool, but later has it's MTU changed to a different value requiring a different mempool. Example usage: $ ovs-vsctl --no-wait set Open_vSwitch . \ other_config:shared-mempool-config=9000,1500:1,6000:1 Port added on NUMA 0: * MTU 1500, use mempool based on 9000 MTU * MTU 5000, use mempool based on 9000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Port added on NUMA 1: * MTU 1500, use mempool based on 1500 MTU * MTU 5000, use mempool based on 6000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Default behaviour is unchanged and mempools are still only created when needed. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* ovsdb: Prepare snapshot JSON in a separate thread.Ilya Maximets2022-07-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion of the database data into JSON object, serialization and destruction of that object are the most heavy operations during the database compaction. If these operations are moved to a separate thread, the main thread can continue processing database requests in the meantime. With this change, the compaction is split in 3 phases: 1. Initialization: - Create a copy of the database. - Remember current database index. - Start a separate thread to convert a copy of the database into serialized JSON object. 2. Wait: - Continue normal operation until compaction thread is done. - Meanwhile, compaction thread: * Convert database copy to JSON. * Serialize resulted JSON. * Destroy original JSON object. 3. Finish: - Destroy the database copy. - Take the snapshot created by the thread. - Write on disk. The key for this schema to be fast is the ability to create a shallow copy of the database. This doesn't take too much time allowing the thread to do most of work. Database copy is created and destroyed only by the main thread, so there is no need for synchronization. Such solution allows to reduce the time main thread is blocked by compaction by 80-90%. For example, in ovn-heater tests with 120 node density-heavy scenario, where compaction normally takes 5-6 seconds at the end of a test, measured compaction times was all below 1 second with the change applied. Also, note that these measured times are the sum of phases 1 and 3, so actual poll intervals are about half a second in this case. Only implemented for raft storage for now. The implementation for standalone databases can be added later by using a file offset as a database index and copying newly added changes from the old file to a new one during ovsdb_log_replace(). Reported-at: https://bugzilla.redhat.com/2069108 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tests: Add check_pkt_len action test to system-offload-traffic.Eelco Chaudron2022-07-131-0/+1
| | | | | | | Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* conntrack: Replace timeout based expiration lists with rculists.Gaetan Rivet2022-07-131-0/+1
| | | | | | | | | | | | | | | | | | | | | This patch aims to replace the expiration lists as, due to the way they are used, besides being a source of contention, they have a known issue when used with non-default policies for different zones that could lead to retaining expired connections potentially for a long time. This patch replaces them with an array of rculist used to distribute all the newly created connections in order to, during the sweeping phase, scan them without locking, and evict the expired connections only locking during the actual removal. This allows to reduce the contention introduced by the pushback performed at every packet update, also solving the issue related to zones and timeout policies. Signed-off-by: Gaetan Rivet <grive@u256.net> Co-authored-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb: Enable memory trimming after compaction by default.Ilya Maximets2022-07-121-0/+3
| | | | | | | | | | Memory trimming was introduced in OVS 2.15 and didn't cause any issues in production environments since then, while allowing ovsdb-sever to consume a lot less memory in high scale OVN deployments. Enabling by default to make it easier to use. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netlink: Offloading meter to tc police actionJianbo Liu2022-07-111-0/+2
| | | | | | | | | | | OVS meters are created in advance and openflow rules refer to them by their unique ID. New tc_police API is used to offload them. By calling the API, police actions are created and meters are mapped to them. These actions then can be used in tc filter rules by the index. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>
* netdev-dpdk: Delay vhost mempool creation.Kevin Traynor2022-07-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently mempools for vhost are being assigned before the vhost device is added. In some cases this may be just reusing an existing mempool but in others it can require creation of a mempool. For multi-NUMA, the NUMA info of the vhost port is not known until a device is added to the port, so on multi-NUMA systems the initial NUMA node for the mempool is a best guess based on vswitchd affinity. When a device is added to the vhost port, the NUMA info can be checked and if the guess was incorrect a mempool on the correct NUMA node created. For multi-NUMA, the current scheme can have the effect of creating a mempool on a NUMA node that will not be needed and at least for a certain time period requires more memory on a NUMA node. It is also difficult for a user trying to provision memory on different NUMA nodes, if they are not sure which NUMA node the initial mempool for a vhost port will be on. For single NUMA, even though the mempool will be on the correct NUMA, it is assigned ahead of time and if a vhost device was not added, it could also be using uneeded memory. This patch delays the creation of the mempool for a vhost port until the vhost device is added. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* ovsdb: Add Local_Config schema.Terry Wilson2022-06-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | The only way to configure settings on a remote (e.g. inactivity_probe) is via --remote=db:DB,table,row. There is no way to do this via the existing CLI options. For a clustered DB with multiple servers listening on unique addresses there is no way to store these entries in the DB as the DB is shared. For example, three servers listening on 1.1.1.1, 1.1.1.2, and 1.1.1.3 respectively would require a Manager/Connection row each, but then all three servers would try to listen on all three addresses. It is possible for ovsdb-server to serve multiple databases. This means that we can have a local "config" database in addition to the main database we are serving (Open_vSwitch, OVN_Southbound, etc.) and this patch adds a Local_Config schema that currently just mirrors the Connection table and a Config table with a 'connections' row that stores each Connection. Signed-off-by: Terry Wilson <twilson@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb-server: Log database transactions for user requested tables.Dumitru Ceara2022-06-281-0/+3
| | | | | | | | | | | | | | | | | | | Add a new command, 'ovsdb-server/tlog-set DB:TABLE on|off', which allows the user to enable/disable transaction logging for specific databases and tables. By default, logging is disabled. Once enabled, logs are generated with level INFO and are also rate limited. If used with care, this command can be useful in analyzing production deployment performance issues, allowing the user to pin point bottlenecks without the need to enable wider debug logs, e.g., jsonrpc. A command to inspect the logging state is also added: 'ovsdb-server/tlog-list'. Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpcls: Add unlisted alias for subtable lookup command.Harry van Haaren2022-06-281-0/+4
| | | | | | | | | | | This patch adds the old name "subtable-lookup-prio-get" as an unlisted command, to restore a consistency between OVS releases for testing scripts. Fixes: 738c76a503f4 ("dpcls: Change info-get function to fetch dpcls usage stats.") Suggested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>