delta/openvswitch.git - github.com: openvswitch/ovs.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	dpdk: Use DPDK 21.11.1 release.	Michael Phelan	2022-05-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	Modify ci linux build script to use the latest DPDK stable release 21.11.1. Modify Documentation to use the latest DPDK stable release 21.11.1. Update NEWS file to reflect the latest DPDK stable release 21.11.1. FAQ is updated to reflect the latest DPDK for each OVS branch. Signed-off-by: Michael Phelan <michael.phelan@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovs-monitor-ipsec: Allow custom options per tunnel.	Andreas Karis	2022-05-04	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	Tunnels in LibreSwan and OpenSwan allow for many options to be set on a per tunnel basis. Pass through any options starting with ipsec_ to the connection in the configuration file. Administrators are responsible for picking valid key/value pairs. Signed-off-by: Andreas Karis <ak.karis@gmail.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	windows: Fix NEWS and add OVS version in FAQ.	Alin-Gabriel Serdean	2022-04-29	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \|	This patch removes the newly added NEWS entry and adds it as a leaf under post 2.17. Add OVS version instead of specifying that the feature is supported for IPv6 connection tracking and Genenve IPv6 tunnels. Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofp-monitor: Support flow monitoring for OpenFlow 1.3, 1.4+.	Vasu Dasari	2022-04-28	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extended OpenFlow monitoring support * OpenFlow 1.3 with ONF extensions * OpenFlow 1.4+ as defined in OpenFlow specification 1.4+. ONF extensions are similar to Nicira extensions except for onf_flow_monitor_request{} where out_port is defined as 32-bit number OF(1.1) number, oxm match formats are used in update and request messages. Flow monitoring support in 1.4+ is slightly different from Nicira and ONF extensions. * More flow monitoring flags are defined. * Monitor add/modify/delete command is introduced in flow_monitor request message. * Addition of out_group as part of flow_monitor request message Description of changes: 1. Generate ofp-msgs.inc to be able to support 1.3, 1.4+ flow Monitoring messages. include/openvswitch/ofp-msgs.h 2. Modify openflow header files with protocol specific headers. include/openflow/openflow-1.3.h include/openflow/openflow-1.4.h 3. Modify OvS abstraction of openflow headers. ofp-monitor.h leverages enums from on nicira extensions for creating protocol abstraction headers. OF(1.4+) enums are superset of nicira extensions. include/openvswitch/ofp-monitor.h 4. Changes to these files reflect encoding and decoding of new protocol messages. lib/ofp-monitor.c 5. Changes to modules using ofp-monitor APIs. Most of the changes here are to migrate enums from nicira to OF 1.4+ versions. ofproto/connmgr.c ofproto/connmgr.h ofproto/ofproto-provider.h ofproto/ofproto.c 6. Extended protocol decoding tests to verify all protocol versions FLOW_MONITOR_CANCEL FLOW_MONITOR_PAUSED FLOW_MONITOR_RESUMED FLOW_MONITOR request FLOW_MONITOR reply tests/ofp-print.at 7. Modify flow monitoring tests to be able executed by all protocol versions. tests/ofproto.at 7. Modified documentation highlighting the change utilities/ovs-ofctl.8.in NEWS Signed-off-by: Vasu Dasari <vdasari@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383915.html Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofp-monitor: Extend Flow Monitoring support for OF 1.0-1.2 with Nicira ↵	Vasu Dasari	2022-04-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extensions. Currently OVS supports flow-monitoring for OpenFlow 1.0 and Nicira Extenstions. Any other OpenFlow versioned messages are not accepted. This change will allow OpenFlow1.0-1.2 Flow Monitoring with Nicira extensions be accepted. Also made sure that flow-monitoring updates, flow monitoring pause messages, resume messages are sent in the same OpenFlow version as that of flow-monitor request. Description of changes: 1. Generate ofp-msgs.inc to be able to support 1.0-1.2 Flow Monitoring messages. include/openvswitch/ofp-msgs.h 2. Support vconn to accept user specified version and use it for vconn flow-monitoring session ofproto/ofproto.c 3. Modify APIs to use protocol as an argument to encode and decode messages include/openvswitch/ofp-monitor.h lib/ofp-monitor.c ofproto/connmgr.c ofproto/connmgr.h ofproto/ofproto.c 4. Modified following testcases to be verified across supported OF Versions ofproto - flow monitoring ofproto - flow monitoring with !own ofproto - flow monitoring with out_port ofproto - flow monitoring pause and resume ofproto - flow monitoring usable protocols tests/ofproto.at 5. Updated NEWS with the support added with this commit Signed-off-by: Vasu Dasari <vdasari@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-December/050820.html Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb-idl: Support write-only-changed IDL monitor mode.	Dumitru Ceara	2022-04-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At a first glance, change tracking should never be allowed for write-only columns. However, some clients (e.g., ovn-northd) that are mostly exclusive writers of a database, use change tracking to avoid duplicating the IDL row records into a local cache when implementing incremental processing. The default behavior of the IDL is to automatically turn a write-only column into a read-write column whenever the client enables change tracking for that column. For the afore mentioned clients, this becomes a performance issue. Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't change a column's value.") explains why writes that don't change a column's value cannot be optimized out early if the column is read/write. Furthermore, if there is at least one record in any table that changed during a transaction, then all records that have been written are added to the transaction, even if their values didn't change. If there are many such rows (e.g., like in ovn-northd's case) this incurs a significant overhead because: a. the client has to build this large transaction b. the transaction has to be sent over the network c. the server needs to parse this (mostly) no-op update We now introduce new IDL APIs allowing users to set a new monitoring mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the atomicity constraints may be relaxed and written columns that don't change value can be skipped from the current transaction. We benchmarked ovn-northd performance when using this new mode against NB and SB databases taken from ovn-kubernetes scale tests. We noticed that when a minor change is performed to the Northbound database (e.g., NB_Global.nb_cfg is incremented) the time it takes to build the Southbound transaction becomes negligible (vs ~1.5 seconds before this change). End-to-end ovn-kubernetes scale tests on 120-node clusters also show significant reduction of latency to bring up pods; both average and P99 latency decreased by ~30%. Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	datapath-windows: Add IPv6 conntrack support on Windows.	ldejing	2022-04-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implementation on Windows: Currently, IPv4 conntrack was supported on the windows platform. In this patch we have implemented ipv6 conntrack functions according to the current logic of the IPv4 conntrack. This implementation has included TcpV6(nat and normal scenario), UdpV6(nat and normal scenario), IcmpV6 conntrack of echo request/reply packet and FtpV6(nat and normal scenario). Testing Topology: On the Windows VM runs on the ESXi host, two hyper-v ports attached to the ovs bridge; one hyper-v port worked as client and the other port worked as server. Testing Case: 1. TcpV6 a) Tcp request/reply conntrack for normal scenario. In this scenario, 20::1 as client, 20::2 as server, it will generate following conntrack entry: (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=1556), reply(src=20::2,src_port=1556,dst=20::1,dst_port=1555),protocol=tcp) b) Tcp request/reply conntrack for nat scenario. In this scenario, 20::1 as client, 20::10 as floating ip, 21::3 as server, it will generate following conntrack entry: (Origin(src=20::1, src_port=1555, dst=20::10, dst_port=1556), reply(src=21::3, src_port=1556, dst=20::1, dst_port= 1555),protocol=tcp) 2. UdpV6 a) Udp request/reply conntrack for normal scenario. (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=1556), reply(src=20::2,src_port=1556,dst=20::1,dst_port=1555),protocol=udp) b) Udp request/reply conntrack for nat scenario. (Origin(src=20::1, src_port=1555, dst=20::10, dst_port=1556), reply(src=21::3, src_port=1556, dst=20::1, dst_port= 1555),protocol=udp) 3. IcmpV6: a) Icmpv6 request/reply conntrack for normal scenario. Currently Icmpv6 only support to construct conntrack for echo request/reply packet, take (20::1 -> 20::2) for example, it will generate following conntrack entry: (origin(src = 20::1, dst=20::2), reply(src=20::2, dst=20::1), protocol=icmp) b) Icmp request/reply conntrack for dnat scenario, for example (20::1->20::10->21::3), 20::1 is client, 20::10 is floating ip, 21::3 is server ip. It will generate flow like below: (origin(src=20::1, dst=20::10), reply(src=21::3, dst=20::1), protocol=icmp) 4. FtpV6 a) Ftp request/reply conntrack for normal scenario. In this scenario, take 20::1 as client, 20::2 as server, it will generate two conntrack entries: Ftp active mode (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=21), reply(src=20::2, src_port=21, dst=20::1, dst_port=1555), protocol=tcp) (Origin(src=20::2, src_port=20, dst=20::1, dst_port=1556), reply(src=20::1, src_port=1556, dst=20::2, dst_port=20), protocol=tcp) Ftp passive mode (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=21), reply(src=20::2,src_port=21,dst=20::1,dst_port=1555),protocol=tcp) (Origin(src=20::1, src_port=1556, dst=20::2, dst_port=1557), reply(src=20::2,src_port=1557, dst=20::1, dst_port=1556) protocol=tcp) b) Ftp request/reply conntrack for nat scenario. Ftp passive mode, In this secnario, 20::1 as client, 20::10 as floating ip, 21::3 as server ip. It will generate following flow: (Origin(src=20::1, src_port=1555, dst=20::10, dst_port=21), reply(src=21::3, src_port=21, dst=20::1, dst_port= 1555),protocol=tcp) (Origin(src=20::1, src_port=1556, dst=20::10, dst_port=1557), reply(src=21::3, src_port=1557, dst=20::1, dst_port= 1556),protocol=tcp) 5. Regression test for IpV4 in Antrea project (about 60 test case) Future work: 1) IcmpV6 redirect packet conntrack. 2) IpV6 fragment support on Udp. 3) Support napt for IPv6. 4) FtpV6 active mode for nat. Signed-off-by: ldejing <ldejing@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
*	NEWS: Highlight libopenvswitch API change caused by UB fixes.	Ilya Maximets	2022-04-08	1	-0/+11
\| \| \| \| \|	Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Set release date for 2.17.0.	Ilya Maximets	2022-03-24	1	-1/+3
\| \| \| \| \| \| \| \| \|	Added a NEWS entry for OVSDB performance because it is user-visible. It was not previously mentioned since it's an aggregated result of various commits. Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb: relay: Add transaction history support.	Ilya Maximets	2022-03-03	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even though relays can be scaled to the big number of servers to handle a lot more clients, lack of transaction history may cause significant load if clients are re-connecting. E.g. in case of the upgrade of a large-scale OVN deployment, relays can be taken down one by one forcing all the clients of one relay to jump to other ones. And all these clients will download the database from scratch from a new relay. Since relay itself supports monitor_cond_since connection to the main cluster, it receives the last transaction id along with each update. Since these transaction ids are 'eid's of actual transactions, they can be used by relay for a transaction history. Relay may not receive all the transaction ids, because the main cluster may combine several changes into a single monitor update. However, all relays will, likely, receive same updates with the same transaction ids, so the case where transaction id can not be found after re-connection between relays should not be very common. If some id is missing on the relay (i.e. this update was merged with some other update and newer id was used) the client will just re-download the database as if there was a normal transaction history miss. OVSDB client synchronization module updated to provide the last transaction id along with the update. Relay module updated to use these ids as a transaction id. If ids are zero, relay decides that the main server doesn't support transaction ids and disables the transaction history accordingly. Using ovsdb_txn_replay_commit() instead of ovsdb_txn_propose_commit_block(), so transactions are added to the history. This can be done, because relays has no file storage, so there is no need to write anything. Relay tests modified to test both standalone and clustered database as a main server. Checks added to ensure that all servers receive the same transaction ids in monitor updates. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	NEWS: Fix some typo.	David Marchand	2022-01-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	The experimantal typo got copy/paste a few times. Fixes: be56e063d028 ("netdev-offload-dpdk: Support tunnel pop action.") Fixes: e098c2f966cb ("netdev-dpdk-offload: Add vxlan pattern matching function.") Fixes: 7617d0583c73 ("netdev-offload-dpdk: Add support for matching on gre fields.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Prepare for post-2.17.0 (2.17.90).	Ilya Maximets	2022-01-19	1	-0/+4
\| \| \| \| \| \|	Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
*	Prepare for 2.17.0.	Ilya Maximets	2022-01-19	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
*	netdev-offload: Add multi-thread API.	Gaetan Rivet	2022-01-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expose functions reporting user configuration of offloading threads, as well as utility functions for multithreading. This will only expose the configuration knob to the user, while no datapath will implement the multiple thread request. This will allow implementations to use this API for offload thread management in relevant layers before enabling the actual dataplane implementation. The offload thread ID is lazily allocated and can as such be in a different order than the offload thread start sequence. The RCU thread will sometime access hardware-offload objects from a provider for reclamation purposes. In such case, it will get a default offload thread ID of 0. Care must be taken that using this thread ID is safe concurrently with the offload threads. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Documentation: Remove experimental tag for PMD ALB.	Kevin Traynor	2022-01-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	PMD Auto Load Balance was introduced as an experimental feature in OVS 2.11. It is used to detect that the Rx queue to PMD assignments are no longer balanced and it would be better to reassign. It is disabled by default, and can be enabled with: $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Documentation: Add USDT documentation and bpftrace example.	Eelco Chaudron	2022-01-18	1	-0/+1
\| \| \| \| \| \| \| \| \|	Add the USDT documentation and a bpftrace example using the bridge run USDT probes. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Introduce hash-based Tx packet steering mode.	Maxime Coquelin	2022-01-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new hash Tx steering mode that distributes the traffic on all the Tx queues, whatever the number of PMD threads. It would be useful for guests expecting traffic to be distributed on all the vCPUs. The idea here is to re-use the 5-tuple hash of the packets, already computed to build the flows batches (and so it does not provide flexibility on which fields are part of the hash). There are also no user-configurable indirection table, given the feature is transparent to the guest. The queue selection is just a modulo operation between the packet hash and the number of Tx queues. There are no (at least intentionnally) functionnal changes for the existing XPS and static modes. There should not be noticeable performance changes for these modes (only one more branch in the hot path). For the hash mode, performance could be impacted due to locking when multiple PMD threads are in use (same as XPS mode) and also because of the second level of batching. Regarding the batching, the existing Tx port output_pkts is not modified. It means that at maximum, NETDEV_MAX_BURST can be batched for all the Tx queues. A second level of batching is done in dp_netdev_pmd_flush_output_on_port(), only for this hash mode. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Encap & Decap actions for MPLS packet type.	Martin Varghese	2022-01-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The encap & decap actions are extended to support MPLS packet type. Encap & decap actions adds and removes MPLS header at start of the packet. The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the ethernet packets must be encapsulated inside MPLS tunnel. In this change the encap & decap actions are extended to support MPLS packet type. The encap & decap adds and removes MPLS header at the start of packet as depicted below. Encapsulation: Actions - encap(mpls),encap(ethernet) Incoming packet -> \| ETH \| IP \| Payload \| 1 Actions - encap(mpls) [Datapath action - ADD_MPLS:0x8847] Outgoing packet -> \| MPLS \| ETH \| Payload\| 2 Actions - encap(ethernet) [ Datapath action - push_eth ] Outgoing packet -> \| ETH \| MPLS \| ETH \| Payload\| Decapsulation: Incoming packet -> \| ETH \| MPLS \| ETH \| IP \| Payload \| Actions - decap(),decap(packet_type(ns=0,type=0)) 1 Actions - decap() [Datapath action - pop_eth) Outgoing packet -> \| MPLS \| ETH \| IP \| Payload\| 2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558] Outgoing packet -> \| ETH \| IP \| Payload\| Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-linux: Use matchall classifier for ingress policing.	Mike Pattrick	2022-01-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently ingress policing uses the basic classifier to apply traffic control filters if hardware offload is not enabled, in which case it uses matchall. This change changes the behavior to always use matchall, and fall back onto basic if the kernel is built without matchall support. The system tests are modified to allow either basic or matchall classification on the ingestion filter, and to allow either 10000 or 10240 packets for the packet burst filter. 10000 is accurate for kernel 5.14 and the most recent iproute2, however, 10240 is left for compatibility with older kernels. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Support running PMD threads on any core.	David Marchand	2022-01-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously in OVS, a PMD thread running on cpu X used lcore X. This assumption limited OVS to run PMD threads on physical cpu < RTE_MAX_LCORE. DPDK 20.08 introduced a new API that associates a non-EAL thread to a free lcore. This new API does not change the thread characteristics (like CPU affinity) and let OVS run its PMD threads on any cpu regardless of RTE_MAX_LCORE. The DPDK multiprocess feature is not compatible with this new API and is disabled. DPDK still limits the number of lcores to RTE_MAX_LCORE (128 on x86_64) which should be enough for OVS pmd threads (hopefully). DPDK lcore/OVS pmd threads mapping are logged at threads when trying to attach a OVS PMD thread, and when detaching. A new command is added to help get DPDK point of view of the DPDK lcores at any time: $ ovs-appctl dpdk/lcore-list lcore 0, socket 0, role RTE, cpuset 0 lcore 1, socket 0, role NON_EAL, cpuset 1 lcore 2, socket 0, role NON_EAL, cpuset 15 Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Forwarding optimization for flows with a simple match.	Ilya Maximets	2022-01-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are cases where users might want simple forwarding or drop rules for all packets received from a specific port, e.g :: "in_port=1,actions=2" "in_port=2,actions=IN_PORT" "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop" "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3" There are also cases where complex OpenFlow rules can be simplified down to datapath flows with very simple match criteria. In theory, for very simple forwarding, OVS doesn't need to parse packets at all in order to follow these rules. "Simple match" lookup optimization is intended to speed up packet forwarding in these cases. Design: Due to various implementation constraints userspace datapath has following flow fields always in exact match (i.e. it's required to match at least these fields of a packet even if the OF rule doesn't need that): - recirc_id - in_port - packet_type - dl_type - vlan_tci (CFI + VID) - in most cases - nw_frag - for ip packets Not all of these fields are related to packet itself. We already know the current 'recirc_id' and the 'in_port' before starting the packet processing. It also seems safe to assume that we're working with Ethernet packets. So, for the simple OF rule we need to match only on 'dl_type', 'vlan_tci' and 'nw_frag'. 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be combined in a single 64bit integer (mark) that can be used as a hash in hash map. We are using only VID and CFI form the 'vlan_tci', flows that need to match on PCP will not qualify for the optimization. Workaround for matching on non-existence of vlan updated to match on CFI and VID only in order to qualify for the optimization. CFI is always set by OVS if vlan is present in a packet, so there is no need to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside the simple match mark. New per-PMD flow table 'simple_match_table' introduced to store simple match flows only. 'dp_netdev_flow_add' adds flow to the usual 'flow_table' and to the 'simple_match_table' if the flow meets following constraints: - 'recirc_id' in flow match is 0. - 'packet_type' in flow match is Ethernet. - Flow wildcards contains only minimal set of non-wildcarded fields (listed above). If the number of flows for current 'in_port' in a regular 'flow_table' equals number of flows for current 'in_port' in a 'simple_match_table', we may use simple match optimization, because all the flows we have are simple match flows. This means that we only need to parse 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching. Now we make the unique flow mark from the 'in_port', 'dl_type', 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'. On successful lookup we don't need to run full 'miniflow_extract()'. Unsuccessful lookup technically means that we have no suitable flow in the datapath and upcall will be required. So, in this case EMC and SMC lookups are disabled. We may optimize this path in the future by bypassing the dpcls lookup too. Performance improvement of this solution on a 'simple match' flows should be comparable with partial HW offloading, because it parses same packet fields and uses similar flow lookup scheme. However, unlike partial HW offloading, it works for all port types including virtual ones. Performance results when compared to EMC: Test setup: virtio-user OVS virtio-user Testpmd1 ------------> pmd1 ------------> Testpmd2 (txonly) x<------ pmd2 <------------ (mac swap) Single stream of 64byte packets. Actions: in_port=vhost0,actions=vhost1 in_port=vhost1,actions=vhost0 Stats collected from pmd1 and pmd2, so there are 2 scenarios: Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2. Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1. Here the packet sent from pmd2 to Testpmd1 is always dropped, because the virtqueue is full since Testpmd1 is in txonly mode and doesn't receive any packets. This should be closer to the performance of a VM-to-Phy scenario. Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz. Table below represents improvement in throughput when compared to EMC. +----------------+------------------------+------------------------+ \| \| Default (-g -O2) \| "-Ofast -march=native" \| \| Scenario +------------+-----------+------------+-----------+ \| \| GCC \| Clang \| GCC \| Clang \| +----------------+------------+-----------+------------+-----------+ \| Virt-to-Virt \| +18.9% \| +25.5% \| +10.8% \| +16.7% \| \| Virt-to-NoCopy \| +24.3% \| +33.7% \| +14.9% \| +22.0% \| +----------------+------------+-----------+------------+-----------+ For Phy-to-Phy case performance improvement should be even higher, but it's not the main use-case for this functionality. Performance difference for the non-simple flows is within a margin of error. Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: idl: Add monitor_cond_since support.	Terry Wilson	2022-01-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Add support for monitor_cond_since / update3 to python-ovs to allow more efficient reconnections when connecting to clustered OVSDB servers. Signed-off-by: Terry Wilson <twilson@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	tnl-neigh-cache: Add tnl/neigh/aging command.	Paolo Valerio	2021-12-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with the command is now possible to change the aging time of the cache entries. For the existing entries the aging time is updated only if the current expiration is greater than the new one. In any case, the next refresh will set it to the new value. This is intended mostly for debugging purpose. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-dpdk: Add support for matching on gre fields.	Nir Anteby	2021-12-16	1	-0/+2
\| \| \| \| \| \| \| \| \|	Add parsing gre match fields. Signed-off-by: Nir Anteby <nanteby@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Use --in-memory by default.	Rosemarie O'Riorden	2021-12-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	If anonymous memory mapping is supported by the kernel, it's better to run OVS entirely in memory rather than creating shared data structures. OVS doesn't work in multi-process mode, so there is no need to litter a filesystem. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1949849 Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Update to use DPDK v21.11.	Ian Stokes	2021-12-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds support for DPDK v21.11, it includes the following changes. 1. ci: Install python elftools for DPDK 21.02. 2. ci: Update meson requirement for DPDK 21.05. 3. netdev-dpdk: Fix build with 21.05. 4. ci: Compile DPDK in non developer mode. http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=* 5. netdev-dpdk: Remove access to DPDK internals. 6. netdev-dpdk: Remove unused attribute from rte_flow rule. 7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1. 8. netdev-dpdk: Fix vhost namespace with 21.11-rc2. http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=* In addition documentation and DPDK unit tests were also updated in this commit for use with DPDK v21.11. For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Emma Finn <emma.finn"intel.com> Tested-by: Seamus Ryan <seamus.ryan@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	ofproto-dpif: Increase dp_hash default max buckets.	Mike Pattrick	2021-12-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when a user creates an openflow group with with multiple buckets without specifying a selection type, the efficient dp_hash is only selected if the user is creating fewer than 64 buckets. But when dp_hash is explicitly selected, up to 256 buckets are supported. While up to 64 buckets seems like a lot, certain OVN/Open Stack workloads could result in the user creating more than 64 buckets. For example, when using OVN to load balance. This patch increases the default maximum from 64 to 256. This change to the default limit doesn't affect how many buckets are actually created, that is specified by the user when the group is created, just how traffic is distributed across buckets. Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	match: Do not print "igmp" match keyword.	Adrian Moreno	2021-11-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The match keyword "igmp" is not supported in ofp-parse, which means that flow dumps cannot be restored. Previously a workaround was added to ovs-save to avoid changing output in stable branches. This patch changes the output to print igmp match in the accepted ofp-parse format (ip,nw_proto=2) and print igmp_type/code as generic tp_src/dst. Tests are added, and NEWS is updated to reflect this change. The workaround in ovs-save is still included to ensure that flows can be restored when upgrading an older ovs-vswitchd. This workaround should be removed in later versions. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Salvatore Daniele <sdaniele@redhat.com> Co-authored-by: Salvatore Daniele <sdaniele@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpctl: dpif: Allow viewing and configuring dp cache sizes.	Eelco Chaudron	2021-11-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a general way of viewing/configuring datapath cache sizes. With an implementation for the netlink interface. The ovs-dpctl/ovs-appctl show commands will display the current cache sizes configured: $ ovs-dpctl show system@ovs-system: lookups: hit:25 missed:63 lost:0 flows: 0 masks: hit:282 total:0 hit/pkt:3.20 cache: hit:4 hit-rate:4.54% caches: masks-cache: size:256 port 0: ovs-system (internal) port 1: br-int (internal) port 2: genev_sys_6081 (geneve: packet_type=ptap) port 3: br-ex (internal) port 4: eth2 port 5: sw0p1 (internal) port 6: sw0p3 (internal) A specific cache can be configured as follows: $ ovs-appctl dpctl/cache-set-size DP CACHE SIZE $ ovs-dpctl cache-set-size DP CACHE SIZE For example to disable the cache do: $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0 Setting cache size successful, new size 0. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Replace pyOpenSSL with ssl.	Timothy Redaelli	2021-11-03	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, pyOpenSSL is half-deprecated upstream and so it's removed on some distributions (for example on CentOS Stream 9, https://issues.redhat.com/browse/CS-336), but since OVS only supports Python 3 it's possible to replace pyOpenSSL with "import ssl" included in base Python 3. Stream recv and send had to be splitted as _recv and _send, since SSLError is a subclass of socket.error and so it was not possible to except for SSLWantReadError and SSLWantWriteError in recv and send of SSLStream. TCPstream._open cannot be used in SSLStream, since Python ssl module requires the SSL socket to be created before connecting it, so SSLStream._open needs to create the socket, create SSL socket and then connect the SSL socket. Reported-by: Timothy Redaelli <tredaelli@redhat.com> Reported-at: https://bugzilla.redhat.com/1988429 Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Terry Wilson <twilson@redhat.com> Tested-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-dpdk: Don't ignore frags as they are handled.	Eli Britstein	2021-09-16	1	-0/+2
\| \| \| \| \| \| \|	Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Set release date for 2.16.0.	Ilya Maximets	2021-08-17	1	-1/+1
\| \| \| \| \| \|	Acked-by: Numan Siddique <numans@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Stop configuring socket-limit with the value of socket-mem.	Rosemarie O'Riorden	2021-07-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change removes the automatic memory limit on start-up of OVS with DPDK. As DPDK supports dynamic memory allocation, there is no need to limit the amount of memory available, if not requested. Currently, if socket-limit is not configured, it is set to the value of socket-mem. With this change, the user can decide to set it or have no memory limit. Removed logs that announce this change and fixed documentation. Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Remove default values for socket-mem and limit.	Rosemarie O'Riorden	2021-07-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change removes the default values for EAL args socket-mem and socket-limit. As DPDK supports dynamic memory allocation, there is no need to allocate a certain amount of memory on start-up, nor limit the amount of memory available, if not requested. Currently, socket-mem has a default value of 1024 when it is not configured by the user, and socket-limit takes on the value of socket-mem, 1024, by default. With this change, socket-mem is not configured by default, meaning that socket-limit is not either. Neither, either or both options can be set. Removed extra logs that announce this change and fixed documentation. Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Prepare for post-2.16.0 (2.16.90).	Ilya Maximets	2021-07-16	1	-0/+4
\| \| \| \| \|	Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	Prepare for 2.16.0.	Ilya Maximets	2021-07-16	1	-1/+1
\| \| \| \| \|	Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netlink: Introduce per-cpu upcall dispatch.	Mark Gray	2021-07-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is correspondingly a large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. Reported-at: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Allow pin rxq and non-isolate PMD.	Kevin Traynor	2021-07-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pinning an rxq to a PMD with pmd-rxq-affinity may be done for various reasons such as reserving a full PMD for an rxq, or to ensure that multiple rxqs from a port are handled on different PMDs. Previously pmd-rxq-affinity always isolated the PMD so no other rxqs could be assigned to it by OVS. There may be cases where there is unused cycles on those pmds and the user would like other rxqs to also be able to be assigned to it by OVS. Add an option to pin the rxq and non-isolate the PMD. The default behaviour is unchanged, which is pin and isolate the PMD. In order to pin and non-isolate: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false Note this is available only with group assignment type, as pinning conflicts with the operation of the other rxq assignment algorithms. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	dpif-netdev: Add group rxq scheduling assignment type.	Kevin Traynor	2021-07-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an rxq scheduling option that allows rxqs to be grouped on a pmd based purely on their load. The current default 'cycles' assignment sorts rxqs by measured processing load and then assigns them to a list of round robin PMDs. This helps to keep the rxqs that require most processing on different cores but as it selects the PMDs in round robin order, it equally distributes rxqs to PMDs. 'cycles' assignment has the advantage in that it separates the most loaded rxqs from being on the same core but maintains the rxqs being spread across a broad range of PMDs to mitigate against changes to traffic pattern. 'cycles' assignment has the disadvantage that in order to make the trade off between optimising for current traffic load and mitigating against future changes, it tries to assign and equal amount of rxqs per PMD in a round robin manner and this can lead to a less than optimal balance of the processing load. Now that PMD auto load balance can help mitigate with future changes in traffic patterns, a 'group' assignment can be used to assign rxqs based on their measured cycles and the estimated running total of the PMDs. In this case, there is no restriction about keeping equal number of rxqs per PMD as it is purely load based. This means that one PMD may have a group of low load rxqs assigned to it while another PMD has one high load rxq assigned to it, as that is the best balance of their measured loads across the PMDs. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	ofproto-dpif: APIs and CLI option to add/delete static fdb entry.	Vasu Dasari	2021-07-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 side. For L2 side, there is only fdb show command. This commit gives an option to add/del an fdb entry via ovs-appctl. CLI command looks like: To add: ovs-appctl fdb/add <bridge> <port> <vlan> <Mac> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 To del: ovs-appctl fdb/del <bridge> <vlan> <Mac> ovs-appctl fdb/del br0 0 50:54:00:00:00:05 Added two new APIs to provide convenient interface to add and delete static-macs. bool xlate_add_static_mac_entry(const struct ofproto_dpif , ofp_port_t in_port, struct eth_addr dl_src, int vlan); bool xlate_delete_static_mac_entry(const struct ofproto_dpif , struct eth_addr dl_src, int vlan); 1. Static entry should not age. To indicate that entry being programmed is a static entry, 'expires' field in 'struct mac_entry' will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value is made while deleting mac entry as part of regular aging process. 2. Another change to the mac-update logic, when a packet with same dl_src as that of a static-mac entry arrives on any port, the logic will not modify the expires field. 3. While flushing fdb entries, made sure static ones are not evicted. 4. Updated "ovs-appctl fdb/stats-show br0" to display number of static entries in switch Added following tests: ofproto-dpif - static-mac add/del/flush ofproto-dpif - static-mac mac moves Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 Signed-off-by: Vasu Dasari <vdasari@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpdk: Logs to announce removal of defaults for socket-mem and limit.	Rosemarie O'Riorden	2021-07-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Deprecate current OVS provided defaults for DPDK socket-mem and socket-limit that are planned to be removed in OVS 2.17. At that point DPDK defaults will be used instead. Warnings have been added to alert users in advance. Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	python: Add cooperative_yield() API method to Idl.	Terry Wilson	2021-07-16	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	When using eventlet monkey_patch()'d code, greenthreads can be blocked on connection for several seconds while the database contents are parsed. Eventlet recommends adding a sleep(0) call to cooperatively yield in cpu-bound code. asyncio code has asyncio.sleep(0). This patch adds an API method that defaults to doing nothing, but can be overridden to yield as needed. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev/mfex: Add more AVX512 traffic profiles	Harry van Haaren	2021-07-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds 3 new traffic profile implementations to the existing avx512 miniflow extract infrastructure. The profiles added are: - Ether()/IP()/TCP() - Ether()/Dot1Q()/IP()/UDP() - Ether()/Dot1Q()/IP()/TCP() The design of the avx512 code here is for scalability to add more traffic profiles, as well as enabling CPU ISA. Note that an implementation is primarily adding static const data, which the compiler then specializes away when the profile specific function is declared below. As a result, the code is relatively maintainable, and scalable for new traffic profiles as well as new ISA, and does not lower performance compared with manually written code for each profile/ISA. Note that confidence in the correctness of each implementation is achieved through autovalidation, unit tests with known packets, and fuzz tested packets. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	dpdk: Add additional CPU ISA detection strings	Harry van Haaren	2021-07-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit enables OVS to at runtime check for more detailed AVX512 capabilities, specifically Byte and Word (BW) extensions, and Vector Bit Manipulation Instructions (VBMI). These instructions will be used in the CPU ISA optimized implementations of traffic profile aware miniflow extract. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	dpif-netdev: Add configure to enable autovalidator at build time.	Kumar Amber	2021-07-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a new command to allow the user to enable autovalidatior by default at build time thus allowing for runnig unit test by default. $ ./configure --enable-mfex-default-autovalidator Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	dpif-netdev: Add study function to select the best mfex function	Kumar Amber	2021-07-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The study function runs all the available implementations of miniflow_extract and makes a choice whose hitmask has maximum hits and sets the mfex to that function. Study can be run at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set study Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	dpif-netdev: Add auto validation function for miniflow extract	Kumar Amber	2021-07-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduced the auto-validation function which allows users to compare the batch of packets obtained from different miniflow implementations against the linear miniflow extract and return a hitmask. The autovaidator function can be triggered at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	dpif-netdev: Add command line and function pointer for miniflow extract	Kumar Amber	2021-07-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces the MFEX function pointers which allows the user to switch between different miniflow extract implementations which are provided by the OVS based on optimized ISA CPU. The user can query for the available minflow extract variants available for that CPU by following commands: $ovs-appctl dpif-netdev/miniflow-parser-get Similarly an user can set the miniflow implementation by the following command : $ ovs-appctl dpif-netdev/miniflow-parser-set name This allows for more performance and flexibility to the user to choose the miniflow implementation according to the needs. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
*	docs: Add documentation for ovsdb relay mode.	Ilya Maximets	2021-07-15	1	-0/+3
\| \| \| \| \| \| \| \| \|	Main documentation for the service model and tutorial with the use case and configuration examples. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpcls-avx512: Enable avx512 vector popcount instruction.	Harry van Haaren	2021-07-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit enables the AVX512-VPOPCNTDQ Vector Popcount instruction. This instruction is not available on every CPU that supports the AVX512-F Foundation ISA, hence it is enabled only when the additional VPOPCNTDQ ISA check is passed. The vector popcount instruction is used instead of the AVX512 popcount emulation code present in the avx512 optimized DPCLS today. It provides higher performance in the SIMD miniflow processing as that requires the popcount to calculate the miniflow block indexes. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>