delta/openvswitch.git - github.com: openvswitch/ovs.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Eliminate "whitelist" and "blacklist" terms.	Ben Pfaff	2020-10-16	4	-9/+10
\| \| \| \| \| \| \| \|	There is one remaining use under datapath. That change should happen upstream in Linux first according to our usual policy. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
*	Use primary/secondary, not master/slave, as names for OpenFlow roles.	Ben Pfaff	2020-10-16	3	-97/+98
\| \| \| \| \|	Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
*	ovsdb-idl.at: Queue for termination all OVSDB IDL pids.	Alin Gabriel Serdean	2020-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	When running OVSDB cluster tests on Windows not all the ovsdb processes are terminated. Queue up the pids of the started processes for termination when the test stops. Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-userspace-packet-type-aware.at: Wait for ip address updates.	Ilya Maximets	2020-10-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ovs-router module checks for the source ip address of the interface while adding a new route. netdev module doesn't request ip addresses from the system every time, but instead it caches currently assigned ip addresses and updates the cache on netlink notifications if needed. So, there is a slight delay between setting ip address on interface in a system and a moment OVS updates list of ip addresses of this interface. If route addition happens within this time frame, it fails with the following error: # ovs-appctl ovs/route/add 10.0.0.0/24 br-p1 Error while inserting route. ovs-appctl: ovs-vswitchd: server returned an error This makes system tests to fail frequently. Let's wait until local route successfully added. This will mean that OVS finished processing of a netlink event and will use up to date list of ip addresses on desired interface. Fixes: 526cf4e1d6a8 ("tests: Added unit tests in packet-type-aware.at") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
*	windows, tests: Strip EOL characters when passing them to tasklist	Alin Gabriel Serdean	2020-09-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	When running OVSDB cluster tests on Windows not all the ovsdb processes are terminated. Strip carriage return and newline of the arguments passed to the kill command because they will cause problems when passing them to tasklist and taskkill. Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb-idl.at: Wait all servers to join the cluster.	Flavio Leitner	2020-09-09	1	-21/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test 'Check Python IDL reconnects to leader - Python3 (leader only)' fails sometimes when the first ovsdb-server gets killed before the others had joined the cluster. Fix the function ovsdb_cluster_start_idltest to wait them to join the cluster. Fixes: c39751e44539 ("python: Monitor Database table to manage lifecycle of IDL client.") Co-authored-by:: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	python: Fixup python shebangs to python3.	Greg Rose	2020-08-26	4	-6/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Builds on RHEL 8.2 systems are failing due to this issue. See [1] as to why this is necessary. I used the following command to identify files that need this fix: find . -type f -executable \| /usr/lib/rpm/redhat/brp-mangle-shebangs I also updated the copyright notices as needed. 1. https://fedoraproject.org/wiki/Changes/Make_ambiguous_python_shebangs_error Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.") Signed-off-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	test-conntrack: Fix conntrack benchmark by clearing conntrack metadata.	Ilya Maximets	2020-08-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Packets in the benchmark must be treated as new packets, i.e. they should not have conntrack metadata set. Current code will set up 'pkt->md.conn' after the first run and all subsequent calls will hit the 'fast' processing that is intended for recirculated packets making a false impression that current conntrack implementation is lightning fast. Before the change: $ ./ovstest test-conntrack benchmark 4 33554432 32 1 conntrack: 1059 ms After (correct): $ ./ovstest test-conntrack benchmark 4 33554432 32 1 conntrack: 92785 ms Fixes: 594570ea1cde ("conntrack: Optimize recirculations.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com>
*	connmgr: Support changing openflow versions without restarting.	Aaron Conole	2020-08-17	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When commit a0baa7dfa4fe ("connmgr: Make treatment of active and passive connections more uniform") was applied, it did not take into account that a reconfiguration of the allowed_versions setting would require a reload of the ofservice object (only accomplished via a restart of OvS). For now, during the reconfigure cycle, we delete the ofservice object and then recreate it immediately. A new test is added to ensure we do not break this behavior again. Fixes: a0baa7dfa4fe ("connmgr: Make treatment of active and passive connections more uniform") Suggested-by: Ben Pfaff <blp@ovn.org> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1782834 Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Numan Siddique <numans@ovn.org> Tested-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ovsdb-server: Replace in-memory DB contents at raft install_snapshot.	Dumitru Ceara	2020-08-06	3	-3/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every time a follower has to install a snapshot received from the leader, it should also replace the data in memory. Right now this only happens when snapshots are installed that also change the schema. This can lead to inconsistent DB data on follower nodes and the snapshot may fail to get applied. Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.") Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	odp-util: Fix clearing match mask if set action is partially unnecessary.	Ilya Maximets	2020-07-29	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While committing set() actions, commit() could wildcard all the fields that are same in match key and in the set action. This leads to situation where mask after commit could actually contain less bits than it was before. And if set action was partially committed, all the fields that were the same will be cleared out from the matching key resulting in the incorrect (too wide) flow. For example, for the flow that matches on both src and dst mac addresses, if the dst mac is the same and only src should be changed by the set() action, destination address will be wildcarded in the match key and will never be matched, i.e. flows with any destination mac will match, which is not correct. Setting OF rule: in_port=1,dl_src=50:54:00:00:00:09 actions=mod_dl_dst(50:54:00:00:00:0a),output(2) Sending following packets on port 1: 1. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800) 2. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),eth_type(0x0800) 3. eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800) Resulted datapath flows: eth(dst=50:54:00:00:00:0c),<...>, actions:set(eth(dst=50:54:00:00:00:0a)),2 eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2 The first flow doesn't have any match on source MAC address and the third packet successfully matched on it while it must be dropped. Fix that by updating the match mask with only the new bits set by commit(), but keeping those that were cleared (OR operation). With fix applied, resulted correct flows are: eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2 eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),<...>, actions:set(eth(dst=50:54:00:00:00:0a)),2 eth(src=50:54:00:00:00:0b),<...>, actions:drop The code before commit dbf4a92800d0 was not able to reduce the mask, it was only possible to expand it to exact match, so it was OK to update original matching mask with the new value in all cases. Fixes: dbf4a92800d0 ("odp-util: Do not rewrite fields with the same values as matched") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1854376 Acked-by: Eli Britstein <elibr@mellanox.com> Tested-by: Adrián Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	bfd: Support overlay BFD	Yifeng Sun	2020-07-27	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current OVS intercepts and processes all BFD packets, thus VM-2-VM BFD packets get lost and the recipient VM never sees them. This patch fixes it by only intercepting and processing BFD packets destined to a configured BFD instance, and other BFD packets are made available to the OVS flow table for forwarding. This patch keeps BFD's backward compatibility. VMware-BZ: #2579326 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	tests: Refactor the iptables accept rule.	William Tu	2020-07-27	2	-10/+9
\| \| \| \| \| \| \| \| \|	Certain Linux distributions, like CentOS, have default iptable rules to reject input traffic from br-underlay. Refactor by creating a macro 'IPTABLES_ACCEPT([bridge])' for adding the accept rule to the iptable input chain. Signed-off-by: William Tu <u9012063@gmail.com>
*	dpif-netdev.at: Wait for miss upcall log.	Ilya Maximets	2020-07-23	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some tests checks for 'miss upcall' log in a log file immediately after sending the packet, this causes test failures while running them under valgrind or on the overloaded system. Fix that by waiting for appearance of the actual string in the log file. Some other tests uses 'sleep 1' to fix that, but it's better to wait for event than sleep for a specific amount of time. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
*	ovs-router: Fix flushing of local routes.	Ilya Maximets	2020-07-21	3	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 8e4e45887ec3, priority of 'local' route entries no longer matches with 'plen'. This should be taken into account while flushing cached routes, otherwise they will remain in OVS even after removing them from the system: # ifconfig eth0 11.0.0.1 # ovs-appctl ovs/route/show --- A new route synchronized from kernel route table --- Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local # ifconfig eth0 0 # ovs-appctl ovs/route/show -- the new route entry is still in ovs route table --- Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local CC: wenxu <wenxu@ucloud.cn> Fixes: 8e4e45887ec3 ("ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses") Reported-by: Zheng Jingzhou <glovejmm@163.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-July/373093.html Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
*	bond: Add 'primary' interface concept for active-backup mode.	Jeff Squyres	2020-07-17	2	-14/+246
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In AB bonding, if the current active slave becomes disabled, a replacement slave is arbitrarily picked from the remaining set of enabled slaves. This commit adds the concept of a "primary" slave: an interface that will always be (or become) the current active slave if it is enabled. The rationale for this functionality is to allow the designation of a preferred interface for a given bond. For example: 1. Bond is created with interfaces p1 (primary) and p2, both enabled. 2. p1 becomes the current active slave (because it was designated as the primary). 3. Later, p1 fails/becomes disabled. 4. p2 is chosen to become the current active slave. 5. Later, p1 becomes re-enabled. 6. p1 is chosen to become the current active slave (because it was designated as the primary) Note that p1 becomes the active slave once it becomes re-enabled, even if nothing has happened to p2. This "primary" concept exists in Linux kernel network interface bonding, but did not previously exist in OVS bonding. Only one primary slave interface is supported per bond, and is only supported for active/backup bonding. The primary slave interface is designated via "other_config:bond-primary" when creating a bond. Also, while adding tests for the "primary" concept, make a few small improvements to the non-primary AB bonding test. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Don't use zero flow mark.	Eli Britstein	2020-07-08	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Zero flow mark is used to indicate the HW to remove the mark. A packet marked with zero mark is received in SW without a mark at all, so it cannot be used as a valid mark. Change the pool range to fix it. Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roni Bar Yanai <roniba@mellanox.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev: Add mega ufid in flow add/del log.	Eli Britstein	2020-07-08	2	-1/+4
\| \| \| \| \| \| \| \| \|	As offload is done using the mega ufid of a flow, for better debugability, add it in the log message. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roni Bar Yanai <roniba@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	ofproto: Delete buckets when lb_output is false.	Adrian Moreno	2020-07-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	When lb-output-action is toggled back to "false" buckets are not being deleted. Delete them as they will no longer be used. Add unit test to verify buckets are correctly deleted. Cc: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	bridge: Fix null dereference on ct_timeout_policy record	Yi-Hung Wei	2020-06-27	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Accoridng to vswitch.ovsschema, each CT_Zone record may have zero or one associcated CT_Timeout_policy. Thus, this patch checks if ovsrec_ct_timeout_policy exist before accesses the record. VMWare-BZ: 2585825 Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.") Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	ofproto-dpif.at: Add unit test for lb_output action.	Matteo Croce	2020-06-22	1	-3/+28
\| \| \| \| \| \| \| \| \|	Extend the balance-tcp one so it tests lb-output action too. The test checks that that the option is shown in bond/show, and that the lb_output action is programmed in the datapath. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	userspace: Avoid dp_hash recirculation for balance-tcp bond mode.	Vishal Deep Ajmera	2020-06-22	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used only for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-tc: Revert tunnel src/dst port masks handling	Roi Dayan	2020-06-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cited commit intended to add tc support for masking tunnel src/dst ips and ports. It's not possible to do tunnel ports masking with openflow rules and the default mask for tunnel ports set to 0 in tnl_wc_init(), unlike tunnel ports default mask which is full mask. So instead of never passing tunnel ports to tc, revert the changes to tunnel ports to always pass the tunnel port. In sw classification is done by the kernel, but for hw we must match the tunnel dst port. Fixes: 5f568d049130 ("netdev-offload-tc: Allow to match the IP and port mask of tunnel") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
*	ofproto-dpif-trace: Improve NAT tracing.	Dumitru Ceara	2020-06-16	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ofproto/trace detects a recirc action it resumes execution at the specified next table. However, if the ct action performs SNAT/DNAT, e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and ports in the oftrace_recirc_node->flow field are not updated. This leads to misleading outputs from ofproto/trace as real packets would actually first get NATed and might match different flows when recirculated. Assume the first IP/port from the NAT src/dst action will be used by conntrack for the translation and update the oftrace_recirc_node->flow accordingly. This is not entirely correct as conntrack might choose a different IP/port but the result is more realistic than before. This fix covers new connections. However, for reply traffic that executes actions of the form ct(nat, table=42) we still don't update the flow as we don't have any information about conntrack state when tracing. Also move the oftrace_recirc_node processing out of ofproto_trace() and to its own function, ofproto_trace_recirc_node() for better readability/ Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	ovsdb-idl: Avoid inconsistent IDL state with OVSDB_MONITOR_V3.	Dumitru Ceara	2020-06-15	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3 (i.e., "monitor_cond_since" method) with the initial monitor condition MC1. Assuming the following two transactions are executed on the ovsdb-server: TXN1: "insert record R1 in table T1" TXN2: "insert record R2 in table T2" If the client's monitor condition MC1 for table T2 matches R2 then the client will receive the following update3 message: method="update3", "insert record R2 in table T2", last-txn-id=TXN2 At this point, if the presence of the new record R2 in the IDL triggers the client to update its monitor condition to MC2 and add a clause for table T1 which matches R1, a monitor_cond_change message is sent to the server: method="monitor_cond_change", "clauses from MC2" In normal operation the ovsdb-server will reply with a new update3 message of the form: method="update3", "insert record R1 in table T1", last-txn-id=TXN2 However, if the connection drops in the meantime, this last update might get lost. It might happen that during the reconnect a new transaction happens that modifies the original record R1: TXN3: "modify record R1 in table T1" When the client reconnects, it will try to perform a fast resync by sending: method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2 Because TXN2 is still in the ovsdb-server transaction history, the server replies with the changes from the most recent transactions only, i.e., TXN3: result="true", last-txbb-id=TXN3, "modify record R1 in table T1" This causes the IDL on the client in to end up in an inconsistent state because it has never seen the update that created R1. Such a scenario is described in: https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22 To avoid this issue, the IDL will now maintain (up to) 3 different types of conditions for each DB table: - new_cond: condition that has been set by the IDL client but has not yet been sent to the server through monitor_cond_change. - req_cond: condition that has been sent to the server but the reply acknowledging the change hasn't been received yet. - ack_cond: condition that has been acknowledged by the server. Whenever the IDL FSM is restarted (e.g., voluntary or involuntary disconnect): - if there is a known last_id txn-id the code ensures that new_cond will contain the most recent condition set by the IDL client (either req_cond if there was a request in flight, or new_cond if the IDL client set a condition while the IDL was disconnected) - if there is no known last_id txn-id the code ensures that ack_cond will contain the most recent conditions set by the IDL client regardless whether they were acked by the server or not. When monitor_cond_since/monitor_cond requests are sent they will always include ack_cond and if new_cond is not NULL a follow up monitor_cond_change will be generated afterwards. On the other hand ovsdb_idl_db_set_condition() will always modify new_cond. This ensures that updates of type "insert" that happened before the last transaction known by the IDL but didn't match old monitor conditions are sent upon reconnect if the monitor condition has changed to include them in the meantime. Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	netdev-offload-tc: Allow to match the IP and port mask of tunnel	Tonghao Zhang	2020-06-03	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows users to offload the TC flower rules with tunnel mask. This patch allows masked match of the following, where previously supported an exact match was supported: * Remote (dst) tunnel endpoint address * Local (src) tunnel endpoint address * Remote (dst) tunnel endpoint UDP port And also allows masked match of the following, where previously no match was supported: * Local (src) tunnel endpoint UDP port In some case, mask is useful as wildcards. For example, DDOS, in that case, we don’t want to allow specified hosts IPs or only source Ports to access the targeted host. For example: $ ovs-appctl dpctl/add-flow "tunnel(dst=2.2.2.100,src=2.2.2.0/255.255.255.0,tp_dst=4789),\ recirc_id(0),in_port(3),eth(),eth_type(0x0800),ipv4()" "" $ tc filter show dev vxlan_sys_4789 ingress ... eth_type ipv4 enc_dst_ip 2.2.2.100 enc_src_ip 2.2.2.0/24 enc_dst_port 4789 enc_ttl 64 in_hw in_hw_count 2 action order 1: gact action drop ... Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
*	classifier: Prevent tries vs n_tries race leading to NULL dereference.	Eiichi Tsukata	2020-05-28	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently classifier tries and n_tries can be updated not atomically, there is a race condition which can lead to NULL dereference. The race can happen when main thread updates a classifier tries and n_tries in classifier_set_prefix_fields() and at the same time revalidator or handler thread try to lookup them in classifier_lookup__(). Such race can be triggered when user changes prefixes of flow_table. Race(user changes flow_table prefixes: ip_dst,ip_src => none): [main thread] [revalidator/handler thread] =========================================================== /* cls->n_tries == 2 / for (int i = 0; i < cls->n_tries; i++) { trie_init(cls, i, NULL); / n_tries == 0 / cls->n_tries = n_tries; / cls->tries[i]->feild is NULL / trie_ctx_init(&trie_ctx[i],&cls->tries[i]); / trie->field is NULL */ ctx->be32ofs = trie->field->flow_be32ofs; To prevent the race, instead of re-introducing internal mutex implemented in the commit fccd7c092e09 ("classifier: Remove internal mutex."), this patch makes trie field RCU protected and checks it after read. Fixes: fccd7c092e09 ("classifier: Remove internal mutex.") Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	oss-fuzz: Fix miniflow_target.c.	William Tu	2020-05-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clang reports: tests/oss-fuzz/miniflow_target.c:209:26: error: suggest braces around \ initialization of subobject [-Werror,-Wmissing-braces] struct flow flow2 = {0}; Fix it by using memset. Cc: Bhargava Shastry <bshastry@sect.tu-berlin.de> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	metaflow: Fix maskable conntrack orig tuple fields	Yi-Hung Wei	2020-05-14	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \|	From man ovs-fields(7), the conntrack origin tuple fields ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed to be bitwise maskable, but they are not. This patch enables those fields to be maskable, and adds a regression test. Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.") Reported-by: Wenying Dong <wenyingd@vmware.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	tests: Add tests using tap device.	William Tu	2020-05-14	3	-0/+36
\| \| \| \| \| \| \| \| \|	Similar to using veth across namespaces, this patch creates tap devices, assigns to namespaces, and allows traffic to go through different test cases. Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: William Tu <u9012063@gmail.com>
*	userspace: Enable TSO support for non-DPDK.	William Tu	2020-05-14	4	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables TSO support for non-DPDK use cases, and also add check-system-tso testsuite. Before TSO, we have to disable checksum offload, allowing the kernel to calculate the TCP/UDP packet checsum. With TSO, we can skip the checksum validation by enabling checksum offload, and with large packet size, we see better performance. Consider container to container use cases: iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1) And I got around 6Gbps, similar to TSO with DPDK-enabled. Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: William Tu <u9012063@gmail.com>
*	userspace: Add conntrack timeout policy support.	William Tu	2020-05-01	3	-11/+30
\| \| \| \| \| \| \| \| \| \| \|	Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout policy support") adds conntrack timeout policy for kernel datapath. This patch enables support for the userspace datapath. I tested using the 'make check-system-userspace' which checks the timeout policies for ICMP and UDP cases. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
*	ofp-actions: Add delete field action	Yi-Hung Wei	2020-04-29	2	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new OpenFlow action, delete field, to delete a field in packets. Currently, only the tun_metadata fields are supported. One use case to add this action is to support multiple versions of geneve tunnel metadatas to be exchanged among different versions of networks. For example, we may introduce tun_metadata2 to replace old tun_metadata1, but still want to provide backward compatibility to the older release. In this case, in the new OpenFlow pipeline, we would like to support the case to receive a packet with tun_metadata1, do some processing. And if the packet is going to a switch in the newer release, we would like to delete the value in tun_metadata1 and set a value into tun_metadata2. Currently, ovs does not provide an action to remove a value in tun_metadata if the value is present. This patch fulfills the gap by adding the delete_field action. For example, the OpenFlow syntax to delete tun_metadata1 is: actions=delete_field:tun_metadata1 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: William Tu <u9012063@gmail.com>
*	netdev-afxdp: Add interrupt mode netdev class.	William Tu	2020-04-28	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \|	The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp interrupt mode. This is similar to 'type=afxdp', except that the is_pmd field is set to false. As a result, the packet processing is handled by main thread, not pmd thread. This avoids burning the CPU to always 100% when there is no traffic. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	system-traffic: Check frozen state handling with TLV map change	Yifeng Sun	2020-04-10	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enhances a system traffic test to prevent regression on the tunnel metadata table (tun_table) handling with frozen state. Without a proper fix this test can crash ovs-vswitchd due to a use-after-free bug on tun_table. These are the timed sequence of how this bug is triggered: - Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that contains a controller action. - When the first packet matches the aforementioned OpenFlow rule, during the miss upcall, OVS stores a pointer to the tun_table (that decodes the Geneve tunnel metadata) in a frozen state and pushes down a datapath flow into kernel datapath. - Issues a add-tlv-map command to reprogram the tun_table on OVS. OVS frees the old tun_table and create a new tun_table. - A subsequent packet hits the kernel datapath flow again. Since there is a controller action associated with that flow, it triggers slow path controller upcall. - In the slow path controller upcall, OVS derives the tun_table from the frozen state, which points to the old tun_table that is already being freed at this time point. - In order to access the tunnel metadata, OVS uses the invalid pointer that points to the old tun_table and triggers the core dump. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	tests/testsuite: Skip failing UT cases on aarch64	Malvika Gupta	2020-04-09	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following test cases are failing inconsistently on aarch64 platforms and have been skipped until further investigation can be made on how to fix them: 20: bfd.at:268 bfd - bfd decay 2104: ovsdb-idl.at:1815 Check Python IDL connects to leader - Python3 (leader only) 2105: ovsdb-idl.at:1816 Check Python IDL reconnects to leader - Python3 (leader only) Suggested-by: Yanqin Wei <Yanqin.Wei@arm.com> Suggested-by: Lance Yang <Lance.Yang@arm.com> Signed-off-by: Malvika Gupta <malvika.gupta@arm.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	tests/atlocal.in: Add check for aarch64 Architecture	Malvika Gupta	2020-04-09	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	This patch adds a condition to check if the CPU architecture is aarch64. If the condition evaluates to true, $IS_ARM64 variable is set to 'yes'. For all other architectures, this variable is set to 'no'. Reviewed-by: Yanqin Wei <Yanqin.wei@arm.com> Signed-off-by: Malvika Gupta <malvika.gupta@arm.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	userspace: Add GTP-U support.	William Tu	2020-03-25	3	-1/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GTP, GPRS Tunneling Protocol, is a group of IP-based communications protocols used to carry general packet radio service (GPRS) within GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling (GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used for setting up GTP-U protocol, which is an IP-in-UDP tunneling protocol. Usually GTP is used in connecting between base station for radio, Serving Gateway (S-GW), and PDN Gateway (P-GW). This patch implements GTP-U protocol for userspace datapath, supporting only required header fields and G-PDU message type. See spec in: https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00 Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784 Signed-off-by: Feng Yang <yangfengee04@gmail.com> Co-authored-by: Feng Yang <yangfengee04@gmail.com> Signed-off-by: Yi Yang <yangyi01@inspur.com> Co-authored-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
*	Handle refTable values with setkey()	Terry Wilson	2020-03-20	3	-1/+50
\| \| \| \| \| \| \| \| \| \| \| \|	For columns like QoS.queues where we have a map containing refTable values, assigning w/ __setattr__ e.g. qos.queues={1: $queue_row} works, but using using qos.setkey('queues', 1, $queue_row) results in an Exception. The opdat argument can essentially just be the JSON representation of the map column instead of trying to build it. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	ofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.	Ben Pfaff	2020-03-20	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recirculation usually requires finding the pre-recirculation input port. Packets sent by the controller, with in_port of OFPP_CONTROLLER or OFPP_NONE, do not have a real input port data structure, only a port number. The code in xlate_lookup_ofproto_() mishandled this case, failing to return the ofproto data structure. This commit fixes the problem and adds a test to guard against regression. Reported-by: Numan Siddique <numans@ovn.org> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368642.html Tested-by: Numan Siddique <numans@ovn.org> Acked-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	raft: Fix the problem of stuck in candidate role forever.	Han Zhou	2020-03-06	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes a server can stay in candidate role forever, even if the server already see the new leader and handles append-requests normally. However, because of the wrong role, it appears as disconnected from cluster and so the clients are disconnected. This problem happens when 2 servers become candidates in the same term, and one of them is elected as leader in that term. It can be reproduced by the test cases added in this patch. The root cause is that the current implementation only changes role to follower when a bigger term is observed (in raft_receive_term__()). According to the RAFT paper, if another candidate becomes leader with the same term, the candidate should change to follower. This patch fixes it by changing the role to follower when leader is being updated in raft_update_leader(). Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	raft: Fix raft_is_connected() when there is no leader yet.	Han Zhou	2020-03-06	1	-0/+35
\| \| \| \| \| \| \| \| \| \|	If there is never a leader known by the current server, it's status should be "disconnected" to the cluster. Without this patch, when a server in cluster is restarted, before it successfully connecting back to the cluster it will appear as connected, which is wrong. Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	ovsdb-server: Don't disconnect clients after raft install_snapshot.	Han Zhou	2020-03-06	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When "schema" field is found in read_db(), there can be two cases: 1. There is a schema change in clustered DB and the "schema" is the new one. 2. There is a install_snapshot RPC happened, which caused log compaction on the server and the next log is just the snapshot, which always constains "schema" field, even though the schema hasn't been changed. The current implementation doesn't handle case 2), and always assume the schema is changed hence disconnect all clients of the server. It can cause stability problem when there are big number of clients connected when this happens in a large scale environment. Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	netdev-dpdk: Remove deprecated ring port type.	Ilya Maximets	2020-03-06	3	-208/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	'dpdkr' ring ports was deprecated in 2.13 release and was not actually used for a long time. Remove support now. More details in commit b4c5f00c339b ("netdev-dpdk: Deprecate ring ports.") Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev.at: VLAN id modification test for ARP partial HW offloading.	Eli Britstein	2020-02-28	1	-0/+77
\| \| \| \| \| \| \| \| \|	Follow up to commit eb540c0f5fc8 ("flow: Fix parsing l3_ofs with partial offloading") that fixed the issue, add a unit-test for it. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	dpif-netdev.at: Fix partial offloading test cases failure.	Yanqin Wei	2020-02-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some partial offloading test cases are failing inconsistently. The root cause is that dummy netdev is assigned with "linux_tc" offloading API. dpif-netdev - partial hw offload - dummy dpif-netdev - partial hw offload - dummy-pmd dpif-netdev - partial hw offload with packet modifications - dummy dpif-netdev - partial hw offload with packet modifications - dummy-pmd This patch fixes this issue by changing 'options:ifindex=1' to some big value. It is a workaround to make "linux_tc" init flow api failure. All above cases can pass consistently after applying this patch. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Gavin Hu <Gavin.Hu@arm.com> Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com> Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
*	conntrack: Fix conntrack new state	Yi-Hung Wei	2020-01-29	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In connection tracking system, a connection is established if we see packets from both directions. However, in userspace datapath's conntrack, if we send a connection setup packet in one direction twice, it will make the connection to be in established state. This patch fixes the aforementioned issue, and adds a system traffic test for UDP and TCP traffic to avoid regression. Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
*	Typo fix: vswtch -> vswitch.	Ben Pfaff	2020-01-17	1	-1/+1
\| \| \| \| \|	Acked-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.	Ben Pfaff	2020-01-16	2	-2/+40
\| \| \| \| \| \|	Acked-by: Han Zhou <hzhou@ovn.org> Requested-by: Leonid Ryzhyk <lryzhyk@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
*	tests: introduced tests for adding/deleting logical routers in VTEP database	Damijan Skvarc	2020-01-07	1	-0/+87
\| \| \| \| \| \| \| \|	New tests were introduced based on lcov report, which reveals apparent code is not covered by ovs test suites. Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>