| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
There is one remaining use under datapath. That change should happen
upstream in Linux first according to our usual policy.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
|
|
|
|
|
|
|
|
|
| |
When running OVSDB cluster tests on Windows not all the ovsdb processes
are terminated. Queue up the pids of the started processes for
termination when the test stops.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ovs-router module checks for the source ip address of the interface
while adding a new route. netdev module doesn't request ip addresses
from the system every time, but instead it caches currently assigned
ip addresses and updates the cache on netlink notifications if needed.
So, there is a slight delay between setting ip address on interface
in a system and a moment OVS updates list of ip addresses of this
interface. If route addition happens within this time frame, it
fails with the following error:
# ovs-appctl ovs/route/add 10.0.0.0/24 br-p1
Error while inserting route.
ovs-appctl: ovs-vswitchd: server returned an error
This makes system tests to fail frequently.
Let's wait until local route successfully added. This will mean
that OVS finished processing of a netlink event and will use up to
date list of ip addresses on desired interface.
Fixes: 526cf4e1d6a8 ("tests: Added unit tests in packet-type-aware.at")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When running OVSDB cluster tests on Windows not all the ovsdb
processes are terminated.
Strip carriage return and newline of the arguments passed to the kill
command because they will cause problems when passing them to tasklist
and taskkill.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test 'Check Python IDL reconnects to leader - Python3
(leader only)' fails sometimes when the first ovsdb-server
gets killed before the others had joined the cluster.
Fix the function ovsdb_cluster_start_idltest to wait them
to join the cluster.
Fixes: c39751e44539 ("python: Monitor Database table to manage lifecycle of IDL client.")
Co-authored-by:: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Builds on RHEL 8.2 systems are failing due to this issue.
See [1] as to why this is necessary.
I used the following command to identify files that need this fix:
find . -type f -executable | /usr/lib/rpm/redhat/brp-mangle-shebangs
I also updated the copyright notices as needed.
1. https://fedoraproject.org/wiki/Changes/Make_ambiguous_python_shebangs_error
Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.")
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Packets in the benchmark must be treated as new packets, i.e. they
should not have conntrack metadata set. Current code will set up
'pkt->md.conn' after the first run and all subsequent calls will hit
the 'fast' processing that is intended for recirculated packets making
a false impression that current conntrack implementation is lightning
fast.
Before the change:
$ ./ovstest test-conntrack benchmark 4 33554432 32 1
conntrack: 1059 ms
After (correct):
$ ./ovstest test-conntrack benchmark 4 33554432 32 1
conntrack: 92785 ms
Fixes: 594570ea1cde ("conntrack: Optimize recirculations.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When commit a0baa7dfa4fe ("connmgr: Make treatment of active and passive
connections more uniform") was applied, it did not take into account
that a reconfiguration of the allowed_versions setting would require a
reload of the ofservice object (only accomplished via a restart of OvS).
For now, during the reconfigure cycle, we delete the ofservice object and
then recreate it immediately. A new test is added to ensure we do not
break this behavior again.
Fixes: a0baa7dfa4fe ("connmgr: Make treatment of active and passive connections more uniform")
Suggested-by: Ben Pfaff <blp@ovn.org>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1782834
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Numan Siddique <numans@ovn.org>
Tested-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time a follower has to install a snapshot received from the
leader, it should also replace the data in memory. Right now this only
happens when snapshots are installed that also change the schema.
This can lead to inconsistent DB data on follower nodes and the snapshot
may fail to get applied.
Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While committing set() actions, commit() could wildcard all the fields
that are same in match key and in the set action. This leads to
situation where mask after commit could actually contain less bits
than it was before. And if set action was partially committed, all
the fields that were the same will be cleared out from the matching key
resulting in the incorrect (too wide) flow.
For example, for the flow that matches on both src and dst mac
addresses, if the dst mac is the same and only src should be changed
by the set() action, destination address will be wildcarded in the
match key and will never be matched, i.e. flows with any destination
mac will match, which is not correct.
Setting OF rule:
in_port=1,dl_src=50:54:00:00:00:09 actions=mod_dl_dst(50:54:00:00:00:0a),output(2)
Sending following packets on port 1:
1. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800)
2. eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),eth_type(0x0800)
3. eth(src=50:54:00:00:00:0b,dst=50:54:00:00:00:0c),eth_type(0x0800)
Resulted datapath flows:
eth(dst=50:54:00:00:00:0c),<...>, actions:set(eth(dst=50:54:00:00:00:0a)),2
eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2
The first flow doesn't have any match on source MAC address and the
third packet successfully matched on it while it must be dropped.
Fix that by updating the match mask with only the new bits set by
commit(), but keeping those that were cleared (OR operation).
With fix applied, resulted correct flows are:
eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),<...>, actions:2
eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0c),<...>,
actions:set(eth(dst=50:54:00:00:00:0a)),2
eth(src=50:54:00:00:00:0b),<...>, actions:drop
The code before commit dbf4a92800d0 was not able to reduce the mask,
it was only possible to expand it to exact match, so it was OK to
update original matching mask with the new value in all cases.
Fixes: dbf4a92800d0 ("odp-util: Do not rewrite fields with the same values as matched")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1854376
Acked-by: Eli Britstein <elibr@mellanox.com>
Tested-by: Adrián Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current OVS intercepts and processes all BFD packets, thus VM-2-VM
BFD packets get lost and the recipient VM never sees them.
This patch fixes it by only intercepting and processing BFD packets
destined to a configured BFD instance, and other BFD packets are made
available to the OVS flow table for forwarding.
This patch keeps BFD's backward compatibility.
VMware-BZ: #2579326
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Certain Linux distributions, like CentOS, have default iptable
rules to reject input traffic from br-underlay. Refactor by
creating a macro 'IPTABLES_ACCEPT([bridge])' for adding the
accept rule to the iptable input chain.
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some tests checks for 'miss upcall' log in a log file immediately
after sending the packet, this causes test failures while running
them under valgrind or on the overloaded system.
Fix that by waiting for appearance of the actual string in the log
file. Some other tests uses 'sleep 1' to fix that, but it's better
to wait for event than sleep for a specific amount of time.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 8e4e45887ec3, priority of 'local' route entries no
longer matches with 'plen'. This should be taken into account
while flushing cached routes, otherwise they will remain in OVS
even after removing them from the system:
# ifconfig eth0 11.0.0.1
# ovs-appctl ovs/route/show
--- A new route synchronized from kernel route table ---
Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local
# ifconfig eth0 0
# ovs-appctl ovs/route/show
-- the new route entry is still in ovs route table ---
Cached: 11.0.0.1/32 dev eth0 SRC 11.0.0.1 local
CC: wenxu <wenxu@ucloud.cn>
Fixes: 8e4e45887ec3 ("ofproto-dpif-xlate: makes OVS native tunneling honor tunnel-specified source addresses")
Reported-by: Zheng Jingzhou <glovejmm@163.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-July/373093.html
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In AB bonding, if the current active slave becomes disabled, a
replacement slave is arbitrarily picked from the remaining set of
enabled slaves. This commit adds the concept of a "primary" slave: an
interface that will always be (or become) the current active slave if
it is enabled.
The rationale for this functionality is to allow the designation of a
preferred interface for a given bond. For example:
1. Bond is created with interfaces p1 (primary) and p2, both enabled.
2. p1 becomes the current active slave (because it was designated as
the primary).
3. Later, p1 fails/becomes disabled.
4. p2 is chosen to become the current active slave.
5. Later, p1 becomes re-enabled.
6. p1 is chosen to become the current active slave (because it was
designated as the primary)
Note that p1 becomes the active slave once it becomes re-enabled, even
if nothing has happened to p2.
This "primary" concept exists in Linux kernel network interface
bonding, but did not previously exist in OVS bonding.
Only one primary slave interface is supported per bond, and is only
supported for active/backup bonding.
The primary slave interface is designated via
"other_config:bond-primary" when creating a bond.
Also, while adding tests for the "primary" concept, make a few small
improvements to the non-primary AB bonding test.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Zero flow mark is used to indicate the HW to remove the mark. A packet
marked with zero mark is received in SW without a mark at all, so it
cannot be used as a valid mark. Change the pool range to fix it.
Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
As offload is done using the mega ufid of a flow, for better
debugability, add it in the log message.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When lb-output-action is toggled back to "false" buckets are not being
deleted. Delete them as they will no longer be used.
Add unit test to verify buckets are correctly deleted.
Cc: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accoridng to vswitch.ovsschema, each CT_Zone record may have
zero or one associcated CT_Timeout_policy. Thus, this patch
checks if ovsrec_ct_timeout_policy exist before accesses the
record.
VMWare-BZ: 2585825
Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.")
Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Extend the balance-tcp one so it tests lb-output action too.
The test checks that that the option is shown in bond/show,
and that the lb_output action is programmed in the datapath.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:
1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.
2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.
Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.
Proposed optimization:
This proposal introduces a new load-balancing output action instead of
recirculation.
Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.
Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.
Instead, we will now generate action as
'lb_output(<bond id>)'
This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).
Example:
Current scheme:
With 8 UDP flows (with random UDP src port):
flow-dump from pmd on cpu core: 2
recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)
recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1
New scheme:
We can do with a single flow entry (for any number of new flows):
in_port(7),<...> actions:lb_output(1)
A new CLI has been added to dump datapath bond cache as given below.
# ovs-appctl dpif-netdev/bond-show [dp]
Bond cache:
bond-id 1 :
bucket 0 - slave 2
bucket 1 - slave 1
bucket 2 - slave 2
bucket 3 - slave 1
Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cited commit intended to add tc support for masking tunnel src/dst
ips and ports. It's not possible to do tunnel ports masking with
openflow rules and the default mask for tunnel ports set to 0 in
tnl_wc_init(), unlike tunnel ports default mask which is full mask.
So instead of never passing tunnel ports to tc, revert the changes
to tunnel ports to always pass the tunnel port.
In sw classification is done by the kernel, but for hw we must match
the tunnel dst port.
Fixes: 5f568d049130 ("netdev-offload-tc: Allow to match the IP and port mask of tunnel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When ofproto/trace detects a recirc action it resumes execution at the
specified next table. However, if the ct action performs SNAT/DNAT,
e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
ports in the oftrace_recirc_node->flow field are not updated. This leads
to misleading outputs from ofproto/trace as real packets would actually
first get NATed and might match different flows when recirculated.
Assume the first IP/port from the NAT src/dst action will be used by
conntrack for the translation and update the oftrace_recirc_node->flow
accordingly. This is not entirely correct as conntrack might choose a
different IP/port but the result is more realistic than before.
This fix covers new connections. However, for reply traffic that executes
actions of the form ct(nat, table=42) we still don't update the flow as
we don't have any information about conntrack state when tracing.
Also move the oftrace_recirc_node processing out of ofproto_trace()
and to its own function, ofproto_trace_recirc_node() for better
readability/
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
(i.e., "monitor_cond_since" method) with the initial monitor condition
MC1.
Assuming the following two transactions are executed on the
ovsdb-server:
TXN1: "insert record R1 in table T1"
TXN2: "insert record R2 in table T2"
If the client's monitor condition MC1 for table T2 matches R2 then the
client will receive the following update3 message:
method="update3", "insert record R2 in table T2", last-txn-id=TXN2
At this point, if the presence of the new record R2 in the IDL triggers
the client to update its monitor condition to MC2 and add a clause for
table T1 which matches R1, a monitor_cond_change message is sent to the
server:
method="monitor_cond_change", "clauses from MC2"
In normal operation the ovsdb-server will reply with a new update3
message of the form:
method="update3", "insert record R1 in table T1", last-txn-id=TXN2
However, if the connection drops in the meantime, this last update might
get lost.
It might happen that during the reconnect a new transaction happens
that modifies the original record R1:
TXN3: "modify record R1 in table T1"
When the client reconnects, it will try to perform a fast resync by
sending:
method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2
Because TXN2 is still in the ovsdb-server transaction history, the
server replies with the changes from the most recent transactions only,
i.e., TXN3:
result="true", last-txbb-id=TXN3, "modify record R1 in table T1"
This causes the IDL on the client in to end up in an inconsistent
state because it has never seen the update that created R1.
Such a scenario is described in:
https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22
To avoid this issue, the IDL will now maintain (up to) 3 different
types of conditions for each DB table:
- new_cond: condition that has been set by the IDL client but has
not yet been sent to the server through monitor_cond_change.
- req_cond: condition that has been sent to the server but the reply
acknowledging the change hasn't been received yet.
- ack_cond: condition that has been acknowledged by the server.
Whenever the IDL FSM is restarted (e.g., voluntary or involuntary
disconnect):
- if there is a known last_id txn-id the code ensures that new_cond
will contain the most recent condition set by the IDL client
(either req_cond if there was a request in flight, or new_cond
if the IDL client set a condition while the IDL was disconnected)
- if there is no known last_id txn-id the code ensures that ack_cond will
contain the most recent conditions set by the IDL client regardless
whether they were acked by the server or not.
When monitor_cond_since/monitor_cond requests are sent they will
always include ack_cond and if new_cond is not NULL a follow up
monitor_cond_change will be generated afterwards.
On the other hand ovsdb_idl_db_set_condition() will always modify new_cond.
This ensures that updates of type "insert" that happened before the last
transaction known by the IDL but didn't match old monitor conditions are
sent upon reconnect if the monitor condition has changed to include them
in the meantime.
Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows users to offload the TC flower rules with
tunnel mask. This patch allows masked match of the following,
where previously supported an exact match was supported:
* Remote (dst) tunnel endpoint address
* Local (src) tunnel endpoint address
* Remote (dst) tunnel endpoint UDP port
And also allows masked match of the following, where previously
no match was supported:
* Local (src) tunnel endpoint UDP port
In some case, mask is useful as wildcards. For example, DDOS,
in that case, we don’t want to allow specified hosts IPs or
only source Ports to access the targeted host. For example:
$ ovs-appctl dpctl/add-flow "tunnel(dst=2.2.2.100,src=2.2.2.0/255.255.255.0,tp_dst=4789),\
recirc_id(0),in_port(3),eth(),eth_type(0x0800),ipv4()" ""
$ tc filter show dev vxlan_sys_4789 ingress
...
eth_type ipv4
enc_dst_ip 2.2.2.100
enc_src_ip 2.2.2.0/24
enc_dst_port 4789
enc_ttl 64
in_hw in_hw_count 2
action order 1: gact action drop
...
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently classifier tries and n_tries can be updated not atomically,
there is a race condition which can lead to NULL dereference.
The race can happen when main thread updates a classifier tries and
n_tries in classifier_set_prefix_fields() and at the same time revalidator
or handler thread try to lookup them in classifier_lookup__(). Such race
can be triggered when user changes prefixes of flow_table.
Race(user changes flow_table prefixes: ip_dst,ip_src => none):
[main thread] [revalidator/handler thread]
===========================================================
/* cls->n_tries == 2 */
for (int i = 0; i < cls->n_tries; i++) {
trie_init(cls, i, NULL);
/* n_tries == 0 */
cls->n_tries = n_tries;
/* cls->tries[i]->feild is NULL */
trie_ctx_init(&trie_ctx[i],&cls->tries[i]);
/* trie->field is NULL */
ctx->be32ofs = trie->field->flow_be32ofs;
To prevent the race, instead of re-introducing internal mutex
implemented in the commit fccd7c092e09 ("classifier: Remove internal
mutex."), this patch makes trie field RCU protected and checks it after
read.
Fixes: fccd7c092e09 ("classifier: Remove internal mutex.")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang reports:
tests/oss-fuzz/miniflow_target.c:209:26: error: suggest braces around \
initialization of subobject
[-Werror,-Wmissing-braces]
struct flow flow2 = {0};
Fix it by using memset.
Cc: Bhargava Shastry <bshastry@sect.tu-berlin.de>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
From man ovs-fields(7), the conntrack origin tuple fields
ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed
to be bitwise maskable, but they are not. This patch enables
those fields to be maskable, and adds a regression test.
Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
Reported-by: Wenying Dong <wenyingd@vmware.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Similar to using veth across namespaces, this patch creates
tap devices, assigns to namespaces, and allows traffic to
go through different test cases.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables TSO support for non-DPDK use cases, and
also add check-system-tso testsuite. Before TSO, we have to
disable checksum offload, allowing the kernel to calculate the
TCP/UDP packet checsum. With TSO, we can skip the checksum
validation by enabling checksum offload, and with large packet
size, we see better performance.
Consider container to container use cases:
iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
And I got around 6Gbps, similar to TSO with DPDK-enabled.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
policy support") adds conntrack timeout policy for kernel datapath.
This patch enables support for the userspace datapath. I tested
using the 'make check-system-userspace' which checks the timeout
policies for ICMP and UDP cases.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new OpenFlow action, delete field, to delete a
field in packets. Currently, only the tun_metadata fields are
supported.
One use case to add this action is to support multiple versions
of geneve tunnel metadatas to be exchanged among different versions
of networks. For example, we may introduce tun_metadata2 to
replace old tun_metadata1, but still want to provide backward
compatibility to the older release. In this case, in the new
OpenFlow pipeline, we would like to support the case to receive a
packet with tun_metadata1, do some processing. And if the packet
is going to a switch in the newer release, we would like to delete
the value in tun_metadata1 and set a value into tun_metadata2.
Currently, ovs does not provide an action to remove a value in
tun_metadata if the value is present. This patch fulfills the gap
by adding the delete_field action. For example, the OpenFlow
syntax to delete tun_metadata1 is:
actions=delete_field:tun_metadata1
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp
interrupt mode. This is similar to 'type=afxdp', except that the
is_pmd field is set to false. As a result, the packet processing
is handled by main thread, not pmd thread. This avoids burning
the CPU to always 100% when there is no traffic.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enhances a system traffic test to prevent regression on
the tunnel metadata table (tun_table) handling with frozen state.
Without a proper fix this test can crash ovs-vswitchd due to a
use-after-free bug on tun_table.
These are the timed sequence of how this bug is triggered:
- Adds an OpenFlow rule in OVS that matches Geneve tunnel metadata that
contains a controller action.
- When the first packet matches the aforementioned OpenFlow rule,
during the miss upcall, OVS stores a pointer to the tun_table (that
decodes the Geneve tunnel metadata) in a frozen state and pushes down
a datapath flow into kernel datapath.
- Issues a add-tlv-map command to reprogram the tun_table on OVS.
OVS frees the old tun_table and create a new tun_table.
- A subsequent packet hits the kernel datapath flow again. Since
there is a controller action associated with that flow, it triggers
slow path controller upcall.
- In the slow path controller upcall, OVS derives the tun_table
from the frozen state, which points to the old tun_table that is
already being freed at this time point.
- In order to access the tunnel metadata, OVS uses the invalid
pointer that points to the old tun_table and triggers the core dump.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following test cases are failing inconsistently on aarch64 platforms and
have been skipped until further investigation can be made on how to fix them:
20: bfd.at:268 bfd - bfd decay
2104: ovsdb-idl.at:1815 Check Python IDL connects to leader - Python3 (leader only)
2105: ovsdb-idl.at:1816 Check Python IDL reconnects to leader - Python3 (leader only)
Suggested-by: Yanqin Wei <Yanqin.Wei@arm.com>
Suggested-by: Lance Yang <Lance.Yang@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a condition to check if the CPU architecture is aarch64. If the
condition evaluates to true, $IS_ARM64 variable is set to 'yes'. For all other
architectures, this variable is set to 'no'.
Reviewed-by: Yanqin Wei <Yanqin.wei@arm.com>
Signed-off-by: Malvika Gupta <malvika.gupta@arm.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For columns like QoS.queues where we have a map containing refTable
values, assigning w/ __setattr__ e.g. qos.queues={1: $queue_row}
works, but using using qos.setkey('queues', 1, $queue_row) results
in an Exception. The opdat argument can essentially just be the
JSON representation of the map column instead of trying to build
it.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recirculation usually requires finding the pre-recirculation input port.
Packets sent by the controller, with in_port of OFPP_CONTROLLER or
OFPP_NONE, do not have a real input port data structure, only a port
number. The code in xlate_lookup_ofproto_() mishandled this case,
failing to return the ofproto data structure. This commit fixes the
problem and adds a test to guard against regression.
Reported-by: Numan Siddique <numans@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368642.html
Tested-by: Numan Siddique <numans@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometimes a server can stay in candidate role forever, even if the server
already see the new leader and handles append-requests normally. However,
because of the wrong role, it appears as disconnected from cluster and
so the clients are disconnected.
This problem happens when 2 servers become candidates in the same
term, and one of them is elected as leader in that term. It can be
reproduced by the test cases added in this patch.
The root cause is that the current implementation only changes role to
follower when a bigger term is observed (in raft_receive_term__()).
According to the RAFT paper, if another candidate becomes leader with
the same term, the candidate should change to follower.
This patch fixes it by changing the role to follower when leader
is being updated in raft_update_leader().
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
If there is never a leader known by the current server, it's status
should be "disconnected" to the cluster. Without this patch, when
a server in cluster is restarted, before it successfully connecting
back to the cluster it will appear as connected, which is wrong.
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When "schema" field is found in read_db(), there can be two cases:
1. There is a schema change in clustered DB and the "schema" is the new one.
2. There is a install_snapshot RPC happened, which caused log compaction on the
server and the next log is just the snapshot, which always constains "schema"
field, even though the schema hasn't been changed.
The current implementation doesn't handle case 2), and always assume the schema
is changed hence disconnect all clients of the server. It can cause stability
problem when there are big number of clients connected when this happens in
a large scale environment.
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'dpdkr' ring ports was deprecated in 2.13 release and was not
actually used for a long time. Remove support now.
More details in
commit b4c5f00c339b ("netdev-dpdk: Deprecate ring ports.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
| |
Follow up to commit eb540c0f5fc8 ("flow: Fix parsing l3_ofs with partial
offloading") that fixed the issue, add a unit-test for it.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some partial offloading test cases are failing inconsistently. The root
cause is that dummy netdev is assigned with "linux_tc" offloading API.
dpif-netdev - partial hw offload - dummy
dpif-netdev - partial hw offload - dummy-pmd
dpif-netdev - partial hw offload with packet modifications - dummy
dpif-netdev - partial hw offload with packet modifications - dummy-pmd
This patch fixes this issue by changing 'options:ifindex=1' to some big
value. It is a workaround to make "linux_tc" init flow api failure. All
above cases can pass consistently after applying this patch.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Gavin Hu <Gavin.Hu@arm.com>
Reviewed-by: Lijian Zhang <Lijian.Zhang@arm.com>
Signed-off-by: Yanqin Wei <Yanqin.Wei@arm.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In connection tracking system, a connection is established if we
see packets from both directions. However, in userspace datapath's
conntrack, if we send a connection setup packet in one direction
twice, it will make the connection to be in established state.
This patch fixes the aforementioned issue, and adds a system traffic
test for UDP and TCP traffic to avoid regression.
Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
| |
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
| |
Acked-by: Han Zhou <hzhou@ovn.org>
Requested-by: Leonid Ryzhyk <lryzhyk@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
| |
New tests were introduced based on lcov report, which reveals apparent code
is not covered by ovs test suites.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|