| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, bundle->cvlans and xbundle->cvlans are pointing to the
same memory location. This can cause issues if the main thread
modifies bundle->cvlans and frees it while the revalidator thread
is still accessing xbundle->cvlans. This leads to use-after-free
error.
AddressSanitizer: heap-use-after-free on address 0x615000007b08 at
pc 0x0000004ede1e bp 0x7f3120ee0310 sp 0x7f3120ee0300
READ of size 8 at 0x615000007b08 thread T25 (revalidator25)
0 0x4ede1d in bitmap_is_set lib/bitmap.h:91
1 0x4fcb26 in xbundle_allows_cvlan ofproto/ofproto-dpif-xlate.c:2028
2 0x4fe279 in input_vid_is_valid ofproto/ofproto-dpif-xlate.c:2294
3 0x502abf in xlate_normal ofproto/ofproto-dpif-xlate.c:3051
4 0x5164dc in xlate_output_action ofproto/ofproto-dpif-xlate.c:5361
5 0x522576 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7047
6 0x52a751 in xlate_actions ofproto/ofproto-dpif-xlate.c:8061
7 0x4e2b66 in xlate_key ofproto/ofproto-dpif-upcall.c:2212
8 0x4e2e13 in xlate_ukey ofproto/ofproto-dpif-upcall.c:2227
9 0x4e345d in revalidate_ukey__ ofproto/ofproto-dpif-upcall.c:2276
10 0x4e3f85 in revalidate_ukey ofproto/ofproto-dpif-upcall.c:2395
11 0x4e7ac5 in revalidate ofproto/ofproto-dpif-upcall.c:2858
12 0x4d9ed3 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:1010
13 0x7cd92e in ovsthread_wrapper lib/ovs-thread.c:423
14 0x7f312ff01f3a (/usr/lib64/libpthread.so.0+0x8f3a)
15 0x7f312fc8f51f in clone (/usr/lib64/libc.so.6+0xf851f)
0x615000007b08 is located 8 bytes inside of 512-byte region
[0x615000007b00,0x615000007d00)
freed by thread T0 here:
0 0x7f3130378ad8 in free (/usr/lib64/libasan.so.4+0xe0ad8)
1 0x49044e in bundle_set ofproto/ofproto-dpif.c:3431
2 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455
3 0x40e6c9 in port_configure vswitchd/bridge.c:1300
4 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921
5 0x41f1a9 in bridge_run vswitchd/bridge.c:3313
6 0x42d4fb in main vswitchd/ovs-vswitchd.c:132
7 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86)
previously allocated by thread T0 here:
0 0x7f3130378e70 in __interceptor_malloc
1 0x8757fe in xmalloc__ lib/util.c:140
2 0x8758da in xmalloc lib/util.c:175
3 0x875927 in xmemdup lib/util.c:188
4 0x475f63 in bitmap_clone lib/bitmap.h:79
5 0x47797c in vlan_bitmap_clone lib/vlan-bitmap.h:40
6 0x49048d in bundle_set ofproto/ofproto-dpif.c:3433
7 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455
8 0x40e6c9 in port_configure vswitchd/bridge.c:1300
9 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921
10 0x41f1a9 in bridge_run vswitchd/bridge.c:3313
11 0x42d4fb in main vswitchd/ovs-vswitchd.c:132
12 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86)
Fixes: fed8962aff57 ("Add new port VLAN mode "dot1q-tunnel"")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ofproto layer currently treats nw_proto field as overloaded to mean
both that a proper nw layer exists, as well as the value contained in
the header for the nw proto. However, this is incorrect behavior as
relevant standards permit that any value, including '0' should be treated
as a valid value.
Because of this overload, when the ofproto layer builds action list for
a packet with nw_proto of 0, it won't build the complete action list that
we expect to be built for the packet. That will cause a bad behavior
where all packets passing the datapath will fall into an incomplete
action set.
The fix here is to unwildcard nw_proto, allowing us to preserve setting
actions for protocols which we know have support for the actions we
program. This means that a traffic which contains nw_proto == 0 cannot
cause connectivity breakage with other traffic on the link.
Reported-by: David Marchand <dmarchand@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SRv6 (Segment Routing IPv6) tunnel vport is responsible
for encapsulation and decapsulation the inner packets with
IPv6 header and an extended header called SRH
(Segment Routing Header). See spec in:
https://datatracker.ietf.org/doc/html/rfc8754
This patch implements SRv6 tunneling in userspace datapath.
It uses `remote_ip` and `local_ip` options as with existing
tunnel protocols. It also adds a dedicated `srv6_segs` option
to define a sequence of routers called segment list.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OVS now just allow the ARP Reply which the destination address is matched
against the known xbridge addresses to update tunnel neighbor. So when OVS
receive the gratuitous ARP from underlay gateway which the source address
and destination address are all gateway IP, tunnel neighbor will not be updated.
Fixes: ba07cf222a0c ("Handle gratuitous ARP requests and replies in tnl_arp_snoop()")
Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to terminate_native_tunnel()")
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Han Ding <handing@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OVN can (and indeed does) set in_port to OFPP_NONE during
the pipeline evaluation. If a sample action follows, it
will be incorrectly skipped.
Per-flow sampling version of:
f0a9000ca ofproto: Fix ipfix not always sampling on egress.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenFlow specification says that buckets in select groups with a weight
of zero should not be selected, but the ofproto-dpif implementation could
select them in corner cases. This fixes the problem.
Reported-by: ychen <ychen103103@163.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359349.html
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When OVS fails to find an OpenFlow port for a packet received
from the upcall it just prints the warning like this:
|INFO|received packet on unassociated datapath port N
However, during the flow translation more information is available
as if the recirculation id wasn't found or it was a packet from
unknown tunnel port. Printing that information might be useful
to understand the origin of the problem.
Port translation functions already support extended error strings,
we just need to pass a variable where to store them.
With the change the output may be:
|INFO|received packet on unassociated datapath port N
(no OpenFlow port for datapath port N)
or
|INFO|received packet on unassociated datapath port N
(no OpenFlow tunnel port for this packet)
or
|INFO|received packet on unassociated datapath port N
(no recirculation data for recirc_id M)
Unfortunately, there is no good way to trigger this code from
current unit tests.
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the PACKET_OUT from controller ends up with sending packet to
a bond interface, the main thread will take locks in the following
order:
handle_openflow
--> take ofproto_mutex
handle_packet_out
packet_xlate
output_normal
bond_update_post_recirc_rules
--> take rwlock in bond.c
If at the same time revalidator thread is processing other packet
with the output to the same bond:
xlate_actions
output_normal
bond_update_post_recirc_rules
--> take rwlock in bond.c
update_recirc_rules
ofproto_dpif_add_internal_flow
ofproto_flow_mod
--> take ofproto_mutex
So, it is possible for these 2 threads to lock each other by
taking one lock and waiting for another thread to release the
second lock.
It is also possible for the main thread to lock itself up by trying
to acquire ofproto_mutex for the second time, if it will actually
proceed with update_recirc_rules() after taking the bond rwlock.
The problem appears to be that bond_update_post_recirc_rules()
is called during the flow translation even if side effects are
prohibited, which is the case for openflow PACKET_OUT handling.
Skipping actual flow updates during the flow translation if
side effects are disabled to avoid the deadlock.
Since flows are not installed now when actions translated for
very first packet, installing initial flows in bond_reconfigure().
This will cover the case of allocating a new recirc_id.
Also checking if we need to update flows in bond_run() to cover
link state changes.
Regression test is added to catch the double lock case.
Reported-at: https://github.com/openvswitch/ovs-issues/issues/259
Reported-by: Daniel Ding <zhihui.ding@easystack.cn>
Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using recirculation")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of commands in ovs-ofctl and ovs-appctl can display port names
instead of port numbers by using --names option. This change adds
similar functionality to ofproto/trace.
For backward compatibility, the default behavior is the same as
before.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
If tnl_port_should_receive() is false, we're looking for a normal
port, not a tunnel one. And it's better to print recirculation IDs
in hex since they are typically printed this way in flow dumps.
Fixes: d40533fc820c ("odp-util: Improve log messages and error reporting for Netlink parsing.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When OFPROTO non-reversible actions are translated to data plane
actions, the only thing looked at is if there are more actions
pending. If this is the case, the action is encapsulated in a
clone().
This could lead to unnecessary clones if no meaningful data
plane actions are added. For example, the register pop in the
included test case.
The best solution would probably be to build the full action
path and determine if the clone is needed. However, this would
be a huge change in the existing design, so for now, we just try
to optimize the generated datapath flow. We can revisit this
later, as some of the pending CT issues might need this rework.
Fixes: feee58b9587f ("ofproto-dpif-xlate: Keep track of the last action")
Fixes: dadd8357f224 ("ofproto-dpif: Fix issue with non-reversible actions on a patch ports.")
Acked-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
A packet go through the encap openflow(set_field tun_id/src/dst)
The tunnel wc bits will be set. But it should be clear if the
original packet is non-tunnel. It is not necessary for datapath
wc the tunnel info for match(like the similar logic for vlan).
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When OVS sees a tunnel push with a nested list next, it will not
clone the packet, as a clone is not needed. However, a clone action will
still be created with the tunnel push encapsulated inside. There is no
need to create the clone action in this case, as extra parsing will need
to be performed, which is less efficient.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some circumstances a flow may get its ct_state set without
conscious intervention by the OVS user space code.
Commit 355fef6f2ccbc optimizes out unnecessary ct_clear actions
based on an internal struct xlate_ctx->conntracked state flag.
Before this commit the xlate_ctx->conntracked state flag would
be initialized to 'false' and only set during thawing for
recirculation.
This patch checks the flow ct_state for the non-recirc case and
sets the internal conntracked state appropriately. A system
traffic test is also added to avoid regression.
Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath actions.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c645550bb249 ("odp-util: Always report
ODP_FIT_TOO_LITTLE for IGMP.")
Always forcing a slow path action can result in some over-broad
flows which swallow all traffic and force them to userspace, as reported
in the thread at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387706.html
and at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387689.html
Revert the ODP_FIT_TOO_LITTLE return for IGMP specifically.
Additionally, remove the userspace wc mask to prevent revalidator from
cycling flows.
Fixes: c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During native tunnel encapsulation process, on tunnel neighbor cache
miss OVS sends an arp/nd request. Currently, tunnel source is used
as arp spa.
Find the spa which has the same subnet with the nexthop of tunnel dst
on egress port, if false, use the tunnel src as spa.
For example:
tunnel src is a vip with 10.0.0.7/32, tunnel dst is 10.0.1.7
the br-phy with address 192.168.0.7/24 and the default gateway is 192.168.0.1
So the spa of arp request for 192.168.0.1 should be 192.168.0.7 but not 10.0.0.7
Signed-off-by: wenxu <wenxu@chinatelecom.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UB Sanitizer reports:
tests/test-hash.c:59:40:
runtime error: shift exponent 64 is too large for 64-bit type
'long unsigned int'
0 0x44c3c9 in get_range128 tests/test-hash.c:59
1 0x44cb2e in check_hash_bytes128 tests/test-hash.c:178
2 0x44d14d in test_hash_main tests/test-hash.c:282
[...]
ofproto/ofproto-dpif-xlate.c:5607:45:
runtime error: left shift of 65535 by 16 places cannot be represented
in type 'int'
0 0x53fe9f in xlate_sample_action ofproto/ofproto-dpif-xlate.c:5607
1 0x54d625 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7160
2 0x553b76 in xlate_actions ofproto/ofproto-dpif-xlate.c:7806
3 0x4fcb49 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1237
4 0x4fe02f in process_upcall ofproto/ofproto-dpif-upcall.c:1456
5 0x4fda99 in upcall_cb ofproto/ofproto-dpif-upcall.c:1358
[...]
tests/test-util.c:89:23:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
0 0x476415 in test_ctz tests/test-util.c:89
[...]
lib/dpif-netlink.c:396:33:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
0 0x571b9f in dpif_netlink_open lib/dpif-netlink.c:396
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the revalidation/upcall, mirror could be removed. Instead of crash
the process, we can simply skip the deleted mirror.
The issue had been triggered multiple times by ovs-tcpdump in my test.
Fixes: ec7ceaed4f3e ("ofproto-dpif: Modularize mirror code.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a packet is received over an access port that needs to be sent
over a vxlan tunnel,the access port VLAN id is used in the lookup
leading to a wrong packet being crafted and sent over the tunnel.
Clear out the flow 's VLAN field as it should not be used while
performing mac lookup for the outer tunnel and also at this point
the VLAN action related to inner flow is already committed.
Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393566.html
Reported-at: https://bugzilla.redhat.com/2060552
Signed-off-by: Thilak Raj Surendra Babu <thilakraj.sb@nutanix.com>
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Co-authored-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A packet received from a tunnel port with legacy_l3 packet-type (e.g.
lisp, L3 gre, gtpu) is conceptually wrapped in a dummy Ethernet header
for processing in an OF pipeline that is not packet-type-aware. Before
transmission of the packet to another legacy_l3 tunnel port, the dummy
Ethernet header is stripped again.
In ofproto-xlate, wrapping in the dummy Ethernet header is done by
simply changing the packet_type to PT_ETH. The generation of the
push_eth datapath action is deferred until the packet's flow changes
need to be committed, for example at output to a normal port. The
deferred Ethernet encapsulation is marked in the pending_encap flag.
This patch fixes a bug in the translation of the output action to a
legacy_l3 tunnel port, where the packet_type of the flow is reverted
from PT_ETH to PT_IPV4 or PT_IPV6 (depending on the dl_type) to remove
its Ethernet header without clearing the pending_encap flag if it was
set. At the subsequent commit of the flow changes, the unexpected
combination of pending_encap == true with an PT_IPV4 or PT_IPV6
packet_type hit the OVS_NOT_REACHED() abortion clause.
The pending_encap is now cleared in this situation.
Reported-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Dincer Beken <dbeken@blackned.de>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Considering the following flows:
ovs-ofctl dump-flows br0
cookie=0x0, table=0, priority=0 actions=NORMAL
and assuming a packet originated from packet-out in this way:
ovs-ofctl packet-out br0 \
"in_port=controller,packet=<UDP packet>,action=ct(table=0)"
If in_port is OFPP_NONE or OFPP_CONTROLLER, this leads to a
NULL pointer (xport) dereference in xlate_normal().
Fix it by checking the xport pointer validity while deciding whether
it is a candidate for mac learning or not.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ofproto map 'all_ofproto_dpifs_by_uuid' does not support
concurrent accesses. It is however read by upcall handler threads
and written by the main thread at the same time.
Additionally, handler threads will change the ams_seq while
an ofproto is being destroyed, triggering crashes with the
following backtrace:
(gdb) bt
hmap_next (hmap.h:398)
seq_wake_waiters (seq.c:326)
seq_change_protected (seq.c:134)
seq_change (seq.c:144)
ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
process_upcall (ofproto_dpif_upcall.c:1782)
recv_upcalls (ofproto_dpif_upcall.c:1026)
udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
ovsthread_wrapper (ovs_thread.c:734)
To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'.
Instead, another map already storing ofprotos in xlate can be used.
During an ofproto destruction, its reference is removed from the current
xlate xcfg. Such change is committed only after all threads have quiesced
at least once during xlate_txn_commit(). This wait ensures that the
removal is seen by all threads, rendering impossible for a thread to
still hold a reference while the destruction proceeds.
Furthermore, the xlate maps are copied during updates instead of
being written in place. It is thus correct to read xcfg->xbridges while
inserting or removing from new_xcfg->xbridges.
Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges,
it is important to use a high level of entropy. As it used the ofproto pointer
hashed, fewer bits were random compared to the uuid key used in
'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key
in xbridges as well, improving entropy.
Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.")
Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Co-authored-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From hepeng:
https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/#2487473
also from guohongzhi <guohongzhi1@huawei.com>:
http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongzhi1@huawei.com/
also from a discussion about the mixing use of RCU and refcount in the mail
list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
A summary, as quoted from Ilya:
"
RCU for ofproto was introduced for one
and only one reason - to avoid freeing ofproto while rules are still
alive. This was done in commit f416c8d61601 ("ofproto: RCU postpone
rule destruction."). The goal was to allow using rules without
refcounting them within a single grace period. And that forced us
to postpone destruction of the ofproto for a single grace period.
Later commit 39c9459355b6 ("Use classifier versioning.") made it
possible for rules to be alive for more than one grace period, so
the commit made ofproto wait for 2 grace periods by double postponing.
As we can see now, that wasn't enough and we have to wait for more
than 2 grace periods in certain cases.
"
In a short, the ofproto should have a longer life time than rule, if
the rule lasts for more than 2 grace periods, the ofproto should live
longer to ensure rule->ofproto is valid. It's hard to predict how long
a ofproto should live, thus we need to use refcount on ofproto to make
things easy. The controversial part is that we have already used RCU postpone
to delay ofproto destrution, if we have to add refcount, is it simpler to
use just refcount without RCU postpone?
IMO, I think going back to the pure refcount solution is more
complicated than mixing using both.
Gaëtan Rive asks some questions on guohongzhi's v2 patch:
during ofproto_rule_create, should we use ofproto_ref
or ofproto_try_ref? how can we make sure the ofproto is alive?
By using RCU, ofproto has three states:
state 1: alive, with refcount >= 1
state 2: dying, with refcount == 0, however pointer is valid
state 3: died, memory freed, pointer might be dangling.
Without using RCU, there is no state 2, thus, we have to be very careful
every time we see a ofproto pointer. In contrast, with RCU, we can be sure
that it's alive at least in this grace peroid, so we can just check if
it is dying by ofproto_try_ref.
This shows that by mixing use of RCU and refcount we can save a lot of work
worrying if ofproto is dangling.
In short, the RCU part makes sure the ofproto is alive when we use it,
and the refcount part makes sure it lives longer enough.
In this patch, I have merged guohongzhi's patch and mine, and fixes
accoring to the previous comments.
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Co-authored-by: Hongzhi Guo <guohongzhi1@huawei.com>
Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are currently requiring in_port to be a valid port number for ipfix
sampling even if the sampling is done on the output port (egress).
This restriction is not really needed and can affect pipelines that
intentionally set the in_port to OFPP_NONE during flow processing. For
instance, OVN does this, see:
cfa547821 Fix ovn-controller generated packets from getting dropped for
reject ACL action.
This patch skips ipfix sampling only if both (ingress and egress) ports
are invalid.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
Added PT_MPLS_MC support in function xlate_generic_decap_action to fix
packet drops when decap action is applied on packets with packet_type
(ns=1,type=0x8848).
Fixes: 1917ace89364 ("Encap & Decap actions for MPLS packet type.")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For patch ports, the is_last_action value is not propagated and is
always set to true. This causes non-reversible actions to modify the
packet, and the original content is not preserved when processing
the remaining actions.
This patch propagates the is_last_action flag for patch port related
actions. In addition, it also fixes a general last action propagation
to the individual actions.
Fixed check_pkt_larger as last action, as it is a valid case for the
drop action, so it should not be skipped.
Fixes: feee58b95 ("ofproto-dpif-xlate: Keep track of the last action")
Fixes: 5b34f8fc3 ("Add a new OVS action check_pkt_larger")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The encap & decap actions are extended to support MPLS packet type.
Encap & decap actions adds and removes MPLS header at start of the
packet.
The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS
header between ethernet header and the IP header. Though this behaviour
is fine for L3 VPN where an IP packet is encapsulated inside a MPLS
tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the
ethernet packets must be encapsulated inside MPLS tunnel.
In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the
start of packet as depicted below.
Encapsulation:
Actions - encap(mpls),encap(ethernet)
Incoming packet -> | ETH | IP | Payload |
1 Actions - encap(mpls) [Datapath action - ADD_MPLS:0x8847]
Outgoing packet -> | MPLS | ETH | Payload|
2 Actions - encap(ethernet) [ Datapath action - push_eth ]
Outgoing packet -> | ETH | MPLS | ETH | Payload|
Decapsulation:
Incoming packet -> | ETH | MPLS | ETH | IP | Payload |
Actions - decap(),decap(packet_type(ns=0,type=0))
1 Actions - decap() [Datapath action - pop_eth)
Outgoing packet -> | MPLS | ETH | IP | Payload|
2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558]
Outgoing packet -> | ETH | IP | Payload|
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of native tunnel with bfd enabled, if the MAC address of the
remote end's interface changes (e.g. because it got rebooted, and the
MAC address is allocated dynamically), the BFD session will never be
re-established.
This happens because the local tunnel neigh entry doesn't get updated,
and the local end keeps sending BFD packets with the old destination
MAC address. This was not an issue until
b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
because ARP requests were snooped as well avoiding the problem.
Fix this by snooping the incoming packets in the slow path, and
updating the neigh cache accordingly.
Fixes: b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2002430
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a minor issue but visible e.g. when you try to flush the neigh
cache while the ARP flow is still present in the datapath, triggering
the revalidation of the datapath flows which subsequently
refreshes/adds the entry in the cache.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
| |
It is useful to also log when tnl_port_build_header() failed.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit dc0bd12f5b04 removed restriction that tunnel endpoint must be a
bridge port. So, currently OVS has to check if the native tunnel needs
to be terminated regardless of the output port. Unfortunately, there
is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst'
match to the megaflow that ends up in the corresponding datapath flow.
And since tunneling works on L3 level and not restricted by any
particular bridge, this extra match criteria is added to every
datapath flow on every bridge even if that bridge cannot be part of
a tunnel processing.
For example, if OVS has at least one tunnel configured and we're
adding a completely separate bridge with 2 ports and simple rules
to forward packets between two ports, there still will be a match on
a destination mac address:
1. <create a tunnel configuration in OVS>
2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev
3. ovs-vsctl add-port br-non-tunnel port0
-- add-port br-non-tunnel port1
4. ovs-ofctl del-flows br-non-tunnel
5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1
6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0
# ovs-appctl ofproto/trace br-non-tunnel in_port=port0
Flow: in_port=1,vlan_tci=0x0000,
dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000
bridge("br-non-tunnel")
-----------------------
0. in_port=1, priority 32768
output:2
Final flow: unchanged
Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000
Datapath actions: 5 ^^^^^^^^^^^^^^^^^^^^^^^^
This increases the number of upcalls and installed datapath flows,
since separate flow needs to be installed per destination MAC, reducing
the switching performance. This also blocks datapath performance
optimizations that are based on the datapath flow simplicity.
In general, in order to be a tunnel endpoint, port has to have an IP
address. Hence native tunnel termination should be attempted only
for such ports. This allows to avoid extra matches in most cases.
Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.")
Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xlate_check_pkt_larger() sets ctx->exit to 'true' at the end
causing the translation to stop. This results in incomplete
datapath rules.
For example, for the below OF rules configured on a bridge,
table=0,in_port=1 actions=load:0x1->NXM_NX_REG1[[]],resubmit(,1),
load:0x2->NXM_NX_REG1[[]],resubmit(,1),
load:0x3->NXM_NX_REG1[[]],resubmit(,1)
table=1,in_port=1,reg1=0x1 actions=check_pkt_larger(200)->NXM_NX_REG0[[0]],
resubmit(,4)
table=1,in_port=1,reg1=0x2 actions=output:2
table=1,in_port=1,reg1=0x3 actions=output:4
table=4,in_port=1 actions=output:3
The datapath flow should be:
check_pkt_len(size=200,gt(3),le(3)),2,4
But right now it is:
check_pkt_len(size=200,gt(3),le(3))
Actions after the first resubmit(,1) in the first flow in table 0
are never applied. This patch fixes this issue.
Fixes: 5b34f8fc3b38 ("Add a new OVS action check_pkt_larger")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2018365
Reported-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CT zone could be set from a field that is not included in frozen
metadata. Consider the example rules which are typically seen in
OpenStack security group rules:
priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
The zone is set from the first rule's ct action. These two rules will
generate two megaflows: the first one uses zone=5 to query the CT module,
the second one sets the zone-id from the first megaflow and commit to CT.
The current implementation will generate a megaflow that does not use
ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
set by an Imm not a field.
Consider a situation that one changes the zone id (for example to 15)
in the first rule, however, still keep the second rule unchanged. During
this change, there is traffic hitting the two generated megaflows, the
revaldiator would revalidate all megaflows, however, the revalidator will
not change the second megaflow, because zone=5 is recorded in the
megaflow, so the xlate will still translate the commit action into zone=5,
and the new traffic will still commit to CT as zone=5, not zone=15,
resulting in taffic drops and other issues.
Just like OVS set-field convention, if a field X is set by Y
(Y is a variable not an Imm), we should also mask Y as a match
in the generated megaflow. An exception is that if the zone-id is
set by the field that is included in the frozen state (i.e. regs) and this
upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
as the recirc_id is a hash of the values in these fields, and it will change
following the changes of these fields. When the recirc_id changes,
all megaflows with the old recirc id will be invalid later.
Fixes: 07659514c3 ("Add support for connection tracking.")
Reported-by: Sai Su <susai.ss@bytedance.com>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Open vSwitch supports OpenFlow "instructions", which were introduced in
OpenFlow 1.1 and act like restricted kinds of actions that can only
appear in a particular order and particular circumstances. OVS did
not support two of these instructions, "write_metadata" and
"goto_table", properly in the case where they appeared in a flow that
needed to be frozen for continuations.
Both of these instructions had the problem that they couldn't be
properly serialized into the stream of actions, because they're not
actions. This commit fixes that problem in freeze_unroll_actions()
by converting them into equivalent actions for serialization.
goto_table had the additional problem that it was being serialized to
the frozen stream even after it had been executed. This was already
properly handled in do_xlate_actions() for resubmit, which is almost
equivalent to goto_table, so this commit applies the same fix to
goto_table. (The commit removes an assertion from the goto_table
implementation, but there wasn't any real value in that assertion and
I thought the code looked cleaner without it.)
This commit adds tests that would have found these bugs. This includes
adding a variant of each continuation test that uses OF1.3 for
monitor/resume (which is necessary to trigger these bugs) plus specific
tests for continuations with goto_table and write_metadata. It also
improves the continuation test infrastructure to add more detail on
the problem if a test fails.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reported-by: Grayson Wu <wgrayson@vmware.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/213
Discussed-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386166.html
Acked-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is an option to add/flush/show ARP/ND neighbor. This
covers L3 side. For L2 side, there is only fdb show command. This
commit gives an option to add/del an fdb entry via ovs-appctl.
CLI command looks like:
To add:
ovs-appctl fdb/add <bridge> <port> <vlan> <Mac>
ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
To del:
ovs-appctl fdb/del <bridge> <vlan> <Mac>
ovs-appctl fdb/del br0 0 50:54:00:00:00:05
Added two new APIs to provide convenient interface to add and delete
static-macs.
bool xlate_add_static_mac_entry(const struct ofproto_dpif *,
ofp_port_t in_port,
struct eth_addr dl_src, int vlan);
bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
struct eth_addr dl_src, int vlan);
1. Static entry should not age. To indicate that entry being
programmed is a static entry, 'expires' field in 'struct mac_entry'
will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value
is made while deleting mac entry as part of regular aging process.
2. Another change to the mac-update logic, when a packet with same
dl_src as that of a static-mac entry arrives on any port, the logic
will not modify the expires field.
3. While flushing fdb entries, made sure static ones are not evicted.
4. Updated "ovs-appctl fdb/stats-show br0" to display number of static
entries in switch
Added following tests:
ofproto-dpif - static-mac add/del/flush
ofproto-dpif - static-mac mac moves
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to flow lookup optimizations, especially in the resubmit/clone cases,
we might end up with multiple ct_clear actions, which are not necessary.
This patch only adds the ct_clear action to the datapath if any ct state
is tracked.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a decap action is applied on NSH header encapsulating a
ethernet packet a redundant set mac address action is programmed
to the datapath.
Fixes: f839892a206a ("OF support and translation of generic encap and decap")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are various L3 encapsulation standards using UDP being discussed to
leverage the UDP based load balancing capability of different networks.
MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
The Bareudp tunnel provides a generic L3 encapsulation support for
tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP
tunnel.
An example to create bareudp device to tunnel MPLS traffic is
given
$ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \
type=bareudp options:remote_ip=2.1.1.3
options:local_ip=2.1.1.2 \
options:payload_type=0x8847 options:dst_port=6635
The bareudp device supports special handling for MPLS & IP as
they can have multiple ethertypes. MPLS procotcol can have ethertypes
ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have
ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
The bareudp device to tunnel L3 traffic with multiple ethertypes
(MPLS & IP) can be created by passing the L3 protocol name as string in
the field payload_type. An example to create bareudp device to tunnel
MPLS unicast & multicast traffic is given below.::
$ ovs-vsctl add-port br_mpls udp_port -- set interface
udp_port \
type=bareudp options:remote_ip=2.1.1.3
options:local_ip=2.1.1.2 \
options:payload_type=mpls options:dst_port=6635
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-By: Greg Rose <gvrose8192@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Too big userdata could overflow netlink message leading to out-of-bound
memory accesses or assertion while formatting nested actions.
Fix that by checking the size and returning correct error code.
Credit to OSS-Fuzz.
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27640
Fixes: e995e3df57ea ("Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According with rfc4541 section 2.1.1, a snooping switch
should forward membership reports only to ports with
routers attached.The current code violates the RFC
forwarding membership reports to group ports as well.
The same issue doesn't exist with IPv4.
Fixes: 06994f879c ("mcast-snooping: Add Multicast Listener Discovery support")
Signed-off-by: XiaoXiong Ding <dingxiaoxiong@huawei.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new term is "member".
Most of these changes should not change user-visible behavior. One
place where they do is in "ovs-ofctl dump-flows", which will now output
"members:..." inside "bundle" actions instead of "slaves:...". I don't
expect this to cause real problems in most systems. The old syntax
is still supported on input for backward compatibility.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:
1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.
2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.
Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.
Proposed optimization:
This proposal introduces a new load-balancing output action instead of
recirculation.
Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.
Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.
Instead, we will now generate action as
'lb_output(<bond id>)'
This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).
Example:
Current scheme:
With 8 UDP flows (with random UDP src port):
flow-dump from pmd on cpu core: 2
recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)
recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1
New scheme:
We can do with a single flow entry (for any number of new flows):
in_port(7),<...> actions:lb_output(1)
A new CLI has been added to dump datapath bond cache as given below.
# ovs-appctl dpif-netdev/bond-show [dp]
Bond cache:
bond-id 1 :
bucket 0 - slave 2
bucket 1 - slave 1
bucket 2 - slave 2
bucket 3 - slave 1
Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When ofproto/trace detects a recirc action it resumes execution at the
specified next table. However, if the ct action performs SNAT/DNAT,
e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and
ports in the oftrace_recirc_node->flow field are not updated. This leads
to misleading outputs from ofproto/trace as real packets would actually
first get NATed and might match different flows when recirculated.
Assume the first IP/port from the NAT src/dst action will be used by
conntrack for the translation and update the oftrace_recirc_node->flow
accordingly. This is not entirely correct as conntrack might choose a
different IP/port but the result is more realistic than before.
This fix covers new connections. However, for reply traffic that executes
actions of the form ct(nat, table=42) we still don't update the flow as
we don't have any information about conntrack state when tracing.
Also move the oftrace_recirc_node processing out of ofproto_trace()
and to its own function, ofproto_trace_recirc_node() for better
readability/
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new OpenFlow action, delete field, to delete a
field in packets. Currently, only the tun_metadata fields are
supported.
One use case to add this action is to support multiple versions
of geneve tunnel metadatas to be exchanged among different versions
of networks. For example, we may introduce tun_metadata2 to
replace old tun_metadata1, but still want to provide backward
compatibility to the older release. In this case, in the new
OpenFlow pipeline, we would like to support the case to receive a
packet with tun_metadata1, do some processing. And if the packet
is going to a switch in the newer release, we would like to delete
the value in tun_metadata1 and set a value into tun_metadata2.
Currently, ovs does not provide an action to remove a value in
tun_metadata if the value is present. This patch fulfills the gap
by adding the delete_field action. For example, the OpenFlow
syntax to delete tun_metadata1 is:
actions=delete_field:tun_metadata1
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tun_metadata can be referened by flow and frozen_state at the same
time. When ovs-vswitchd handles TLV table mod message, the involved
tun_metadata gets freed. The call trace to free tun_metadata is
shown as below:
ofproto_run
- handle_openflow
- handle_single_part_openflow
- handle_tlv_table_mod
- tun_metadata_table_mod
- tun_metadata_postpone_free
Unfortunately, this tun_metadata can be still used by some frozen_state,
and later on when frozen_state tries to access its tun_metadata table,
ovs-vswitchd crashes. The call trace to access tun_metadata from
frozen_state is shown as below:
udpif_upcall_handler
- recv_upcalls
- process_upcall
- frozen_metadata_to_flow
It is unsafe for frozen_state to reference tun_table because tun_table
is protected by RCU while the lifecycle of frozen_state can span several
RCU quiesce states. Current code violates OVS's RCU protection mechanism.
This patch fixes it by simply stopping frozen_state from referencing
tun_table. If frozen_state needs tun_table, the latest valid tun_table
can be found through ofproto_get_tun_tab() efficiently.
A previous commit seems fixing the samiliar issue:
254878c18874f6 (ofproto-dpif-xlate: Fix segmentation fault caused by tun_table)
VMware-BZ: #2526222
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GTP, GPRS Tunneling Protocol, is a group of IP-based communications
protocols used to carry general packet radio service (GPRS) within
GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling
(GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used
for setting up GTP-U protocol, which is an IP-in-UDP tunneling
protocol. Usually GTP is used in connecting between base station for
radio, Serving Gateway (S-GW), and PDN Gateway (P-GW).
This patch implements GTP-U protocol for userspace datapath,
supporting only required header fields and G-PDU message type.
See spec in:
https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00
Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784
Signed-off-by: Feng Yang <yangfengee04@gmail.com>
Co-authored-by: Feng Yang <yangfengee04@gmail.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recirculation usually requires finding the pre-recirculation input port.
Packets sent by the controller, with in_port of OFPP_CONTROLLER or
OFPP_NONE, do not have a real input port data structure, only a port
number. The code in xlate_lookup_ofproto_() mishandled this case,
failing to return the ofproto data structure. This commit fixes the
problem and adds a test to guard against regression.
Reported-by: Numan Siddique <numans@ovn.org>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368642.html
Tested-by: Numan Siddique <numans@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently fast-failover group does not support checking liveness of controller
port (OFPP_CONTROLLER). However this feature can be useful for selecting
alternate pipeline when controller connection itself is down for e.g.
by using local DHCP server to reply for any DHCP request originating from VMs.
This patch adds the support for watching controller port liveness in fast-
failover group. Controller port is considered live when atleast one
of-connection is alive.
Example usage:
ovs-ofctl add-group br-int 'group_id=1234,type=ff,
bucket=watch_port:CONTROLLER,actions:<A>,
bucket=watch_port:1,actions:<B>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|