summaryrefslogtreecommitdiff
path: root/ofproto/ofproto-dpif-xlate.c
Commit message (Collapse)AuthorAgeFilesLines
* ofproto-dpif-xlate: Fix use-after-free when xlate_actions().Yunjian Wang2023-05-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, bundle->cvlans and xbundle->cvlans are pointing to the same memory location. This can cause issues if the main thread modifies bundle->cvlans and frees it while the revalidator thread is still accessing xbundle->cvlans. This leads to use-after-free error. AddressSanitizer: heap-use-after-free on address 0x615000007b08 at pc 0x0000004ede1e bp 0x7f3120ee0310 sp 0x7f3120ee0300 READ of size 8 at 0x615000007b08 thread T25 (revalidator25) 0 0x4ede1d in bitmap_is_set lib/bitmap.h:91 1 0x4fcb26 in xbundle_allows_cvlan ofproto/ofproto-dpif-xlate.c:2028 2 0x4fe279 in input_vid_is_valid ofproto/ofproto-dpif-xlate.c:2294 3 0x502abf in xlate_normal ofproto/ofproto-dpif-xlate.c:3051 4 0x5164dc in xlate_output_action ofproto/ofproto-dpif-xlate.c:5361 5 0x522576 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7047 6 0x52a751 in xlate_actions ofproto/ofproto-dpif-xlate.c:8061 7 0x4e2b66 in xlate_key ofproto/ofproto-dpif-upcall.c:2212 8 0x4e2e13 in xlate_ukey ofproto/ofproto-dpif-upcall.c:2227 9 0x4e345d in revalidate_ukey__ ofproto/ofproto-dpif-upcall.c:2276 10 0x4e3f85 in revalidate_ukey ofproto/ofproto-dpif-upcall.c:2395 11 0x4e7ac5 in revalidate ofproto/ofproto-dpif-upcall.c:2858 12 0x4d9ed3 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:1010 13 0x7cd92e in ovsthread_wrapper lib/ovs-thread.c:423 14 0x7f312ff01f3a (/usr/lib64/libpthread.so.0+0x8f3a) 15 0x7f312fc8f51f in clone (/usr/lib64/libc.so.6+0xf851f) 0x615000007b08 is located 8 bytes inside of 512-byte region [0x615000007b00,0x615000007d00) freed by thread T0 here: 0 0x7f3130378ad8 in free (/usr/lib64/libasan.so.4+0xe0ad8) 1 0x49044e in bundle_set ofproto/ofproto-dpif.c:3431 2 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455 3 0x40e6c9 in port_configure vswitchd/bridge.c:1300 4 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921 5 0x41f1a9 in bridge_run vswitchd/bridge.c:3313 6 0x42d4fb in main vswitchd/ovs-vswitchd.c:132 7 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86) previously allocated by thread T0 here: 0 0x7f3130378e70 in __interceptor_malloc 1 0x8757fe in xmalloc__ lib/util.c:140 2 0x8758da in xmalloc lib/util.c:175 3 0x875927 in xmemdup lib/util.c:188 4 0x475f63 in bitmap_clone lib/bitmap.h:79 5 0x47797c in vlan_bitmap_clone lib/vlan-bitmap.h:40 6 0x49048d in bundle_set ofproto/ofproto-dpif.c:3433 7 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455 8 0x40e6c9 in port_configure vswitchd/bridge.c:1300 9 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921 10 0x41f1a9 in bridge_run vswitchd/bridge.c:3313 11 0x42d4fb in main vswitchd/ovs-vswitchd.c:132 12 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86) Fixes: fed8962aff57 ("Add new port VLAN mode "dot1q-tunnel"") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Always mask ip proto field.Aaron Conole2023-04-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | The ofproto layer currently treats nw_proto field as overloaded to mean both that a proper nw layer exists, as well as the value contained in the header for the nw proto. However, this is incorrect behavior as relevant standards permit that any value, including '0' should be treated as a valid value. Because of this overload, when the ofproto layer builds action list for a packet with nw_proto of 0, it won't build the complete action list that we expect to be built for the packet. That will cause a bad behavior where all packets passing the datapath will fall into an incomplete action set. The fix here is to unwildcard nw_proto, allowing us to preserve setting actions for protocols which we know have support for the actions we program. This means that a traffic which contains nw_proto == 0 cannot cause connectivity breakage with other traffic on the link. Reported-by: David Marchand <dmarchand@redhat.com> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873 Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* userspace: Add SRv6 tunnel support.Nobuhiro MIKI2023-03-291-0/+4
| | | | | | | | | | | | | | | | | SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Update tunnel neighbor when receive gratuitous ARP.Han Ding2022-11-021-3/+11
| | | | | | | | | | | | | OVS now just allow the ARP Reply which the destination address is matched against the known xbridge addresses to update tunnel neighbor. So when OVS receive the gratuitous ARP from underlay gateway which the source address and destination address are all gateway IP, tunnel neighbor will not be updated. Fixes: ba07cf222a0c ("Handle gratuitous ARP requests and replies in tnl_arp_snoop()") Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to terminate_native_tunnel()") Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Han Ding <handing@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Allow sample when no in_port.Adrian Moreno2022-10-251-1/+1
| | | | | | | | | | | | | OVN can (and indeed does) set in_port to OFPP_NONE during the pipeline evaluation. If a sample action follows, it will be incorrectly skipped. Per-flow sampling version of: f0a9000ca ofproto: Fix ipfix not always sampling on egress. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Do not use zero-weight buckets in select groups.Ben Pfaff2022-10-181-5/+11
| | | | | | | | | | | The OpenFlow specification says that buckets in select groups with a weight of zero should not be selected, but the ofproto-dpif implementation could select them in corner cases. This fixes the problem. Reported-by: ychen <ychen103103@163.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359349.html Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-upcall: Print more data on unassociated datapath ports.Ilya Maximets2022-10-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When OVS fails to find an OpenFlow port for a packet received from the upcall it just prints the warning like this: |INFO|received packet on unassociated datapath port N However, during the flow translation more information is available as if the recirculation id wasn't found or it was a packet from unknown tunnel port. Printing that information might be useful to understand the origin of the problem. Port translation functions already support extended error strings, we just need to pass a variable where to store them. With the change the output may be: |INFO|received packet on unassociated datapath port N (no OpenFlow port for datapath port N) or |INFO|received packet on unassociated datapath port N (no OpenFlow tunnel port for this packet) or |INFO|received packet on unassociated datapath port N (no recirculation data for recirc_id M) Unfortunately, there is no good way to trigger this code from current unit tests. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* bond: Avoid deadlock while updating post recirculation rules.Ilya Maximets2022-09-161-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the PACKET_OUT from controller ends up with sending packet to a bond interface, the main thread will take locks in the following order: handle_openflow --> take ofproto_mutex handle_packet_out packet_xlate output_normal bond_update_post_recirc_rules --> take rwlock in bond.c If at the same time revalidator thread is processing other packet with the output to the same bond: xlate_actions output_normal bond_update_post_recirc_rules --> take rwlock in bond.c update_recirc_rules ofproto_dpif_add_internal_flow ofproto_flow_mod --> take ofproto_mutex So, it is possible for these 2 threads to lock each other by taking one lock and waiting for another thread to release the second lock. It is also possible for the main thread to lock itself up by trying to acquire ofproto_mutex for the second time, if it will actually proceed with update_recirc_rules() after taking the bond rwlock. The problem appears to be that bond_update_post_recirc_rules() is called during the flow translation even if side effects are prohibited, which is the case for openflow PACKET_OUT handling. Skipping actual flow updates during the flow translation if side effects are disabled to avoid the deadlock. Since flows are not installed now when actions translated for very first packet, installing initial flows in bond_reconfigure(). This will cover the case of allocating a new recirc_id. Also checking if we need to update flows in bond_run() to cover link state changes. Regression test is added to catch the double lock case. Reported-at: https://github.com/openvswitch/ovs-issues/issues/259 Reported-by: Daniel Ding <zhihui.ding@easystack.cn> Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using recirculation") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-trace: add --name option for ofproto/trace.Nobuhiro MIKI2022-09-151-2/+22
| | | | | | | | | | | | | Most of commands in ovs-ofctl and ovs-appctl can display port names instead of port numbers by using --names option. This change adds similar functionality to ofproto/trace. For backward compatibility, the default behavior is the same as before. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix error messages for nonexistent ports/recirc_ids.Ilya Maximets2022-09-091-3/+3
| | | | | | | | | | If tnl_port_should_receive() is false, we're looking for a normal port, not a tunnel one. And it's better to print recirculation IDs in hex since they are typically printed this way in flow dumps. Fixes: d40533fc820c ("odp-util: Improve log messages and error reporting for Netlink parsing.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Optimize datapath action set by removing last clone action.Eelco Chaudron2022-09-091-0/+37
| | | | | | | | | | | | | | | | | | | | | | | When OFPROTO non-reversible actions are translated to data plane actions, the only thing looked at is if there are more actions pending. If this is the case, the action is encapsulated in a clone(). This could lead to unnecessary clones if no meaningful data plane actions are added. For example, the register pop in the included test case. The best solution would probably be to build the full action path and determine if the clone is needed. However, this would be a huge change in the existing design, so for now, we just try to optimize the generated datapath flow. We can revisit this later, as some of the pending CT issues might need this rework. Fixes: feee58b9587f ("ofproto-dpif-xlate: Keep track of the last action") Fixes: dadd8357f224 ("ofproto-dpif: Fix issue with non-reversible actions on a patch ports.") Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Clear tunnel wc bits if original packet is non-tunnel.wenxu2022-09-091-0/+4
| | | | | | | | | | A packet go through the encap openflow(set_field tun_id/src/dst) The tunnel wc bits will be set. But it should be clear if the original packet is non-tunnel. It is not necessary for datapath wc the tunnel info for match(like the similar logic for vlan). Signed-off-by: wenxu <wenxu@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: No clone when tunnel push is last action.Rosemarie O'Riorden2022-06-291-6/+18
| | | | | | | | | | | When OVS sees a tunnel push with a nested list next, it will not clone the packet, as a clone is not needed. However, a clone action will still be created with the tunnel push encapsulated inside. There is no need to create the clone action in this case, as extra parsing will need to be performed, which is less efficient. Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix internal CT state for non-recirc traffic.Frode Nordahl2022-06-071-0/+6
| | | | | | | | | | | | | | | | | | | | In some circumstances a flow may get its ct_state set without conscious intervention by the OVS user space code. Commit 355fef6f2ccbc optimizes out unnecessary ct_clear actions based on an internal struct xlate_ctx->conntracked state flag. Before this commit the xlate_ctx->conntracked state flag would be initialized to 'false' and only set during thawing for recirculation. This patch checks the flow ct_state for the non-recirc case and sets the internal conntracked state appropriately. A system traffic test is also added to avoid regression. Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath actions.") Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Revert "odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP."Aaron Conole2022-05-261-1/+0
| | | | | | | | | | | | | | | | | | | | | This reverts commit c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.") Always forcing a slow path action can result in some over-broad flows which swallow all traffic and force them to userspace, as reported in the thread at https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387706.html and at https://mail.openvswitch.org/pipermail/ovs-dev/2021-September/387689.html Revert the ODP_FIT_TOO_LITTLE return for IGMP specifically. Additionally, remove the userspace wc mask to prevent revalidator from cycling flows. Fixes: c645550bb249 ("odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.") Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix netdev native tunnel neigh discovery spa.wenxu2022-05-251-2/+15
| | | | | | | | | | | | | | | | | | During native tunnel encapsulation process, on tunnel neighbor cache miss OVS sends an arp/nd request. Currently, tunnel source is used as arp spa. Find the spa which has the same subnet with the nexthop of tunnel dst on egress port, if false, use the tunnel src as spa. For example: tunnel src is a vip with 10.0.0.7/32, tunnel dst is 10.0.1.7 the br-phy with address 192.168.0.7/24 and the default gateway is 192.168.0.1 So the spa of arp request for 192.168.0.1 should be 192.168.0.7 but not 10.0.0.7 Signed-off-by: wenxu <wenxu@chinatelecom.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* treewide: Fix invalid bit shift operations.Dumitru Ceara2022-05-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UB Sanitizer reports: tests/test-hash.c:59:40: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' 0 0x44c3c9 in get_range128 tests/test-hash.c:59 1 0x44cb2e in check_hash_bytes128 tests/test-hash.c:178 2 0x44d14d in test_hash_main tests/test-hash.c:282 [...] ofproto/ofproto-dpif-xlate.c:5607:45: runtime error: left shift of 65535 by 16 places cannot be represented in type 'int' 0 0x53fe9f in xlate_sample_action ofproto/ofproto-dpif-xlate.c:5607 1 0x54d625 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7160 2 0x553b76 in xlate_actions ofproto/ofproto-dpif-xlate.c:7806 3 0x4fcb49 in upcall_xlate ofproto/ofproto-dpif-upcall.c:1237 4 0x4fe02f in process_upcall ofproto/ofproto-dpif-upcall.c:1456 5 0x4fda99 in upcall_cb ofproto/ofproto-dpif-upcall.c:1358 [...] tests/test-util.c:89:23: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' 0 0x476415 in test_ctz tests/test-util.c:89 [...] lib/dpif-netlink.c:396:33: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' 0 0x571b9f in dpif_netlink_open lib/dpif-netlink.c:396 Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Remove mirror assert.lic1212022-05-041-3/+8
| | | | | | | | | | | | During the revalidation/upcall, mirror could be removed. Instead of crash the process, we can simply skip the deleted mirror. The issue had been triggered multiple times by ovs-tcpdump in my test. Fixes: ec7ceaed4f3e ("ofproto-dpif: Modularize mirror code.") Signed-off-by: lic121 <lic121@chinatelecom.cn> Tested-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Clear out vlan flow fields while processing native tunnel.Thilak Raj Surendra Babu2022-04-271-0/+3
| | | | | | | | | | | | | | | | | When a packet is received over an access port that needs to be sent over a vxlan tunnel,the access port VLAN id is used in the lookup leading to a wrong packet being crafted and sent over the tunnel. Clear out the flow 's VLAN field as it should not be used while performing mac lookup for the outer tunnel and also at this point the VLAN action related to inner flow is already committed. Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.") Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393566.html Reported-at: https://bugzilla.redhat.com/2060552 Signed-off-by: Thilak Raj Surendra Babu <thilakraj.sb@nutanix.com> Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com> Co-authored-by: Rosemarie O'Riorden <roriorden@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-xlate: Fix crash when forwarding packet between legacy_l3 tunnels.Jan Scheurich2022-04-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A packet received from a tunnel port with legacy_l3 packet-type (e.g. lisp, L3 gre, gtpu) is conceptually wrapped in a dummy Ethernet header for processing in an OF pipeline that is not packet-type-aware. Before transmission of the packet to another legacy_l3 tunnel port, the dummy Ethernet header is stripped again. In ofproto-xlate, wrapping in the dummy Ethernet header is done by simply changing the packet_type to PT_ETH. The generation of the push_eth datapath action is deferred until the packet's flow changes need to be committed, for example at output to a normal port. The deferred Ethernet encapsulation is marked in the pending_encap flag. This patch fixes a bug in the translation of the output action to a legacy_l3 tunnel port, where the packet_type of the flow is reverted from PT_ETH to PT_IPV4 or PT_IPV6 (depending on the dl_type) to remove its Ethernet header without clearing the pending_encap flag if it was set. At the subsequent commit of the flow changes, the unexpected combination of pending_encap == true with an PT_IPV4 or PT_IPV6 packet_type hit the OVS_NOT_REACHED() abortion clause. The pending_encap is now cleared in this situation. Reported-by: Dincer Beken <dbeken@blackned.de> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Dincer Beken <dbeken@blackned.de> Signed-off-by: Dincer Beken <dbeken@blackned.de> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix NULL pointer dereference in xlate_normal().Paolo Valerio2022-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Considering the following flows: ovs-ofctl dump-flows br0 cookie=0x0, table=0, priority=0 actions=NORMAL and assuming a packet originated from packet-out in this way: ovs-ofctl packet-out br0 \ "in_port=controller,packet=<UDP packet>,action=ct(table=0)" If in_port is OFPP_NONE or OFPP_CONTROLLER, this leads to a NULL pointer (xport) dereference in xlate_normal(). Fix it by checking the xport pointer validity while deciding whether it is a candidate for mac learning or not. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* hmap: use short version of safe loops if possible.Adrian Moreno2022-03-301-4/+4
| | | | | | | | | | | | | | | Using SHORT version of the *_SAFE loops makes the code cleaner and less error prone. So, use the SHORT version and remove the extra variable when possible for hmap and all its derived types. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* list: use short version of safe loops if possible.Adrian Moreno2022-03-301-2/+2
| | | | | | | | | | | | | | | Using the SHORT version of the *_SAFE loops makes the code cleaner and less error-prone. So, use the SHORT version and remove the extra variable when possible. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto: Use xlate map for uuid lookups.Gaetan Rivet2022-03-071-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ofproto map 'all_ofproto_dpifs_by_uuid' does not support concurrent accesses. It is however read by upcall handler threads and written by the main thread at the same time. Additionally, handler threads will change the ams_seq while an ofproto is being destroyed, triggering crashes with the following backtrace: (gdb) bt hmap_next (hmap.h:398) seq_wake_waiters (seq.c:326) seq_change_protected (seq.c:134) seq_change (seq.c:144) ofproto_dpif_send_async_msg (ofproto_dpif.c:263) process_upcall (ofproto_dpif_upcall.c:1782) recv_upcalls (ofproto_dpif_upcall.c:1026) udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945) ovsthread_wrapper (ovs_thread.c:734) To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'. Instead, another map already storing ofprotos in xlate can be used. During an ofproto destruction, its reference is removed from the current xlate xcfg. Such change is committed only after all threads have quiesced at least once during xlate_txn_commit(). This wait ensures that the removal is seen by all threads, rendering impossible for a thread to still hold a reference while the destruction proceeds. Furthermore, the xlate maps are copied during updates instead of being written in place. It is thus correct to read xcfg->xbridges while inserting or removing from new_xcfg->xbridges. Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges, it is important to use a high level of entropy. As it used the ofproto pointer hashed, fewer bits were random compared to the uuid key used in 'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key in xbridges as well, improving entropy. Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.") Suggested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Co-authored-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto: Add refcount to ofproto to fix ofproto use-after-free.Peng He2022-03-071-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From hepeng: https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/#2487473 also from guohongzhi <guohongzhi1@huawei.com>: http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongzhi1@huawei.com/ also from a discussion about the mixing use of RCU and refcount in the mail list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet. A summary, as quoted from Ilya: " RCU for ofproto was introduced for one and only one reason - to avoid freeing ofproto while rules are still alive. This was done in commit f416c8d61601 ("ofproto: RCU postpone rule destruction."). The goal was to allow using rules without refcounting them within a single grace period. And that forced us to postpone destruction of the ofproto for a single grace period. Later commit 39c9459355b6 ("Use classifier versioning.") made it possible for rules to be alive for more than one grace period, so the commit made ofproto wait for 2 grace periods by double postponing. As we can see now, that wasn't enough and we have to wait for more than 2 grace periods in certain cases. " In a short, the ofproto should have a longer life time than rule, if the rule lasts for more than 2 grace periods, the ofproto should live longer to ensure rule->ofproto is valid. It's hard to predict how long a ofproto should live, thus we need to use refcount on ofproto to make things easy. The controversial part is that we have already used RCU postpone to delay ofproto destrution, if we have to add refcount, is it simpler to use just refcount without RCU postpone? IMO, I think going back to the pure refcount solution is more complicated than mixing using both. Gaëtan Rive asks some questions on guohongzhi's v2 patch: during ofproto_rule_create, should we use ofproto_ref or ofproto_try_ref? how can we make sure the ofproto is alive? By using RCU, ofproto has three states: state 1: alive, with refcount >= 1 state 2: dying, with refcount == 0, however pointer is valid state 3: died, memory freed, pointer might be dangling. Without using RCU, there is no state 2, thus, we have to be very careful every time we see a ofproto pointer. In contrast, with RCU, we can be sure that it's alive at least in this grace peroid, so we can just check if it is dying by ofproto_try_ref. This shows that by mixing use of RCU and refcount we can save a lot of work worrying if ofproto is dangling. In short, the RCU part makes sure the ofproto is alive when we use it, and the refcount part makes sure it lives longer enough. In this patch, I have merged guohongzhi's patch and mine, and fixes accoring to the previous comments. Acked-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Peng He <hepeng.0320@bytedance.com> Co-authored-by: Hongzhi Guo <guohongzhi1@huawei.com> Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto: Fix ipfix not always sampling on egress.Adrian Moreno2022-02-091-1/+3
| | | | | | | | | | | | | | | | | | | | We are currently requiring in_port to be a valid port number for ipfix sampling even if the sampling is done on the output port (egress). This restriction is not really needed and can affect pipelines that intentionally set the in_port to OFPP_NONE during flow processing. For instance, OVN does this, see: cfa547821 Fix ovn-controller generated packets from getting dropped for reject ACL action. This patch skips ipfix sampling only if both (ingress and egress) ports are invalid. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346 Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix packet drops with decap action on MPLS Multicast.Martin Varghese2022-01-311-1/+2
| | | | | | | | | | Added PT_MPLS_MC support in function xlate_generic_decap_action to fix packet drops when decap action is applied on packets with packet_type (ns=1,type=0x8848). Fixes: 1917ace89364 ("Encap & Decap actions for MPLS packet type.") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: Fix issue with non-reversible actions on a patch ports.Eelco Chaudron2022-01-251-15/+12
| | | | | | | | | | | | | | | | | | | For patch ports, the is_last_action value is not propagated and is always set to true. This causes non-reversible actions to modify the packet, and the original content is not preserved when processing the remaining actions. This patch propagates the is_last_action flag for patch port related actions. In addition, it also fixes a general last action propagation to the individual actions. Fixed check_pkt_larger as last action, as it is a valid case for the drop action, so it should not be skipped. Fixes: feee58b95 ("ofproto-dpif-xlate: Keep track of the last action") Fixes: 5b34f8fc3 ("Add a new OVS action check_pkt_larger") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Encap & Decap actions for MPLS packet type.Martin Varghese2022-01-171-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The encap & decap actions are extended to support MPLS packet type. Encap & decap actions adds and removes MPLS header at start of the packet. The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS header between ethernet header and the IP header. Though this behaviour is fine for L3 VPN where an IP packet is encapsulated inside a MPLS tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the ethernet packets must be encapsulated inside MPLS tunnel. In this change the encap & decap actions are extended to support MPLS packet type. The encap & decap adds and removes MPLS header at the start of packet as depicted below. Encapsulation: Actions - encap(mpls),encap(ethernet) Incoming packet -> | ETH | IP | Payload | 1 Actions - encap(mpls) [Datapath action - ADD_MPLS:0x8847] Outgoing packet -> | MPLS | ETH | Payload| 2 Actions - encap(ethernet) [ Datapath action - push_eth ] Outgoing packet -> | ETH | MPLS | ETH | Payload| Decapsulation: Incoming packet -> | ETH | MPLS | ETH | IP | Payload | Actions - decap(),decap(packet_type(ns=0,type=0)) 1 Actions - decap() [Datapath action - pop_eth) Outgoing packet -> | MPLS | ETH | IP | Payload| 2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558] Outgoing packet -> | ETH | IP | Payload| Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Snoop ingress packets and update neigh cache if needed.Paolo Valerio2021-12-171-0/+13
| | | | | | | | | | | | | | | | | | | | | | | In case of native tunnel with bfd enabled, if the MAC address of the remote end's interface changes (e.g. because it got rebooted, and the MAC address is allocated dynamically), the BFD session will never be re-established. This happens because the local tunnel neigh entry doesn't get updated, and the local end keeps sending BFD packets with the old destination MAC address. This was not an issue until b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.") because ARP requests were snooped as well avoiding the problem. Fix this by snooping the incoming packets in the slow path, and updating the neigh cache accordingly. Fixes: b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2002430 Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tnl-neigh-cache: Do not refresh the entry while revalidating.Paolo Valerio2021-12-171-1/+2
| | | | | | | | | | | | This is a minor issue but visible e.g. when you try to flush the neigh cache while the ARP flow is still present in the datapath, triggering the revalidation of the datapath flows which subsequently refreshes/adds the entry in the cache. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Add a trace log for tnl_port_build_header() error.Yunjian Wang2021-11-291-0/+1
| | | | | | | It is useful to also log when tnl_port_build_header() failed. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.Ilya Maximets2021-11-191-5/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit dc0bd12f5b04 removed restriction that tunnel endpoint must be a bridge port. So, currently OVS has to check if the native tunnel needs to be terminated regardless of the output port. Unfortunately, there is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst' match to the megaflow that ends up in the corresponding datapath flow. And since tunneling works on L3 level and not restricted by any particular bridge, this extra match criteria is added to every datapath flow on every bridge even if that bridge cannot be part of a tunnel processing. For example, if OVS has at least one tunnel configured and we're adding a completely separate bridge with 2 ports and simple rules to forward packets between two ports, there still will be a match on a destination mac address: 1. <create a tunnel configuration in OVS> 2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev 3. ovs-vsctl add-port br-non-tunnel port0 -- add-port br-non-tunnel port1 4. ovs-ofctl del-flows br-non-tunnel 5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1 6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0 # ovs-appctl ofproto/trace br-non-tunnel in_port=port0 Flow: in_port=1,vlan_tci=0x0000, dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000 bridge("br-non-tunnel") ----------------------- 0. in_port=1, priority 32768 output:2 Final flow: unchanged Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000 Datapath actions: 5 ^^^^^^^^^^^^^^^^^^^^^^^^ This increases the number of upcalls and installed datapath flows, since separate flow needs to be installed per destination MAC, reducing the switching performance. This also blocks datapath performance optimizations that are based on the datapath flow simplicity. In general, in order to be a tunnel endpoint, port has to have an IP address. Hence native tunnel termination should be attempted only for such ports. This allows to avoid extra matches in most cases. Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.") Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mike Pattrick <mkp@redhat.com>
* ofproto-dpif-xlate: Fix check_pkt_larger incomplete translation.Numan Siddique2021-11-171-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlate_check_pkt_larger() sets ctx->exit to 'true' at the end causing the translation to stop. This results in incomplete datapath rules. For example, for the below OF rules configured on a bridge, table=0,in_port=1 actions=load:0x1->NXM_NX_REG1[[]],resubmit(,1), load:0x2->NXM_NX_REG1[[]],resubmit(,1), load:0x3->NXM_NX_REG1[[]],resubmit(,1) table=1,in_port=1,reg1=0x1 actions=check_pkt_larger(200)->NXM_NX_REG0[[0]], resubmit(,4) table=1,in_port=1,reg1=0x2 actions=output:2 table=1,in_port=1,reg1=0x3 actions=output:4 table=4,in_port=1 actions=output:3 The datapath flow should be: check_pkt_len(size=200,gt(3),le(3)),2,4 But right now it is: check_pkt_len(size=200,gt(3),le(3)) Actions after the first resubmit(,1) in the first flow in table 0 are never applied. This patch fixes this issue. Fixes: 5b34f8fc3b38 ("Add a new OVS action check_pkt_larger") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2018365 Reported-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: Numan Siddique <numans@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix zone set from non-frozen-metadata fields.Peng He2021-10-131-8/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CT zone could be set from a field that is not included in frozen metadata. Consider the example rules which are typically seen in OpenStack security group rules: priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 The zone is set from the first rule's ct action. These two rules will generate two megaflows: the first one uses zone=5 to query the CT module, the second one sets the zone-id from the first megaflow and commit to CT. The current implementation will generate a megaflow that does not use ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is set by an Imm not a field. Consider a situation that one changes the zone id (for example to 15) in the first rule, however, still keep the second rule unchanged. During this change, there is traffic hitting the two generated megaflows, the revaldiator would revalidate all megaflows, however, the revalidator will not change the second megaflow, because zone=5 is recorded in the megaflow, so the xlate will still translate the commit action into zone=5, and the new traffic will still commit to CT as zone=5, not zone=15, resulting in taffic drops and other issues. Just like OVS set-field convention, if a field X is set by Y (Y is a variable not an Imm), we should also mask Y as a match in the generated megaflow. An exception is that if the zone-id is set by the field that is included in the frozen state (i.e. regs) and this upcall is a resume of a thawed xlate, the un-wildcarding can be skipped, as the recirc_id is a hash of the values in these fields, and it will change following the changes of these fields. When the recirc_id changes, all megaflows with the old recirc id will be invalid later. Fixes: 07659514c3 ("Add support for connection tracking.") Reported-by: Sai Su <susai.ss@bytedance.com> Signed-off-by: Peng He <hepeng.0320@bytedance.com> Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.Ben Pfaff2021-07-221-20/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open vSwitch supports OpenFlow "instructions", which were introduced in OpenFlow 1.1 and act like restricted kinds of actions that can only appear in a particular order and particular circumstances. OVS did not support two of these instructions, "write_metadata" and "goto_table", properly in the case where they appeared in a flow that needed to be frozen for continuations. Both of these instructions had the problem that they couldn't be properly serialized into the stream of actions, because they're not actions. This commit fixes that problem in freeze_unroll_actions() by converting them into equivalent actions for serialization. goto_table had the additional problem that it was being serialized to the frozen stream even after it had been executed. This was already properly handled in do_xlate_actions() for resubmit, which is almost equivalent to goto_table, so this commit applies the same fix to goto_table. (The commit removes an assertion from the goto_table implementation, but there wasn't any real value in that assertion and I thought the code looked cleaner without it.) This commit adds tests that would have found these bugs. This includes adding a variant of each continuation test that uses OF1.3 for monitor/resume (which is necessary to trigger these bugs) plus specific tests for continuations with goto_table and write_metadata. It also improves the continuation test infrastructure to add more detail on the problem if a test fails. Signed-off-by: Ben Pfaff <blp@ovn.org> Reported-by: Grayson Wu <wgrayson@vmware.com> Reported-at: https://github.com/openvswitch/ovs-issues/issues/213 Discussed-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386166.html Acked-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif: APIs and CLI option to add/delete static fdb entry.Vasu Dasari2021-07-161-8/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 side. For L2 side, there is only fdb show command. This commit gives an option to add/del an fdb entry via ovs-appctl. CLI command looks like: To add: ovs-appctl fdb/add <bridge> <port> <vlan> <Mac> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 To del: ovs-appctl fdb/del <bridge> <vlan> <Mac> ovs-appctl fdb/del br0 0 50:54:00:00:00:05 Added two new APIs to provide convenient interface to add and delete static-macs. bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t in_port, struct eth_addr dl_src, int vlan); bool xlate_delete_static_mac_entry(const struct ofproto_dpif *, struct eth_addr dl_src, int vlan); 1. Static entry should not age. To indicate that entry being programmed is a static entry, 'expires' field in 'struct mac_entry' will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value is made while deleting mac entry as part of regular aging process. 2. Another change to the mac-update logic, when a packet with same dl_src as that of a static-mac entry arrives on any port, the logic will not modify the expires field. 3. While flushing fdb entries, made sure static ones are not evicted. 4. Updated "ovs-appctl fdb/stats-show br0" to display number of static entries in switch Added following tests: ofproto-dpif - static-mac add/del/flush ofproto-dpif - static-mac mac moves Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 Signed-off-by: Vasu Dasari <vdasari@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-xlate: Avoid successive ct_clear datapath actions.Eelco Chaudron2021-07-081-1/+3
| | | | | | | | | | | | Due to flow lookup optimizations, especially in the resubmit/clone cases, we might end up with multiple ct_clear actions, which are not necessary. This patch only adds the ct_clear action to the datapath if any ct state is tracked. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Fix redundant datapath set ethernet action with NSH Decap.Martin Varghese2021-06-161-0/+2
| | | | | | | | | | | | When a decap action is applied on NSH header encapsulating a ethernet packet a redundant set mac address action is programmed to the datapath. Fixes: f839892a206a ("OF support and translation of generic encap and decap") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* tunnel: Bareudp Tunnel Support.Martin Varghese2020-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are various L3 encapsulation standards using UDP being discussed to leverage the UDP based load balancing capability of different networks. MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. The Bareudp tunnel provides a generic L3 encapsulation support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside a UDP tunnel. An example to create bareudp device to tunnel MPLS traffic is given $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ options:payload_type=0x8847 options:dst_port=6635 The bareudp device supports special handling for MPLS & IP as they can have multiple ethertypes. MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). The bareudp device to tunnel L3 traffic with multiple ethertypes (MPLS & IP) can be created by passing the L3 protocol name as string in the field payload_type. An example to create bareudp device to tunnel MPLS unicast & multicast traffic is given below.:: $ ovs-vsctl add-port br_mpls udp_port -- set interface udp_port \ type=bareudp options:remote_ip=2.1.1.3 options:local_ip=2.1.1.2 \ options:payload_type=mpls options:dst_port=6635 Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-By: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp-util: Fix netlink message overflow with userdata.Ilya Maximets2020-12-221-7/+6
| | | | | | | | | | | | | | Too big userdata could overflow netlink message leading to out-of-bound memory accesses or assertion while formatting nested actions. Fix that by checking the size and returning correct error code. Credit to OSS-Fuzz. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27640 Fixes: e995e3df57ea ("Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org>
* ofproto-dpif-xlate: Stop forwarding MLD reports to group ports.XiaoXiong Ding2020-12-211-0/+1
| | | | | | | | | | | | | According with rfc4541 section 2.1.1, a snooping switch should forward membership reports only to ports with routers attached.The current code violates the RFC forwarding membership reports to group ports as well. The same issue doesn't exist with IPv4. Fixes: 06994f879c ("mcast-snooping: Add Multicast Listener Discovery support") Signed-off-by: XiaoXiong Ding <dingxiaoxiong@huawei.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Eliminate use of term "slave" in bond, LACP, and bundle contexts.Ben Pfaff2020-10-211-11/+11
| | | | | | | | | | | | | The new term is "member". Most of these changes should not change user-visible behavior. One place where they do is in "ovs-ofctl dump-flows", which will now output "members:..." inside "bundle" actions instead of "slaves:...". I don't expect this to cause real problems in most systems. The old syntax is still supported on input for backward compatibility. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
* userspace: Avoid dp_hash recirculation for balance-tcp bond mode.Vishal Deep Ajmera2020-06-221-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used *only* for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto-dpif-trace: Improve NAT tracing.Dumitru Ceara2020-06-161-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | When ofproto/trace detects a recirc action it resumes execution at the specified next table. However, if the ct action performs SNAT/DNAT, e.g., ct(commit,nat(src=1.1.1.1:4000),table=42), the src/dst IPs and ports in the oftrace_recirc_node->flow field are not updated. This leads to misleading outputs from ofproto/trace as real packets would actually first get NATed and might match different flows when recirculated. Assume the first IP/port from the NAT src/dst action will be used by conntrack for the translation and update the oftrace_recirc_node->flow accordingly. This is not entirely correct as conntrack might choose a different IP/port but the result is more realistic than before. This fix covers new connections. However, for reply traffic that executes actions of the form ct(nat, table=42) we still don't update the flow as we don't have any information about conntrack state when tracing. Also move the oftrace_recirc_node processing out of ofproto_trace() and to its own function, ofproto_trace_recirc_node() for better readability/ Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofp-actions: Add delete field actionYi-Hung Wei2020-04-291-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new OpenFlow action, delete field, to delete a field in packets. Currently, only the tun_metadata fields are supported. One use case to add this action is to support multiple versions of geneve tunnel metadatas to be exchanged among different versions of networks. For example, we may introduce tun_metadata2 to replace old tun_metadata1, but still want to provide backward compatibility to the older release. In this case, in the new OpenFlow pipeline, we would like to support the case to receive a packet with tun_metadata1, do some processing. And if the packet is going to a switch in the newer release, we would like to delete the value in tun_metadata1 and set a value into tun_metadata2. Currently, ovs does not provide an action to remove a value in tun_metadata if the value is present. This patch fulfills the gap by adding the delete_field action. For example, the OpenFlow syntax to delete tun_metadata1 is: actions=delete_field:tun_metadata1 Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: William Tu <u9012063@gmail.com>
* tun_metadata: Fix coredump caused by use-after-free bugYifeng Sun2020-04-101-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tun_metadata can be referened by flow and frozen_state at the same time. When ovs-vswitchd handles TLV table mod message, the involved tun_metadata gets freed. The call trace to free tun_metadata is shown as below: ofproto_run - handle_openflow - handle_single_part_openflow - handle_tlv_table_mod - tun_metadata_table_mod - tun_metadata_postpone_free Unfortunately, this tun_metadata can be still used by some frozen_state, and later on when frozen_state tries to access its tun_metadata table, ovs-vswitchd crashes. The call trace to access tun_metadata from frozen_state is shown as below: udpif_upcall_handler - recv_upcalls - process_upcall - frozen_metadata_to_flow It is unsafe for frozen_state to reference tun_table because tun_table is protected by RCU while the lifecycle of frozen_state can span several RCU quiesce states. Current code violates OVS's RCU protection mechanism. This patch fixes it by simply stopping frozen_state from referencing tun_table. If frozen_state needs tun_table, the latest valid tun_table can be found through ofproto_get_tun_tab() efficiently. A previous commit seems fixing the samiliar issue: 254878c18874f6 (ofproto-dpif-xlate: Fix segmentation fault caused by tun_table) VMware-BZ: #2526222 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* userspace: Add GTP-U support.William Tu2020-03-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | GTP, GPRS Tunneling Protocol, is a group of IP-based communications protocols used to carry general packet radio service (GPRS) within GSM, UMTS and LTE networks. GTP protocol has two parts: Signalling (GTP-Control, GTP-C) and User data (GTP-User, GTP-U). GTP-C is used for setting up GTP-U protocol, which is an IP-in-UDP tunneling protocol. Usually GTP is used in connecting between base station for radio, Serving Gateway (S-GW), and PDN Gateway (P-GW). This patch implements GTP-U protocol for userspace datapath, supporting only required header fields and G-PDU message type. See spec in: https://tools.ietf.org/html/draft-hmm-dmm-5g-uplane-analysis-00 Tested-at: https://travis-ci.org/github/williamtu/ovs-travis/builds/666518784 Signed-off-by: Feng Yang <yangfengee04@gmail.com> Co-authored-by: Feng Yang <yangfengee04@gmail.com> Signed-off-by: Yi Yang <yangyi01@inspur.com> Co-authored-by: Yi Yang <yangyi01@inspur.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Fix recirculation when in_port is OFPP_CONTROLLER.Ben Pfaff2020-03-201-4/+21
| | | | | | | | | | | | | | | Recirculation usually requires finding the pre-recirculation input port. Packets sent by the controller, with in_port of OFPP_CONTROLLER or OFPP_NONE, do not have a real input port data structure, only a port number. The code in xlate_lookup_ofproto_() mishandled this case, failing to return the ofproto data structure. This commit fixes the problem and adds a test to guard against regression. Reported-by: Numan Siddique <numans@ovn.org> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368642.html Tested-by: Numan Siddique <numans@ovn.org> Acked-by: Numan Siddique <numans@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto: Add support to watch controller port liveness in fast-failover groupVishal Deep Ajmera2020-03-061-1/+4
| | | | | | | | | | | | | | | | | | | | Currently fast-failover group does not support checking liveness of controller port (OFPP_CONTROLLER). However this feature can be useful for selecting alternate pipeline when controller connection itself is down for e.g. by using local DHCP server to reply for any DHCP request originating from VMs. This patch adds the support for watching controller port liveness in fast- failover group. Controller port is considered live when atleast one of-connection is alive. Example usage: ovs-ofctl add-group br-int 'group_id=1234,type=ff, bucket=watch_port:CONTROLLER,actions:<A>, bucket=watch_port:1,actions:<B> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>