summaryrefslogtreecommitdiff
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* netdev-dpdk: include dpdk PCI header directlyAaron Conole2017-08-101-0/+1
| | | | | | | | | | | As part of a devargs rework in DPDK, the PCI header file was removed, and needs to be directly included. This isn't required to build with 17.05 or earlier, but will be required should a future update happen. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-By: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Ciara Loftus <ciara.loftus@intel.com>
* dp-packet: Reset DPDK hwol flags on init.Darrell Ball2017-08-103-4/+23
| | | | | | | | | | | | | | | | Reset the DPDK hwol flags in dp_packet_init_. The new hwol bad checksum flag is uninitialized for non-dpdk ports and this is noticed as test failures using netdev-dummy ports, when built with the --with-dpdk flag set. Hence, in this case, packets may be falsely marked as having a bad checksum. The existing APIs are simplified at the same time by making them specific to either DPDK or otherwise; they also now manage a single field. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation functions.") CC: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dummy: Fix minor style variation.Joe Stringer2017-08-091-1/+1
| | | | | Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* dpif: Clean up netdev_ports map on dpif_close().Joe Stringer2017-08-091-0/+8
| | | | | | | | | | | | | | | | | | | | | Commit 32b77c316d9982("dpif: Save added ports in a port map.") introduced tracking of all dpif ports by taking a reference on each available netdev when the dpif is opened, but it failed to clear out and release references to these netdevs when the dpif is closed. One of the problems introduced by this was that upon clean exit of ovs-vswitchd via "ovs-appctl exit --cleanup", the "ovs-netdev" device was not deleted. This which could cause problems in subsequent start up. Commit 5119e258da92 ("dpif: Fix cleanup of userspace datapath.") fixed this particular problem by not adding such devices to the netdev_ports map, but the referencing/unreferencing upon dpif_open()/dpif_close() is still not balanced. Balance the referencing of netdevs by clearing these during dpif_close(). Fixes: 32b77c316d9982("dpif: Save added ports in a port map.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* Remove duplicate description about Experimenter classesYi Yang2017-08-091-21/+2
| | | | | | | | | | commit 3d2fbd70bda514f7327970b859663f34f994290c brought duplicate description about Experimenter classes ONFOXM_ET and NXOXM_NSH in lib/meta-flow.xml, branch-2.8 has the same issue. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-vport: Always implement get_ifindex for netdev-vportPaul Blakey2017-08-091-10/+1
| | | | | | | | | | | Always implement get_ifindex without checking if offload is enabled or not as this should not be related. From ovs-dpctl we cannot tell if offload is enabled or not as other_config is not being read. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: Reduce log level for ENODEV errors getting ifindexRoi Dayan2017-08-091-2/+6
| | | | | | | | | | These are normal and unavoidable, because the vifs disappear from the kernel before they are removed them from the OVS database. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dp-packet: Use OVS_UNUSED to mark possibly unused parameters.Ben Pfaff2017-08-091-8/+8
| | | | | | | This is the way usually used in OVS. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Darrell Ball <dlu998@gmail.com>
* netdev-dummy: Close pcap files when dummy device is closed.Ben Pfaff2017-08-081-0/+6
| | | | | | | | Fixes a fd leak. Reported-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
* netdev: check for NULL fields in netdev_get_addrsDaniel Alvarez2017-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | When the interfaces list is retrieved through getiffaddrs(), there might be elements with iface_name set to NULL. This patch checks ifa_name to be not NULL before comparing it to the actual device name in the loop that calculates how many interfaces exist with that same name. Also, this patch checks that ifa_netmask is not NULL for coherence with the existing code so that it doesn't allocate more memory than needed if this field is NULL. Note, that these checks are already being done later in the function so it should be done in both places. Signed-off-by: Daniel Alvarez <dalvarez@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com>
* ofp-print: #include its own header first.Ben Pfaff2017-08-081-1/+2
| | | | | | | | | The OVS coding style document says that a .c file should include the corresponding .h file first, to ensure that the .h file includes all of its dependencies, but this file didn't do that. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* nsh: Avoid zero-length array.Ben Pfaff2017-08-081-1/+1
| | | | | | | | | MSVC allows [] but not [0] for arrays in struct definitions, and does not allow nested [] inside a union. Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
* Generic encap and decap support for NSHJan Scheurich2017-08-079-21/+552
| | | | | | | | | | | | | | | | | | | | | | | | This commit adds translation and netdev datapath support for generic encap and decap actions for the NSH MD1 header. The generic encap and decap actions are mapped to specific encap_nsh and decap_nsh actions in the datapath. The translation follows that general scheme that decap() of an NSH packet triggers recirculation after decapsulation, while encap(nsh) just modifies struct flow and sets the ctx->pending_encap flag to generate the encap_nsh action at the next commit to be able to include subsequent set_field actions for NSH headers. Support for the flexible MD2 format using TLV properties is foreseen in encap(nsh), but not yet fully implemented. The CLI syntax for encap of NSH is encap(nsh(md_type=1)) encap(nsh(md_type=2[,tlv(<tlv_class>,<tlv_type>,<hex_string>),...])) Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: add NSH support to vxlan-gpe tunnelsJan Scheurich2017-08-071-0/+6
| | | | | | Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Adding nsh.at for NSH unit testsJan Scheurich2017-08-071-2/+4
| | | | | | | | | | First basic NSH test case implemented and working. Unconditionally show matched packet_type in megaflows, even when matching on eth. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: Add support for NSH MD1 match fieldsJan Scheurich2017-08-0711-19/+689
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for NSH packet header fields to the OVS control plane and the userspace datapath. Initially we support the fields of the NSH base header as defined in https://www.ietf.org/id/draft-ietf-sfc-nsh-13.txt and the fixed context headers specified for metadata format MD1. The variable length MD2 format is parsed but the TLV context headers are not yet available for matching. The NSH fields are modelled as experimenter fields with the dedicated experimenter class 0x005ad650 proposed for NSH in ONF. The following fields are defined: NXOXM code ofctl name Size Comment ===================================================================== NXOXM_NSH_FLAGS nsh_flags 8 Bits 2-9 of 1st NSH word (0x005ad650,1) NXOXM_NSH_MDTYPE nsh_mdtype 8 Bits 16-23 (0x005ad650,2) NXOXM_NSH_NEXTPROTO nsh_np 8 Bits 24-31 (0x005ad650,3) NXOXM_NSH_SPI nsh_spi 24 Bits 0-23 of 2nd NSH word (0x005ad650,4) NXOXM_NSH_SI nsh_si 8 Bits 24-31 (0x005ad650,5) NXOXM_NSH_C1 nsh_c1 32 Maskable, nsh_mdtype==1 (0x005ad650,6) NXOXM_NSH_C2 nsh_c2 32 Maskable, nsh_mdtype==1 (0x005ad650,7) NXOXM_NSH_C3 nsh_c3 32 Maskable, nsh_mdtype==1 (0x005ad650,8) NXOXM_NSH_C4 nsh_c4 32 Maskable, nsh_mdtype==1 (0x005ad650,9) Co-authored-by: Johnson Li <johnson.li@intel.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Userspace Datapath: Add TFTP support.Darrell Ball2017-08-071-1/+38
| | | | | | | Both ipv4 and ipv6 are supported. Also, NAT support is included. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Userspace Datapath: Add ALG infra and FTP.Darrell Ball2017-08-073-76/+1022
| | | | | | | | ALG infra and FTP (both V4 and V6) support is added to the userspace datapath. Also, NAT support is included. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Userspace Datapath: Introduce conn_key_cmp().Darrell Ball2017-08-071-10/+30
| | | | | | | | | | | | A new function conn_key_cmp() is introduced and used to replace memcmp of conn_keys. Given that OVS runs on with many compilers and on many architectures, it seems prudent to avoid memcmp in case existing and future holes in conn_key are not handled by a given compiler for a given architecture. Signed-off-by: Darrell Ball <dlu998@gmail.com> Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* string: Implement strcasestr for Windows.Darrell Ball2017-08-072-3/+22
| | | | | | | | | | strcasestr is not defined for Windows, so implement a version that could be used on Windows. This is needed for an upcoming patch. Signed-off-by: Darrell Ball <dlu998@gmail.com> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Revert "netdev-vport: Always implement get_ifindex for netdev-vport"Ben Pfaff2017-08-071-1/+10
| | | | | | | | This reverts commit 327d98eb197bf04da90e23c03d88093a6eeeb6f3, which several unit tests to fail due to new warning messages in the logs. Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-vport: Always implement get_ifindex for netdev-vportPaul Blakey2017-08-071-10/+1
| | | | | | | | | | | Always implement get_ifindex without checking if offload is enabled or not as this should not be related. From ovs-dpctl we cannot tell if offload is enabled or not as other_config is not being read. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-tc-offloads: Fix parsing SCTP in dump flowsRoi Dayan2017-08-071-0/+3
| | | | | | | | | | | After splitting the unions of tcp/udp the sctp was forgotten when parsing flower back to match. Fixes: 2b1d9fa90909 ("tc: Split IPs and transport layer ports unions in flower struct") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Simon Horman <simon.horman@netronome.com>
* ovsdb-idl: idl compound indexes implementationLance Richardson2017-08-033-0/+481
| | | | | | | | | | | This patch adds support for the creation of multicolumn indexes in the C IDL to enable for efficient search and retrieval of database rows by key. Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com> Co-authored-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lib: skiplist implementationLance Richardson2017-08-033-0/+312
| | | | | | | | | | Skiplist implementation intended for use in the IDL compound indexes feature. Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com> Co-authored-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovs-ofctl: Avoid unnecessary flow replacement in "replace-flows" command.Ben Pfaff2017-08-031-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ovs-ofctl "diff-flows" and "replace-flows" command compare the flows in two flow tables. Until now, the "replace-flows" command has considered certain almost meaningless differences related to the version of OpenFlow used to add a flow as significant, which caused it to replace a flow by an identical-in-practice version, e.g. in the following, the "replace-flows" command prints a FLOW_MOD that adds the flow that was already added previously: $ cat > flows actions=resubmit(,1) $ ovs-vsctl add-br br0 $ ovs-ofctl del-flows br0 $ ovs-ofctl add-flows br0 flows $ ovs-ofctl -vvconn replace-flows br0 flows 2>&1 | grep FLOW_MOD Re-adding an existing flow has some effects, for example, it resets the flow's duration, so it's better to avoid it. This commit fixes the problem using the same trick previously used for a similar problem with the "diff-flows" command, which was fixed in commit 98f7f427bf8b ("ovs-ofctl: Avoid printing false differences on "ovs-ofctl diff-flows"."). Reported-by: Kevin Lin <kevin@quilt.io> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* ovs-ovctl: Fix "OpenFlow versions" in ovs-ofctl -VTimothy Redaelli2017-08-031-1/+1
| | | | | | | | | | Fix the output of "ovs-ofctl -V" to show OpenFlow 1.4 as max supported versions since OpenFlow 1.4 was enabled by default in commit 8d3485791188 ("OpenFlow: Enable OpenFlow 1.4 by default.") CC: Ben Pfaff <blp@ovn.org> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tnl-ports: Open tunnel type if device name has special prefixPaul Blakey2017-08-031-1/+1
| | | | | | | | | | | | | | | | | | There is a race between listening on route changes from route-table netlink which then calls ovs_router_insert() where it adds the involved netdev to the tnl-ports map (tnl_port_map_insert_ipdev()), and netdev_open from from normal opening of the port. tnl-ports open the netdev as type system (type == NULL) when it doesn't exists before it is opened normally, e.g from dumping the ports in dpctl. This solves 'ovs-dpctl show' EExists error on vxlan ports as both (dpctl/tnl-ports) will open the ports as vxlan type. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tc: Correct convert ticks to msecs on parsing tc TMPaul Blakey2017-08-031-2/+15
| | | | | | | | | | | | Use sysconf(_SC_CLK_TCK) to read run time "number of clock ticks per second" and use that to convert ticks to msecs. This is how iproute does the conversion when parsing tc filters. The system call is done only once. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Support zero mask on ipv4 fragPaul Blakey2017-08-031-10/+7
| | | | | | | | | Don't print frag parsing error if mask is zero, instead just don't print it. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-tc-offloads: Parse ip related fields only if eth type is ipPaul Blakey2017-08-031-10/+10
| | | | | | | | There is no need to parse ip related fields if eth type is not ip. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tc: Split IPs and transport layer ports unions in flower structPaul Blakey2017-08-033-39/+62
| | | | | | | | | Split dst/src_port and ipv4/ipv6 union so we can distingush them easily for later features. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tc: Refactor nl_msg_put_flower_optionsPaul Blakey2017-08-031-65/+16
| | | | | | | | | | Refactor nl_msg_put_flower_options to be more readable. This commit doesn't change functionality. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* packets: Reorganize the pkt_metadata structure.Bhanuprakash Bodireddy2017-08-031-2/+21
| | | | | | | | | | | | | | | | | pkt_metadata_init() is called for every packet in userspace datapath and initializes few members in pkt_metadata. Before this the members that needs to be initialized are prefetched using pkt_metadata_prefetch_init(). The above functions are critical to the userspace datapath performance and should be in sync. Any changes to the pkt_metadata should also include changes to metadata_init() and prefetch_init() if necessary. This commit slightly refactors the pkt_metadata structure and introduces cache line markers to catch any violations to the structure. Also only prefetch the cachelines having the members that needs to be zeroed out. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* util: Add PADDED_MEMBERS_CACHELINE_MARKER macro to mark cachelines.Bhanuprakash Bodireddy2017-08-031-0/+6
| | | | | | | | | | | | PADDED_MEMBERS_CACHELINE_MARKER macro introduces a way to mark cachelines. This macro expands to an anonymous union containing cacheline marker, members in nested anonymous structure, followed by array of bytes that is multiple of UNIT bytes. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovs-router: Remove redundant headers.Tonghao Zhang2017-08-031-2/+0
| | | | | Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-linux: Replace sendmsg with sendmmsg in netdev_linux_sendZhenyu Gao2017-08-021-74/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sendmmsg can reduce cpu cycles in sending packets to kernel. Replace sendmsg with sendmmsg in function netdev_linux_send to send batch packets if sendmmsg is available. If kernel side doesn't support sendmmsg, will fallback to sendmsg. netserver |------------| | | | container | |----veth----| | | |------------| |---veth-| dpdk-ovs | netperf | | |--------------| |----dpdk----| | bare-metal | | |--------------| | | | | pnic-----------pnic Netperf was consumed to test the performance: 1)cmd:netperf -H remote-container -t UDP_STREAM -l 60 -- -m 1400 result: netserver received 2383.21Mb(sendmsg)/2551.64Mb(sendmmsg) 2)cmd:netperf -H remote-container -t UDP_STREAM -l 60 -- -m 60 result: netserver received 109.72Mb(sendmsg)/115.18Mb(sendmmsg) Sendmmsg show about 6% improvement in netperf UDP testing. Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dp-packet: New function dp_packet_get_send_len().Ben Pfaff2017-08-024-15/+11
| | | | | | | This function is useful in a few places for representing the packet's length minus the cutlen. Signed-off-by: Ben Pfaff <blp@ovn.org>
* Eliminate most shadowing for local variable names.Ben Pfaff2017-08-0218-47/+37
| | | | | | | | | | | | | | Shadowing is when a variable with a given name in an inner scope hides a different variable with the same name in a surrounding scope. This is generally undesirable because it can confuse programmers. This commit eliminates most of it. Found with -Wshadow=local in GCC 7. The repo is not really ready to enable this option by default because of a few cases that are harder to fix, and harmless, such as nested use of CMAP_FOR_EACH. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* hash: Add "fall through" annotations for 32-bit builds as well.Ben Pfaff2017-08-021-0/+14
| | | | | | | | | | Commit 73c7216a5329 ("Fix some -Wimplicit-fallthrough warnings building with GCC 7") missed a few fall through annotations that only appear in 32-bit builds. This commit adds them. CC: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* netdev: Fix netdev_open() to track and recreate classless interfacesEelco Chaudron2017-08-022-5/+29
| | | | | | | | | | | | | | | Due to commit 67ac844 an existing issue with OVS persisten ports surfaced. If we revert the commit we no longer get the error, and basic traffic will flow. However the wrong netdev class is used, hence the wrong callbacks get called. The main issue is with netdev_open() being called with type = NULL before the interface is actually configured in the system. This patch tracks these "auto" generated interfaces, and once netdev_open() gets called with a valid type, re-configures (re-create) it. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* lacp: enable bond slave immediately after lacp attachHuanle Han2017-08-021-0/+1
| | | | | | | | | | | | There is a long interval (5~20 seconds) between lacp slave attach and bond slave enable. During the interval, ovs drop all received packets from that slave because bond_check_admissibility() check fails. The root cause is that connectivity_seq is not changed after lacp update and lacp status is not populated into port->may_enable by port_run() immediately. Signed-off-by: Huanle Han <hanxueluo@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vlog: reopen log file in monitor processHuanle Han2017-08-021-3/+6
| | | | | | | | | | | | | | ovs daemon process will reopen file after every log rotate. However, it doesn't happen to monitor process. That is to say, fd of log file in monitor proces always point to oldest disk file, which is deleted after log rotate. Once daemon process restarts from a crash, it inherits parent's fds, including the deleted log file. This commit reopens log file in monitor process everytime it wakes up from waitpid. Signed-off-by: Huanle Han <hanxueluo@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* OF support and translation of generic encap and decapJan Scheurich2017-08-026-40/+572
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for the OpenFlow actions generic encap and decap (as specified in ONF EXT-382) to the OVS control plane. CLI syntax for encap action with properties: encap(<header>) encap(<header>(<prop>=<value>,<tlv>(<class>,<type>,<value>),...)) For example: encap(ethernet) encap(nsh(md_type=1)) encap(nsh(md_type=2,tlv(0x1000,10,0x12345678),tlv(0x2000,20,0xfedcba9876543210))) CLI syntax for decap action: decap() decap(packet_type(ns=<pt_ns>,type=<pt_type>)) For example: decap() decap(packet_type(ns=0,type=0xfffe)) decap(packet_type(ns=1,type=0x894f)) The first header supported for encap and decap is "ethernet" to convert packets between packet_type (1,Ethertype) and (0,0). This commit also implements a skeleton for the translation of generic encap and decap actions in ofproto-dpif and adds support to encap and decap an Ethernet header. In general translation of encap commits pending actions and then rewrites struct flow in accordance with the new packet type and header. In the case of encap(ethernet) it suffices to change the packet type from (1, Ethertype) to (0,0) and set the dl_type accordingly. A new pending_encap flag in xlate ctx is set to mark that an corresponding datapath encap action must be triggered at the next commit. In the case of encap(ethernet) ofproto generetas a push_eth action. The general case for translation of decap() is to emit a datapath action to decap the current outermost header and then recirculate the packet to reparse the inner headers. In the special case of an Ethernet packet, decap() just changes the packet type from (0,0) to (1, dl_type) without a need to recirculate. The emission of the pop_eth action for the datapath is postponed to the next commit. Hence encap(ethernet) and decap() on an Ethernet packet are OF octions that only incur a cost in the dataplane when a modifed packet is actually committed, e.g. because it is sent out. They can freely be used for normalizing the packet type in the OF pipeline without degrading performance. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif-netdev: Reorder elements in dp_netdev_port structure.Bhanuprakash Bodireddy2017-08-021-2/+2
| | | | | | | | | | | | | | | By reordering the elements in dp_netdev_port structure, pad bytes can be reduced there by saving a cache line. Marginal performance improvement is also observed with this change. Before: structure size: 136, holes: 7, sum padbytes:7, cachelines:3 After : structure size: 128, holes: 6, sum padbytes:0, cachelines:2 Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpctl: Add new 'ct-bkts' command.Antonio Fischetti2017-08-0211-16/+131
| | | | | | | | | | | | | | | | | With the command: ovs-appctl dpctl/ct-bkts shows the number of connections per bucket. By using a threshold: ovs-appctl dpctl/ct-bkts gt=N for each bucket shows the number of connections when they are greater than N. Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* conntrack : Use Rx checksum offload feature on DPDK ports for conntrack.Sugesh Chandran2017-08-021-23/+40
| | | | | | | | | | | Avoiding checksum validation in conntrack module if it is already verified in DPDK physical NIC ports. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Co-authored-by: Darrell Ball <dball@vmware.com> Signed-off-by: Darrell Ball <dball@vmware.com> Acked-by: Antonio Fishetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dp-packet : Update DPDK rx checksum validation functions.Sugesh Chandran2017-08-021-2/+26
| | | | | | | | | | | | | | | DPDK ports use masks while reporting rx checksum flags. OVS should use these mask along with reported checksum flag while validating the good checksum. Added two new functions to validate bad checksum reported by DPDK NIC port. These two functions will be used in the following patch for enabling rx checksum offload in conntrack module. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Co-authored-by: Darrell Ball <dball@vmware.com> Signed-off-by: Darrell Ball <dball@vmware.com> Acked-by: Antonio Fishetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* packets: Do not initialize ct_orig_tuple.Daniele Di Proietto2017-08-021-1/+9
| | | | | | | | | | | | | | | | | | | | | | Commit "odp: Support conntrack orig tuple key." introduced new fields in struct 'pkt_metadata'. pkt_metadata_init() is called for every packet in the userspace datapath. When testing a simple single flow case with DPDK, we observe a lower throughput after the above commit (it was 14.88 Mpps before, it is 13 Mpps after). This patch skips initializing ct_orig_tuple in pkt_metadata_init(). It should be enough to initialize ct_state, because nobody should look at ct_orig_tuple unless ct_state is != 0. It's discussed at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/332419.html Fixes: daf4d3c18da4("odp: Support conntrack orig tuple key.") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpdk: Fix device cleanup.Darrell Ball2017-08-021-1/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5dcde09c80a8 was introduced to make detaching more automatic without using an additional command beyond ovs-vsctl del-port <br> <port>. Sometimes, since commit 5dcde09c80a8, dpdk devices are not detached when del-port is issued; command example: sudo ovs-vsctl del-port br0 dpdk1 This can happen when vswitchd is (re)started with an existing database and devices are already bound to dpdk. A minimal recipe to reproduce the issue is: 1/ Starting with darrell@prmh-nsx-perf-server125:~$ sudo ovs-vsctl show 1c50d8ee-b17f-4fac-a595-03b0da8c8275 Bridge "br0" Port "br0" Interface "br0" type: internal Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:04:00.1"} Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:04:00.0"} darrell@prmh-nsx-perf-server125:~$ /usr/src/dpdk-16.11/tools/dpdk-devbind.py --status Network devices using DPDK-compatible driver ============================================ 0000:04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci 0000:04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci 2/ restart vswitchd 3/ run sudo ovs-vsctl del-port br0 dpdk1 and find the interface is NOT detached; there is no info log ‘Device '0000:04:00.1' detached’. A more verbose discussion is here: https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333462.html along with another possible solution. Since we are nearing the end of a release, a safe approach is needed, at this time. One approach is to revert 5dcde09c80a8. This patch does not do that but reinstates the command ovs-appctl netdev-dpdk/detach to handle cases when del-port will not work. To detach the device, run the reinstated command ovs-appctl netdev-dpdk/detach 0000:04:00.1 Observe console output ‘Device '0000:04:00.1' has been detached’ Fixes: 5dcde09c80a8 ("netdev-dpdk: Fix device leak on port deletion.") CC: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Fischetti, Antonio <antonio.fischetti@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>