summaryrefslogtreecommitdiff
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* ofproto: Honor mtu_request even for internal ports.Daniele Di Proietto2016-09-023-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | By default Open vSwitch tries to configure internal interfaces MTU to match the bridge minimum, overriding any attempt by the user to configure it through standard system tools, or the database. While this works in many simple cases (there are probably many users that rely on this) it may create problems for more advanced use cases (like any overlay networks). This commit allows the user to override the default behavior by providing an explict MTU in the mtu_request column in the Interface table. This means that Open vSwitch will now treat differently database MTU requests from standard system tools MTU requests (coming from `ip link` or `ifconfig`), but this seems the best way to remain compatible with old users while providing a more powerful interface. Suggested-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Tested-by: Joe Stringer <joe@ovn.org>
* learn: Fix iteration over learning specs.Ben Pfaff2016-09-022-12/+5
| | | | | | | | | | | | | | | | struct ofpact_learn_spec is variable-length. The 'n_specs' member of struct ofpact_learn counted the number of specs, but the iteration loops over struct ofpact_learn_spec only iterated as far as the *minimum* length of 'n_specs' specs. This fixes the problem, which exhibited as consistent failures for test 431 (learning action - TCPv6 port learning), seemingly only on i386 since it shows up for my personal development machine but appears to not happen for anyone else. Fixes: dfe191d5faa6 ("ofp-actions: Waste less memory in learn actions.") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* sflow-agent: Flush freshly-polled sFlow counters promptly.Neil McKee2016-09-021-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the order of the steps that are followed every second in the sFlow agent. By moving the receiver_tick() step to the end, we ensure that any counters that were polled during the poller_tick() step are flushed immediately to the sFlow collector. This eliminates what was a variable time-delay between counters being polled and being flushed. The variable time-delay that this eliminates could be up to a second because counters lingering in the output buffer could be flushed at any time by the arrival of random packet-samples. Since the sFlow standard does not require that a poll-timestamp be sent along with the counters the collector must use his receive-time as the timestamp, so that extra second of variable delay was "stretching or shrinking" the time between successive counter readings. This affected any counter-rate calculation that was based only on the delta between sucessive samples. The effect was small with a polling interval of 60 seconds: just +/- 2%. But the effect grew larger when faster polling was configured. For example, if the counters were pushed every 5 seconds then the instantaneous rate calculations could wander by +/- 20%. For a thorough analysis of this problem, see Rick Jones' paper: "High Frequency sFlow v5 Counter Sampling" ftp://ftp.netperf.org/papers/high_freq_sflow/hf_sflow_counters.pdf So this patch makes it possible to obtain usable results even when high-frequency polling is configured. Signed-off-by: Neil McKee <neil.mckee@inmon.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* learn: Avoid nested zero-sized arrays to fix build with MSVC.Jarno Rajahalme2016-09-012-8/+13
| | | | | | | | | | | | Avoid using nested zero-sized arrays to allow compilation with MSVC. Also, make sure the immediate data is accessed only if it exists, and that the size is always calculated from struct learn_spec field 'n_bits'. Fixes: dfe191d5faa6 ("ofp-actions: Waste less memory in learn actions.") Reported-by: Alin Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofp-actions: Waste less memory in set field and load actions.Jarno Rajahalme2016-08-312-83/+132
| | | | | | | | | | | | | | | Change the value and mask to be added to the end of the set field action without any extra bytes, exept for the usual ofp-actions padding to 8 bytes. Together with some structure member packing this saves on average about to 256 bytes for each set field and load action (as set field internal representation is also used for load actions). On a specific production data set each flow entry uses on average about 4.2 load or set field actions. This means that with this patch an average of more than 1kb can be saved for each flow with such a flow table. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofp-actions: Waste less memory in learn actions.Jarno Rajahalme2016-08-313-40/+72
| | | | | | | | | | | Make the immediate data member 'src_imm' of a learn spec allocated at the end of the action for just the right size. This, together with some structure packing saves on average of ~128 bytes for each learn spec in each learn action. Typical learn actions have about 4 specs each, so this amounts to saving about 0.5kb for each learn action. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* object-collection: Remove access to stub.Jarno Rajahalme2016-08-301-5/+0
| | | | | | | Better not use access to the *_collection_stub(), as it is an internal implementation detail. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* lib: Retire packet buffering feature.Jarno Rajahalme2016-08-304-318/+24
| | | | | | | | | | | | | | | | | | | | | | | | | OVS implementation of buffering packets that are sent to the controller is not compliant with the OpenFlow specifications after OpenFlow 1.0, which is possibly true since OpenFlow 1.0 is not really specifying the packet buffering behavior. OVS implementation executes the buffered packet against the actions of the modified or added rule, whereas OpenFlow (since 1.1) specifies that the packet should be matched against the flow table 0 and processed accordingly. Rather than fix this behavior, and potentially break OVS users, the packet buffering feature is removed altogether. After all, such packet buffering is an optional OpenFlow feature, and as such any possible users should continue to work without this feature. This patch also makes OVS check the received 'buffer_id' values more rigorously, and fixes some internal users accordingly. Found by inspection. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* daemon: Minor tweaking of man page fragment.Justin Pettit2016-08-122-6/+7
| | | | | | Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
* stream-windows: Disconnect faulty named pipesAlin Serdean2016-08-231-0/+2
| | | | | | | | | Disconnect named pipes that failed connection. Found by testing. Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
* match: Only print external tunnel flags.Jesse Gross2016-08-191-2/+2
| | | | | | | | | Some tunnel flags are purely internal implementation details (primarily FLOW_TNL_F_UDPIF). These shouldn't be output when we format tunnel flows, so this masks them out. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Fix occurance of error logCiara Loftus2016-08-181-0/+2
| | | | | | | | | | | | If NUMA information can't be derived from a vHost User device, only print an error if the VHOST_NUMA option is enabled in DPDK. Otherwise 'fail' silently. Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Reported-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Simplify send function for ETH devices.Ilya Maximets2016-08-181-40/+6
| | | | | | | | | 'netdev_dpdk_send__()' function can be greatly simplified by using recently introduced 'netdev_dpdk_filter_packet_len()'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Fix vHost stats.Ilya Maximets2016-08-181-23/+33
| | | | | | | | | | | | | | This patch introduces function 'netdev_dpdk_filter_packet_len()' which is intended to find and remove all packets with 'pkt_len > max_packet_len' from the Tx batch. It fixes inaccurate counting of 'tx_bytes' in vHost case if there was dropped packets and allows to simplify send function. Fixes: 0072e931b207 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* Revert "netdev: do not allow devices to be opened with conflicting types"Thadeu Lima de Souza Cascardo2016-08-161-7/+1
| | | | | | | | | | | | | | This reverts commit d2fa6c676a13e86acc7f17261b2d87484f625d45. When doing a restart, the routing table will open ports as system, which prevents internal ports to be opened with the right type. That causes failures in creating the ports. We should revisit this patch after finding a proper fix on the routing table layer. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* ovn-trace: New utility.Ben Pfaff2016-08-152-0/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new utility is intended to fulfill for OVN the purpose that "ofproto/trace" has for Open vSwitch. First, it's meant to be a useful tool for troubleshooting and diagnosis and in general for improving one's understanding of the emergent properties of a flow table. Second, it simplifies and increases the practical scope of testing, as well as making testing more reliable and repeatable and failures easier to interpret. This commit adds only a single test that uses the new utility, based on the oldest OVN end-to-end test "ovn -- 3 HVs, 1 LS, 3 lports/HV". The differences between the old and the new test illustrate properties of tracing. First, the new test does not start any ovn-controller processes or simulate any hypervisors in a nontrivial way. This is because ovn-trace does not actually forward packets or rely on the physical structure of the system. Second, whereas the old test tested not just the logical but also the physical structure of the system, it needed to have several logical ports, a total of 9 (3 on each of 3 HVs), whereas since this test only tests the logical network implementation it can use a smaller number. This property also means that the new test runs signicantly faster than the old one (less than a second on my laptop). In my opinion this approach points the way toward the future of OVN testing. Certainly, we need end-to-end tests. However, I believe that the bulk of our tests can be broken into ones that test the logical network implementation (using tracing) and ones that test physical/logical translation. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
* meta-flow: New functions mf_subfield_copy() and mf_subfield_swap().Ben Pfaff2016-08-153-33/+74
| | | | | | | | | | | | | | | | | The function nxm_execute_reg_move() was almost a general-purpose function for manipulating subfields, except for its awkward interface that took a struct ofpact_reg_move instead of a plain source and destination. This commit introduces a general-purpose function in meta-flow that corrects this flaw, and updates the callers. An upcoming commit will introduce a new user of the function. This commit also introduces a related function mf_subfield_swap() to swap the contents of subfields. An upcoming commit will introduce the first user. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Acked-by: Justin Pettit <jpettit@ovn.org>
* netdev-dpdk: vHost client mode and reconnectCiara Loftus2016-08-151-26/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | Until now, vHost ports in OVS have only been able to operate in 'server' mode whereby OVS creates and manages the vHost socket and essentially acts as the vHost 'server'. With this commit a new mode, 'client' mode, is available. In this mode, OVS acts as the vHost 'client' and connects to the socket created and managed by QEMU which now acts as the vHost 'server'. This mode allows for reconnect capability, which allows a vHost port to resume normal connectivity in event of switch reset. By default dpdkvhostuser ports still operate in 'server' mode. That is unless a valid 'vhost-server-path' is specified for a device like so: ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-path=/path/to/socket 'vhost-server-path' represents the full path of the vhost user socket that has been or will be created by QEMU. Once specified, the port stays in 'client' mode for the remainder of its lifetime. QEMU v2.7.0+ is required when using OVS in vHost client mode and QEMU in vHost server mode. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Consistent naming for vhostCiara Loftus2016-08-151-17/+8
| | | | | | | | | | | A mix of vhost_user_ and vhost_ is used when naming vhost functions. The 'user_' has been dropped for consistency. Also remove empty init functions for netdev dpdk classes. Suggested-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Daniele Di Proietto <diproiettod at vmware.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
* netdev-dpdk: Remove dpdkvhostcuse portsCiara Loftus2016-08-151-105/+5
| | | | | | | | | | | | | | | | | | | This commit removes the 'dpdkvhostcuse' port type from the userspace datapath. vhost-cuse ports are quickly becoming obsolete as the vhost-user port type begins to support a greater feature-set thanks to the addition of things like vhost-user multiqueue and potential upcoming features like vhost-user client-mode and vhost-user reconnect. The feature is also expected to be removed from DPDK soon. One potential drawback of the removal of this support is that a userspace vHost port type is not available in OVS for use with older versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old this should however be a low impact change. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
* Add read-only option to ovs-dpctl and ovs-ofctl commands.Ryan Moats2016-08-155-36/+70
| | | | | | | | | ovs-dpctl and ovs-ofctl lack a read-only option to prevent running of commands that perform read-write operations. Add it and the necessary scaffolding to each. Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovsdb-idl: Style and comment improvements for conditional replication.Ben Pfaff2016-08-153-51/+94
| | | | | | | | | | The conditional replication code had hardly any comments. This adds some. This commit also fixes a number of style problems, factors out some code into a helper function, and moves some struct declarations from a public header, that were not used by client code, into more private locations. Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovsdb-idl: Fix memory leak in ovsdb_idl_condition_add_clause().Ben Pfaff2016-08-151-1/+1
| | | | | | | | The function always allocated a clause but didn't use it if it was going to be a duplicate. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Flavio Fernandes <flavio@flaviof.com>
* ovsdb-idl: Fix double-remove in ovsdb_idl_condition_reset().Ben Pfaff2016-08-151-1/+0
| | | | | | | | | Both ovsdb_idl_condition_reset() and ovsdb_idl_clause_free() call ovs_list_remove() on the clause's 'node' member, but it should only be called once. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* netdev-dpdk: Do not attempt to initialise flow control for 'dpdkr' portsCiara Loftus2016-08-151-21/+40
| | | | | | | | | | Only 'dpdk' ports support flow control. This patch stops 'dpdkr' ports from attempting to initialise this feature as this port type does not support it. Fixes: 9fd39370c12c ("netdev-dpdk: Add Flow Control support.") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Use rte_eth_is_valid_port instead of manual checkCiara Loftus2016-08-151-2/+3
| | | | | | Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Mauricio Vásquez B <mauricio.vasquez@polito.it> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* tests: Add a new MTU test.Daniele Di Proietto2016-08-151-1/+4
| | | | | | | | Also, netdev-dummy needs to call netdev_change_seq_changed() in set_mtu(). Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* netdev-dummy: Add dummy-internal class.Daniele Di Proietto2016-08-152-3/+13
| | | | | | | | | | | | | | | | | | "internal" netdevs are treated specially in OVS (e.g. for MTU), but the dummy datapath remaps both "system" and "internal" devices to the same "dummy" netdev class, so there's no way to discern those in tests. This commit adds a new "dummy-internal" netdev type, which will be used by the dummy datapath for internal ports, so that other parts of the code can understand which ports are internal just by looking at the netdev object. The alternative solution, using the original interface type ("internal") instead of the translated netdev type ("dummy"), is harder to implement, because in so many places only the netdev object is available. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* netdev: Pass 'netdev_class' to ->run() and ->wait().Daniele Di Proietto2016-08-156-16/+22
| | | | | | | | This will allow run() and wait() methods to be shared between different classes and still perform class-specific work. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* Fix copyright statements from commit f1ab6e06Ryan Moats2016-08-142-2/+4
| | | | | | | | | Commit f1ab6e06 ("Add/user partial set updates.) incorrectly did not include HPE attribution for derived files lib/ovsdb-set-op.[ch]. Add the attribution to correct this. Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovsdb: Add/use partial set updates.Ryan Moats2016-08-146-99/+516
| | | | | | | | | | | | | | | | | | | | This patchset mimics the changes introduced in f199df26 (ovsdb-idl: Add partial map updates functionality.) 010fe7ae (ovsdb-idlc.in: Autogenerate partial map updates functions.) 7251075c (tests: Add test for partial map updates.) b1048e6a (ovsdb-idl: Fix issues detected in Partial Map Update feature) but for columns that store sets of values rather than key-value pairs. These columns will now be able to use the OVSDB mutate operation to transmit deltas on the wire rather than use verify/update and transmit wait/update operations on the wire. Side effect of modifying the comments in the partial map update tests. Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovn-northd: Add logical flows to support DHCPv6Numan Siddique2016-08-141-0/+1
| | | | | | | | | | | | | | | | | | | OVN implements native DHCPv6. DHCPv6 options are stored in the 'DHCP_Options' NB table and logical ports refer to this table to configure the DHCPv6 options. For each logical port configured with DHCPv6 Options following flows are added - A logical flow which copies the DHCPv6 options to the DHCPv6 request packets using the 'put_dhcpv6_opts' action and advances the packet to the next stage. - A logical flow which implements the DHCPv6 reponder by sending the DHCPv6 reply back to the inport once the 'put_dhcpv6_opts' action is applied. Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ovsdb-idl: Fix bug on ovsdb_idl_condition_remove_clause().Liran Schour2016-08-131-0/+1
| | | | | | | Call for poll_immediate_wake() when condition is changed. Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: add support for jumbo framesMark Kavanagh2016-08-121-25/+120
| | | | | | | | | | | | | | | | | | | | | Add support for Jumbo Frames to DPDK-enabled port types, using single-segment-mbufs. Using this approach, the amount of memory allocated to each mbuf to store frame data is increased to a value greater than 1518B (typical Ethernet maximum frame length). The increased space available in the mbuf means that an entire Jumbo Frame of a specific size can be carried in a single mbuf, as opposed to partitioning it across multiple mbuf segments. The amount of space allocated to each mbuf to hold frame data is defined dynamically by the user with ovs-vsctl, via the 'mtu_request' parameter. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> [diproiettod@vmware.com rebased] Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev: Make netdev_set_mtu() netdev parameter non-const.Daniele Di Proietto2016-08-125-5/+5
| | | | | | | | | Every provider silently drops the const attribute when converting the parameter to the appropriate subclass. Might as well drop the const attribute from the parameter, since this is a "set" function. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
* vswitchd: Introduce 'mtu_request' column in Interface.Daniele Di Proietto2016-08-121-52/+1
| | | | | | | | | | | | | | | | | The 'mtu_request' column can be used to set the MTU of a specific interface. This column is useful because it will allow changing the MTU of DPDK devices (implemented in a future commit), which are not accessible outside the ovs-vswitchd process, but it can be used for kernel interfaces as well. The current implementation of set_mtu() in netdev-dpdk is removed because it's broken. It will be reintroduced by a subsequent commit on this series. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
* dpif-netdev: Fix -Wformat warning on 32-bit build.Daniele Di Proietto2016-08-121-1/+1
| | | | | | | | | | | Use the appropriate format specifier for size_t, otherwise the 32-bit build fails. Reported-at: https://travis-ci.org/openvswitch/ovs/jobs/151938383 Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted subtables") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Joe Stringer <joe@ovn.org>
* netdev-dpdk: add DPDK pdump capabilityCiara Loftus2016-08-121-0/+19
| | | | | | | | | | | | | | | This commit provides the ability to 'listen' on DPDK ports and save packets to a pcap file with a DPDK app that uses the librte_pdump library. One such app is the 'pdump' app that can be found in the DPDK 'app' directory. Instructions on how to use this can be found in INSTALL.DPDK-ADVANCED.md Pdump capability in OVS with DPDK will only be initialised if the CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are set in DPDK. libpcap is required if the above configuration is used. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpif-netdev: dpcls per in_port with sorted subtablesJan Scheurich2016-08-121-25/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The user-space datapath (dpif-netdev) consists of a first level "exact match cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient and the OVS forwarding performance is determined by the megaflow classifier. The megaflow classifier (dpcls) consists of a variable number of hash tables (aka subtables), each containing megaflow entries with the same mask of packet header and metadata fields to match upon. A dpcls lookup matches a given packet against all subtables in sequence until it hits a match. As megaflow cache entries are by construction non-overlapping, the first match is the only match. Today the order of the subtables in the dpcls is essentially random so that on average a dpcls lookup has to visit N/2 subtables for a hit, when N is the total number of subtables. Even though every single hash-table lookup is fast, the performance of the current dpcls degrades when there are many subtables. How does the patch address this issue: In reality there is often a strong correlation between the ingress port and a small subset of subtables that have hits. The entire megaflow cache typically decomposes nicely into partitions that are hit only by packets entering from a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM -> Phy). Therefore, maintaining a separate dpcls instance per ingress port with its subtable vector sorted by frequency of hits reduces the average number of subtables lookups in the dpcls to a minimum, even if the total number of subtables gets large. This is possible because megaflows always have an exact match on in_port, so every megaflow belongs to unique dpcls instance. For thread safety, the PMD thread needs to block out revalidators during the periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD. To monitor the effectiveness of the patch we have enhanced the ovs-appctl dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups per hit" to report the average number of subtable lookup needed for a megaflow match. Ideally, this should be close to 1 and almost all cases much smaller than N/2. The PMD tests have been adjusted to the additional line in pmd-stats-show. We have benchmarked a L3-VPN pipeline on top of a VXLAN overlay mesh. With pure L3 tenant traffic between VMs on different nodes the resulting netdev dpcls contains N=4 subtables. Each packet traversing the OVS datapath is subject to dpcls lookup twice due to the tunnel termination. Disabling the EMC, we have measured a baseline performance (in+out) of ~1.45 Mpps (64 bytes, 10K L4 packet flows). The average number of subtable lookups per dpcls match is 2.5. With the patch the average number of subtable lookups per dpcls match is reduced to 1 and the forwarding performance grows by ~50% to 2.13 Mpps. Even with EMC enabled, the patch improves the performance by 9% (for 1000 L4 flows) and 34% (for 50K+ L4 flows). As the actual number of subtables will often be higher in reality, we can assume that this is at the lower end of the speed-up one can expect from this optimization. Just running a parallel ping between the VXLAN tunnel endpoints increases the number of subtables and hence the average number of subtable lookups from 2.5 to 3.5 on master with a corresponding decrease of throughput to 1.2 Mpps. With the patch the parallel ping has no impact on average number of subtable lookups and performance. The performance gain is then ~75%. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* Windows: Report absolute file name.Alin Serdean2016-08-121-0/+7
| | | | | | | | | | | | | | On Windows if a file path contains ":" we can safely say it is an absolute file name. This patch allows file_name checks to report correctly when using "abs_file_name". Found by testing. Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Acked-by: Sairam Venugopal <vsairam@vmware.com> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
* netdev-dpdk: vhost: Fix double free and use after free with QoS.Ilya Maximets2016-08-101-15/+9
| | | | | | | | | | | | | | | | | | | | While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will free mbufs while executing 'netdev_dpdk_policer_run()'. After that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()' if 'may_steal == true'. This behaviour will break mempool. Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't do this ('may_steal == false'). This will lead to using of already freed packets by the upper layers. Fix that by copying all packets that we can't steal like it done for DPDK_DEV_ETH devices and freeing only packets not freed by QoS. Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* Revert "pvector: Expose non-concurrent priority vector."Jarno Rajahalme2016-08-105-251/+176
| | | | | | | | | | | | | | | This reverts commit 8bdfe1313894047d44349fa4cf4402970865950f. I failed to see that lib/dpif-netdev.c actually needs the concurrency provided by pvector prior to this change. More specifically, when a subtable is removed, concurrent lookups may skip over another subtable swapped in to the place of the removed subtable in the vector. Since this was the only use of the non-concurrent pvector, it is cleaner to revert the whole patch. Reported-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* ovn-controller: Persist desired conntrack groups.Ryan Moats2016-08-101-0/+9
| | | | | | | | | | | | | With incremental processing of logical flows desired conntrack groups are not being persisted. This patch adds this capability, with the side effect of adding a ds_clone method that this capability leverages. Signed-off-by: Ryan Moats <rmoats@us.ibm.com> Reported-by: Guru Shetty <guru@ovn.org> Reported-at: http://openvswitch.org/pipermail/dev/2016-July/076320.html Fixes: 70c7cfe ("ovn-controller: Add incremental processing to lflow_run and physical_run") Acked-by: Flavio Fernandes <flavio@flaviof.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-vport: remove unused functionBinbin Xu2016-08-102-10/+0
| | | | | | | | The function netdev_vport_get_dpif_port_strdup is not used anymore. So we can remove it now. Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.Ilya Maximets2016-08-091-8/+11
| | | | | | | | | | | | | | | | | | Binding/unbinding of virtio driver inside VM leads to reconfiguration of PMD threads. This behaviour may be abused by executing bind/unbind in an infinite loop to break normal networking on all ports attached to the same instance of Open vSwitch. Fix that by avoiding reconfiguration if it's not necessary. Number of queues will not be decreased to 1 on device disconnection but it's not very important in comparison with possible DOS attack from the inside of guest OS. Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost ports from attached virtio.") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Fix egress policer error detection bug.Ian Stokes2016-08-091-2/+27
| | | | | | | | | | | | | | | When egress policer is set as a QoS type for a port, an error may occur during setup if incorrect parameters are used for the rte_meter. If this occurs the egress policer construct and set functions should free any allocated memory relevant to the policer and set the QoS configuration pointer to null. The netdev_dpdk_set_qos function should check the error value returned for any QoS construct/set calls with an assertion to avoid segfault. Also this commit modifies egress_policer_qos_set() to correctly lock the QoS spinlock while the egress policer configuration is updated to avoid segfault. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Fix dead initialization reported by clang.Bhanuprakash Bodireddy2016-08-091-1/+1
| | | | | | | | Clang reports that value stored to 'tok' during initialization is never read. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-dpdk: Fix deadlock in destroy_device().Daniele Di Proietto2016-08-091-10/+24
| | | | | | | | | | | | | | | | | | | | netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which can trigger the destroy_device() callback. destroy_device() will try to take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a deadlock. This problem can be solved by dropping the mutexes before calling rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and construct() call are already serialized by netdev_mutex. This commit also makes clear that dev->vhost_id is constant and can be accessed without taking any mutexes in the lifetime of the devices. Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak") Reported-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
* smap: New function smap_get_ullong().Ben Pfaff2016-08-084-76/+42
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
* smap: New function smap_get_def().Ben Pfaff2016-08-083-54/+35
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>