summaryrefslogtreecommitdiff
path: root/lib/dpdk.c
Commit message (Collapse)AuthorAgeFilesLines
* dpdk: Use ovs-numa provided functions to manage thread affinity.Ilya Maximets2019-09-061-15/+12
| | | | | | | | | | This allows to decrease code duplication and avoid using Linux-specific functions (this might be useful in the future if we'll try to allow running OvS+DPDK on FreeBSD). Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: William Tu <u9012063@gmail.com>
* dpif-netdev-perf: Fix TSC frequency for non-DPDK case.Ilya Maximets2019-09-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | Unlike 'rte_get_tsc_cycles()' which doesn't need any specific initialization, 'rte_get_tsc_hz()' could be used only after successfull call to 'rte_eal_init()'. 'rte_eal_init()' estimates the TSC frequency for later use by 'rte_get_tsc_hz()'. Fairly said, we're not allowed to use 'rte_get_tsc_cycles()' before initializing DPDK too, but it works this way for now and provides correct results. This patch provides TSC frequency estimation code that will be used in two cases: * DPDK is not compiled in, i.e. DPDK_NETDEV not defined. * DPDK compiled in but not initialized, i.e. other_config:dpdk-init=false This change is mostly useful for AF_XDP netdev support, i.e. allows to use dpif-netdev/pmd-perf-show command and various PMD perf metrics. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: William Tu <u9012063@gmail.com>
* netdev-offload: Rename offload providers.Ilya Maximets2019-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Flow API providers renamed to be consistent with parent module 'netdev-offload' and look more like each other. '_rte_' replaced with more convenient '_dpdk_'. We'll have following structure: Common code: lib/netdev-offload-provider.h lib/netdev-offload.c lib/netdev-offload.h Providers: lib/netdev-offload-tc.c lib/netdev-offload-dpdk.c 'netdev-offload-dummy' still resides inside netdev-dummy, but it makes no much sence to move it out of there. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev: Split up netdev offloading to separate module.Ilya Maximets2019-06-111-2/+2
| | | | | | | | | | | | | | | New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev: Dynamic per-port Flow API.Ilya Maximets2019-06-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Current issues with Flow API: * OVS calls offloading functions regardless of successful flow API initialization. (ex. on init_flow_api failure) * Static initilaization of Flow API for a netdev_class forbids having different offloading types for different instances of netdev with the same netdev_class. (ex. different vports in 'system' and 'netdev' datapaths at the same time) Solution: * Move Flow API from the netdev_class to netdev instance. * Make Flow API dynamic, i.e. probe the APIs and choose the suitable one. Side effects: * Flow API providers localized as possible in their modules. * Now we have an ability to make runtime checks. For example, we could check if particular device supports features we need, like if dpdk device supports RSS+MARK action. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev-dpdk: Post-copy Live Migration support for vhost-user-client.Liliia Butorina2019-05-241-0/+18
| | | | | | | | | | | | | | | | | | | | | | | Post-copy Live Migration for vHost supported since DPDK 18.11 and QEMU 2.12. New global config option 'vhost-postcopy-support' added to control this feature. Ex.: ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true Changing this value requires restarting the daemon. It's safe to enable this knob even if QEMU doesn't support post-copy LM. Feature marked as experimental and disabled by default because it may cause PMD thread hang on destination host on page fault for the time of page downloading from the source. Feature is not compatible with 'mlockall' and 'dequeue zero-copy'. Support added only for vhost-user-client. Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
* dpdk: Stop dumping memzones to stdout.Ilya Maximets2019-03-191-1/+17
| | | | | | | | | | Information about memzones reserved on init is not much useful. Anyway, we need to log it in more civilized manner, i.e. through the OVS logging subsystem. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Fix case-sensitivity of dpdk-init knob.Ilya Maximets2019-03-041-2/+2
| | | | | | | | | | | | | | | | | | Before supporting the DPDK initialization status in DB 'dpdk-init' was just a boolean and 'smap_get_bool', which is case-insensitive, was used to get the value. Current code uses simple 'strcmp' that fails to recognize values like "True". As a result this breaks different OVS configuration tools. For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS is not able to recognize it leading to broken installations. 'strcasecmp' should be used instead to fix the issue. CC: Aaron Conole <aconole@redhat.com> Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Limit DPDK memory usage.Ilya Maximets2019-02-011-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | Since 18.05 release, DPDK moved to dynamic memory model in which hugepages could be allocated on demand. At the same time '--socket-mem' option was re-defined as a size of pre-allocated memory, i.e. memory that should be allocated at startup and could not be freed. So, DPDK with a new memory model could allocate more hugepage memory than specified in '--socket-mem' or '-m' options. This change adds new configurable 'other_config:dpdk-socket-limit' which could be used to limit the ammount of memory DPDK could use. It uses new DPDK option '--socket-limit'. Ex.: ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024" Also, in order to preserve old behaviour, if '--socket-limit' is not specified, it will be defaulted to the amount of memory specified by '--socket-mem' option, i.e. OVS will not be able to allocate more. This is needed, for example, to disallow OVS to allocate more memory than reserved for it by Nova in OpenStack installations. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Use svec instead of re-inventing.Ilya Maximets2019-01-301-153/+68
| | | | | | | | | No need to implement dynamic vector to store arguments. 'svec' perfectly covers all the needed functionality. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Use dynamic string for socket-mem construction.Ilya Maximets2019-01-281-8/+5
| | | | | | | | | No need to allocate memory and use 'strcat' direcly. 'dynamic-string' could do this for us. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Support both shared and per port mempools.Ian Stokes2018-07-061-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit re-introduces the concept of shared mempools as the default memory model for DPDK devices. Per port mempools are still available but must be enabled explicitly by a user. OVS previously used a shared mempool model for ports with the same MTU and socket configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model potentially requires an increase in memory resource requirements to support the same number of ports and configuration as the shared port model. This is considered a blocking factor for current deployments of OVS when upgrading to future OVS releases as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. This commit resolves the issue by re-introducing shared mempools as the default memory behaviour in OVS DPDK but also refactors the memory configuration code to allow for per port mempools. This patch adds a new global config option, per-port-memory, that controls the enablement of per port mempools for DPDK devices. ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true This value defaults to false; to enable per port memory support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. The mempool sweep functionality is also replaced with the sweep functionality from OVS 2.9 found in commits c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.) A new document to discuss the specifics of the memory models and example memory requirement calculations is also added. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>
* OVS-DPDK: Change "dpdk-socket-mem" default value.Marcin Rybka2018-06-081-1/+27
| | | | | | | | | | | | | When "dpdk-socket-mem" and "dpdk-alloc-mem" are not specified, "dpdk-socket-mem" will be set to allocate 1024MB on each NUMA node. This change will prevent OVS from failing when NIC is attached on NUMA node 1 and higher. Patch contains documentation update. Signed-off-by: Marcin Rybka <marcinx.rybka@intel.com> Co-authored-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com> Tested-by: Hariprasad Govindharajan <hariprasad.govindharajan@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: reflect status and version in the databaseAaron Conole2018-05-251-2/+19
| | | | | | | | | | | | | | | | The normal way of retrieving the running DPDK status involves parsing log files and issuing various incantations of ovs-vsctl and ovs-appctl commands to determine whether the rte_eal_init successfully started. This commit adds two new records to reflect the dpdk version, and the dpdk initialization status. To support this, the other_config:dpdk-init configuration block supports the 'true' and 'try' keywords now, instead of just 'true'. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: allow init to failAaron Conole2018-05-251-7/+16
| | | | | | | | | | | | | | | | | | It's possible for dpdk initialization to fail either due to an internal error or an invalid configuration. When that happens, it's rather impolite to immediately abort without any details. With this change, a failed dpdk initialization attempt will continue to trigger a SIGABRT. However, the failure details will be logged, and a user or administrator may have more information to correct the issue. A restart of OvS would still be required to re-attempt initialization. The refactor to propagate the init error will be used in an upcoming commit. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Limit rate of DPDK logs.Ilya Maximets2018-03-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | DPDK could produce huge amount of logs. For example, in case of exhausting of a mempool in vhost-user port, following message will be printed on each call to 'rte_vhost_dequeue_burst()': |ERR|VHOST_DATA: Failed to allocate memory for mbuf. These messages are increasing ovs-vswitchd.log size extremely fast making it unreadable and non-parsable by a common linux utils like grep, less etc. Moreover continuously growing log could exhaust the HDD space in a few hours breaking normal operation of the whole system. To avoid such issues, DPDK log rate limited to 600 messages per minute. This value is high, because we still want to see many big logs like vhost-user configuration sequence. The debug messages are treated separately to avoid looss of errors/warnings in case of intensive debug enabled in DPDK. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* vswitchd: show DPDK versionMatteo Croce2018-01-261-0/+8
| | | | | | | | | | Show DPDK version if Open vSwitch is compiled with DPDK support. Version can be retrieved with `ovs-vswitchd --version` or from OVS logs. Small change in ovs-ctl to avoid breakage on output change. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: vHost IOMMU supportMark Kavanagh2017-12-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPDK v17.11 introduces support for the vHost IOMMU feature. This is a security feature, which restricts the vhost memory that a virtio device may access. This feature also enables the vhost REPLY_ACK protocol, the implementation of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature is disabled by default in (and should remain so), for the aforementioned older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, even without having an IOMMU device, with no performance penalty. This patch adds a new global config option, vhost-iommu-support, that controls enablement of the vhost IOMMU feature: ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true This value defaults to false; to enable IOMMU support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Redirect DPDK log to OVS logging subsystem.Ilya Maximets2017-03-091-0/+48
| | | | | | | | | | | This should be helpful for have all the logs in one place. 'ovs-appctl vlog' commands for 'dpdk' module can be used to configure the log level. Lower bound for DPDK logging (--log-level) still can be passed through 'dpdk-extra' field. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: Use VLOG_INFO_ONCE instead of open-coding it.Ben Pfaff2017-03-081-6/+2
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* dpdk: Fixes memory leak in dpdk_init__().nickcooper-zhangtonghao2017-02-101-3/+3
| | | | | | | | | | | | | | If users configure the 'vhost-sock-dir' for dpdk, the memory allocated by xstrdup(ovs_rundir()) is not freed. This patch allows the process_vhost_flags to xstrdup() for val or default_val according to configuration and the caller must free new_val when it is no longer needed. Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.") CC: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: Late initialization.Daniele Di Proietto2017-01-101-10/+21
| | | | | | | | | | | | | | | | | | With this commit, we allow the user to set other_config:dpdk-init=true after the process is started. This makes it easier to start Open vSwitch with DPDK using standard init scripts without restarting the service. This is still far from ideal, because initializing DPDK might still abort the process (e.g. if there not enough memory), so the user must check the status of the process after setting dpdk-init to true. Nonetheless, I think this is an improvement, because it doesn't require restarting the whole unit. CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Aaron Conole <aconole@redhat.com>
* lib/dpdk: No more deferred releaseAaron Conole2016-12-211-12/+5
| | | | | | | | | DPDK documentation is recently updated to reflect that DPDK does not hold any references to, nor take ownership of, the argv/argc elements. With that understanding, let's just release the memory asap. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* lib/dpdk: fix double free on exitAaron Conole2016-12-121-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | The DPDK EAL library intents that all argc/argv arguments passed on the command line will be in the form: progname dpdk arguments program arguments This means the argv array will look something like: argv[0] = progname argv[1..x] = dpdk arguments argv[x..y] = program arguments When the eal initialization routine completes, it will modify the argv array to set argv[ret] = progname, such that the arguments can then be passed to something like getopts for further processing. When the dpdk arguments rework was initially added, the assignment mentioned above was not considered. This means two errors were introduced: 1. Leak of the element at argv[ret] 2. Double-free of the element at argv[0] Reported-by: Ilya Maximets <i.maximets@samsung.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2016-November/325442.html Fixes: bab694097133 ("netdev-dpdk: Convert initialization from cmdline to db") Signed-off-by: Aaron Conole <aconole@redhat.com>
* dpdk: Fix DPDK pdump compilationCiara Loftus2016-10-131-0/+5
| | | | | | | | | The rte_pdump header file was not included in the file that requires it. Fix this. Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: New module with some code from netdev-dpdk.Daniele Di Proietto2016-10-121-0/+432
There's a lot of code in netdev-dpdk which is not at all related to the netdev interface, mostly the library initialization code. This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'. Also a new module 'dpdk-stub' is introduced to implement some functions when DPDK is not available. This replaces the old 'netdev-nodpdk' module. Some redundant includes are removed or reorganized as a consequence. No functional change. CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>