summaryrefslogtreecommitdiff
path: root/lib/dpdk.c
Commit message (Collapse)AuthorAgeFilesLines
* treewide: Remove uses of ATOMIC_VAR_INIT.Fangrui Song2023-03-061-1/+1
| | | | | | | | | | ATOMIC_VAR_INIT has a trivial definition `#define ATOMIC_VAR_INIT(value) (value)`, is deprecated in C17/C++20, and will be removed in newer standards in newer GCC/Clang (e.g. https://reviews.llvm.org/D144196). Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dpdk: Move DPDK netdev related configuration.David Marchand2022-11-301-101/+0
| | | | | | | | | | | | | | | vhost related configuration and per port memory are netdev-dpdk configuration items. dpdk-stub.c and netdev-dpdk.c are never linked together, so we can move those bits out of the generic dpdk code. The dpdk_* accessors for those configuration items are then not needed anymore and we can simply reference local variables. Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev-dpdk: Add shared mempool config.Kevin Traynor2022-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mempools may currently be shared between DPDK ports based on port MTU and NUMA. With some hint from the user we can increase the sharing on MTU and hence reduce memory consumption in many cases. For example, a port with MTU 9000, uses a mempool with an mbuf size based on 9000 MTU. A port with MTU 1500, uses a different mempool with an mbuf size based on 1500 MTU. In this case, assuming same NUMA, both these ports could share the 9000 MTU mempool. The user must give a hint as order of creation of ports and setting of MTUs may vary and we need to ensure that upgrades from older OVS versions do not require more memory. This scheme can also prevent multiple mempools being created for cases where a port is added picking up a default MTU and an appropriate mempool, but later has it's MTU changed to a different value requiring a different mempool. Example usage: $ ovs-vsctl --no-wait set Open_vSwitch . \ other_config:shared-mempool-config=9000,1500:1,6000:1 Port added on NUMA 0: * MTU 1500, use mempool based on 9000 MTU * MTU 5000, use mempool based on 9000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Port added on NUMA 1: * MTU 1500, use mempool based on 1500 MTU * MTU 5000, use mempool based on 6000 MTU * MTU 9000, use mempool based on 9000 MTU * MTU 9300, use mempool based on 9300 MTU (existing behaviour) Default behaviour is unchanged and mempools are still only created when needed. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Support running PMD threads on any core.David Marchand2022-01-111-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously in OVS, a PMD thread running on cpu X used lcore X. This assumption limited OVS to run PMD threads on physical cpu < RTE_MAX_LCORE. DPDK 20.08 introduced a new API that associates a non-EAL thread to a free lcore. This new API does not change the thread characteristics (like CPU affinity) and let OVS run its PMD threads on any cpu regardless of RTE_MAX_LCORE. The DPDK multiprocess feature is not compatible with this new API and is disabled. DPDK still limits the number of lcores to RTE_MAX_LCORE (128 on x86_64) which should be enough for OVS pmd threads (hopefully). DPDK lcore/OVS pmd threads mapping are logged at threads when trying to attach a OVS PMD thread, and when detaching. A new command is added to help get DPDK point of view of the DPDK lcores at any time: $ ovs-appctl dpdk/lcore-list lcore 0, socket 0, role RTE, cpuset 0 lcore 1, socket 0, role NON_EAL, cpuset 1 lcore 2, socket 0, role NON_EAL, cpuset 15 Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpif-netdev: Call cpuid for x86 isa availability.David Marchand2022-01-031-52/+0
| | | | | | | | | | | | | | | | | DPIF AVX512 optimizations currently rely on DPDK availability while they can be used without DPDK. Besides, checking for availability of some isa only has to be done once and won't change while a OVS process runs. Resolve isa availability in constructors by using a simplified query based on cpuid API that comes from the compiler. Note: this also fixes the check on BMI2 availability: DPDK had a bug for this isa, see https://git.dpdk.org/dpdk/commit/?id=aae3037ab1e0. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Use --in-memory by default.Rosemarie O'Riorden2021-12-151-0/+7
| | | | | | | | | | | | | If anonymous memory mapping is supported by the kernel, it's better to run OVS entirely in memory rather than creating shared data structures. OVS doesn't work in multi-process mode, so there is no need to litter a filesystem. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1949849 Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Stop configuring socket-limit with the value of socket-mem.Rosemarie O'Riorden2021-07-261-20/+0
| | | | | | | | | | | | | | | | This change removes the automatic memory limit on start-up of OVS with DPDK. As DPDK supports dynamic memory allocation, there is no need to limit the amount of memory available, if not requested. Currently, if socket-limit is not configured, it is set to the value of socket-mem. With this change, the user can decide to set it or have no memory limit. Removed logs that announce this change and fixed documentation. Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Remove default values for socket-mem and limit.Rosemarie O'Riorden2021-07-261-73/+1
| | | | | | | | | | | | | | | | | | | This change removes the default values for EAL args socket-mem and socket-limit. As DPDK supports dynamic memory allocation, there is no need to allocate a certain amount of memory on start-up, nor limit the amount of memory available, if not requested. Currently, socket-mem has a default value of 1024 when it is not configured by the user, and socket-limit takes on the value of socket-mem, 1024, by default. With this change, socket-mem is not configured by default, meaning that socket-limit is not either. Neither, either or both options can be set. Removed extra logs that announce this change and fixed documentation. Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Logs to announce removal of defaults for socket-mem and limit.Rosemarie O'Riorden2021-07-161-0/+12
| | | | | | | | | | | Deprecate current OVS provided defaults for DPDK socket-mem and socket-limit that are planned to be removed in OVS 2.17. At that point DPDK defaults will be used instead. Warnings have been added to alert users in advance. Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Add additional CPU ISA detection stringsHarry van Haaren2021-07-161-0/+2
| | | | | | | | | | | | | | This commit enables OVS to at runtime check for more detailed AVX512 capabilities, specifically Byte and Word (BW) extensions, and Vector Bit Manipulation Instructions (VBMI). These instructions will be used in the CPU ISA optimized implementations of traffic profile aware miniflow extract. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpcls-avx512: Enable avx512 vector popcount instruction.Harry van Haaren2021-07-091-0/+1
| | | | | | | | | | | | | | | | This commit enables the AVX512-VPOPCNTDQ Vector Popcount instruction. This instruction is not available on every CPU that supports the AVX512-F Foundation ISA, hence it is enabled only when the additional VPOPCNTDQ ISA check is passed. The vector popcount instruction is used instead of the AVX512 popcount emulation code present in the avx512 optimized DPCLS today. It provides higher performance in the SIMD miniflow processing as that requires the popcount to calculate the miniflow block indexes. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Cache result of CPU ISA checks.Harry van Haaren2021-07-091-4/+24
| | | | | | | | | | | | | | | | | As a small optimization, this patch caches the result of a CPU ISA check from DPDK. Particularly in the case of running the DPCLS autovalidator (which repeatedly probes subtables) this reduces the amount of CPU ISA lookups from the DPDK level. By caching them at the OVS/dpdk.c level, the ISA checks remain runtime for the CPU where they are executed, but subsequent checks for the same ISA feature become much cheaper. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* ovs-numa: Support non-contiguous numa nodes and offline CPU cores.David Wilder2021-07-071-8/+49
| | | | | | | | | | | | | | | | | | | | | | | This change removes the assumption that numa nodes and cores are numbered contiguously in linux. This change is required to support some Power systems. A check has been added to verify that cores are online, offline cores result in non-contiguously numbered cores. DPDK EAL option generation is updated to work with non-contiguous numa nodes. These options can be seen in the ovs-vswitchd.log. For example: a system containing only numa nodes 0 and 8 will generate the following: EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \ --socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0 Tests for pmd and dpif-netdev have been updated to validate non-contiguous numbered nodes. Signed-off-by: David Wilder <dwilder@us.ibm.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Add debug appctl to get malloc statistics.Eli Britstein2021-06-181-0/+10
| | | | | | | | | | | New appctl 'dpdk/get-malloc-stats' implemented to get result of 'rte_malloc_dump_stats()' function. Could be used for debugging. Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Salem Sol <salems@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Update to use DPDK v20.11.Ian Stokes2020-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for DPDK v20.11, it includes the following changes. 1. travis: Remove explicit DPDK kmods configuration. 2. sparse: Fix build with 20.05 DPDK tracepoints. 3. netdev-dpdk: Remove experimental API flag. http://patchwork.ozlabs.org/project/openvswitch/list/?series=173216&state=* 4. sparse: Update to DPDK 20.05 trace point header. http://patchwork.ozlabs.org/project/openvswitch/list/?series=179604&state=* 5. sparse: Fix build with DPDK 20.08. http://patchwork.ozlabs.org/project/openvswitch/list/?series=200181&state=* 6. build: Add support for DPDK meson build. http://patchwork.ozlabs.org/project/openvswitch/list/?series=199138&state=* 7. netdev-dpdk: Remove usage of RTE_ETH_DEV_CLOSE_REMOVE flag. http://patchwork.ozlabs.org/project/openvswitch/list/?series=207850&state=* 8. netdev-dpdk: Fix build with 20.11-rc1. http://patchwork.ozlabs.org/project/openvswitch/list/?series=209006&state=* 9. sparse: Fix __ATOMIC_* redefinition errors http://patchwork.ozlabs.org/project/openvswitch/list/?series=209452&state=* 10. build: Remove DPDK make build references. http://patchwork.ozlabs.org/project/openvswitch/list/?series=216682&state=* For credit all authors of the original commits to 'dpdk-latest' with the above changes have been added as co-authors for this commit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Co-authored-by: Eli Britstein <elibr@nvidia.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com> Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Add commands to configure log levels.David Marchand2020-07-171-5/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling debug logs in dpdk can be a challenge to be sure of what is actually enabled, add commands to list and change those log levels. However, these commands do not help when tracking issues in dpdk init itself: dump log levels right after init. Example: $ ovs-appctl dpdk/log-list global log level is debug id 0: lib.eal, level is info id 1: lib.malloc, level is info id 2: lib.ring, level is info id 3: lib.mempool, level is info id 4: lib.timer, level is info id 5: pmd, level is info [...] id 37: pmd.net.bnxt.driver, level is notice id 38: pmd.net.e1000.init, level is notice id 39: pmd.net.e1000.driver, level is notice id 40: pmd.net.enic, level is info [...] $ ovs-appctl dpdk/log-set debug pmd.*:notice $ ovs-appctl dpdk/log-list global log level is debug id 0: lib.eal, level is debug id 1: lib.malloc, level is debug id 2: lib.ring, level is debug id 3: lib.mempool, level is debug id 4: lib.timer, level is debug id 5: pmd, level is debug [...] id 37: pmd.net.bnxt.driver, level is notice id 38: pmd.net.e1000.init, level is notice id 39: pmd.net.e1000.driver, level is notice id 40: pmd.net.enic, level is notice [...] Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: enable CPU feature detection.Harry van Haaren2020-07-131-0/+30
| | | | | | | | | | This commit implements a method to retrieve the CPU ISA capabilities. These ISA capabilities can be used in OVS to at runtime select a function implementation to make the best use of the available ISA on the CPU. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Remove deprecated pdump support.Ilya Maximets2020-03-061-12/+0
| | | | | | | | | | | | DPDK pdump was deprecated in 2.13 release and didn't actually work since 2.11. Removing it. More details in commit 4ae8c4617fd3 ("dpdk: Deprecate pdump support.") Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpdk: Update to use DPDK 19.11.Ian Stokes2019-12-041-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for DPDK v19.11, it includes the following changes. 1. travis: Enable compilation and linkage with dpdk 19.11. 2. sparse: Remove dpdk network headers copies. https://patchwork.ozlabs.org/patch/1185256/ 3. dpdk: Migrate to new PDUMP API. https://patchwork.ozlabs.org/patch/1192971/ 4. netdev-dpdk: Prefix network structures with rte_. https://patchwork.ozlabs.org/patch/1109733/ 5. netdev-dpdk: Update by new color definitions. https://patchwork.ozlabs.org/patch/1086089/ 6. docs: Update docs to reference 19.11. 7. docs: Add note regarding hotplug and igb_uio requirements. For credit all authors of the original commits to 'dpdk-latest' with the above changes been added as co-authors for this commmit. Signed-off-by: David Marchand <david.marchand@redhat.com> Co-authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Co-authored-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Co-authored-by: Ophir Munk <ophirmu@mellanox.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Deprecate pdump support.Ilya Maximets2019-11-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conventional way for packet dumping in OVS is to use ovs-tcpdump that works via traffic mirroring. DPDK pdump could probably be used for some lower level debugging, but it is not commonly used for various reasons. There are lots of limitations for using this functionality in practice. Most of them connected with running secondary pdump process and memory layout issues like requirement to disable ASLR in kernel. More details are available in DPDK guide: https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations Beside the functional limitations it's also hard to use this functionality correctly. User must be sure that OVS and pdump utility are running on different CPU cores, which is hard because non-PMD threads could float over available CPU cores. This or any other misconfiguration will likely lead to crash of the pdump utility or/and OVS. Another problem is that the user must actually have this special pdump utility in a system and it might be not available in distributions. This change disables pdump support by default introducing special configuration option '--enable-dpdk-pdump'. Deprecation warnings will be shown to users on configuration and in runtime. Claiming to completely remove this functionality from OVS in one of the next releases. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Remove unneeded log message copy.David Marchand2019-09-261-7/+5
| | | | | | | | | | No need to duplicate and null-terminate the passed buffer. We can directly give it to the vlog subsystem using a dynamic precision in the format string. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Use ovs-numa provided functions to manage thread affinity.Ilya Maximets2019-09-061-15/+12
| | | | | | | | | | This allows to decrease code duplication and avoid using Linux-specific functions (this might be useful in the future if we'll try to allow running OvS+DPDK on FreeBSD). Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: William Tu <u9012063@gmail.com>
* dpif-netdev-perf: Fix TSC frequency for non-DPDK case.Ilya Maximets2019-09-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | Unlike 'rte_get_tsc_cycles()' which doesn't need any specific initialization, 'rte_get_tsc_hz()' could be used only after successfull call to 'rte_eal_init()'. 'rte_eal_init()' estimates the TSC frequency for later use by 'rte_get_tsc_hz()'. Fairly said, we're not allowed to use 'rte_get_tsc_cycles()' before initializing DPDK too, but it works this way for now and provides correct results. This patch provides TSC frequency estimation code that will be used in two cases: * DPDK is not compiled in, i.e. DPDK_NETDEV not defined. * DPDK compiled in but not initialized, i.e. other_config:dpdk-init=false This change is mostly useful for AF_XDP netdev support, i.e. allows to use dpif-netdev/pmd-perf-show command and various PMD perf metrics. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: William Tu <u9012063@gmail.com>
* netdev-offload: Rename offload providers.Ilya Maximets2019-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Flow API providers renamed to be consistent with parent module 'netdev-offload' and look more like each other. '_rte_' replaced with more convenient '_dpdk_'. We'll have following structure: Common code: lib/netdev-offload-provider.h lib/netdev-offload.c lib/netdev-offload.h Providers: lib/netdev-offload-tc.c lib/netdev-offload-dpdk.c 'netdev-offload-dummy' still resides inside netdev-dummy, but it makes no much sence to move it out of there. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev: Split up netdev offloading to separate module.Ilya Maximets2019-06-111-2/+2
| | | | | | | | | | | | | | | New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev: Dynamic per-port Flow API.Ilya Maximets2019-06-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Current issues with Flow API: * OVS calls offloading functions regardless of successful flow API initialization. (ex. on init_flow_api failure) * Static initilaization of Flow API for a netdev_class forbids having different offloading types for different instances of netdev with the same netdev_class. (ex. different vports in 'system' and 'netdev' datapaths at the same time) Solution: * Move Flow API from the netdev_class to netdev instance. * Make Flow API dynamic, i.e. probe the APIs and choose the suitable one. Side effects: * Flow API providers localized as possible in their modules. * Now we have an ability to make runtime checks. For example, we could check if particular device supports features we need, like if dpdk device supports RSS+MARK action. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>
* netdev-dpdk: Post-copy Live Migration support for vhost-user-client.Liliia Butorina2019-05-241-0/+18
| | | | | | | | | | | | | | | | | | | | | | | Post-copy Live Migration for vHost supported since DPDK 18.11 and QEMU 2.12. New global config option 'vhost-postcopy-support' added to control this feature. Ex.: ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true Changing this value requires restarting the daemon. It's safe to enable this knob even if QEMU doesn't support post-copy LM. Feature marked as experimental and disabled by default because it may cause PMD thread hang on destination host on page fault for the time of page downloading from the source. Feature is not compatible with 'mlockall' and 'dequeue zero-copy'. Support added only for vhost-user-client. Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
* dpdk: Stop dumping memzones to stdout.Ilya Maximets2019-03-191-1/+17
| | | | | | | | | | Information about memzones reserved on init is not much useful. Anyway, we need to log it in more civilized manner, i.e. through the OVS logging subsystem. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Fix case-sensitivity of dpdk-init knob.Ilya Maximets2019-03-041-2/+2
| | | | | | | | | | | | | | | | | | Before supporting the DPDK initialization status in DB 'dpdk-init' was just a boolean and 'smap_get_bool', which is case-insensitive, was used to get the value. Current code uses simple 'strcmp' that fails to recognize values like "True". As a result this breaks different OVS configuration tools. For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS is not able to recognize it leading to broken installations. 'strcasecmp' should be used instead to fix the issue. CC: Aaron Conole <aconole@redhat.com> Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Limit DPDK memory usage.Ilya Maximets2019-02-011-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | Since 18.05 release, DPDK moved to dynamic memory model in which hugepages could be allocated on demand. At the same time '--socket-mem' option was re-defined as a size of pre-allocated memory, i.e. memory that should be allocated at startup and could not be freed. So, DPDK with a new memory model could allocate more hugepage memory than specified in '--socket-mem' or '-m' options. This change adds new configurable 'other_config:dpdk-socket-limit' which could be used to limit the ammount of memory DPDK could use. It uses new DPDK option '--socket-limit'. Ex.: ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024" Also, in order to preserve old behaviour, if '--socket-limit' is not specified, it will be defaulted to the amount of memory specified by '--socket-mem' option, i.e. OVS will not be able to allocate more. This is needed, for example, to disallow OVS to allocate more memory than reserved for it by Nova in OpenStack installations. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Use svec instead of re-inventing.Ilya Maximets2019-01-301-153/+68
| | | | | | | | | No need to implement dynamic vector to store arguments. 'svec' perfectly covers all the needed functionality. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Use dynamic string for socket-mem construction.Ilya Maximets2019-01-281-8/+5
| | | | | | | | | No need to allocate memory and use 'strcat' direcly. 'dynamic-string' could do this for us. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Support both shared and per port mempools.Ian Stokes2018-07-061-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit re-introduces the concept of shared mempools as the default memory model for DPDK devices. Per port mempools are still available but must be enabled explicitly by a user. OVS previously used a shared mempool model for ports with the same MTU and socket configuration. This was replaced by a per port mempool model to address issues flagged by users such as: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html However the per port model potentially requires an increase in memory resource requirements to support the same number of ports and configuration as the shared port model. This is considered a blocking factor for current deployments of OVS when upgrading to future OVS releases as a user may have to redimension memory for the same deployment configuration. This may not be possible for users. This commit resolves the issue by re-introducing shared mempools as the default memory behaviour in OVS DPDK but also refactors the memory configuration code to allow for per port mempools. This patch adds a new global config option, per-port-memory, that controls the enablement of per port mempools for DPDK devices. ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true This value defaults to false; to enable per port memory support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. The mempool sweep functionality is also replaced with the sweep functionality from OVS 2.9 found in commits c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.) a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.) A new document to discuss the specifics of the memory models and example memory requirement calculations is also added. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Tiago Lam <tiago.lam@intel.com> Tested-by: Tiago Lam <tiago.lam@intel.com>
* OVS-DPDK: Change "dpdk-socket-mem" default value.Marcin Rybka2018-06-081-1/+27
| | | | | | | | | | | | | When "dpdk-socket-mem" and "dpdk-alloc-mem" are not specified, "dpdk-socket-mem" will be set to allocate 1024MB on each NUMA node. This change will prevent OVS from failing when NIC is attached on NUMA node 1 and higher. Patch contains documentation update. Signed-off-by: Marcin Rybka <marcinx.rybka@intel.com> Co-authored-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com> Tested-by: Hariprasad Govindharajan <hariprasad.govindharajan@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: reflect status and version in the databaseAaron Conole2018-05-251-2/+19
| | | | | | | | | | | | | | | | The normal way of retrieving the running DPDK status involves parsing log files and issuing various incantations of ovs-vsctl and ovs-appctl commands to determine whether the rte_eal_init successfully started. This commit adds two new records to reflect the dpdk version, and the dpdk initialization status. To support this, the other_config:dpdk-init configuration block supports the 'true' and 'try' keywords now, instead of just 'true'. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: allow init to failAaron Conole2018-05-251-7/+16
| | | | | | | | | | | | | | | | | | It's possible for dpdk initialization to fail either due to an internal error or an invalid configuration. When that happens, it's rather impolite to immediately abort without any details. With this change, a failed dpdk initialization attempt will continue to trigger a SIGABRT. However, the failure details will be logged, and a user or administrator may have more information to correct the issue. A restart of OvS would still be required to re-attempt initialization. The refactor to propagate the init error will be used in an upcoming commit. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: Limit rate of DPDK logs.Ilya Maximets2018-03-231-4/+6
| | | | | | | | | | | | | | | | | | | | | | | DPDK could produce huge amount of logs. For example, in case of exhausting of a mempool in vhost-user port, following message will be printed on each call to 'rte_vhost_dequeue_burst()': |ERR|VHOST_DATA: Failed to allocate memory for mbuf. These messages are increasing ovs-vswitchd.log size extremely fast making it unreadable and non-parsable by a common linux utils like grep, less etc. Moreover continuously growing log could exhaust the HDD space in a few hours breaking normal operation of the whole system. To avoid such issues, DPDK log rate limited to 600 messages per minute. This value is high, because we still want to see many big logs like vhost-user configuration sequence. The debug messages are treated separately to avoid looss of errors/warnings in case of intensive debug enabled in DPDK. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* vswitchd: show DPDK versionMatteo Croce2018-01-261-0/+8
| | | | | | | | | | Show DPDK version if Open vSwitch is compiled with DPDK support. Version can be retrieved with `ovs-vswitchd --version` or from OVS logs. Small change in ovs-ctl to avoid breakage on output change. Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* netdev-dpdk: vHost IOMMU supportMark Kavanagh2017-12-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPDK v17.11 introduces support for the vHost IOMMU feature. This is a security feature, which restricts the vhost memory that a virtio device may access. This feature also enables the vhost REPLY_ACK protocol, the implementation of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature is disabled by default in (and should remain so), for the aforementioned older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, even without having an IOMMU device, with no performance penalty. This patch adds a new global config option, vhost-iommu-support, that controls enablement of the vhost IOMMU feature: ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true This value defaults to false; to enable IOMMU support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* dpdk: Redirect DPDK log to OVS logging subsystem.Ilya Maximets2017-03-091-0/+48
| | | | | | | | | | | This should be helpful for have all the logs in one place. 'ovs-appctl vlog' commands for 'dpdk' module can be used to configure the log level. Lower bound for DPDK logging (--log-level) still can be passed through 'dpdk-extra' field. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: Use VLOG_INFO_ONCE instead of open-coding it.Ben Pfaff2017-03-081-6/+2
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Andy Zhou <azhou@ovn.org>
* dpdk: Fixes memory leak in dpdk_init__().nickcooper-zhangtonghao2017-02-101-3/+3
| | | | | | | | | | | | | | If users configure the 'vhost-sock-dir' for dpdk, the memory allocated by xstrdup(ovs_rundir()) is not freed. This patch allows the process_vhost_flags to xstrdup() for val or default_val according to configuration and the caller must free new_val when it is no longer needed. Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.") CC: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: Late initialization.Daniele Di Proietto2017-01-101-10/+21
| | | | | | | | | | | | | | | | | | With this commit, we allow the user to set other_config:dpdk-init=true after the process is started. This makes it easier to start Open vSwitch with DPDK using standard init scripts without restarting the service. This is still far from ideal, because initializing DPDK might still abort the process (e.g. if there not enough memory), so the user must check the status of the process after setting dpdk-init to true. Nonetheless, I think this is an improvement, because it doesn't require restarting the whole unit. CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Aaron Conole <aconole@redhat.com>
* lib/dpdk: No more deferred releaseAaron Conole2016-12-211-12/+5
| | | | | | | | | DPDK documentation is recently updated to reflect that DPDK does not hold any references to, nor take ownership of, the argv/argc elements. With that understanding, let's just release the memory asap. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* lib/dpdk: fix double free on exitAaron Conole2016-12-121-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | The DPDK EAL library intents that all argc/argv arguments passed on the command line will be in the form: progname dpdk arguments program arguments This means the argv array will look something like: argv[0] = progname argv[1..x] = dpdk arguments argv[x..y] = program arguments When the eal initialization routine completes, it will modify the argv array to set argv[ret] = progname, such that the arguments can then be passed to something like getopts for further processing. When the dpdk arguments rework was initially added, the assignment mentioned above was not considered. This means two errors were introduced: 1. Leak of the element at argv[ret] 2. Double-free of the element at argv[0] Reported-by: Ilya Maximets <i.maximets@samsung.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2016-November/325442.html Fixes: bab694097133 ("netdev-dpdk: Convert initialization from cmdline to db") Signed-off-by: Aaron Conole <aconole@redhat.com>
* dpdk: Fix DPDK pdump compilationCiara Loftus2016-10-131-0/+5
| | | | | | | | | The rte_pdump header file was not included in the file that requires it. Fix this. Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* dpdk: New module with some code from netdev-dpdk.Daniele Di Proietto2016-10-121-0/+432
There's a lot of code in netdev-dpdk which is not at all related to the netdev interface, mostly the library initialization code. This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'. Also a new module 'dpdk-stub' is introduced to implement some functions when DPDK is not available. This replaces the old 'netdev-nodpdk' module. Some redundant includes are removed or reorganized as a consequence. No functional change. CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>