| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
ATOMIC_VAR_INIT has a trivial definition
`#define ATOMIC_VAR_INIT(value) (value)`,
is deprecated in C17/C++20, and will be removed in newer standards in
newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vhost related configuration and per port memory are netdev-dpdk
configuration items.
dpdk-stub.c and netdev-dpdk.c are never linked together, so we can move
those bits out of the generic dpdk code.
The dpdk_* accessors for those configuration items are then not needed
anymore and we can simply reference local variables.
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mempools may currently be shared between DPDK ports based
on port MTU and NUMA. With some hint from the user we can
increase the sharing on MTU and hence reduce memory
consumption in many cases.
For example, a port with MTU 9000, uses a mempool with an
mbuf size based on 9000 MTU. A port with MTU 1500, uses a
different mempool with an mbuf size based on 1500 MTU.
In this case, assuming same NUMA, both these ports could
share the 9000 MTU mempool.
The user must give a hint as order of creation of ports and
setting of MTUs may vary and we need to ensure that upgrades
from older OVS versions do not require more memory.
This scheme can also prevent multiple mempools being created
for cases where a port is added picking up a default MTU and
an appropriate mempool, but later has it's MTU changed to a
different value requiring a different mempool.
Example usage:
$ ovs-vsctl --no-wait set Open_vSwitch . \
other_config:shared-mempool-config=9000,1500:1,6000:1
Port added on NUMA 0:
* MTU 1500, use mempool based on 9000 MTU
* MTU 5000, use mempool based on 9000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
Port added on NUMA 1:
* MTU 1500, use mempool based on 1500 MTU
* MTU 5000, use mempool based on 6000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
Default behaviour is unchanged and mempools are still only created
when needed.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously in OVS, a PMD thread running on cpu X used lcore X.
This assumption limited OVS to run PMD threads on physical cpu <
RTE_MAX_LCORE.
DPDK 20.08 introduced a new API that associates a non-EAL thread to a free
lcore. This new API does not change the thread characteristics (like CPU
affinity) and let OVS run its PMD threads on any cpu regardless of
RTE_MAX_LCORE.
The DPDK multiprocess feature is not compatible with this new API and is
disabled.
DPDK still limits the number of lcores to RTE_MAX_LCORE (128 on x86_64)
which should be enough for OVS pmd threads (hopefully).
DPDK lcore/OVS pmd threads mapping are logged at threads when trying to
attach a OVS PMD thread, and when detaching.
A new command is added to help get DPDK point of view of the DPDK lcores
at any time:
$ ovs-appctl dpdk/lcore-list
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role NON_EAL, cpuset 1
lcore 2, socket 0, role NON_EAL, cpuset 15
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DPIF AVX512 optimizations currently rely on DPDK availability while
they can be used without DPDK.
Besides, checking for availability of some isa only has to be done once
and won't change while a OVS process runs.
Resolve isa availability in constructors by using a simplified query
based on cpuid API that comes from the compiler.
Note: this also fixes the check on BMI2 availability: DPDK had a bug
for this isa, see https://git.dpdk.org/dpdk/commit/?id=aae3037ab1e0.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If anonymous memory mapping is supported by the kernel, it's better
to run OVS entirely in memory rather than creating shared data
structures. OVS doesn't work in multi-process mode, so there is no need
to litter a filesystem.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1949849
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the automatic memory limit on start-up of OVS with
DPDK. As DPDK supports dynamic memory allocation, there is no
need to limit the amount of memory available, if not requested.
Currently, if socket-limit is not configured, it is set to the value of
socket-mem. With this change, the user can decide to set it or have no
memory limit.
Removed logs that announce this change and fixed documentation.
Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the default values for EAL args socket-mem and
socket-limit. As DPDK supports dynamic memory allocation, there is no
need to allocate a certain amount of memory on start-up, nor limit the
amount of memory available, if not requested.
Currently, socket-mem has a default value of 1024 when it is not
configured by the user, and socket-limit takes on the value of
socket-mem, 1024, by default. With this change, socket-mem is not
configured by default, meaning that socket-limit is not either.
Neither, either or both options can be set.
Removed extra logs that announce this change and fixed documentation.
Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate current OVS provided defaults for DPDK socket-mem and
socket-limit that are planned to be removed in OVS 2.17. At that point
DPDK defaults will be used instead. Warnings have been added to alert
users in advance.
Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit enables OVS to at runtime check for more detailed
AVX512 capabilities, specifically Byte and Word (BW) extensions,
and Vector Bit Manipulation Instructions (VBMI).
These instructions will be used in the CPU ISA optimized
implementations of traffic profile aware miniflow extract.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit enables the AVX512-VPOPCNTDQ Vector Popcount
instruction. This instruction is not available on every CPU
that supports the AVX512-F Foundation ISA, hence it is enabled
only when the additional VPOPCNTDQ ISA check is passed.
The vector popcount instruction is used instead of the AVX512
popcount emulation code present in the avx512 optimized DPCLS today.
It provides higher performance in the SIMD miniflow processing
as that requires the popcount to calculate the miniflow block indexes.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a small optimization, this patch caches the result of a CPU ISA
check from DPDK. Particularly in the case of running the DPCLS
autovalidator (which repeatedly probes subtables) this reduces
the amount of CPU ISA lookups from the DPDK level.
By caching them at the OVS/dpdk.c level, the ISA checks remain
runtime for the CPU where they are executed, but subsequent checks
for the same ISA feature become much cheaper.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the assumption that numa nodes and cores are numbered
contiguously in linux. This change is required to support some Power
systems.
A check has been added to verify that cores are online,
offline cores result in non-contiguously numbered cores.
DPDK EAL option generation is updated to work with non-contiguous numa nodes.
These options can be seen in the ovs-vswitchd.log. For example:
a system containing only numa nodes 0 and 8 will generate the following:
EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \
--socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0
Tests for pmd and dpif-netdev have been updated to validate non-contiguous
numbered nodes.
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
| |
New appctl 'dpdk/get-malloc-stats' implemented to get result of
'rte_malloc_dump_stats()' function.
Could be used for debugging.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Salem Sol <salems@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds support for DPDK v20.11, it includes the following
changes.
1. travis: Remove explicit DPDK kmods configuration.
2. sparse: Fix build with 20.05 DPDK tracepoints.
3. netdev-dpdk: Remove experimental API flag.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=173216&state=*
4. sparse: Update to DPDK 20.05 trace point header.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=179604&state=*
5. sparse: Fix build with DPDK 20.08.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=200181&state=*
6. build: Add support for DPDK meson build.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=199138&state=*
7. netdev-dpdk: Remove usage of RTE_ETH_DEV_CLOSE_REMOVE flag.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=207850&state=*
8. netdev-dpdk: Fix build with 20.11-rc1.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=209006&state=*
9. sparse: Fix __ATOMIC_* redefinition errors
http://patchwork.ozlabs.org/project/openvswitch/list/?series=209452&state=*
10. build: Remove DPDK make build references.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=216682&state=*
For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com>
Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Co-authored-by: Eli Britstein <elibr@nvidia.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com>
Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling debug logs in dpdk can be a challenge to be sure of what is
actually enabled, add commands to list and change those log levels.
However, these commands do not help when tracking issues in dpdk init
itself: dump log levels right after init.
Example:
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is info
id 1: lib.malloc, level is info
id 2: lib.ring, level is info
id 3: lib.mempool, level is info
id 4: lib.timer, level is info
id 5: pmd, level is info
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is info
[...]
$ ovs-appctl dpdk/log-set debug pmd.*:notice
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is debug
id 1: lib.malloc, level is debug
id 2: lib.ring, level is debug
id 3: lib.mempool, level is debug
id 4: lib.timer, level is debug
id 5: pmd, level is debug
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is notice
[...]
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
| |
This commit implements a method to retrieve the CPU ISA capabilities.
These ISA capabilities can be used in OVS to at runtime select a function
implementation to make the best use of the available ISA on the CPU.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
DPDK pdump was deprecated in 2.13 release and didn't actually
work since 2.11. Removing it.
More details in commit 4ae8c4617fd3 ("dpdk: Deprecate pdump support.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds support for DPDK v19.11, it includes the following
changes.
1. travis: Enable compilation and linkage with dpdk 19.11.
2. sparse: Remove dpdk network headers copies.
https://patchwork.ozlabs.org/patch/1185256/
3. dpdk: Migrate to new PDUMP API.
https://patchwork.ozlabs.org/patch/1192971/
4. netdev-dpdk: Prefix network structures with rte_.
https://patchwork.ozlabs.org/patch/1109733/
5. netdev-dpdk: Update by new color definitions.
https://patchwork.ozlabs.org/patch/1086089/
6. docs: Update docs to reference 19.11.
7. docs: Add note regarding hotplug and igb_uio requirements.
For credit all authors of the original commits to 'dpdk-latest' with the
above changes been added as co-authors for this commmit.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Co-authored-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conventional way for packet dumping in OVS is to use ovs-tcpdump
that works via traffic mirroring. DPDK pdump could probably be used
for some lower level debugging, but it is not commonly used for
various reasons.
There are lots of limitations for using this functionality in practice.
Most of them connected with running secondary pdump process and
memory layout issues like requirement to disable ASLR in kernel.
More details are available in DPDK guide:
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
Beside the functional limitations it's also hard to use this
functionality correctly. User must be sure that OVS and pdump utility
are running on different CPU cores, which is hard because non-PMD
threads could float over available CPU cores. This or any other
misconfiguration will likely lead to crash of the pdump utility
or/and OVS.
Another problem is that the user must actually have this special pdump
utility in a system and it might be not available in distributions.
This change disables pdump support by default introducing special
configuration option '--enable-dpdk-pdump'. Deprecation warnings will
be shown to users on configuration and in runtime.
Claiming to completely remove this functionality from OVS in one
of the next releases.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
No need to duplicate and null-terminate the passed buffer.
We can directly give it to the vlog subsystem using a dynamic precision
in the format string.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
This allows to decrease code duplication and avoid using Linux-specific
functions (this might be useful in the future if we'll try to allow
running OvS+DPDK on FreeBSD).
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike 'rte_get_tsc_cycles()' which doesn't need any specific
initialization, 'rte_get_tsc_hz()' could be used only after successfull
call to 'rte_eal_init()'. 'rte_eal_init()' estimates the TSC frequency
for later use by 'rte_get_tsc_hz()'. Fairly said, we're not allowed
to use 'rte_get_tsc_cycles()' before initializing DPDK too, but it
works this way for now and provides correct results.
This patch provides TSC frequency estimation code that will be used
in two cases:
* DPDK is not compiled in, i.e. DPDK_NETDEV not defined.
* DPDK compiled in but not initialized,
i.e. other_config:dpdk-init=false
This change is mostly useful for AF_XDP netdev support, i.e. allows
to use dpif-netdev/pmd-perf-show command and various PMD perf metrics.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flow API providers renamed to be consistent with parent module
'netdev-offload' and look more like each other.
'_rte_' replaced with more convenient '_dpdk_'.
We'll have following structure:
Common code:
lib/netdev-offload-provider.h
lib/netdev-offload.c
lib/netdev-offload.h
Providers:
lib/netdev-offload-tc.c
lib/netdev-offload-dpdk.c
'netdev-offload-dummy' still resides inside netdev-dummy, but it
makes no much sence to move it out of there.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.
Flow API providers further encapsulated.
The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current issues with Flow API:
* OVS calls offloading functions regardless of successful
flow API initialization. (ex. on init_flow_api failure)
* Static initilaization of Flow API for a netdev_class forbids
having different offloading types for different instances
of netdev with the same netdev_class. (ex. different vports in
'system' and 'netdev' datapaths at the same time)
Solution:
* Move Flow API from the netdev_class to netdev instance.
* Make Flow API dynamic, i.e. probe the APIs and choose the
suitable one.
Side effects:
* Flow API providers localized as possible in their modules.
* Now we have an ability to make runtime checks. For example,
we could check if particular device supports features we
need, like if dpdk device supports RSS+MARK action.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Post-copy Live Migration for vHost supported since DPDK 18.11 and
QEMU 2.12. New global config option 'vhost-postcopy-support' added
to control this feature. Ex.:
ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true
Changing this value requires restarting the daemon. It's safe to
enable this knob even if QEMU doesn't support post-copy LM.
Feature marked as experimental and disabled by default because it may
cause PMD thread hang on destination host on page fault for the time
of page downloading from the source.
Feature is not compatible with 'mlockall' and 'dequeue zero-copy'.
Support added only for vhost-user-client.
Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Information about memzones reserved on init is not much useful.
Anyway, we need to log it in more civilized manner, i.e. through
the OVS logging subsystem.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before supporting the DPDK initialization status in DB 'dpdk-init' was
just a boolean and 'smap_get_bool', which is case-insensitive, was used
to get the value.
Current code uses simple 'strcmp' that fails to recognize values like
"True". As a result this breaks different OVS configuration tools.
For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS
is not able to recognize it leading to broken installations.
'strcasecmp' should be used instead to fix the issue.
CC: Aaron Conole <aconole@redhat.com>
Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since 18.05 release, DPDK moved to dynamic memory model in which
hugepages could be allocated on demand. At the same time '--socket-mem'
option was re-defined as a size of pre-allocated memory, i.e. memory
that should be allocated at startup and could not be freed.
So, DPDK with a new memory model could allocate more hugepage memory
than specified in '--socket-mem' or '-m' options.
This change adds new configurable 'other_config:dpdk-socket-limit'
which could be used to limit the ammount of memory DPDK could use.
It uses new DPDK option '--socket-limit'.
Ex.:
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024"
Also, in order to preserve old behaviour, if '--socket-limit' is not
specified, it will be defaulted to the amount of memory specified by
'--socket-mem' option, i.e. OVS will not be able to allocate more.
This is needed, for example, to disallow OVS to allocate more memory
than reserved for it by Nova in OpenStack installations.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
| |
No need to implement dynamic vector to store arguments.
'svec' perfectly covers all the needed functionality.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
| |
No need to allocate memory and use 'strcat' direcly.
'dynamic-string' could do this for us.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.
OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html
However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.
This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.
This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.
This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.
ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.
The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits
c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)
A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Tested-by: Tiago Lam <tiago.lam@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When "dpdk-socket-mem" and "dpdk-alloc-mem" are not specified,
"dpdk-socket-mem" will be set to allocate 1024MB on each NUMA node.
This change will prevent OVS from failing when NIC is attached on
NUMA node 1 and higher. Patch contains documentation update.
Signed-off-by: Marcin Rybka <marcinx.rybka@intel.com>
Co-authored-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Tested-by: Hariprasad Govindharajan <hariprasad.govindharajan@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The normal way of retrieving the running DPDK status involves parsing
log files and issuing various incantations of ovs-vsctl and ovs-appctl
commands to determine whether the rte_eal_init successfully started.
This commit adds two new records to reflect the dpdk version, and
the dpdk initialization status.
To support this, the other_config:dpdk-init configuration block supports
the 'true' and 'try' keywords now, instead of just 'true'.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's possible for dpdk initialization to fail either due to an internal
error or an invalid configuration. When that happens, it's rather
impolite to immediately abort without any details.
With this change, a failed dpdk initialization attempt will continue to
trigger a SIGABRT. However, the failure details will be logged, and a
user or administrator may have more information to correct the issue.
A restart of OvS would still be required to re-attempt initialization.
The refactor to propagate the init error will be used in an upcoming
commit.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DPDK could produce huge amount of logs. For example, in case of
exhausting of a mempool in vhost-user port, following message will be
printed on each call to 'rte_vhost_dequeue_burst()':
|ERR|VHOST_DATA: Failed to allocate memory for mbuf.
These messages are increasing ovs-vswitchd.log size extremely fast
making it unreadable and non-parsable by a common linux utils like
grep, less etc. Moreover continuously growing log could exhaust the
HDD space in a few hours breaking normal operation of the whole system.
To avoid such issues, DPDK log rate limited to 600 messages per minute.
This value is high, because we still want to see many big logs like
vhost-user configuration sequence. The debug messages are treated
separately to avoid looss of errors/warnings in case of intensive debug
enabled in DPDK.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Show DPDK version if Open vSwitch is compiled with DPDK support.
Version can be retrieved with `ovs-vswitchd --version` or from OVS logs.
Small change in ovs-ctl to avoid breakage on output change.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.
This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.
This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This should be helpful for have all the logs in one place.
'ovs-appctl vlog' commands for 'dpdk' module can be used
to configure the log level. Lower bound for DPDK logging
(--log-level) still can be passed through 'dpdk-extra' field.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If users configure the 'vhost-sock-dir' for dpdk, the memory
allocated by xstrdup(ovs_rundir()) is not freed. This patch
allows the process_vhost_flags to xstrdup() for val or
default_val according to configuration and the caller must
free new_val when it is no longer needed.
Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.")
CC: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this commit, we allow the user to set other_config:dpdk-init=true
after the process is started. This makes it easier to start Open
vSwitch with DPDK using standard init scripts without restarting the
service.
This is still far from ideal, because initializing DPDK might still
abort the process (e.g. if there not enough memory), so the user must
check the status of the process after setting dpdk-init to true.
Nonetheless, I think this is an improvement, because it doesn't require
restarting the whole unit.
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Aaron Conole <aconole@redhat.com>
|
|
|
|
|
|
|
|
|
| |
DPDK documentation is recently updated to reflect that DPDK does not
hold any references to, nor take ownership of, the argv/argc elements.
With that understanding, let's just release the memory asap.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DPDK EAL library intents that all argc/argv arguments passed on the
command line will be in the form:
progname dpdk arguments program arguments
This means the argv array will look something like:
argv[0] = progname
argv[1..x] = dpdk arguments
argv[x..y] = program arguments
When the eal initialization routine completes, it will modify the argv array
to set argv[ret] = progname, such that the arguments can then be passed to
something like getopts for further processing.
When the dpdk arguments rework was initially added, the assignment mentioned
above was not considered. This means two errors were introduced:
1. Leak of the element at argv[ret]
2. Double-free of the element at argv[0]
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2016-November/325442.html
Fixes: bab694097133 ("netdev-dpdk: Convert initialization from cmdline to db")
Signed-off-by: Aaron Conole <aconole@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The rte_pdump header file was not included in the file that requires it.
Fix this.
Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
|
|
There's a lot of code in netdev-dpdk which is not at all related to the
netdev interface, mostly the library initialization code.
This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'.
Also a new module 'dpdk-stub' is introduced to implement some functions
when DPDK is not available. This replaces the old 'netdev-nodpdk'
module.
Some redundant includes are removed or reorganized as a consequence.
No functional change.
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
|