diff options
author | Fangrui Song <i@maskray.me> | 2023-04-12 13:13:38 -0700 |
---|---|---|
committer | Fangrui Song <i@maskray.me> | 2023-04-12 13:13:38 -0700 |
commit | da68d2164efcc1f5e57f090e2ae2219056b120a0 (patch) | |
tree | 3c7f265a0830b7b3e765cc6c0461d1297da4dc13 /lld/ELF | |
parent | 9bc5e8c87e9be8db2d65e71f90ba0ceea4b814a4 (diff) | |
download | llvm-da68d2164efcc1f5e57f090e2ae2219056b120a0.tar.gz |
[ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D147493
Diffstat (limited to 'lld/ELF')
-rw-r--r-- | lld/ELF/Driver.cpp | 10 |
1 files changed, 8 insertions, 2 deletions
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp index 8cc2d2005e39..5099f1cbe96a 100644 --- a/lld/ELF/Driver.cpp +++ b/lld/ELF/Driver.cpp @@ -1416,8 +1416,12 @@ static void readConfigs(opt::InputArgList &args) { config->mllvmOpts.emplace_back(arg->getValue()); } + config->threadCount = parallel::strategy.compute_thread_count(); + // --threads= takes a positive integer and provides the default value for - // --thinlto-jobs=. + // --thinlto-jobs=. If unspecified, cap the number of threads since + // overhead outweighs optimization for used parallel algorithms for the + // non-LTO parts. if (auto *arg = args.getLastArg(OPT_threads)) { StringRef v(arg->getValue()); unsigned threads = 0; @@ -1426,10 +1430,12 @@ static void readConfigs(opt::InputArgList &args) { arg->getValue() + "'"); parallel::strategy = hardware_concurrency(threads); config->thinLTOJobs = v; + } else if (config->threadCount > 16) { + log("set maximum concurrency to 16, specify --threads= to change"); + parallel::strategy = hardware_concurrency(16); } if (auto *arg = args.getLastArg(OPT_thinlto_jobs_eq)) config->thinLTOJobs = arg->getValue(); - config->threadCount = parallel::strategy.compute_thread_count(); if (config->ltoPartitions == 0) error("--lto-partitions: number of threads must be > 0"); |