diff options
author | Fangrui Song <i@maskray.me> | 2023-04-20 12:17:26 -0700 |
---|---|---|
committer | Fangrui Song <i@maskray.me> | 2023-04-20 12:17:26 -0700 |
commit | a8788de1c3f3c8c3a591bd3aae2acee1b43b229a (patch) | |
tree | 6692d63414e43163495c85ea5252ddc9a2604178 /lld | |
parent | 0c7fe5202ce0739357277cb4457ad6a399f71fde (diff) | |
download | llvm-a8788de1c3f3c8c3a591bd3aae2acee1b43b229a.tar.gz |
[ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith, andrewng
Differential Revision: https://reviews.llvm.org/D147493
Diffstat (limited to 'lld')
-rw-r--r-- | lld/ELF/Driver.cpp | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp index c540f573aaef..79f16a281df9 100644 --- a/lld/ELF/Driver.cpp +++ b/lld/ELF/Driver.cpp @@ -1421,7 +1421,9 @@ static void readConfigs(opt::InputArgList &args) { } // --threads= takes a positive integer and provides the default value for - // --thinlto-jobs=. + // --thinlto-jobs=. If unspecified, cap the number of threads since + // overhead outweighs optimization for used parallel algorithms for the + // non-LTO parts. if (auto *arg = args.getLastArg(OPT_threads)) { StringRef v(arg->getValue()); unsigned threads = 0; @@ -1430,6 +1432,9 @@ static void readConfigs(opt::InputArgList &args) { arg->getValue() + "'"); parallel::strategy = hardware_concurrency(threads); config->thinLTOJobs = v; + } else if (parallel::strategy.compute_thread_count() > 16) { + log("set maximum concurrency to 16, specify --threads= to change"); + parallel::strategy = hardware_concurrency(16); } if (auto *arg = args.getLastArg(OPT_thinlto_jobs_eq)) config->thinLTOJobs = arg->getValue(); |