[ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified

When --threads= is unspecified, we set it to `parallel::strategy.compute_thread_count()`, which uses sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others). With extensive testing on many machines (many configurations from {aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations) with varying workloads, we discovered that when the concurrency is larger than 16, the linking process is slower than using --threads=16 due to parallelism overhead outweighs optimizations. This is particularly harmful for machines with many cores or when the link job competes with other jobs. Cap parallel::strategy when --threads= is unspecified. For some workloads changing the concurrency from 8 to 16 has nearly no improvement. --thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly parallel. Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160 Reviewed By: peter.smith, andrewng Differential Revision: https://reviews.llvm.org/D147493
author: Fangrui Song <i@maskray.me> 2023-04-20 12:17:26 -0700
committer: Fangrui Song <i@maskray.me> 2023-04-20 12:17:26 -0700
commit: a8788de1c3f3c8c3a591bd3aae2acee1b43b229a (patch)
tree: 6692d63414e43163495c85ea5252ddc9a2604178 /lld
parent: 0c7fe5202ce0739357277cb4457ad6a399f71fde (diff)
download: llvm-a8788de1c3f3c8c3a591bd3aae2acee1b43b229a.tar.gz
1 files changed, 6 insertions, 1 deletions
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index c540f573aaef..79f16a281df9 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1421,7 +1421,9 @@ static void readConfigs(opt::InputArgList &args) {
   }
 
   // --threads= takes a positive integer and provides the default value for
-  // --thinlto-jobs=.
+  // --thinlto-jobs=. If unspecified, cap the number of threads since
+  // overhead outweighs optimization for used parallel algorithms for the
+  // non-LTO parts.
   if (auto *arg = args.getLastArg(OPT_threads)) {
     StringRef v(arg->getValue());
     unsigned threads = 0;
@@ -1430,6 +1432,9 @@ static void readConfigs(opt::InputArgList &args) {
             arg->getValue() + "'");
     parallel::strategy = hardware_concurrency(threads);
     config->thinLTOJobs = v;
+  } else if (parallel::strategy.compute_thread_count() > 16) {
+    log("set maximum concurrency to 16, specify --threads= to change");
+    parallel::strategy = hardware_concurrency(16);
   }
   if (auto *arg = args.getLastArg(OPT_thinlto_jobs_eq))
     config->thinLTOJobs = arg->getValue();
author	Fangrui Song <i@maskray.me>	2023-04-20 12:17:26 -0700
committer	Fangrui Song <i@maskray.me>	2023-04-20 12:17:26 -0700
commit	a8788de1c3f3c8c3a591bd3aae2acee1b43b229a (patch)
tree	6692d63414e43163495c85ea5252ddc9a2604178 /lld
parent	0c7fe5202ce0739357277cb4457ad6a399f71fde (diff)
download	llvm-a8788de1c3f3c8c3a591bd3aae2acee1b43b229a.tar.gz