diff options
author | Matthew Pickering <matthewtpickering@gmail.com> | 2022-03-22 18:02:44 +0000 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2022-03-29 13:08:42 -0400 |
commit | 840a6811b074a883b6e01ff10ac65387acd85089 (patch) | |
tree | 6cb941332a93ba4467fa457d11d731efedf07819 /rts | |
parent | ad09a5f7671c5594524c517945b2bdd7c5e2c8fa (diff) | |
download | haskell-840a6811b074a883b6e01ff10ac65387acd85089.tar.gz |
RTS: Zero gc_cpu_start and gc_cpu_end after accounting
When passed a combination of `-N` and `-qn` options the cpu time for
garbage collection was being vastly overcounted because the counters
were not being zeroed appropiately.
When -qn1 is passed, only 1 of the N avaiable GC threads is chosen to
perform work, the rest are idle. At the end of the GC period, stat_endGC
traverses all the GC threads and adds up the elapsed time from each of
them. For threads which didn't participate in this GC, the value of the
cpu time should be zero, but before this patch, the counters were not
zeroed and hence we would count the same elapsed time on many subsequent
iterations (until the thread participated in a GC again).
The most direct way to zero these fields is to do so immediately after
the value is added into the global counter, after which point they are
never used again.
We also tried another approach where we would zero the counter in
yieldCapability but there are some (undiagnosed) siations where a
capbility would not pass through yieldCapability before the GC ended and
the same double counting problem would occur.
Fixes #21082
Diffstat (limited to 'rts')
-rw-r--r-- | rts/Stats.c | 20 |
1 files changed, 11 insertions, 9 deletions
diff --git a/rts/Stats.c b/rts/Stats.c index fb87a6d8e4..16875ce5d2 100644 --- a/rts/Stats.c +++ b/rts/Stats.c @@ -515,15 +515,17 @@ stat_endGC (Capability *cap, gc_thread *initiating_gct, W_ live, W_ copied, W_ s // see Note [n_gc_threads] // par_n_threads is set to n_gc_threads at the single callsite of this // function - if (par_n_threads == 1) { - ASSERT(initiating_gct->gc_end_cpu >= initiating_gct->gc_start_cpu); - stats.gc.cpu_ns += initiating_gct->gc_end_cpu - initiating_gct->gc_start_cpu; - } else { - for (unsigned int i=0; i < par_n_threads; i++) { - gc_thread *gct = gc_threads[i]; - ASSERT(gct->gc_end_cpu >= gct->gc_start_cpu); - stats.gc.cpu_ns += gct->gc_end_cpu - gct->gc_start_cpu; - } + for (unsigned int i=0; i < par_n_threads; i++) { + gc_thread *gct = gc_threads[i]; + ASSERT(gct->gc_end_cpu >= gct->gc_start_cpu); + stats.gc.cpu_ns += gct->gc_end_cpu - gct->gc_start_cpu; + // Reset the start/end times to zero after the difference has been + // added to the global total (see #21082) + // Resetting the values here is the most direct way to avoid double counting + // because the gc_end_cpu and gc_start_cpu is not reset to zero as + // stat_startGC is only called on active (not idle threads). + gct->gc_end_cpu = 0; + gct->gc_start_cpu = 0; } } // ------------------------------------------------- |