diff options
author | Sergei Trofimovich <slyfox@gentoo.org> | 2016-09-01 17:34:58 +0100 |
---|---|---|
committer | Sergei Trofimovich <siarheit@google.com> | 2016-09-01 17:36:37 +0100 |
commit | a48de37dcca98e7d477040b0ed298bcd1b3ab303 (patch) | |
tree | a668a64334152ab6e62028fb0d4cf5aa81ebc33e | |
parent | da920f691145175dc310055ae533757e638caab4 (diff) | |
download | haskell-a48de37dcca98e7d477040b0ed298bcd1b3ab303.tar.gz |
restore -fmax-worker-args handling (Trac #11565)
maxWorkerArgs handling was accidentally lost 3 years ago
in a major update of demand analysis
commit 0831a12ea2fc73c33652eeec1adc79fa19700578
Old regression is noticeable as:
- code bloat (requires stack reshuffling)
- compilation slowdown (more code to optimise/generate)
- and increased heap usage (DynFlags unboxing/reboxing?)
On a simple compile benchmark this change causes heap
allocation drop from 70G don to 67G (ghc perf build).
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
Reviewers: simonpj, ezyang, goldfire, austin, bgamari
Reviewed By: simonpj, ezyang
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2503
GHC Trac Issues: #11565
-rw-r--r-- | compiler/stranal/WwLib.hs | 30 | ||||
-rw-r--r-- | testsuite/tests/perf/compiler/all.T | 6 | ||||
-rw-r--r-- | testsuite/tests/perf/space_leaks/all.T | 6 |
3 files changed, 37 insertions, 5 deletions
diff --git a/compiler/stranal/WwLib.hs b/compiler/stranal/WwLib.hs index 0057f6fb78..1a096052db 100644 --- a/compiler/stranal/WwLib.hs +++ b/compiler/stranal/WwLib.hs @@ -143,7 +143,7 @@ mkWwBodies dflags fam_envs fun_ty demands res_info wrapper_body = wrap_fn_args . wrap_fn_cpr . wrap_fn_str . applyToVars work_call_args . Var worker_body = mkLams work_lam_args. work_fn_str . work_fn_cpr . work_fn_args - ; if useful1 && not only_one_void_argument || useful2 + ; if is_small_enough work_args && (useful1 && not only_one_void_argument || useful2) then return (Just (worker_args_dmds, wrapper_body, worker_body)) else return Nothing } @@ -163,6 +163,10 @@ mkWwBodies dflags fam_envs fun_ty demands res_info = True | otherwise = False + is_small_enough args = count isId args <= maxWorkerArgs dflags + -- See Note [Limit w/w arity] + -- We count only Free variables (isId) to skip Type, Kind + -- variables which have no runtime representation. {- Note [Always do CPR w/w] @@ -177,6 +181,30 @@ a disaster, because then the enclosing function might say it has the CPR property, but now doesn't and there a cascade of disaster. A good example is Trac #5920. +Note [Limit w/w arity] +~~~~~~~~~~~~~~~~~~~~~~~~ +Guard against high worker arity as it generates a lot of stack traffic. +A simplified example is Trac #11565#comment:6 + +Current strategy is very simple: don't perform w/w transformation at all +if the result produces a wrapper with arity higher than -fmax-worker-args=. + +It is a bit all or nothing, consider + + f (x,y) (a,b,c,d,e ... , z) = rhs + +Currently we will remove all w/w ness entirely. But actually we could +w/w on the (x,y) pair... it's the huge product that is the problem. + +Could we instead refrain from w/w on an arg-by-arg basis? Yes, that'd +solve f. But we can get a lot of args from deeply-nested products: + + g (a, (b, (c, (d, ...)))) = rhs + +This is harder to spot on an arg-by-arg basis. Previously mkWwStr was +given some "fuel" saying how many arguments it could add; when we ran +out of fuel it would stop w/wing. +Still not very clever because it had a left-right bias. ************************************************************************ * * diff --git a/testsuite/tests/perf/compiler/all.T b/testsuite/tests/perf/compiler/all.T index 3c8cbdabf9..130ba44691 100644 --- a/testsuite/tests/perf/compiler/all.T +++ b/testsuite/tests/perf/compiler/all.T @@ -428,8 +428,9 @@ test('T5631', test('parsing001', [compiler_stats_num_field('bytes allocated', [(wordsize(32), 274000576, 10), - (wordsize(64), 587079016, 5)]), + (wordsize(64), 581551384, 5)]), # expected value: 587079016 (amd64/Linux) + # 2016-09-01: 581551384 (amd64/Linux) Restore w/w limit (#11565) only_ways(['normal']), ], compile_fail, ['']) @@ -767,7 +768,7 @@ test('T9872d', test('T9961', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 568526784, 5), + [(wordsize(64), 537297968, 5), # 2015-01-12 807117816 Initally created # 2015-spring 772510192 Got better # 2015-05-22 663978160 Fix for #10370 improves it more @@ -775,6 +776,7 @@ test('T9961', # 2015-12-17 745044392 x86_64/Darwin Creep upwards # 2016-03-20 519436672 x64_64/Linux Don't use build desugaring for large lists (#11707) # 2016-03-24 568526784 x64_64/Linux Add eqInt* variants (#11688) + # 2016-09-01 537297968 x64_64/Linux Restore w/w limit (#11565) (wordsize(32), 275264188, 5) # was 375647160 # 2016-04-06 275264188 x86/Linux diff --git a/testsuite/tests/perf/space_leaks/all.T b/testsuite/tests/perf/space_leaks/all.T index c6b1d925a6..301029cf58 100644 --- a/testsuite/tests/perf/space_leaks/all.T +++ b/testsuite/tests/perf/space_leaks/all.T @@ -58,17 +58,19 @@ test('T4018', test('T4029', [stats_num_field('peak_megabytes_allocated', - [(wordsize(64), 82, 10)]), + [(wordsize(64), 71, 10)]), # 2016-02-26: 66 (amd64/Linux) INITIAL # 2016-05-23: 82 (amd64/Linux) Use -G1 # 2016-07-13: 92 (amd64/Linux) Changes to tidyType + # 2016-09-01: 71 (amd64/Linux) Restore w/w limit (#11565) stats_num_field('max_bytes_used', - [(wordsize(64), 22920616, 5)]), + [(wordsize(64), 21648488, 5)]), # 2016-02-26: 24071720 (amd64/Linux) INITIAL # 2016-04-21: 25542832 (amd64/Linux) # 2016-05-23: 25247216 (amd64/Linux) Use -G1 # 2016-07-13: 27575416 (amd64/Linux) Changes to tidyType # 2016-07-20: 22920616 (amd64/Linux) Fix laziness of instance matching + # 2016-09-01: 21648488 (amd64/Linux) Restore w/w limit (#11565) extra_hc_opts('+RTS -G1 -RTS' ), ], ghci_script, |