diff options
author | Simon Peyton Jones <simonpj@microsoft.com> | 2020-06-19 09:05:08 +0100 |
---|---|---|
committer | Ben Gamari <ben@smart-cactus.org> | 2020-07-13 14:52:49 -0400 |
commit | 7ccb760b1a8034b28171d7540712fd195f65d1fd (patch) | |
tree | 2a1a670a396f5dedf25c5f871cf9a27ae313e197 /compiler | |
parent | e78c4efb8735eb97f17e7b4ca35e305b0766f78a (diff) | |
download | haskell-7ccb760b1a8034b28171d7540712fd195f65d1fd.tar.gz |
Reduce result discount in conSize
Ticket #18282 showed that the result discount given by conSize
was massively too large. This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.
Note [Constructor size and result discount] elaborates, as
does the ticket #18282.
Reducing result discount reduces inlining, which affects perf. I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
boyer -0.2% +5.4% -3.2% -3.4% 0.0%
cichelli -0.1% +5.9% -11.2% -11.7% 0.0%
compress2 -0.2% +9.6% -6.0% -6.8% 0.0%
cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0%
gamteb -0.2% +2.6% -13.8% -14.4% 0.0%
genfft -0.1% -1.6% -29.5% -29.9% 0.0%
gg -0.0% -2.2% -17.2% -17.8% -20.0%
life -0.1% -2.2% -62.3% -63.4% 0.0%
mate +0.0% +1.4% -5.1% -5.1% -14.3%
parser -0.2% -2.1% +7.4% +6.7% 0.0%
primetest -0.2% -12.8% -14.3% -14.2% 0.0%
puzzle -0.2% +2.1% -10.0% -10.4% 0.0%
rsa -0.2% -11.7% -3.7% -3.8% 0.0%
simple -0.2% +2.8% -36.7% -38.3% -2.2%
wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9%
--------------------------------------------------------------------------------
Min -0.4% -19.2% -62.3% -63.4% -42.9%
Max +0.3% +9.6% +7.4% +11.0% +16.7%
Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7%
I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.
I investigated compress2. The difference is entirely caused by this
function no longer inlining
WriteRoutines.$woutputCodes
= \ (w :: [CodeEvent]) ->
let result_s1Sr
= case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
(# ww1, ww2 #) -> (ww1, ww2)
in (# case result_s1Sr of (x, _) ->
map @Int @Char WriteRoutines.outputCodes1 x
, case result_s1Sr of { (_, y) -> y } #)
It was right on the cusp before, driven by the excessive result
discount. Too bad!
Happily, the compiler/perf tests show a number of improvements:
T12227 compiler bytes-alloc -6.6%
T12545 compiler bytes-alloc -4.7%
T13056 compiler bytes-alloc -3.3%
T15263 runtime bytes-alloc -13.1%
T17499 runtime bytes-alloc -14.3%
T3294 compiler bytes-alloc -1.1%
T5030 compiler bytes-alloc -11.7%
T9872a compiler bytes-alloc -2.0%
T9872b compiler bytes-alloc -1.2%
T9872c compiler bytes-alloc -1.5%
Metric Decrease:
T12227
T12545
T13056
T15263
T17499
T3294
T5030
T9872a
T9872b
T9872c
Diffstat (limited to 'compiler')
-rw-r--r-- | compiler/GHC/Core/Unfold.hs | 50 | ||||
-rw-r--r-- | compiler/GHC/Driver/Session.hs | 23 |
2 files changed, 48 insertions, 25 deletions
diff --git a/compiler/GHC/Core/Unfold.hs b/compiler/GHC/Core/Unfold.hs index 01c0a99638..5795ca0036 100644 --- a/compiler/GHC/Core/Unfold.hs +++ b/compiler/GHC/Core/Unfold.hs @@ -887,16 +887,13 @@ conSize dc n_val_args | n_val_args == 0 = SizeIs 0 emptyBag 10 -- Like variables -- See Note [Unboxed tuple size and result discount] - | isUnboxedTupleCon dc = SizeIs 0 emptyBag (10 * (1 + n_val_args)) + | isUnboxedTupleCon dc = SizeIs 0 emptyBag 10 -- See Note [Constructor size and result discount] - | otherwise = SizeIs 10 emptyBag (10 * (1 + n_val_args)) + | otherwise = SizeIs 10 emptyBag 10 --- XXX still looks to large to me - -{- -Note [Constructor size and result discount] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +{- Note [Constructor size and result discount] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Treat a constructors application as size 10, regardless of how many arguments it has; we are keen to expose them (and we charge separately for their args). We can't treat them as size zero, else we find that @@ -907,14 +904,32 @@ The "result discount" is applied if the result of the call is scrutinised (say by a case). For a constructor application that will mean the constructor application will disappear, so we don't need to charge it to the function. So the discount should at least match the -cost of the constructor application, namely 10. But to give a bit -of extra incentive we give a discount of 10*(1 + n_val_args). - -Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)), -and said it was an "unambiguous win", but its terribly dangerous -because a function with many many case branches, each finishing with -a constructor, can have an arbitrarily large discount. This led to -terrible code bloat: see #6099. +cost of the constructor application, namely 10. + +Historical note 1: Until Jun 2020 we gave it a "bit of extra +incentive" via a discount of 10*(1 + n_val_args), but that was FAR too +much (#18282). In particular, consider a huge case tree like + + let r = case y1 of + Nothing -> B1 a b c + Just v1 -> case y2 of + Nothing -> B1 c b a + Just v2 -> ... + +If conSize gives a cost of 10 (regardless of n_val_args) and a +discount of 10, that'll make each alternative RHS cost zero. We +charge 10 for each case alternative (see size_up_alt). If we give a +bigger discount (say 20) in conSize, we'll make the case expression +cost *nothing*, and that can make a huge case tree cost nothing. This +leads to massive, sometimes exponenial inlinings (#18282). In short, +don't give a discount that give a negative size to a sub-expression! + +Historical note 2: Much longer ago, Simon M tried a MUCH bigger +discount: (10 * (10 + n_val_args)), and said it was an "unambiguous +win", but its terribly dangerous because a function with many many +case branches, each finishing with a constructor, can have an +arbitrarily large discount. This led to terrible code bloat: see +#6099. Note [Unboxed tuple size and result discount] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -924,7 +939,7 @@ and f wasn't getting inlined. I tried giving unboxed tuples a *result discount* of zero (see the commented-out line). Why? When returned as a result they do not -allocate, so maybe we don't want to charge so much for them If you +allocate, so maybe we don't want to charge so much for them. If you have a non-zero discount here, we find that workers often get inlined back into wrappers, because it look like f x = case $wf x of (# a,b #) -> (a,b) @@ -933,6 +948,9 @@ shrank binary sizes by 0.5% it also made spectral/boyer allocate 5% more. All other changes were very small. So it's not a big deal but I didn't adopt the idea. +When fixing #18282 (see Note [Constructor size and result discount]) +I changed the result discount to be just 10, not 10*(1+n_val_args). + Note [Function and non-function discounts] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We want a discount if the function is applied. A good example is diff --git a/compiler/GHC/Driver/Session.hs b/compiler/GHC/Driver/Session.hs index a558ceae96..17e3796c3d 100644 --- a/compiler/GHC/Driver/Session.hs +++ b/compiler/GHC/Driver/Session.hs @@ -1412,16 +1412,21 @@ defaultDynFlags mySettings llvmConfig = extensions = [], extensionFlags = flattenExtensionFlags Nothing [], - -- The ufCreationThreshold threshold must be reasonably high to - -- take account of possible discounts. - -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to inline - -- into Csg.calc (The unfolding for sqr never makes it into the - -- interface file.) ufCreationThreshold = 750, - ufUseThreshold = 80, - ufFunAppDiscount = 60, - -- Be fairly keen to inline a function if that means - -- we'll be able to pick the right method from a dictionary + -- The ufCreationThreshold threshold must be reasonably high + -- to take account of possible discounts. + -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to + -- inline into Csg.calc (The unfolding for sqr never makes it + -- into the interface file.) + + ufUseThreshold = 90, + -- Last adjusted upwards in #18282, when I reduced + -- the result discount for constructors. + + ufFunAppDiscount = 60, + -- Be fairly keen to inline a function if that means + -- we'll be able to pick the right method from a dictionary + ufDictDiscount = 30, ufDearOp = 40, ufVeryAggressive = False, |