Reduce result discount in conSize

Ticket #18282 showed that the result discount given by conSize was massively too large. This patch reduces that discount to a constant 10, which just balances the cost of the constructor application itself. Note [Constructor size and result discount] elaborates, as does the ticket #18282. Reducing result discount reduces inlining, which affects perf. I found that I could increase the unfoldingUseThrehold from 80 to 90 in compensation; in combination with the result discount change I get these overall nofib numbers: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- boyer -0.2% +5.4% -3.2% -3.4% 0.0% cichelli -0.1% +5.9% -11.2% -11.7% 0.0% compress2 -0.2% +9.6% -6.0% -6.8% 0.0% cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0% gamteb -0.2% +2.6% -13.8% -14.4% 0.0% genfft -0.1% -1.6% -29.5% -29.9% 0.0% gg -0.0% -2.2% -17.2% -17.8% -20.0% life -0.1% -2.2% -62.3% -63.4% 0.0% mate +0.0% +1.4% -5.1% -5.1% -14.3% parser -0.2% -2.1% +7.4% +6.7% 0.0% primetest -0.2% -12.8% -14.3% -14.2% 0.0% puzzle -0.2% +2.1% -10.0% -10.4% 0.0% rsa -0.2% -11.7% -3.7% -3.8% 0.0% simple -0.2% +2.8% -36.7% -38.3% -2.2% wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9% -------------------------------------------------------------------------------- Min -0.4% -19.2% -62.3% -63.4% -42.9% Max +0.3% +9.6% +7.4% +11.0% +16.7% Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7% I'm ok with these numbers, remembering that this change removes an *exponential* increase in code size in some in-the-wild cases. I investigated compress2. The difference is entirely caused by this function no longer inlining WriteRoutines.$woutputCodes = \ (w :: [CodeEvent]) -> let result_s1Sr = case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of (# ww1, ww2 #) -> (ww1, ww2) in (# case result_s1Sr of (x, _) -> map @Int @Char WriteRoutines.outputCodes1 x , case result_s1Sr of { (_, y) -> y } #) It was right on the cusp before, driven by the excessive result discount. Too bad! Happily, the compiler/perf tests show a number of improvements: T12227 compiler bytes-alloc -6.6% T12545 compiler bytes-alloc -4.7% T13056 compiler bytes-alloc -3.3% T15263 runtime bytes-alloc -13.1% T17499 runtime bytes-alloc -14.3% T3294 compiler bytes-alloc -1.1% T5030 compiler bytes-alloc -11.7% T9872a compiler bytes-alloc -2.0% T9872b compiler bytes-alloc -1.2% T9872c compiler bytes-alloc -1.5% Metric Decrease: T12227 T12545 T13056 T15263 T17499 T3294 T5030 T9872a T9872b T9872c
author: Simon Peyton Jones <simonpj@microsoft.com> 2020-06-19 09:05:08 +0100
committer: Ben Gamari <ben@smart-cactus.org> 2020-07-13 14:52:49 -0400
commit: 7ccb760b1a8034b28171d7540712fd195f65d1fd (patch)
tree: 2a1a670a396f5dedf25c5f871cf9a27ae313e197 /compiler
parent: e78c4efb8735eb97f17e7b4ca35e305b0766f78a (diff)
download: haskell-7ccb760b1a8034b28171d7540712fd195f65d1fd.tar.gz
2 files changed, 48 insertions, 25 deletions
diff --git a/compiler/GHC/Core/Unfold.hs b/compiler/GHC/Core/Unfold.hs
index 01c0a99638..5795ca0036 100644
--- a/compiler/GHC/Core/Unfold.hs
+++ b/compiler/GHC/Core/Unfold.hs
@@ -887,16 +887,13 @@ conSize dc n_val_args
   | n_val_args == 0 = SizeIs 0 emptyBag 10    -- Like variables
 
 -- See Note [Unboxed tuple size and result discount]
-  | isUnboxedTupleCon dc = SizeIs 0 emptyBag (10 * (1 + n_val_args))
+  | isUnboxedTupleCon dc = SizeIs 0 emptyBag 10
 
 -- See Note [Constructor size and result discount]
-  | otherwise = SizeIs 10 emptyBag (10 * (1 + n_val_args))
+  | otherwise = SizeIs 10 emptyBag 10
 
--- XXX still looks to large to me
-
-{-
-Note [Constructor size and result discount]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+{- Note [Constructor size and result discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Treat a constructors application as size 10, regardless of how many
 arguments it has; we are keen to expose them (and we charge separately
 for their args).  We can't treat them as size zero, else we find that
@@ -907,14 +904,32 @@ The "result discount" is applied if the result of the call is
 scrutinised (say by a case).  For a constructor application that will
 mean the constructor application will disappear, so we don't need to
 charge it to the function.  So the discount should at least match the
-cost of the constructor application, namely 10.  But to give a bit
-of extra incentive we give a discount of 10*(1 + n_val_args).
-
-Simon M tried a MUCH bigger discount: (10 * (10 + n_val_args)),
-and said it was an "unambiguous win", but its terribly dangerous
-because a function with many many case branches, each finishing with
-a constructor, can have an arbitrarily large discount.  This led to
-terrible code bloat: see #6099.
+cost of the constructor application, namely 10.
+
+Historical note 1: Until Jun 2020 we gave it a "bit of extra
+incentive" via a discount of 10*(1 + n_val_args), but that was FAR too
+much (#18282).  In particular, consider a huge case tree like
+
+   let r = case y1 of
+          Nothing -> B1 a b c
+          Just v1 -> case y2 of
+                      Nothing -> B1 c b a
+                      Just v2 -> ...
+
+If conSize gives a cost of 10 (regardless of n_val_args) and a
+discount of 10, that'll make each alternative RHS cost zero.  We
+charge 10 for each case alternative (see size_up_alt).  If we give a
+bigger discount (say 20) in conSize, we'll make the case expression
+cost *nothing*, and that can make a huge case tree cost nothing. This
+leads to massive, sometimes exponenial inlinings (#18282).  In short,
+don't give a discount that give a negative size to a sub-expression!
+
+Historical note 2: Much longer ago, Simon M tried a MUCH bigger
+discount: (10 * (10 + n_val_args)), and said it was an "unambiguous
+win", but its terribly dangerous because a function with many many
+case branches, each finishing with a constructor, can have an
+arbitrarily large discount.  This led to terrible code bloat: see
+#6099.
 
 Note [Unboxed tuple size and result discount]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -924,7 +939,7 @@ and f wasn't getting inlined.
 
 I tried giving unboxed tuples a *result discount* of zero (see the
 commented-out line).  Why?  When returned as a result they do not
-allocate, so maybe we don't want to charge so much for them If you
+allocate, so maybe we don't want to charge so much for them. If you
 have a non-zero discount here, we find that workers often get inlined
 back into wrappers, because it look like
     f x = case $wf x of (# a,b #) -> (a,b)
@@ -933,6 +948,9 @@ shrank binary sizes by 0.5% it also made spectral/boyer allocate 5%
 more. All other changes were very small. So it's not a big deal but I
 didn't adopt the idea.
 
+When fixing #18282 (see Note [Constructor size and result discount])
+I changed the result discount to be just 10, not 10*(1+n_val_args).
+
 Note [Function and non-function discounts]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We want a discount if the function is applied. A good example is
diff --git a/compiler/GHC/Driver/Session.hs b/compiler/GHC/Driver/Session.hs
index a558ceae96..17e3796c3d 100644
--- a/compiler/GHC/Driver/Session.hs
+++ b/compiler/GHC/Driver/Session.hs
@@ -1412,16 +1412,21 @@ defaultDynFlags mySettings llvmConfig =
         extensions = [],
         extensionFlags = flattenExtensionFlags Nothing [],
 
-        -- The ufCreationThreshold threshold must be reasonably high to
-        -- take account of possible discounts.
-        -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to inline
-        -- into Csg.calc (The unfolding for sqr never makes it into the
-        -- interface file.)
         ufCreationThreshold = 750,
-        ufUseThreshold      = 80,
-        ufFunAppDiscount    = 60,
-        -- Be fairly keen to inline a function if that means
-        -- we'll be able to pick the right method from a dictionary
+           -- The ufCreationThreshold threshold must be reasonably high
+           -- to take account of possible discounts.
+           -- E.g. 450 is not enough in 'fulsom' for Interval.sqr to
+           -- inline into Csg.calc (The unfolding for sqr never makes it
+           -- into the interface file.)
+
+        ufUseThreshold = 90,
+           -- Last adjusted upwards in #18282, when I reduced
+           -- the result discount for constructors.
+
+        ufFunAppDiscount = 60,
+           -- Be fairly keen to inline a function if that means
+           -- we'll be able to pick the right method from a dictionary
+
         ufDictDiscount      = 30,
         ufDearOp            = 40,
         ufVeryAggressive    = False,
author	Simon Peyton Jones <simonpj@microsoft.com>	2020-06-19 09:05:08 +0100
committer	Ben Gamari <ben@smart-cactus.org>	2020-07-13 14:52:49 -0400
commit	7ccb760b1a8034b28171d7540712fd195f65d1fd (patch)
tree	2a1a670a396f5dedf25c5f871cf9a27ae313e197 /compiler
parent	e78c4efb8735eb97f17e7b4ca35e305b0766f78a (diff)
download	haskell-7ccb760b1a8034b28171d7540712fd195f65d1fd.tar.gz