summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBen Gamari <ben@smart-cactus.org>2020-02-03 11:20:11 -0500
committerBen Gamari <ben@smart-cactus.org>2020-04-05 23:22:04 -0400
commitf32925d8fe7dfd3c704987a8c48fcc29c09a345a (patch)
tree412bcebf842473601f6c1192440835d4be7cfd80
parentbd75e5da0f1f05f107325733b570bf28b379d2f2 (diff)
downloadhaskell-wip/rename-test-envs.tar.gz
simplifier: Kill off ufKeenessFactorwip/rename-test-envs
We used to have another factor, ufKeenessFactor, which would scale the discounts before they were subtracted from the size. This was justified with the following comment: -- We multiple the raw discounts (args_discount and result_discount) -- ty opt_UnfoldingKeenessFactor because the former have to do with -- *size* whereas the discounts imply that there's some extra -- *efficiency* to be gained (e.g. beta reductions, case reductions) -- by inlining. However, this is highly suspect since it means that we subtract a *scaled* size from an absolute size, resulting in crazy (e.g. negative) scores in some cases (#15304). We consequently killed off ufKeenessFactor and bumped up the ufUseThreshold to compensate. Adjustment of unfolding use threshold ===================================== Since this removes a discount from our inlining heuristic, I revisited our default choice of -funfolding-use-threshold to minimize the change in overall inlining behavior. Specifically, I measured runtime allocations and executable size of nofib and the testsuite performance tests built using compilers (and core libraries) built with several values of -funfolding-use-threshold. This comes as a result of a quantitative comparison of testsuite performance and code size as a function of ufUseThreshold, comparing GHC trees using values of 50, 60, 70, 80, 90, and 100. The test set consisted of nofib and the testsuite performance tests. A full summary of these measurements are found in the description of !2608 Comparing executable sizes (relative to the base commit) across all nofib tests, we see that sizes are similar to the baseline: gmean min max median thresh 50 -6.36% -7.04% -4.82% -6.46% 60 -5.04% -5.97% -3.83% -5.11% 70 -2.90% -3.84% -2.31% -2.92% 80 -0.75% -2.16% -0.42% -0.73% 90 +0.24% -0.41% +0.55% +0.26% 100 +1.36% +0.80% +1.64% +1.37% baseline +0.00% +0.00% +0.00% +0.00% Likewise, looking at runtime allocations we see that 80 gives slightly better optimisation than the baseline: gmean min max median thresh 50 +0.16% -0.16% +4.43% +0.00% 60 +0.09% -0.00% +3.10% +0.00% 70 +0.04% -0.09% +2.29% +0.00% 80 +0.02% -1.17% +2.29% +0.00% 90 -0.02% -2.59% +1.86% +0.00% 100 +0.00% -2.59% +7.51% -0.00% baseline +0.00% +0.00% +0.00% +0.00% Finally, I had to add a NOINLINE in T4306 to ensure that `upd` is worker-wrappered as the test expects. This makes me wonder whether the inlining heuristic is now too liberal as `upd` is quite a large function. The same measure was taken in T12600. Wall clock time compiling Cabal with -O0 thresh 50 60 70 80 90 100 baseline build-Cabal 93.88 89.58 92.59 90.09 100.26 94.81 89.13 Also, this change happens to avoid the spurious test output in `plugin-recomp-change` and `plugin-recomp-change-prof` (see #17308). Metric Decrease: hie002 T12234 T13035 T13719 T14683 T4801 T5631 T5642 T9020 T9872d T9961 Metric Increase: T12150 T12425 T13701 T14697 T15426 T1969 T3064 T5837 T6048 T9203 T9872a T9872b T9872c T9872d haddock.Cabal haddock.base haddock.compiler
-rw-r--r--compiler/GHC/Core/Unfold.hs35
-rw-r--r--compiler/GHC/Driver/Session.hs9
-rw-r--r--testsuite/tests/dependent/should_compile/dynamic-paper.stderr2
-rw-r--r--testsuite/tests/ghci.debugger/scripts/all.T3
-rw-r--r--testsuite/tests/ghci.debugger/scripts/break021.stdout6
-rw-r--r--testsuite/tests/plugins/all.T2
-rw-r--r--testsuite/tests/simplCore/should_compile/T12600.hs1
-rw-r--r--testsuite/tests/simplCore/should_compile/T15056.stderr4
-rw-r--r--testsuite/tests/simplCore/should_compile/T4306.hs1
9 files changed, 33 insertions, 30 deletions
diff --git a/compiler/GHC/Core/Unfold.hs b/compiler/GHC/Core/Unfold.hs
index 199f4bfca4..a8e4c03b95 100644
--- a/compiler/GHC/Core/Unfold.hs
+++ b/compiler/GHC/Core/Unfold.hs
@@ -1002,10 +1002,6 @@ ufUseThreshold
At a call site, if the unfolding, less discounts, is smaller than
this, then it's small enough inline
-ufKeenessFactor
- Factor by which the discounts are multiplied before
- subtracting from size
-
ufDictDiscount
The discount for each occurrence of a dictionary argument
as an argument of a class method. Should be pretty small
@@ -1024,6 +1020,22 @@ ufVeryAggressive
loop breakers.
+Historical Note: Before April 2020 we had another factor,
+ufKeenessFactor, which would scale the discounts before they were subtracted
+from the size. This was justified with the following comment:
+
+ -- We multiply the raw discounts (args_discount and result_discount)
+ -- ty opt_UnfoldingKeenessFactor because the former have to do with
+ -- *size* whereas the discounts imply that there's some extra
+ -- *efficiency* to be gained (e.g. beta reductions, case reductions)
+ -- by inlining.
+
+However, this is highly suspect since it means that we subtract a *scaled* size
+from an absolute size, resulting in crazy (e.g. negative) scores in some cases
+(#15304). We consequently killed off ufKeenessFactor and bumped up the
+ufUseThreshold to compensate.
+
+
Note [Function applications]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In a function application (f a b)
@@ -1307,8 +1319,7 @@ tryUnfolding dflags id lone_variable
extra_doc = text "discounted size =" <+> int discounted_size
discounted_size = size - discount
small_enough = discounted_size <= ufUseThreshold dflags
- discount = computeDiscount dflags arg_discounts
- res_discount arg_infos cont_info
+ discount = computeDiscount arg_discounts res_discount arg_infos cont_info
where
mk_doc some_benefit extra_doc yes_or_no
@@ -1553,14 +1564,9 @@ which Roman did.
-}
-computeDiscount :: DynFlags -> [Int] -> Int -> [ArgSummary] -> CallCtxt
+computeDiscount :: [Int] -> Int -> [ArgSummary] -> CallCtxt
-> Int
-computeDiscount dflags arg_discounts res_discount arg_infos cont_info
- -- We multiple the raw discounts (args_discount and result_discount)
- -- ty opt_UnfoldingKeenessFactor because the former have to do with
- -- *size* whereas the discounts imply that there's some extra
- -- *efficiency* to be gained (e.g. beta reductions, case reductions)
- -- by inlining.
+computeDiscount arg_discounts res_discount arg_infos cont_info
= 10 -- Discount of 10 because the result replaces the call
-- so we count 10 for the function itself
@@ -1569,8 +1575,7 @@ computeDiscount dflags arg_discounts res_discount arg_infos cont_info
-- Discount of 10 for each arg supplied,
-- because the result replaces the call
- + round (ufKeenessFactor dflags *
- fromIntegral (total_arg_discount + res_discount'))
+ + total_arg_discount + res_discount'
where
actual_arg_discounts = zipWith mk_arg_discount arg_discounts arg_infos
total_arg_discount = sum actual_arg_discounts
diff --git a/compiler/GHC/Driver/Session.hs b/compiler/GHC/Driver/Session.hs
index 93c2a870b5..461a3d17fe 100644
--- a/compiler/GHC/Driver/Session.hs
+++ b/compiler/GHC/Driver/Session.hs
@@ -699,7 +699,6 @@ data DynFlags = DynFlags {
ufUseThreshold :: Int,
ufFunAppDiscount :: Int,
ufDictDiscount :: Int,
- ufKeenessFactor :: Float,
ufDearOp :: Int,
ufVeryAggressive :: Bool,
@@ -1430,12 +1429,11 @@ defaultDynFlags mySettings llvmConfig =
-- into Csg.calc (The unfolding for sqr never makes it into the
-- interface file.)
ufCreationThreshold = 750,
- ufUseThreshold = 60,
+ ufUseThreshold = 80,
ufFunAppDiscount = 60,
-- Be fairly keen to inline a function if that means
-- we'll be able to pick the right method from a dictionary
ufDictDiscount = 30,
- ufKeenessFactor = 1.5,
ufDearOp = 40,
ufVeryAggressive = False,
@@ -3023,8 +3021,9 @@ dynamic_flags_deps = [
(intSuffix (\n d -> d {ufFunAppDiscount = n}))
, make_ord_flag defFlag "funfolding-dict-discount"
(intSuffix (\n d -> d {ufDictDiscount = n}))
- , make_ord_flag defFlag "funfolding-keeness-factor"
- (floatSuffix (\n d -> d {ufKeenessFactor = n}))
+ , make_dep_flag defFlag "funfolding-keeness-factor"
+ (floatSuffix (\_ d -> d))
+ "-funfolding-keeness-factor is no longer respected as of GHC 8.12"
, make_ord_flag defFlag "fmax-worker-args"
(intSuffix (\n d -> d {maxWorkerArgs = n}))
, make_ord_flag defGhciFlag "fghci-hist-size"
diff --git a/testsuite/tests/dependent/should_compile/dynamic-paper.stderr b/testsuite/tests/dependent/should_compile/dynamic-paper.stderr
index e9496e19e6..e9d6540f14 100644
--- a/testsuite/tests/dependent/should_compile/dynamic-paper.stderr
+++ b/testsuite/tests/dependent/should_compile/dynamic-paper.stderr
@@ -12,4 +12,4 @@ Simplifier ticks exhausted
simplifier non-termination has been judged acceptable.
To see detailed counts use -ddump-simpl-stats
- Total ticks: 140082
+ Total ticks: 140084
diff --git a/testsuite/tests/ghci.debugger/scripts/all.T b/testsuite/tests/ghci.debugger/scripts/all.T
index 24939a942e..db597a455f 100644
--- a/testsuite/tests/ghci.debugger/scripts/all.T
+++ b/testsuite/tests/ghci.debugger/scripts/all.T
@@ -75,7 +75,8 @@ test('break015', expect_broken(1532), ghci_script, ['break015.script'])
test('break016', combined_output, ghci_script, ['break016.script'])
test('break017', [extra_files(['../QSort.hs']),
combined_output], ghci_script, ['break017.script'])
-test('break018', extra_files(['../mdo.hs']), ghci_script, ['break018.script'])
+test('break018', [expect_broken(18004), extra_files(['../mdo.hs'])],
+ ghci_script, ['break018.script'])
test('break019', extra_files(['../Test2.hs']), ghci_script, ['break019.script'])
test('break020', extra_files(['Break020b.hs']), ghci_script, ['break020.script'])
test('break021', extra_files(['Break020b.hs', 'break020.hs']), ghci_script, ['break021.script'])
diff --git a/testsuite/tests/ghci.debugger/scripts/break021.stdout b/testsuite/tests/ghci.debugger/scripts/break021.stdout
index 712417852d..bf64680b1a 100644
--- a/testsuite/tests/ghci.debugger/scripts/break021.stdout
+++ b/testsuite/tests/ghci.debugger/scripts/break021.stdout
@@ -41,7 +41,7 @@ _result :: IO () = _
^^^^^^^^^^^^^^^^^
13 in_another_module 0
Stopped in Main.in_another_decl, break020.hs:(6,21)-(7,30)
-_result :: m () = _
+_result :: IO () = _
5
vv
6 in_another_decl _ = do line1 0
@@ -49,7 +49,7 @@ _result :: m () = _
^^
8
Stopped in Main.in_another_decl, break020.hs:6:24-30
-_result :: m () = _
+_result :: IO () = _
5
6 in_another_decl _ = do line1 0
^^^^^^^
@@ -61,7 +61,7 @@ _result :: IO () = _
^^^^^^^^^
4 line2 _ = return ()
Stopped in Main.in_another_decl, break020.hs:7:24-30
-_result :: m () = _
+_result :: IO () = _
6 in_another_decl _ = do line1 0
7 line2 0
^^^^^^^
diff --git a/testsuite/tests/plugins/all.T b/testsuite/tests/plugins/all.T
index 1fec731289..891246b228 100644
--- a/testsuite/tests/plugins/all.T
+++ b/testsuite/tests/plugins/all.T
@@ -167,7 +167,6 @@ test('plugin-recomp-flags',
test('plugin-recomp-change',
[extra_files(['plugin-recomp/', 'plugin-recomp-test.hs']),
only_ways([config.ghc_plugin_way]),
- when(compiler_debugged(), expect_broken_for(17308, ['dyn'])),
pre_cmd('$MAKE -s --no-print-directory -C plugin-recomp package.plugins01 TOP={top}')
],
makefile_test, [])
@@ -175,7 +174,6 @@ test('plugin-recomp-change',
test('plugin-recomp-change-prof',
[extra_files(['plugin-recomp/', 'plugin-recomp-test.hs']),
only_ways([config.ghc_plugin_way]),
- when(compiler_debugged(), expect_broken_for(17308, ['dyn'])),
pre_cmd('$MAKE -s --no-print-directory -C plugin-recomp package.plugins01 TOP={top}'),
when(not config.have_profiling,skip)
],
diff --git a/testsuite/tests/simplCore/should_compile/T12600.hs b/testsuite/tests/simplCore/should_compile/T12600.hs
index d08d923ac6..34e3a91d87 100644
--- a/testsuite/tests/simplCore/should_compile/T12600.hs
+++ b/testsuite/tests/simplCore/should_compile/T12600.hs
@@ -27,3 +27,4 @@ instance (Eq1 f) => Eq1 (G f) where
foo :: G F Int -> G F Int -> Bool
foo a b = eq1 a b
+{-# NOINLINE foo #-}
diff --git a/testsuite/tests/simplCore/should_compile/T15056.stderr b/testsuite/tests/simplCore/should_compile/T15056.stderr
index 1b8c0c8101..116def9073 100644
--- a/testsuite/tests/simplCore/should_compile/T15056.stderr
+++ b/testsuite/tests/simplCore/should_compile/T15056.stderr
@@ -1,9 +1,7 @@
Rule fired: Class op - (BUILTIN)
Rule fired: Class op + (BUILTIN)
Rule fired: Class op + (BUILTIN)
-Rule fired: Class op enumFromTo (BUILTIN)
-Rule fired: Class op foldr (BUILTIN)
-Rule fired: Class op foldr (BUILTIN)
Rule fired: +# (BUILTIN)
Rule fired: Class op foldr (BUILTIN)
+Rule fired: Class op enumFromTo (BUILTIN)
Rule fired: fold/build (GHC.Base)
diff --git a/testsuite/tests/simplCore/should_compile/T4306.hs b/testsuite/tests/simplCore/should_compile/T4306.hs
index 548e132497..23c5af21da 100644
--- a/testsuite/tests/simplCore/should_compile/T4306.hs
+++ b/testsuite/tests/simplCore/should_compile/T4306.hs
@@ -10,3 +10,4 @@ upd (UPD _ (D x _)) = sqrt $! (x*x + x*x + sin x + x*x + x*x + cos x + x*x + x*x
x*x + x*x + sin x + x*x + x*x + cos x + x*x + x*x + tan x +
x*x + x*x + sin x + x*x + x*x + cos x + x*x + x*x + tan x)
-- make the rhs large enough to be worker/wrapperred
+{-# NOINLINE upd #-}