diff options
author | Simon Peyton Jones <simonpj@microsoft.com> | 2021-07-23 23:57:01 +0100 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2022-05-30 13:44:14 -0400 |
commit | 6656f0165a30fc2a22208532ba384fc8e2f11b46 (patch) | |
tree | ab6d5ec67947168dd86cf0b86b088fd7d91741e4 /testsuite/tests/profiling/should_run | |
parent | 0079171bae7271dc44f81c3bf26505941ee92d7e (diff) | |
download | haskell-6656f0165a30fc2a22208532ba384fc8e2f11b46.tar.gz |
A bunch of changes related to eta reduction
This is a large collection of changes all relating to eta
reduction, originally triggered by #18993, but there followed
a long saga.
Specifics:
* Move state-hack stuff from GHC.Types.Id (where it never belonged)
to GHC.Core.Opt.Arity (which seems much more appropriate).
* Add a crucial mkCast in the Cast case of
GHC.Core.Opt.Arity.eta_expand; helps with T18223
* Add clarifying notes about eta-reducing to PAPs.
See Note [Do not eta reduce PAPs]
* I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity,
where it properly belongs. See Note [Eta reduce PAPs]
* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for
when eta-expansion is wanted, to make wantEtaExpansion, and all that
same function in GHC.Core.Opt.Simplify.simplStableUnfolding. It was
previously inconsistent, but it's doing the same thing.
* I did a substantial refactor of ArityType; see Note [ArityType].
This allowed me to do away with the somewhat mysterious takeOneShots;
more generally it allows arityType to describe the function, leaving
its clients to decide how to use that information.
I made ArityType abstract, so that clients have to use functions
to access it.
* Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called
mkLam before) aware of the floats that the simplifier builds up, so
that it can still do eta-reduction even if there are some floats.
(Previously that would not happen.) That means passing the floats
to rebuildLam, and an extra check when eta-reducting (etaFloatOk).
* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info
in the idDemandInfo of the binder, as well as the CallArity info. The
occurrence analyser did this but we were failing to take advantage here.
In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity;
see Note [Combining arityType with demand info], and functions
idDemandOneShots and combineWithDemandOneShots.
(These changes partly drove my refactoring of ArityType.)
* In GHC.Core.Opt.Arity.findRhsArity
* I'm now taking account of the demand on the binder to give
extra one-shot info. E.g. if the fn is always called with two
args, we can give better one-shot info on the binders
than if we just look at the RHS.
* Don't do any fixpointing in the non-recursive
case -- simple short cut.
* Trim arity inside the loop. See Note [Trim arity inside the loop]
* Make SimpleOpt respect the eta-reduction flag
(Some associated refactoring here.)
* I made the CallCtxt which the Simplifier uses distinguish between
recursive and non-recursive right-hand sides.
data CallCtxt = ... | RhsCtxt RecFlag | ...
It affects only one thing:
- We call an RHS context interesting only if it is non-recursive
see Note [RHS of lets] in GHC.Core.Unfold
* Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification.
See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep.
Other incidental changes
* Fix a fairly long-standing outright bug in the ApplyToVal case of
GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the
tail of 'dmds' in the recursive call, which meant the demands were All
Wrong. I have no idea why this has not caused problems before now.
* Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg
Metrics: compile_time/bytes allocated
Test Metric Baseline New value Change
---------------------------------------------------------------------------------------
MultiLayerModulesTH_OneShot(normal) ghc/alloc 2,743,297,692 2,619,762,992 -4.5% GOOD
T18223(normal) ghc/alloc 1,103,161,360 972,415,992 -11.9% GOOD
T3064(normal) ghc/alloc 201,222,500 184,085,360 -8.5% GOOD
T8095(normal) ghc/alloc 3,216,292,528 3,254,416,960 +1.2%
T9630(normal) ghc/alloc 1,514,131,032 1,557,719,312 +2.9% BAD
parsing001(normal) ghc/alloc 530,409,812 525,077,696 -1.0%
geo. mean -0.1%
Nofib:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
banner +0.0% +0.4% -8.9% -8.7% 0.0%
exact-reals +0.0% -7.4% -36.3% -37.4% 0.0%
fannkuch-redux +0.0% -0.1% -1.0% -1.0% 0.0%
fft2 -0.1% -0.2% -17.8% -19.2% 0.0%
fluid +0.0% -1.3% -2.1% -2.1% 0.0%
gg -0.0% +2.2% -0.2% -0.1% 0.0%
spectral-norm +0.1% -0.2% 0.0% 0.0% 0.0%
tak +0.0% -0.3% -9.8% -9.8% 0.0%
x2n1 +0.0% -0.2% -3.2% -3.2% 0.0%
--------------------------------------------------------------------------------
Min -3.5% -7.4% -58.7% -59.9% 0.0%
Max +0.1% +2.2% +32.9% +32.9% 0.0%
Geometric Mean -0.0% -0.1% -14.2% -14.8% -0.0%
Metric Decrease:
MultiLayerModulesTH_OneShot
T18223
T3064
T15185
T14766
Metric Increase:
T9630
Diffstat (limited to 'testsuite/tests/profiling/should_run')
-rw-r--r-- | testsuite/tests/profiling/should_run/T2552.prof.sample | 50 | ||||
-rw-r--r-- | testsuite/tests/profiling/should_run/all.T | 4 | ||||
-rw-r--r-- | testsuite/tests/profiling/should_run/ioprof.prof.sample | 80 |
3 files changed, 71 insertions, 63 deletions
diff --git a/testsuite/tests/profiling/should_run/T2552.prof.sample b/testsuite/tests/profiling/should_run/T2552.prof.sample index 7ed927f6db..c8bfad1ecf 100644 --- a/testsuite/tests/profiling/should_run/T2552.prof.sample +++ b/testsuite/tests/profiling/should_run/T2552.prof.sample @@ -1,36 +1,36 @@ - Sat Jun 4 11:59 2016 Time and Allocation Profiling Report (Final) + Mon Apr 25 16:27 2022 Time and Allocation Profiling Report (Final) T2552 +RTS -hc -p -RTS - total time = 0.09 secs (90 ticks @ 1000 us, 1 processor) - total alloc = 123,465,848 bytes (excludes profiling overheads) + total time = 0.05 secs (49 ticks @ 1000 us, 1 processor) + total alloc = 74,099,440 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc -fib1.fib1'.nfib Main T2552.hs:5:9-61 37.8 33.3 -fib2'.nfib Main T2552.hs:10:5-57 31.1 33.3 -fib3'.nfib Main T2552.hs:15:5-57 31.1 33.3 +fib1.fib1'.nfib Main T2552.hs:5:9-61 34.7 33.3 +fib3'.nfib Main T2552.hs:15:5-57 32.7 33.3 +fib2'.nfib Main T2552.hs:10:5-57 32.7 33.3 individual inherited COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc -MAIN MAIN <built-in> 45 0 0.0 0.0 100.0 100.0 - CAF Main <entire-module> 89 0 0.0 0.0 100.0 100.0 - main Main T2552.hs:(17,1)-(20,17) 90 1 0.0 0.0 100.0 100.0 - fib1 Main T2552.hs:(1,1)-(5,61) 92 1 0.0 0.0 37.8 33.3 - fib1.fib1' Main T2552.hs:(3,5)-(5,61) 93 1 0.0 0.0 37.8 33.3 - nfib' Main T2552.hs:3:35-40 94 1 0.0 0.0 37.8 33.3 - fib1.fib1'.nfib Main T2552.hs:5:9-61 95 1028457 37.8 33.3 37.8 33.3 - fib2 Main T2552.hs:7:1-16 96 1 0.0 0.0 31.1 33.3 - fib2' Main T2552.hs:(8,1)-(10,57) 97 1 0.0 0.0 31.1 33.3 - fib2'.nfib Main T2552.hs:10:5-57 98 1028457 31.1 33.3 31.1 33.3 - fib3 Main T2552.hs:12:1-12 99 1 0.0 0.0 0.0 0.0 - fib3' Main T2552.hs:(13,1)-(15,57) 100 1 0.0 0.0 31.1 33.3 - fib3'.nfib Main T2552.hs:15:5-57 101 1028457 31.1 33.3 31.1 33.3 - CAF GHC.IO.Handle.FD <entire-module> 84 0 0.0 0.0 0.0 0.0 - CAF GHC.IO.Handle.Text <entire-module> 83 0 0.0 0.0 0.0 0.0 - CAF GHC.Conc.Signal <entire-module> 81 0 0.0 0.0 0.0 0.0 - CAF GHC.IO.Encoding <entire-module> 78 0 0.0 0.0 0.0 0.0 - CAF GHC.IO.Encoding.Iconv <entire-module> 64 0 0.0 0.0 0.0 0.0 - main Main T2552.hs:(17,1)-(20,17) 91 0 0.0 0.0 0.0 0.0 +MAIN MAIN <built-in> 128 0 0.0 0.0 100.0 100.0 + CAF Main <entire-module> 255 0 0.0 0.0 100.0 99.9 + fib3 Main T2552.hs:12:1-12 265 1 0.0 0.0 0.0 0.0 + main Main T2552.hs:(17,1)-(20,17) 256 1 0.0 0.0 100.0 99.9 + fib1 Main T2552.hs:(1,1)-(5,61) 258 1 0.0 0.0 34.7 33.3 + fib1.fib1' Main T2552.hs:(3,5)-(5,61) 259 1 0.0 0.0 34.7 33.3 + nfib' Main T2552.hs:3:35-40 260 1 0.0 0.0 34.7 33.3 + fib1.fib1'.nfib Main T2552.hs:5:9-61 261 1028457 34.7 33.3 34.7 33.3 + fib2 Main T2552.hs:7:1-16 262 1 0.0 0.0 32.7 33.3 + fib2' Main T2552.hs:(8,1)-(10,57) 263 1 0.0 0.0 32.7 33.3 + fib2'.nfib Main T2552.hs:10:5-57 264 1028457 32.7 33.3 32.7 33.3 + fib3 Main T2552.hs:12:1-12 266 0 0.0 0.0 32.7 33.3 + fib3' Main T2552.hs:(13,1)-(15,57) 267 1 0.0 0.0 32.7 33.3 + fib3'.nfib Main T2552.hs:15:5-57 268 1028457 32.7 33.3 32.7 33.3 + CAF GHC.Conc.Signal <entire-module> 250 0 0.0 0.0 0.0 0.0 + CAF GHC.IO.Encoding <entire-module> 241 0 0.0 0.0 0.0 0.0 + CAF GHC.IO.Encoding.Iconv <entire-module> 239 0 0.0 0.0 0.0 0.0 + CAF GHC.IO.Handle.FD <entire-module> 231 0 0.0 0.0 0.0 0.0 + main Main T2552.hs:(17,1)-(20,17) 257 0 0.0 0.0 0.0 0.0 diff --git a/testsuite/tests/profiling/should_run/all.T b/testsuite/tests/profiling/should_run/all.T index 0455d06f17..96a0d30bc6 100644 --- a/testsuite/tests/profiling/should_run/all.T +++ b/testsuite/tests/profiling/should_run/all.T @@ -93,7 +93,7 @@ test('T5314', [extra_ways(extra_prof_ways)], compile_and_run, ['']) test('T680', [], compile_and_run, ['-fno-full-laziness']) # Note [consistent stacks] -test('T2552', [expect_broken_for_10037], compile_and_run, ['']) +test('T2552', [], compile_and_run, ['']) test('T949', [extra_ways(extra_prof_ways)], compile_and_run, ['']) @@ -101,7 +101,7 @@ test('T949', [extra_ways(extra_prof_ways)], compile_and_run, ['']) # We care more about getting the optimised results right, so ignoring # this for now. test('ioprof', - [expect_broken_for_10037, + [normal, exit_code(1), omit_ways(['ghci-ext-prof']), # doesn't work with exit_code(1) ignore_stderr diff --git a/testsuite/tests/profiling/should_run/ioprof.prof.sample b/testsuite/tests/profiling/should_run/ioprof.prof.sample index 52ab8ba4d2..103207d8ca 100644 --- a/testsuite/tests/profiling/should_run/ioprof.prof.sample +++ b/testsuite/tests/profiling/should_run/ioprof.prof.sample @@ -1,46 +1,54 @@ - Sat Jun 4 11:59 2016 Time and Allocation Profiling Report (Final) + Mon May 23 13:50 2022 Time and Allocation Profiling Report (Final) ioprof +RTS -hc -p -RTS total time = 0.00 secs (0 ticks @ 1000 us, 1 processor) - total alloc = 180,024 bytes (excludes profiling overheads) + total alloc = 129,248 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc -CAF GHC.IO.Encoding <entire-module> 0.0 1.8 -CAF GHC.IO.Handle.FD <entire-module> 0.0 19.2 -CAF GHC.Exception <entire-module> 0.0 2.5 -main Main ioprof.hs:28:1-43 0.0 4.8 -errorM.\ Main ioprof.hs:23:22-28 0.0 68.7 +CAF Main <entire-module> 0.0 1.1 +main Main ioprof.hs:28:1-43 0.0 6.8 +errorM.\ Main ioprof.hs:23:22-28 0.0 56.8 +CAF GHC.IO.Handle.FD <entire-module> 0.0 26.9 +CAF GHC.IO.Exception <entire-module> 0.0 1.0 +CAF GHC.IO.Encoding <entire-module> 0.0 2.3 +CAF GHC.Exception <entire-module> 0.0 3.0 - individual inherited -COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc + individual inherited +COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc -MAIN MAIN <built-in> 46 0 0.0 0.4 0.0 100.0 - CAF Main <entire-module> 91 0 0.0 0.9 0.0 69.8 - <*> Main ioprof.hs:20:5-14 96 1 0.0 0.0 0.0 0.0 - fmap Main ioprof.hs:16:5-16 100 1 0.0 0.0 0.0 0.0 - main Main ioprof.hs:28:1-43 92 1 0.0 0.0 0.0 68.9 - runM Main ioprof.hs:26:1-37 94 1 0.0 0.1 0.0 68.9 - bar Main ioprof.hs:31:1-20 95 1 0.0 0.1 0.0 68.8 - foo Main ioprof.hs:34:1-16 104 1 0.0 0.0 0.0 0.0 - errorM Main ioprof.hs:23:1-28 105 1 0.0 0.0 0.0 0.0 - <*> Main ioprof.hs:20:5-14 97 0 0.0 0.0 0.0 68.7 - >>= Main ioprof.hs:(11,3)-(12,50) 98 1 0.0 0.0 0.0 68.7 - >>=.\ Main ioprof.hs:(11,27)-(12,50) 99 2 0.0 0.0 0.0 68.7 - fmap Main ioprof.hs:16:5-16 103 0 0.0 0.0 0.0 0.0 - foo Main ioprof.hs:34:1-16 106 0 0.0 0.0 0.0 68.7 - errorM Main ioprof.hs:23:1-28 107 0 0.0 0.0 0.0 68.7 - errorM.\ Main ioprof.hs:23:22-28 108 1 0.0 68.7 0.0 68.7 - fmap Main ioprof.hs:16:5-16 101 0 0.0 0.0 0.0 0.0 - >>= Main ioprof.hs:(11,3)-(12,50) 102 1 0.0 0.0 0.0 0.0 - CAF GHC.IO.Exception <entire-module> 89 0 0.0 0.7 0.0 0.7 - CAF GHC.Exception <entire-module> 86 0 0.0 2.5 0.0 2.5 - CAF GHC.IO.Handle.FD <entire-module> 85 0 0.0 19.2 0.0 19.2 - CAF GHC.Conc.Signal <entire-module> 82 0 0.0 0.4 0.0 0.4 - CAF GHC.IO.Encoding <entire-module> 80 0 0.0 1.8 0.0 1.8 - CAF GHC.Conc.Sync <entire-module> 75 0 0.0 0.1 0.0 0.1 - CAF GHC.Stack.CCS <entire-module> 71 0 0.0 0.2 0.0 0.2 - CAF GHC.IO.Encoding.Iconv <entire-module> 64 0 0.0 0.1 0.0 0.1 - main Main ioprof.hs:28:1-43 93 0 0.0 4.8 0.0 4.8 +MAIN MAIN <built-in> 129 0 0.0 0.5 0.0 100.0 + CAF GHC.Conc.Signal <entire-module> 233 0 0.0 0.5 0.0 0.5 + CAF GHC.Conc.Sync <entire-module> 232 0 0.0 0.5 0.0 0.5 + CAF GHC.Exception <entire-module> 215 0 0.0 3.0 0.0 3.0 + CAF GHC.IO.Encoding <entire-module> 199 0 0.0 2.3 0.0 2.3 + CAF GHC.IO.Encoding.Iconv <entire-module> 197 0 0.0 0.2 0.0 0.2 + CAF GHC.IO.Exception <entire-module> 191 0 0.0 1.0 0.0 1.0 + CAF GHC.IO.Handle.FD <entire-module> 188 0 0.0 26.9 0.0 26.9 + CAF GHC.Stack.CCS <entire-module> 167 0 0.0 0.2 0.0 0.2 + CAF GHC.Weak.Finalize <entire-module> 158 0 0.0 0.0 0.0 0.0 + CAF Main <entire-module> 136 0 0.0 1.1 0.0 1.1 + <*> Main ioprof.hs:20:5-14 261 1 0.0 0.0 0.0 0.0 + fmap Main ioprof.hs:16:5-16 269 1 0.0 0.0 0.0 0.0 + main Main ioprof.hs:28:1-43 258 1 0.0 0.0 0.0 0.0 + main Main ioprof.hs:28:1-43 259 0 0.0 6.8 0.0 63.7 + bar Main ioprof.hs:31:1-20 260 1 0.0 0.1 0.0 0.2 + foo Main ioprof.hs:34:1-16 275 1 0.0 0.0 0.0 0.0 + errorM Main ioprof.hs:23:1-28 276 1 0.0 0.0 0.0 0.0 + <*> Main ioprof.hs:20:5-14 262 0 0.0 0.0 0.0 0.0 + >>= Main ioprof.hs:(11,3)-(12,50) 263 1 0.0 0.0 0.0 0.0 + fmap Main ioprof.hs:16:5-16 270 0 0.0 0.0 0.0 0.0 + >>= Main ioprof.hs:(11,3)-(12,50) 271 1 0.0 0.0 0.0 0.0 + runM Main ioprof.hs:26:1-37 264 1 0.0 0.0 0.0 56.8 + bar Main ioprof.hs:31:1-20 265 0 0.0 0.0 0.0 56.8 + <*> Main ioprof.hs:20:5-14 266 0 0.0 0.0 0.0 0.0 + >>= Main ioprof.hs:(11,3)-(12,50) 267 0 0.0 0.0 0.0 0.0 + >>=.\ Main ioprof.hs:(11,27)-(12,50) 268 1 0.0 0.0 0.0 0.0 + fmap Main ioprof.hs:16:5-16 272 0 0.0 0.0 0.0 0.0 + >>= Main ioprof.hs:(11,3)-(12,50) 273 0 0.0 0.0 0.0 0.0 + >>=.\ Main ioprof.hs:(11,27)-(12,50) 274 1 0.0 0.0 0.0 0.0 + foo Main ioprof.hs:34:1-16 277 0 0.0 0.0 0.0 56.8 + errorM Main ioprof.hs:23:1-28 278 0 0.0 0.0 0.0 56.8 + errorM.\ Main ioprof.hs:23:22-28 279 1 0.0 56.8 0.0 56.8 |