summaryrefslogtreecommitdiff
path: root/testsuite/tests/profiling/should_run
diff options
context:
space:
mode:
authorSimon Peyton Jones <simonpj@microsoft.com>2021-07-23 23:57:01 +0100
committerMarge Bot <ben+marge-bot@smart-cactus.org>2022-05-30 13:44:14 -0400
commit6656f0165a30fc2a22208532ba384fc8e2f11b46 (patch)
treeab6d5ec67947168dd86cf0b86b088fd7d91741e4 /testsuite/tests/profiling/should_run
parent0079171bae7271dc44f81c3bf26505941ee92d7e (diff)
downloadhaskell-6656f0165a30fc2a22208532ba384fc8e2f11b46.tar.gz
A bunch of changes related to eta reduction
This is a large collection of changes all relating to eta reduction, originally triggered by #18993, but there followed a long saga. Specifics: * Move state-hack stuff from GHC.Types.Id (where it never belonged) to GHC.Core.Opt.Arity (which seems much more appropriate). * Add a crucial mkCast in the Cast case of GHC.Core.Opt.Arity.eta_expand; helps with T18223 * Add clarifying notes about eta-reducing to PAPs. See Note [Do not eta reduce PAPs] * I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity, where it properly belongs. See Note [Eta reduce PAPs] * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for when eta-expansion is wanted, to make wantEtaExpansion, and all that same function in GHC.Core.Opt.Simplify.simplStableUnfolding. It was previously inconsistent, but it's doing the same thing. * I did a substantial refactor of ArityType; see Note [ArityType]. This allowed me to do away with the somewhat mysterious takeOneShots; more generally it allows arityType to describe the function, leaving its clients to decide how to use that information. I made ArityType abstract, so that clients have to use functions to access it. * Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called mkLam before) aware of the floats that the simplifier builds up, so that it can still do eta-reduction even if there are some floats. (Previously that would not happen.) That means passing the floats to rebuildLam, and an extra check when eta-reducting (etaFloatOk). * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info in the idDemandInfo of the binder, as well as the CallArity info. The occurrence analyser did this but we were failing to take advantage here. In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity; see Note [Combining arityType with demand info], and functions idDemandOneShots and combineWithDemandOneShots. (These changes partly drove my refactoring of ArityType.) * In GHC.Core.Opt.Arity.findRhsArity * I'm now taking account of the demand on the binder to give extra one-shot info. E.g. if the fn is always called with two args, we can give better one-shot info on the binders than if we just look at the RHS. * Don't do any fixpointing in the non-recursive case -- simple short cut. * Trim arity inside the loop. See Note [Trim arity inside the loop] * Make SimpleOpt respect the eta-reduction flag (Some associated refactoring here.) * I made the CallCtxt which the Simplifier uses distinguish between recursive and non-recursive right-hand sides. data CallCtxt = ... | RhsCtxt RecFlag | ... It affects only one thing: - We call an RHS context interesting only if it is non-recursive see Note [RHS of lets] in GHC.Core.Unfold * Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification. See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep. Other incidental changes * Fix a fairly long-standing outright bug in the ApplyToVal case of GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the tail of 'dmds' in the recursive call, which meant the demands were All Wrong. I have no idea why this has not caused problems before now. * Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg Metrics: compile_time/bytes allocated Test Metric Baseline New value Change --------------------------------------------------------------------------------------- MultiLayerModulesTH_OneShot(normal) ghc/alloc 2,743,297,692 2,619,762,992 -4.5% GOOD T18223(normal) ghc/alloc 1,103,161,360 972,415,992 -11.9% GOOD T3064(normal) ghc/alloc 201,222,500 184,085,360 -8.5% GOOD T8095(normal) ghc/alloc 3,216,292,528 3,254,416,960 +1.2% T9630(normal) ghc/alloc 1,514,131,032 1,557,719,312 +2.9% BAD parsing001(normal) ghc/alloc 530,409,812 525,077,696 -1.0% geo. mean -0.1% Nofib: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- banner +0.0% +0.4% -8.9% -8.7% 0.0% exact-reals +0.0% -7.4% -36.3% -37.4% 0.0% fannkuch-redux +0.0% -0.1% -1.0% -1.0% 0.0% fft2 -0.1% -0.2% -17.8% -19.2% 0.0% fluid +0.0% -1.3% -2.1% -2.1% 0.0% gg -0.0% +2.2% -0.2% -0.1% 0.0% spectral-norm +0.1% -0.2% 0.0% 0.0% 0.0% tak +0.0% -0.3% -9.8% -9.8% 0.0% x2n1 +0.0% -0.2% -3.2% -3.2% 0.0% -------------------------------------------------------------------------------- Min -3.5% -7.4% -58.7% -59.9% 0.0% Max +0.1% +2.2% +32.9% +32.9% 0.0% Geometric Mean -0.0% -0.1% -14.2% -14.8% -0.0% Metric Decrease: MultiLayerModulesTH_OneShot T18223 T3064 T15185 T14766 Metric Increase: T9630
Diffstat (limited to 'testsuite/tests/profiling/should_run')
-rw-r--r--testsuite/tests/profiling/should_run/T2552.prof.sample50
-rw-r--r--testsuite/tests/profiling/should_run/all.T4
-rw-r--r--testsuite/tests/profiling/should_run/ioprof.prof.sample80
3 files changed, 71 insertions, 63 deletions
diff --git a/testsuite/tests/profiling/should_run/T2552.prof.sample b/testsuite/tests/profiling/should_run/T2552.prof.sample
index 7ed927f6db..c8bfad1ecf 100644
--- a/testsuite/tests/profiling/should_run/T2552.prof.sample
+++ b/testsuite/tests/profiling/should_run/T2552.prof.sample
@@ -1,36 +1,36 @@
- Sat Jun 4 11:59 2016 Time and Allocation Profiling Report (Final)
+ Mon Apr 25 16:27 2022 Time and Allocation Profiling Report (Final)
T2552 +RTS -hc -p -RTS
- total time = 0.09 secs (90 ticks @ 1000 us, 1 processor)
- total alloc = 123,465,848 bytes (excludes profiling overheads)
+ total time = 0.05 secs (49 ticks @ 1000 us, 1 processor)
+ total alloc = 74,099,440 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
-fib1.fib1'.nfib Main T2552.hs:5:9-61 37.8 33.3
-fib2'.nfib Main T2552.hs:10:5-57 31.1 33.3
-fib3'.nfib Main T2552.hs:15:5-57 31.1 33.3
+fib1.fib1'.nfib Main T2552.hs:5:9-61 34.7 33.3
+fib3'.nfib Main T2552.hs:15:5-57 32.7 33.3
+fib2'.nfib Main T2552.hs:10:5-57 32.7 33.3
individual inherited
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
-MAIN MAIN <built-in> 45 0 0.0 0.0 100.0 100.0
- CAF Main <entire-module> 89 0 0.0 0.0 100.0 100.0
- main Main T2552.hs:(17,1)-(20,17) 90 1 0.0 0.0 100.0 100.0
- fib1 Main T2552.hs:(1,1)-(5,61) 92 1 0.0 0.0 37.8 33.3
- fib1.fib1' Main T2552.hs:(3,5)-(5,61) 93 1 0.0 0.0 37.8 33.3
- nfib' Main T2552.hs:3:35-40 94 1 0.0 0.0 37.8 33.3
- fib1.fib1'.nfib Main T2552.hs:5:9-61 95 1028457 37.8 33.3 37.8 33.3
- fib2 Main T2552.hs:7:1-16 96 1 0.0 0.0 31.1 33.3
- fib2' Main T2552.hs:(8,1)-(10,57) 97 1 0.0 0.0 31.1 33.3
- fib2'.nfib Main T2552.hs:10:5-57 98 1028457 31.1 33.3 31.1 33.3
- fib3 Main T2552.hs:12:1-12 99 1 0.0 0.0 0.0 0.0
- fib3' Main T2552.hs:(13,1)-(15,57) 100 1 0.0 0.0 31.1 33.3
- fib3'.nfib Main T2552.hs:15:5-57 101 1028457 31.1 33.3 31.1 33.3
- CAF GHC.IO.Handle.FD <entire-module> 84 0 0.0 0.0 0.0 0.0
- CAF GHC.IO.Handle.Text <entire-module> 83 0 0.0 0.0 0.0 0.0
- CAF GHC.Conc.Signal <entire-module> 81 0 0.0 0.0 0.0 0.0
- CAF GHC.IO.Encoding <entire-module> 78 0 0.0 0.0 0.0 0.0
- CAF GHC.IO.Encoding.Iconv <entire-module> 64 0 0.0 0.0 0.0 0.0
- main Main T2552.hs:(17,1)-(20,17) 91 0 0.0 0.0 0.0 0.0
+MAIN MAIN <built-in> 128 0 0.0 0.0 100.0 100.0
+ CAF Main <entire-module> 255 0 0.0 0.0 100.0 99.9
+ fib3 Main T2552.hs:12:1-12 265 1 0.0 0.0 0.0 0.0
+ main Main T2552.hs:(17,1)-(20,17) 256 1 0.0 0.0 100.0 99.9
+ fib1 Main T2552.hs:(1,1)-(5,61) 258 1 0.0 0.0 34.7 33.3
+ fib1.fib1' Main T2552.hs:(3,5)-(5,61) 259 1 0.0 0.0 34.7 33.3
+ nfib' Main T2552.hs:3:35-40 260 1 0.0 0.0 34.7 33.3
+ fib1.fib1'.nfib Main T2552.hs:5:9-61 261 1028457 34.7 33.3 34.7 33.3
+ fib2 Main T2552.hs:7:1-16 262 1 0.0 0.0 32.7 33.3
+ fib2' Main T2552.hs:(8,1)-(10,57) 263 1 0.0 0.0 32.7 33.3
+ fib2'.nfib Main T2552.hs:10:5-57 264 1028457 32.7 33.3 32.7 33.3
+ fib3 Main T2552.hs:12:1-12 266 0 0.0 0.0 32.7 33.3
+ fib3' Main T2552.hs:(13,1)-(15,57) 267 1 0.0 0.0 32.7 33.3
+ fib3'.nfib Main T2552.hs:15:5-57 268 1028457 32.7 33.3 32.7 33.3
+ CAF GHC.Conc.Signal <entire-module> 250 0 0.0 0.0 0.0 0.0
+ CAF GHC.IO.Encoding <entire-module> 241 0 0.0 0.0 0.0 0.0
+ CAF GHC.IO.Encoding.Iconv <entire-module> 239 0 0.0 0.0 0.0 0.0
+ CAF GHC.IO.Handle.FD <entire-module> 231 0 0.0 0.0 0.0 0.0
+ main Main T2552.hs:(17,1)-(20,17) 257 0 0.0 0.0 0.0 0.0
diff --git a/testsuite/tests/profiling/should_run/all.T b/testsuite/tests/profiling/should_run/all.T
index 0455d06f17..96a0d30bc6 100644
--- a/testsuite/tests/profiling/should_run/all.T
+++ b/testsuite/tests/profiling/should_run/all.T
@@ -93,7 +93,7 @@ test('T5314', [extra_ways(extra_prof_ways)], compile_and_run, [''])
test('T680', [], compile_and_run,
['-fno-full-laziness']) # Note [consistent stacks]
-test('T2552', [expect_broken_for_10037], compile_and_run, [''])
+test('T2552', [], compile_and_run, [''])
test('T949', [extra_ways(extra_prof_ways)], compile_and_run, [''])
@@ -101,7 +101,7 @@ test('T949', [extra_ways(extra_prof_ways)], compile_and_run, [''])
# We care more about getting the optimised results right, so ignoring
# this for now.
test('ioprof',
- [expect_broken_for_10037,
+ [normal,
exit_code(1),
omit_ways(['ghci-ext-prof']), # doesn't work with exit_code(1)
ignore_stderr
diff --git a/testsuite/tests/profiling/should_run/ioprof.prof.sample b/testsuite/tests/profiling/should_run/ioprof.prof.sample
index 52ab8ba4d2..103207d8ca 100644
--- a/testsuite/tests/profiling/should_run/ioprof.prof.sample
+++ b/testsuite/tests/profiling/should_run/ioprof.prof.sample
@@ -1,46 +1,54 @@
- Sat Jun 4 11:59 2016 Time and Allocation Profiling Report (Final)
+ Mon May 23 13:50 2022 Time and Allocation Profiling Report (Final)
ioprof +RTS -hc -p -RTS
total time = 0.00 secs (0 ticks @ 1000 us, 1 processor)
- total alloc = 180,024 bytes (excludes profiling overheads)
+ total alloc = 129,248 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
-CAF GHC.IO.Encoding <entire-module> 0.0 1.8
-CAF GHC.IO.Handle.FD <entire-module> 0.0 19.2
-CAF GHC.Exception <entire-module> 0.0 2.5
-main Main ioprof.hs:28:1-43 0.0 4.8
-errorM.\ Main ioprof.hs:23:22-28 0.0 68.7
+CAF Main <entire-module> 0.0 1.1
+main Main ioprof.hs:28:1-43 0.0 6.8
+errorM.\ Main ioprof.hs:23:22-28 0.0 56.8
+CAF GHC.IO.Handle.FD <entire-module> 0.0 26.9
+CAF GHC.IO.Exception <entire-module> 0.0 1.0
+CAF GHC.IO.Encoding <entire-module> 0.0 2.3
+CAF GHC.Exception <entire-module> 0.0 3.0
- individual inherited
-COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
+ individual inherited
+COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
-MAIN MAIN <built-in> 46 0 0.0 0.4 0.0 100.0
- CAF Main <entire-module> 91 0 0.0 0.9 0.0 69.8
- <*> Main ioprof.hs:20:5-14 96 1 0.0 0.0 0.0 0.0
- fmap Main ioprof.hs:16:5-16 100 1 0.0 0.0 0.0 0.0
- main Main ioprof.hs:28:1-43 92 1 0.0 0.0 0.0 68.9
- runM Main ioprof.hs:26:1-37 94 1 0.0 0.1 0.0 68.9
- bar Main ioprof.hs:31:1-20 95 1 0.0 0.1 0.0 68.8
- foo Main ioprof.hs:34:1-16 104 1 0.0 0.0 0.0 0.0
- errorM Main ioprof.hs:23:1-28 105 1 0.0 0.0 0.0 0.0
- <*> Main ioprof.hs:20:5-14 97 0 0.0 0.0 0.0 68.7
- >>= Main ioprof.hs:(11,3)-(12,50) 98 1 0.0 0.0 0.0 68.7
- >>=.\ Main ioprof.hs:(11,27)-(12,50) 99 2 0.0 0.0 0.0 68.7
- fmap Main ioprof.hs:16:5-16 103 0 0.0 0.0 0.0 0.0
- foo Main ioprof.hs:34:1-16 106 0 0.0 0.0 0.0 68.7
- errorM Main ioprof.hs:23:1-28 107 0 0.0 0.0 0.0 68.7
- errorM.\ Main ioprof.hs:23:22-28 108 1 0.0 68.7 0.0 68.7
- fmap Main ioprof.hs:16:5-16 101 0 0.0 0.0 0.0 0.0
- >>= Main ioprof.hs:(11,3)-(12,50) 102 1 0.0 0.0 0.0 0.0
- CAF GHC.IO.Exception <entire-module> 89 0 0.0 0.7 0.0 0.7
- CAF GHC.Exception <entire-module> 86 0 0.0 2.5 0.0 2.5
- CAF GHC.IO.Handle.FD <entire-module> 85 0 0.0 19.2 0.0 19.2
- CAF GHC.Conc.Signal <entire-module> 82 0 0.0 0.4 0.0 0.4
- CAF GHC.IO.Encoding <entire-module> 80 0 0.0 1.8 0.0 1.8
- CAF GHC.Conc.Sync <entire-module> 75 0 0.0 0.1 0.0 0.1
- CAF GHC.Stack.CCS <entire-module> 71 0 0.0 0.2 0.0 0.2
- CAF GHC.IO.Encoding.Iconv <entire-module> 64 0 0.0 0.1 0.0 0.1
- main Main ioprof.hs:28:1-43 93 0 0.0 4.8 0.0 4.8
+MAIN MAIN <built-in> 129 0 0.0 0.5 0.0 100.0
+ CAF GHC.Conc.Signal <entire-module> 233 0 0.0 0.5 0.0 0.5
+ CAF GHC.Conc.Sync <entire-module> 232 0 0.0 0.5 0.0 0.5
+ CAF GHC.Exception <entire-module> 215 0 0.0 3.0 0.0 3.0
+ CAF GHC.IO.Encoding <entire-module> 199 0 0.0 2.3 0.0 2.3
+ CAF GHC.IO.Encoding.Iconv <entire-module> 197 0 0.0 0.2 0.0 0.2
+ CAF GHC.IO.Exception <entire-module> 191 0 0.0 1.0 0.0 1.0
+ CAF GHC.IO.Handle.FD <entire-module> 188 0 0.0 26.9 0.0 26.9
+ CAF GHC.Stack.CCS <entire-module> 167 0 0.0 0.2 0.0 0.2
+ CAF GHC.Weak.Finalize <entire-module> 158 0 0.0 0.0 0.0 0.0
+ CAF Main <entire-module> 136 0 0.0 1.1 0.0 1.1
+ <*> Main ioprof.hs:20:5-14 261 1 0.0 0.0 0.0 0.0
+ fmap Main ioprof.hs:16:5-16 269 1 0.0 0.0 0.0 0.0
+ main Main ioprof.hs:28:1-43 258 1 0.0 0.0 0.0 0.0
+ main Main ioprof.hs:28:1-43 259 0 0.0 6.8 0.0 63.7
+ bar Main ioprof.hs:31:1-20 260 1 0.0 0.1 0.0 0.2
+ foo Main ioprof.hs:34:1-16 275 1 0.0 0.0 0.0 0.0
+ errorM Main ioprof.hs:23:1-28 276 1 0.0 0.0 0.0 0.0
+ <*> Main ioprof.hs:20:5-14 262 0 0.0 0.0 0.0 0.0
+ >>= Main ioprof.hs:(11,3)-(12,50) 263 1 0.0 0.0 0.0 0.0
+ fmap Main ioprof.hs:16:5-16 270 0 0.0 0.0 0.0 0.0
+ >>= Main ioprof.hs:(11,3)-(12,50) 271 1 0.0 0.0 0.0 0.0
+ runM Main ioprof.hs:26:1-37 264 1 0.0 0.0 0.0 56.8
+ bar Main ioprof.hs:31:1-20 265 0 0.0 0.0 0.0 56.8
+ <*> Main ioprof.hs:20:5-14 266 0 0.0 0.0 0.0 0.0
+ >>= Main ioprof.hs:(11,3)-(12,50) 267 0 0.0 0.0 0.0 0.0
+ >>=.\ Main ioprof.hs:(11,27)-(12,50) 268 1 0.0 0.0 0.0 0.0
+ fmap Main ioprof.hs:16:5-16 272 0 0.0 0.0 0.0 0.0
+ >>= Main ioprof.hs:(11,3)-(12,50) 273 0 0.0 0.0 0.0 0.0
+ >>=.\ Main ioprof.hs:(11,27)-(12,50) 274 1 0.0 0.0 0.0 0.0
+ foo Main ioprof.hs:34:1-16 277 0 0.0 0.0 0.0 56.8
+ errorM Main ioprof.hs:23:1-28 278 0 0.0 0.0 0.0 56.8
+ errorM.\ Main ioprof.hs:23:22-28 279 1 0.0 56.8 0.0 56.8