diff options
| author | Simon Peyton Jones <simonpj@microsoft.com> | 2017-02-14 15:08:24 +0000 |
|---|---|---|
| committer | Simon Peyton Jones <simonpj@microsoft.com> | 2017-02-17 16:26:53 +0000 |
| commit | 912e71eb3b4ec91e805ecf2236d1033e55e2933a (patch) | |
| tree | 674e219d82ecded43241fd1f1f35f690ebc4f297 | |
| parent | 7188cd13f8e54efa764d52ca016b87b3669b29f5 (diff) | |
| download | haskell-912e71eb3b4ec91e805ecf2236d1033e55e2933a.tar.gz | |
The Early Inline Patch
This very small patch switches on sm_inline even in the InitialPhase
(aka "gentle" phase). There is no reason not to... and the results
are astonishing.
I think the peformance of GHC itself improves by about 5%; and some
programs get much smaller, quicker. Result: across the board irmprovements in
compile time performance. Here are the changes in perf/compiler;
the numbers are decreases in compiler bytes-allocated:
3% T5837
7% parsing001
9% T12234
35% T9020
9% T3064
13% T9961
20% T13056
5% T9872d
5% T9872c
5% T9872b
7% T9872a
5% T783
35% T12227
20% T1969
Plus in perf/should_run
5% lazy-bs-alloc
It wasn't as easy as it sounds: I did a raft of preparatory work in
earlier patches. But it's great!
23 files changed, 218 insertions, 152 deletions
diff --git a/compiler/simplCore/SimplCore.hs b/compiler/simplCore/SimplCore.hs index 3c6c3115bc..34f49ad074 100644 --- a/compiler/simplCore/SimplCore.hs +++ b/compiler/simplCore/SimplCore.hs @@ -132,6 +132,7 @@ getCoreToDo dflags rules_on = gopt Opt_EnableRewriteRules dflags eta_expand_on = gopt Opt_DoLambdaEtaExpansion dflags ww_on = gopt Opt_WorkerWrapper dflags + vectorise_on = gopt Opt_Vectorise dflags static_ptrs = xopt LangExt.StaticPointers dflags maybe_rule_check phase = runMaybe rule_check (CoreDoRuleCheck phase) @@ -160,12 +161,12 @@ getCoreToDo dflags -- We need to eliminate these common sub expressions before their definitions -- are inlined in phase 2. The CSE introduces lots of v1 = v2 bindings, -- so we also run simpl_gently to inline them. - ++ (if gopt Opt_Vectorise dflags && phase == 3 + ++ (if vectorise_on && phase == 3 then [CoreCSE, simpl_gently] else []) vectorisation - = runWhen (gopt Opt_Vectorise dflags) $ + = runWhen vectorise_on $ CoreDoPasses [ simpl_gently, CoreDoVectorisation ] -- By default, we have 2 phases before phase 0. @@ -188,7 +189,8 @@ getCoreToDo dflags (base_mode { sm_phase = InitialPhase , sm_names = ["Gentle"] , sm_rules = rules_on -- Note [RULEs enabled in SimplGently] - , sm_inline = False + , sm_inline = not vectorise_on + -- See Note [Inline in InitialPhase] , sm_case_case = False }) -- Don't do case-of-case transformations. -- This makes full laziness work better @@ -381,7 +383,35 @@ addPluginPasses builtin_passes query_plug todos (_, plug, options) = installCoreToDos plug options todos #endif -{- +{- Note [Inline in InitialPhase] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In GHC 8 and earlier we did not inline anything in the InitialPhase. But that is +confusing for users because when they say INLINE they expect the function to inline +right away. + +So now we do inlining immediately, even in the InitialPhase, assuming that the +Id's Activation allows it. + +This is a surprisingly big deal. Compiler performance improved a lot +when I made this change: + + perf/compiler/T5837.run T5837 [stat too good] (normal) + perf/compiler/parsing001.run parsing001 [stat too good] (normal) + perf/compiler/T12234.run T12234 [stat too good] (optasm) + perf/compiler/T9020.run T9020 [stat too good] (optasm) + perf/compiler/T3064.run T3064 [stat too good] (normal) + perf/compiler/T9961.run T9961 [stat too good] (normal) + perf/compiler/T13056.run T13056 [stat too good] (optasm) + perf/compiler/T9872d.run T9872d [stat too good] (normal) + perf/compiler/T783.run T783 [stat too good] (normal) + perf/compiler/T12227.run T12227 [stat too good] (normal) + perf/should_run/lazy-bs-alloc.run lazy-bs-alloc [stat too good] (normal) + perf/compiler/T1969.run T1969 [stat too good] (normal) + perf/compiler/T9872a.run T9872a [stat too good] (normal) + perf/compiler/T9872c.run T9872c [stat too good] (normal) + perf/compiler/T9872b.run T9872b [stat too good] (normal) + perf/compiler/T9872d.run T9872d [stat too good] (normal) + Note [RULEs enabled in SimplGently] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RULES are enabled when doing "gentle" simplification. Two reasons: @@ -432,7 +462,7 @@ doCorePass CoreLiberateCase = {-# SCC "LiberateCase" #-} doPassD liberateCase doCorePass CoreDoFloatInwards = {-# SCC "FloatInwards" #-} - doPassD floatInwards + floatInwards doCorePass (CoreDoFloatOutwards f) = {-# SCC "FloatOutwards" #-} doPassDUM (floatOutwards f) diff --git a/compiler/simplCore/SimplUtils.hs b/compiler/simplCore/SimplUtils.hs index 7deaf5bf0c..923a467c00 100644 --- a/compiler/simplCore/SimplUtils.hs +++ b/compiler/simplCore/SimplUtils.hs @@ -721,7 +721,8 @@ updModeForRules current_mode {- Note [Simplifying rules] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When simplifying a rule, refrain from any inlining or applying of other RULES. +When simplifying a rule LHS, refrain from /any/ inlining or applying +of other RULES. Doing anything to the LHS is plain confusing, because it means that what the rule matches is not what the user wrote. c.f. Trac #10595, and #10528. @@ -868,11 +869,17 @@ continuation. -} activeUnfolding :: SimplEnv -> Id -> Bool -activeUnfolding env - | not (sm_inline mode) = active_unfolding_minimal - | otherwise = case sm_phase mode of - InitialPhase -> active_unfolding_gentle - Phase n -> active_unfolding n +activeUnfolding env id + | isCompulsoryUnfolding (realIdUnfolding id) + = True -- Even sm_inline can't override compulsory unfoldings + | otherwise + = isActive (sm_phase mode) (idInlineActivation id) + && sm_inline mode + -- `or` isStableUnfolding (realIdUnfolding id) + -- Inline things when + -- (a) they are active + -- (b) sm_inline says so, except that for stable unfoldings + -- (ie pragmas) we inline anyway where mode = getMode env @@ -891,35 +898,13 @@ getUnfoldingInRuleMatch env id_unf id | unf_is_active id = idUnfolding id | otherwise = NoUnfolding unf_is_active id - | not (sm_rules mode) = active_unfolding_minimal id + | not (sm_rules mode) = -- active_unfolding_minimal id + isStableUnfolding (realIdUnfolding id) + -- Do we even need to test this? I think this InScopeEnv + -- is only consulted if activeRule returns True, which + -- never happens if sm_rules is False | otherwise = isActive (sm_phase mode) (idInlineActivation id) -active_unfolding_minimal :: Id -> Bool --- Compuslory unfoldings only --- Ignore SimplGently, because we want to inline regardless; --- the Id has no top-level binding at all --- --- NB: we used to have a second exception, for data con wrappers. --- On the grounds that we use gentle mode for rule LHSs, and --- they match better when data con wrappers are inlined. --- But that only really applies to the trivial wrappers (like (:)), --- and they are now constructed as Compulsory unfoldings (in MkId) --- so they'll happen anyway. -active_unfolding_minimal id = isCompulsoryUnfolding (realIdUnfolding id) - -active_unfolding :: PhaseNum -> Id -> Bool -active_unfolding n id = isActiveIn n (idInlineActivation id) - -active_unfolding_gentle :: Id -> Bool --- Anything that is early-active --- See Note [Gentle mode] -active_unfolding_gentle id - = isInlinePragma prag - && isEarlyActive (inlinePragmaActivation prag) - -- NB: wrappers are not early-active - where - prag = idInlinePragma id - ---------------------- activeRule :: SimplEnv -> Activation -> Bool -- Nothing => No rules at all @@ -1027,10 +1012,11 @@ Example ...fInt...fInt...fInt... Here f occurs just once, in the RHS of fInt. But if we inline it there -we'll lose the opportunity to inline at each of fInt's call sites. -The INLINE pragma will only inline when the application is saturated -for exactly this reason; and we don't want PreInlineUnconditionally -to second-guess it. A live example is Trac #3736. +it might make fInt look big, and we'll lose the opportunity to inline f +at each of fInt's call sites. The INLINE pragma will only inline when +the application is saturated for exactly this reason; and we don't +want PreInlineUnconditionally to second-guess it. A live example is +Trac #3736. c.f. Note [Stable unfoldings and postInlineUnconditionally] Note [Top-level bottoming Ids] diff --git a/testsuite/tests/deSugar/should_compile/T2431.stderr b/testsuite/tests/deSugar/should_compile/T2431.stderr index 83826408cf..9ae527c942 100644 --- a/testsuite/tests/deSugar/should_compile/T2431.stderr +++ b/testsuite/tests/deSugar/should_compile/T2431.stderr @@ -4,7 +4,7 @@ Result size of Tidy Core = {terms: 44, types: 34, coercions: 1, joins: 0/0} -- RHS size: {terms: 2, types: 4, coercions: 1, joins: 0/0} -T2431.$WRefl [InlPrag=INLINE] :: forall a. a :~: a +T2431.$WRefl [InlPrag=INLINE[2]] :: forall a. a :~: a [GblId[DataConWrapper], Caf=NoCafRefs, Str=m, diff --git a/testsuite/tests/indexed-types/should_compile/T7837.stderr b/testsuite/tests/indexed-types/should_compile/T7837.stderr index a4d96b1dfb..7fd0a48058 100644 --- a/testsuite/tests/indexed-types/should_compile/T7837.stderr +++ b/testsuite/tests/indexed-types/should_compile/T7837.stderr @@ -1,6 +1,5 @@ Rule fired: Class op signum Rule fired: Class op abs -Rule fired: normalize/Double -Rule fired: Class op HEq_sc Rule fired: Class op HEq_sc +Rule fired: normalize/Double Rule fired: Class op HEq_sc diff --git a/testsuite/tests/numeric/should_compile/T7116.stdout b/testsuite/tests/numeric/should_compile/T7116.stdout index ee136c24e5..681d171350 100644 --- a/testsuite/tests/numeric/should_compile/T7116.stdout +++ b/testsuite/tests/numeric/should_compile/T7116.stdout @@ -1,7 +1,7 @@ ==================== Tidy Core ==================== Result size of Tidy Core - = {terms: 50, types: 25, coercions: 0, joins: 0/0} + = {terms: 36, types: 19, coercions: 0, joins: 0/0} -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} T7116.$trModule4 :: GHC.Prim.Addr# @@ -64,7 +64,7 @@ dr = \ (x :: Double) -> case x of { GHC.Types.D# x1 -> GHC.Types.D# (GHC.Prim.+## x1 x1) } --- RHS size: {terms: 8, types: 3, coercions: 0, joins: 0/0} +-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} dl :: Double -> Double [GblId, Arity=1, @@ -75,9 +75,7 @@ dl :: Double -> Double Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False) Tmpl= \ (x [Occ=Once!] :: Double) -> case x of { GHC.Types.D# y -> GHC.Types.D# (GHC.Prim.+## y y) }}] -dl - = \ (x :: Double) -> - case x of { GHC.Types.D# y -> GHC.Types.D# (GHC.Prim.+## y y) } +dl = dr -- RHS size: {terms: 8, types: 3, coercions: 0, joins: 0/0} fr :: Float -> Float @@ -98,7 +96,7 @@ fr GHC.Types.F# (GHC.Prim.plusFloat# x1 x1) } --- RHS size: {terms: 8, types: 3, coercions: 0, joins: 0/0} +-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} fl :: Float -> Float [GblId, Arity=1, @@ -111,11 +109,7 @@ fl :: Float -> Float case x of { GHC.Types.F# y -> GHC.Types.F# (GHC.Prim.plusFloat# y y) }}] -fl - = \ (x :: Float) -> - case x of { GHC.Types.F# y -> - GHC.Types.F# (GHC.Prim.plusFloat# y y) - } +fl = fr diff --git a/testsuite/tests/perf/compiler/T4007.stdout b/testsuite/tests/perf/compiler/T4007.stdout index e7ccd42a05..59c81d9b51 100644 --- a/testsuite/tests/perf/compiler/T4007.stdout +++ b/testsuite/tests/perf/compiler/T4007.stdout @@ -1,12 +1,6 @@ -Rule fired: unpack -Rule fired: Class op >> -Rule fired: Class op return -Rule fired: Class op foldr -Rule fired: Class op >> -Rule fired: Class op return -Rule fired: Class op foldr Rule fired: Class op >> Rule fired: Class op return +Rule fired: unpack Rule fired: Class op foldr Rule fired: fold/build Rule fired: <# diff --git a/testsuite/tests/perf/compiler/all.T b/testsuite/tests/perf/compiler/all.T index 0592bd6800..cb86970563 100644 --- a/testsuite/tests/perf/compiler/all.T +++ b/testsuite/tests/perf/compiler/all.T @@ -53,6 +53,7 @@ test('T1969', # 2015-10-28 55, (amd64/Linux) emit Typeable at definition site # 2016-10-20 68, (amd64/Linux) allow top-level string literals # See the comment 16 on #8472. + compiler_stats_num_field('max_bytes_used', [(platform('i386-unknown-mingw32'), 5719436, 20), # 2010-05-17 5717704 (x86/Windows) @@ -67,6 +68,7 @@ test('T1969', # 2014-06-29 5949188 (x86/Linux) # 2015-07-11 6241108 (x86/Linux, 64bit machine) use +RTS -G1 # 2016-04-06 9093608 (x86/Linux, 64bit machine) + (wordsize(64), 19924328, 15)]), # 2014-09-10 10463640, 10 # post-AMP-update (somewhat stabelish) # looks like the peak is around ~10M, but we're @@ -80,6 +82,8 @@ test('T1969', # 2015-10-28 15017528 (amd64/Linux) emit typeable at definition site # 2016-10-12 17285216 (amd64/Linux) it's not entirely clear why # 2017-02-01 19924328 (amd64/Linux) Join points (#12988) + # 2017-02-14 16393848 Early inline patch + compiler_stats_num_field('bytes allocated', [(platform('i386-unknown-mingw32'), 301784492, 5), # 215582916 (x86/Windows) @@ -96,7 +100,8 @@ test('T1969', # 2014-06-29 303300692 (x86/Linux) # 2015-07-11 288699104 (x86/Linux, 64-bit machine) use +RTS -G1 # 2016-04-06 344730660 (x86/Linux, 64-bit machine) - (wordsize(64), 756138176, 5)]), + + (wordsize(64), 611386528, 5)]), # 17/11/2009 434845560 (amd64/Linux) # 08/12/2009 459776680 (amd64/Linux) # 17/05/2010 519377728 (amd64/Linux) @@ -117,6 +122,7 @@ test('T1969', # 03/06/2015 581460896 (x86_64/Linux) use +RTS -G1 # 28/10/2015 695430728 (x86_64/Linux) emit Typeable at definition site # 28/10/2015 756138176 (x86_64/Linux) inst-decl defaults go via typechecker (#12220) + # 2017-02-14 611386528 Early inline patch: 20% improvement only_ways(['normal']), extra_hc_opts('-dcore-lint -static'), @@ -312,7 +318,7 @@ test('T3064', # 2014-12-22: 122836340 (Windows) Death to silent superclasses # 2016-04-06: 153261024 (x86/Linux) probably wildcard refactor - (wordsize(64), 287460128, 5)]), + (wordsize(64), 259815560, 5)]), # (amd64/Linux) (28/06/2011): 73259544 # (amd64/Linux) (07/02/2013): 224798696 # (amd64/Linux) (02/08/2013): 236404384, increase from roles @@ -335,6 +341,7 @@ test('T3064', # Tracked as #11196 # (amd64/Linux) (15/4/2016): 287460128 Improvement due to using coercionKind instead # of zonkTcType (Trac #11882) + # 2017-02-14: 259815560 Early inline patch: 9% improvement ################################### # deactivated for now, as this metric became too volatile recently @@ -440,10 +447,11 @@ test('T5631', test('parsing001', [compiler_stats_num_field('bytes allocated', [(wordsize(32), 274000576, 10), - (wordsize(64), 493730288, 5)]), + (wordsize(64), 463931280, 5)]), # expected value: 587079016 (amd64/Linux) # 2016-09-01: 581551384 (amd64/Linux) Restore w/w limit (#11565) # 2016-12-19: 493730288 (amd64/Linux) Join points (#12988) + # 2017-02-14: 463931280 Early inlining patch; acutal improvement 7% only_ways(['normal']), ], compile_fail, ['']) @@ -462,7 +470,7 @@ test('T783', # 2014-12-22: 235002220 (Windows) not sure why # 2016-04-06: 249332816 (x86/Linux, 64-bit machine) - (wordsize(64), 488592288, 10)]), + (wordsize(64), 436978192, 10)]), # prev: 349263216 (amd64/Linux) # 07/08/2012: 384479856 (amd64/Linux) # 29/08/2012: 436927840 (amd64/Linux) @@ -491,6 +499,8 @@ test('T783', # (D1535: Major overhaul of pattern match checker, #11162) # 2016-02-03: 488592288 (amd64/Linux) # (D1795: Another overhaul of pattern match checker, #11374) + # 2017-02-14 436978192 Early inlining: 5% improvement + extra_hc_opts('-static') ], compile,['']) @@ -608,7 +618,7 @@ test('T5837', # 2014-12-08: 115905208 Constraint solver perf improvements (esp kick-out) # 2016-04-06: 24199320 (x86/Linux, 64-bit machine) TypeInType - (wordsize(64), 53592736, 7)]) + (wordsize(64), 48289024, 7)]) # sample: 3926235424 (amd64/Linux, 15/2/2012) # 2012-10-02 81879216 # 2012-09-20 87254264 amd64/Linux @@ -641,6 +651,7 @@ test('T5837', # 2017-02-07 53592736 amd64/Linux Simon's earlier decrease appears # to be environmentally-dependent. # Also bumped acceptance threshold to 7%. + # 2017-02-14 48289024 Early inlining patch; the actual improvement was 3% ], compile, ['-freduction-depth=50']) @@ -676,7 +687,7 @@ test('T9020', [(wordsize(32), 343005716, 10), # Original: 381360728 # 2014-07-31: 343005716 (Windows) (general round of updates) - (wordsize(64), 764866144, 10)]) + (wordsize(64), 500707080, 10)]) # prev: 795469104 # 2014-07-17: 728263536 (general round of updates) # 2014-09-10: 785871680 post-AMP-cleanup @@ -686,6 +697,8 @@ test('T9020', # 2016-04-06: 852298336 Refactoring of CSE #11781 # 2016-04-06: 698401736 Use thenIO in Applicative IO # 2017-02-03: 764866144 Join points + # 2017-02-14: 500707080 Early inline patch; 35% decrease! + # Program size collapses in first simplification ], compile,['']) @@ -737,13 +750,15 @@ test('T9675', test('T9872a', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 3134866040 , 5), + [(wordsize(64), 2921927528, 5), # 2014-12-10 5521332656 Initally created # 2014-12-16 5848657456 Flattener parameterized over roles # 2014-12-18 2680733672 Reduce type families even more eagerly # 2015-12-11 3581500440 TypeInType (see #11196) # 2016-04-07 3352882080 CSE improvements # 2016-10-19 3134866040 Refactor traceRn interface (#12617) + # 2017-02-14 2921927528 Early inlining: 7% improvement + (wordsize(32), 1740903516, 5) # was 1325592896 # 2016-04-06 1740903516 x86/Linux @@ -755,7 +770,7 @@ test('T9872a', test('T9872b', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 4069522928, 5), + [(wordsize(64), 3730686224, 5), # 2014-12-10 6483306280 Initally created # 2014-12-16 6892251912 Flattener parameterized over roles # 2014-12-18 3480212048 Reduce type families even more eagerly @@ -763,6 +778,8 @@ test('T9872b', # 2016-02-08 4918990352 Improved a bit by tyConRolesRepresentational # 2016-04-06: 4600233488 Refactoring of CSE #11781 # 2016-09-15: 4069522928 Fix #12422 + # 2017-02-14 3730686224 Early inlining: 5% improvement + (wordsize(32), 2422750696, 5) # was 1700000000 # 2016-04-06 2422750696 x86/Linux @@ -773,7 +790,7 @@ test('T9872b', test('T9872c', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 3702580928, 5), + [(wordsize(64), 3404346032, 5), # 2014-12-10 5495850096 Initally created # 2014-12-16 5842024784 Flattener parameterized over roles # 2014-12-18 2963554096 Reduce type families even more eagerly @@ -781,6 +798,8 @@ test('T9872c', # 2016-02-08 4454071184 Improved a bit by tyConRolesRepresentational # 2016-04-06: 4306667256 Refactoring of CSE #11781 # 2016-09-15: 3702580928 Fixing #12422 + # 2017-02-14 3404346032 Early inlining: 5% improvement + (wordsize(32), 2257242896, 5) # was 1500000000 # 2016-04-06 2257242896 @@ -791,7 +810,7 @@ test('T9872c', test('T9872d', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 478169352, 5), + [(wordsize(64), 441108888, 5), # 2014-12-18 796071864 Initally created # 2014-12-18 739189056 Reduce type families even more eagerly # 2015-01-07 687562440 TrieMap leaf compression @@ -801,6 +820,8 @@ test('T9872d', # 2016-03-18 506691240 optimize Unify & zonking # 2016-12-05 478169352 using tyConIsTyFamFree, I think, but only # a 1% improvement 482 -> 478 + # 2017-02-14 441108888 Early inlining: 5% improvement + (wordsize(32), 264566040, 5) # some date 328810212 # 2015-07-11 350369584 @@ -813,7 +834,7 @@ test('T9872d', test('T9961', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 571246936, 5), + [(wordsize(64), 498326216, 5), # 2015-01-12 807117816 Initally created # 2015-spring 772510192 Got better # 2015-05-22 663978160 Fix for #10370 improves it more @@ -823,6 +844,8 @@ test('T9961', # 2016-03-24 568526784 x64_64/Linux Add eqInt* variants (#11688) # 2016-09-01 537297968 x64_64/Linux Restore w/w limit (#11565) # 2016-12-19 571246936 x64_64/Linux Join points (#12988) + # 2017-02-14 498326216 Early inline patch; 13% improvement + (wordsize(32), 275264188, 5) # was 375647160 # 2016-04-06 275264188 x86/Linux @@ -901,11 +924,12 @@ test('T10547', test('T12227', [ only_ways(['normal']), compiler_stats_num_field('bytes allocated', - [(wordsize(64), 1715827784, 5), + [(wordsize(64), 1060158624, 5), # 2016-07-11 5650186880 (Windows) before fix for #12227 # 2016-07-11 1822822016 (Windows) after fix for #12227 # 2016-12-20 1715827784 after d250d493 (INLINE in Traversable dms) # (or thereabouts in the commit history) + # 2017-02-14 1060158624 Early inlining: 35% improvement ]), ], compile, @@ -928,11 +952,12 @@ test('T12234', compiler_stats_num_field('bytes allocated', [(platform('x86_64-unknown-mingw32'), 77949232, 5), # initial: 77949232 - (wordsize(64), 74374440, 5), + (wordsize(64), 68682016, 5), # initial: 72958288 # 2016-01-17: 76848856 (x86-64, Linux. drift?) # 2017-02-01: 80882208 (Use superclass instances when solving) # 2017-02-05: 74374440 (Probably OccAnal fixes) + # 2017-02-14: 68682016 Early inlining; actual improvement 9% ]), ], compile, @@ -957,6 +982,7 @@ test('T13056', # 2017-01-06 520166912 initial # 2017-01-31 546800240 Join points (#12988) # 2017-02-07 524611224 new SetLevels + # 2017-02-14 407779656 Early inline patch: 20% improvement ]), ], compile, diff --git a/testsuite/tests/perf/should_run/all.T b/testsuite/tests/perf/should_run/all.T index c70fd44399..e0f76cc97b 100644 --- a/testsuite/tests/perf/should_run/all.T +++ b/testsuite/tests/perf/should_run/all.T @@ -87,6 +87,8 @@ test('lazy-bs-alloc', # 2014-01-22: 411500 (x86/Linux) # 2014-01-28: Widen 1->2% (x86/Windows was at 425212) # 2016-04-06: 429760 (x86/Linux) no idea what happened + # 2017-02-14: 421448 Early inline patch + only_ways(['normal']), extra_run_opts('arith011.stdout'), ignore_stdout, diff --git a/testsuite/tests/simplCore/should_compile/Makefile b/testsuite/tests/simplCore/should_compile/Makefile index ef3e74ad7f..64de7ec64d 100644 --- a/testsuite/tests/simplCore/should_compile/Makefile +++ b/testsuite/tests/simplCore/should_compile/Makefile @@ -7,13 +7,17 @@ T3990: '$(TEST_HC)' $(TEST_HC_OPTS) -O -c -ddump-simpl T3990.hs | grep 'test_case' # Grep output should show an unpacked constructor +T8848: + $(RM) -f T8848.o T8848.hi + '$(TEST_HC)' $(TEST_HC_OPTS) -O -c -ddump-rule-firings T8848.hs | grep 'SPEC map2' + # Should fire twice + T9509: $(RM) -f T9509*.o T9509*.hi '$(TEST_HC)' $(TEST_HC_OPTS) -O -c T9509a.hs - '$(TEST_HC)' $(TEST_HC_OPTS) -O -c T9509.hs -funfolding-use-threshold=20 \ + '$(TEST_HC)' $(TEST_HC_OPTS) -O -c T9509.hs \ -ddump-rule-rewrites | grep SPEC - # Grep output should show a SPEC rule firing - # The unfolding use threshold is to prevent foo inlining before it is specialised + # Grep output should show a SPEC rule firing, twice T8832: $(RM) -f T8832.o T8832.hi @@ -131,6 +135,7 @@ T4138: '$(TEST_HC)' $(TEST_HC_OPTS) -O -c T4138_A.hs '$(TEST_HC)' $(TEST_HC_OPTS) -O -c T4138.hs -ddump-simpl > T4138.simpl grep -c 'F#' T4138.simpl + # We expect to see this T7165: $(RM) -f T7165.hi T7165a.hi T7165a.o T7165.o diff --git a/testsuite/tests/simplCore/should_compile/T10181.hs b/testsuite/tests/simplCore/should_compile/T10181.hs index 1983507cd2..8de148a57f 100644 --- a/testsuite/tests/simplCore/should_compile/T10181.hs +++ b/testsuite/tests/simplCore/should_compile/T10181.hs @@ -1,3 +1,8 @@ module T10181 where +-- GHC 8.0 and previous wrongly eta-reduced this to +-- t = t +-- but GHC 8.2 does not; some kind of consequence of +-- better simplification in the early stages. + t a = t a diff --git a/testsuite/tests/simplCore/should_compile/T3234.stderr b/testsuite/tests/simplCore/should_compile/T3234.stderr index 9d87b3ecc1..23cccfae7d 100644 --- a/testsuite/tests/simplCore/should_compile/T3234.stderr +++ b/testsuite/tests/simplCore/should_compile/T3234.stderr @@ -12,12 +12,15 @@ ==================== Grand total simplifier statistics ==================== Total ticks: 54 -15 PreInlineUnconditionally +18 PreInlineUnconditionally + 1 c 1 n 1 g 1 a 1 xs 1 ys + 1 c + 1 n 1 k 1 z 1 g @@ -28,11 +31,7 @@ Total ticks: 54 1 lvl 1 lvl 1 lvl -4 PostInlineUnconditionally - 1 c - 1 n - 1 c - 1 c +1 PostInlineUnconditionally 1 c 1 UnfoldingDone 1 GHC.Base.build 5 RuleFired 1 ++ @@ -67,6 +66,6 @@ Total ticks: 54 1 c 1 n 1 a -11 SimplifierDone 11 +10 SimplifierDone 10 diff --git a/testsuite/tests/simplCore/should_compile/T3772.stdout b/testsuite/tests/simplCore/should_compile/T3772.stdout index f5de5d7835..ab24d94793 100644 --- a/testsuite/tests/simplCore/should_compile/T3772.stdout +++ b/testsuite/tests/simplCore/should_compile/T3772.stdout @@ -3,6 +3,50 @@ Result size of Tidy Core = {terms: 44, types: 19, coercions: 0, joins: 0/0} +-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} +T3772.$trModule4 :: GHC.Prim.Addr# +[GblId, + Caf=NoCafRefs, + Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, + WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}] +T3772.$trModule4 = "main"# + +-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0} +T3772.$trModule3 :: GHC.Types.TrName +[GblId, + Caf=NoCafRefs, + Str=m1, + Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, + WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] +T3772.$trModule3 = GHC.Types.TrNameS T3772.$trModule4 + +-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} +T3772.$trModule2 :: GHC.Prim.Addr# +[GblId, + Caf=NoCafRefs, + Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, + WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}] +T3772.$trModule2 = "T3772"# + +-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0} +T3772.$trModule1 :: GHC.Types.TrName +[GblId, + Caf=NoCafRefs, + Str=m1, + Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, + WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] +T3772.$trModule1 = GHC.Types.TrNameS T3772.$trModule2 + +-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0} +T3772.$trModule :: GHC.Types.Module +[GblId, + Caf=NoCafRefs, + Str=m, + Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, + WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 30}] +T3772.$trModule = + GHC.Types.Module T3772.$trModule3 T3772.$trModule1 + Rec { -- RHS size: {terms: 10, types: 2, coercions: 0, joins: 0/0} $wxs :: GHC.Prim.Int# -> () @@ -39,49 +83,5 @@ foo [InlPrag=INLINE[0]] :: Int -> () foo = \ (w :: Int) -> case w of { GHC.Types.I# ww1 -> T3772.$wfoo ww1 } --- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} -T3772.$trModule2 :: GHC.Prim.Addr# -[GblId, - Caf=NoCafRefs, - Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, - WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}] -T3772.$trModule2 = "T3772"# - --- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0} -T3772.$trModule1 :: GHC.Types.TrName -[GblId, - Caf=NoCafRefs, - Str=m1, - Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, - WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] -T3772.$trModule1 = GHC.Types.TrNameS T3772.$trModule2 - --- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} -T3772.$trModule4 :: GHC.Prim.Addr# -[GblId, - Caf=NoCafRefs, - Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, - WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}] -T3772.$trModule4 = "main"# - --- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0} -T3772.$trModule3 :: GHC.Types.TrName -[GblId, - Caf=NoCafRefs, - Str=m1, - Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, - WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] -T3772.$trModule3 = GHC.Types.TrNameS T3772.$trModule4 - --- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0} -T3772.$trModule :: GHC.Types.Module -[GblId, - Caf=NoCafRefs, - Str=m, - Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, - WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 30}] -T3772.$trModule - = GHC.Types.Module T3772.$trModule3 T3772.$trModule1 - diff --git a/testsuite/tests/simplCore/should_compile/T4138.stdout b/testsuite/tests/simplCore/should_compile/T4138.stdout index 0cfbf08886..b8626c4cff 100644 --- a/testsuite/tests/simplCore/should_compile/T4138.stdout +++ b/testsuite/tests/simplCore/should_compile/T4138.stdout @@ -1 +1 @@ -2 +4 diff --git a/testsuite/tests/simplCore/should_compile/T7360.stderr b/testsuite/tests/simplCore/should_compile/T7360.stderr index e3fea9ba85..7aad0f0d54 100644 --- a/testsuite/tests/simplCore/should_compile/T7360.stderr +++ b/testsuite/tests/simplCore/should_compile/T7360.stderr @@ -4,7 +4,7 @@ Result size of Tidy Core = {terms: 94, types: 48, coercions: 0, joins: 0/0} -- RHS size: {terms: 6, types: 3, coercions: 0, joins: 0/0} -T7360.$WFoo3 [InlPrag=INLINE] :: Int -> Foo +T7360.$WFoo3 [InlPrag=INLINE[2]] :: Int -> Foo [GblId[DataConWrapper], Arity=1, Caf=NoCafRefs, diff --git a/testsuite/tests/simplCore/should_compile/T7785.hs b/testsuite/tests/simplCore/should_compile/T7785.hs index c6dacb705c..ecde1ff020 100644 --- a/testsuite/tests/simplCore/should_compile/T7785.hs +++ b/testsuite/tests/simplCore/should_compile/T7785.hs @@ -1,9 +1,21 @@ {-# LANGUAGE TypeFamilies, ConstraintKinds #-} module Foo( shared, foo, bar) where +-- module Foo where import GHC.Exts +{- +foo :: [Int] -> [Int] +foo = let f = map negate + in f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. + f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. + f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. + f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. + f . f . f . f . f . f . f . f . f . f . f . f . f + +-} + type family Domain (f :: * -> *) a :: Constraint type instance Domain [] a = () @@ -22,7 +34,7 @@ shared = let f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. f. - f + f . f . f . f . f . f . f . f . f . f . f . f . f foo xs = shared $ 0:xs bar xs = 0:shared xs diff --git a/testsuite/tests/simplCore/should_compile/T8848.hs b/testsuite/tests/simplCore/should_compile/T8848.hs index d0f48bdbda..605a7a86ad 100644 --- a/testsuite/tests/simplCore/should_compile/T8848.hs +++ b/testsuite/tests/simplCore/should_compile/T8848.hs @@ -18,10 +18,15 @@ instance A.Applicative (Shape r)=> A.Applicative (Shape (S r)) where instance Fun.Functor (Shape Z) where instance (Fun.Functor (Shape r)) => Fun.Functor (Shape (S r)) where -map2 :: (A.Applicative (Shape r))=> (a->b ->c) -> (Shape r a) -> (Shape r b) -> (Shape r c ) -map2 = \f l r -> A.pure f A.<*> l A.<*> r +map2 :: (A.Applicative (Shape r))=> (a->b ->c) -> (Shape r a) -> (Shape r b) -> [Shape r c] +-- Artifically made recursive so that it won't inline, +-- se we can see if the speicalisation happens +map2 = \f l r -> (A.pure f A.<*> l A.<*> r) : map2 f l r -{-# SPECIALIZE map2 :: (a->b->c)-> (Shape (S (S Z)) a )-> Shape (S (S Z)) b -> Shape (S (S Z)) c #-} +{-# SPECIALIZE map2 :: (a->b->c) + -> (Shape (S (S Z)) a ) + -> Shape (S (S Z)) b + -> [Shape (S (S Z)) c] #-} -map3 :: (a->b->c)-> (Shape (S (S Z)) a )-> Shape (S (S Z)) b -> Shape (S (S Z)) c +map3 :: (a->b->c)-> (Shape (S (S Z)) a )-> Shape (S (S Z)) b -> [Shape (S (S Z)) c] map3 x y z = map2 x y z
\ No newline at end of file diff --git a/testsuite/tests/simplCore/should_compile/T9509.stdout b/testsuite/tests/simplCore/should_compile/T9509.stdout index 0acd484558..c774c06096 100644 --- a/testsuite/tests/simplCore/should_compile/T9509.stdout +++ b/testsuite/tests/simplCore/should_compile/T9509.stdout @@ -1 +1,2 @@ Rule: SPEC/T9509 foo @ Int + Rule: SPEC/T9509 foo @ Int diff --git a/testsuite/tests/simplCore/should_compile/T9509a.hs b/testsuite/tests/simplCore/should_compile/T9509a.hs index bd6511eedb..b80cf6a7d4 100644 --- a/testsuite/tests/simplCore/should_compile/T9509a.hs +++ b/testsuite/tests/simplCore/should_compile/T9509a.hs @@ -6,5 +6,5 @@ foo :: Ord a => a -> IO a {-# INLINABLE foo #-} foo x = newIORef x >>= readIORef >>= \y -> case compare x y of - LT -> return x ; - _ -> return y + LT -> return x + _ -> foo x -- Recursive so it won't inline diff --git a/testsuite/tests/simplCore/should_compile/all.T b/testsuite/tests/simplCore/should_compile/all.T index 1dd4232b2d..1351680420 100644 --- a/testsuite/tests/simplCore/should_compile/all.T +++ b/testsuite/tests/simplCore/should_compile/all.T @@ -179,7 +179,9 @@ test('T8832', run_command, ['$MAKE -s --no-print-directory T8832 T8832_WORDSIZE_OPTS=' + ('-DT8832_WORDSIZE_64' if wordsize(64) else '')]) -test('T8848', only_ways(['optasm']), compile, ['-ddump-rule-firings -dsuppress-uniques']) +test('T8848', normal, + run_command, + ['$MAKE -s --no-print-directory T8848']) test('T8848a', only_ways(['optasm']), compile, ['-ddump-rules']) test('T8331', only_ways(['optasm']), compile, ['-ddump-rules']) test('T6056', only_ways(['optasm']), multimod_compile, ['T6056', '-v0 -ddump-rule-firings']) @@ -191,7 +193,7 @@ test('T10176', only_ways(['optasm']), compile, ['']) test('T10180', only_ways(['optasm']), compile, ['']) test('T10602', only_ways(['optasm']), multimod_compile, ['T10602','-v0']) test('T10627', only_ways(['optasm']), compile, ['']) -test('T10181', [expect_broken(10181), only_ways(['optasm'])], compile, ['']) +test('T10181', [only_ways(['optasm'])], compile, ['']) test('T10083', normal, run_command, diff --git a/testsuite/tests/simplCore/should_compile/rule2.stderr b/testsuite/tests/simplCore/should_compile/rule2.stderr index 572fac36a8..7444cc90a4 100644 --- a/testsuite/tests/simplCore/should_compile/rule2.stderr +++ b/testsuite/tests/simplCore/should_compile/rule2.stderr @@ -1,6 +1,6 @@ ==================== FloatOut stats: ==================== -1 Lets floated to top level; 0 Lets floated elsewhere; from 4 Lambda groups +0 Lets floated to top level; 0 Lets floated elsewhere; from 3 Lambda groups @@ -10,23 +10,20 @@ ==================== Grand total simplifier statistics ==================== -Total ticks: 15 +Total ticks: 13 -2 PreInlineUnconditionally - 1 f - 1 lvl +1 PreInlineUnconditionally 1 f 1 UnfoldingDone 1 Roman.bar 1 RuleFired 1 foo/bar 3 LetFloatFromLet 3 1 EtaReduction 1 ds -7 BetaReduction +6 BetaReduction 1 f 1 a 1 m - 1 b 1 m 1 b 1 m -9 SimplifierDone 9 +8 SimplifierDone 8 diff --git a/testsuite/tests/simplCore/should_run/T7611.hs b/testsuite/tests/simplCore/should_run/T7611.hs index 717a65518a..bd6baa4d85 100644 --- a/testsuite/tests/simplCore/should_run/T7611.hs +++ b/testsuite/tests/simplCore/should_run/T7611.hs @@ -4,6 +4,7 @@ import GHC.Exts newtype Age = Age Int +{-# NOINLINE myMap #-} myMap f [] = [] myMap f (x:xs) = f x : myMap f xs diff --git a/testsuite/tests/simplCore/should_run/simplrun002.hs b/testsuite/tests/simplCore/should_run/simplrun002.hs index e6cb8d7efc..ec4ea7c7ac 100644 --- a/testsuite/tests/simplCore/should_run/simplrun002.hs +++ b/testsuite/tests/simplCore/should_run/simplrun002.hs @@ -15,9 +15,13 @@ import System.IO.Unsafe ( unsafePerformIO ) -- Dont inline till last, to give the rule a chance sndSnd (a,(b,c)) = (a,c) +{-# NOINLINE [2] myFst #-} +-- Dont inline till last, to give the rule a chance +myFst (a,b) = a + trace x y = unsafePerformIO (hPutStr stderr x >> hPutStr stderr "\n" >> return y) -{-# RULES "foo" forall v . fst (sndSnd v) = trace "Yes" (fst v) #-} +{-# RULES "foo" forall v . myFst (sndSnd v) = trace "Yes" (fst v) #-} main :: IO () -main = print (fst (sndSnd (True, (False,True)))) +main = print (myFst (sndSnd (True, (False,True)))) diff --git a/testsuite/tests/simplCore/should_run/simplrun008.hs b/testsuite/tests/simplCore/should_run/simplrun008.hs index a562005ad4..c82040b683 100644 --- a/testsuite/tests/simplCore/should_run/simplrun008.hs +++ b/testsuite/tests/simplCore/should_run/simplrun008.hs @@ -10,9 +10,13 @@ f x = "NOT FIRED" neg :: Int -> Int neg = negate +{-# NOINLINE myord #-} +myord :: Char -> Int +myord = ord + {-# RULES "f" forall (c::Char->Int) (x::Char). f (c x) = "RULE FIRED" #-} -main = do { print (f (ord 'a')) -- Rule should fire +main = do { print (f (myord 'a')) -- Rule should fire ; print (f (neg 1)) } -- Rule should not fire |
