diff options
author | Simon Peyton Jones <simonpj@microsoft.com> | 2018-04-20 13:57:16 +0100 |
---|---|---|
committer | Simon Peyton Jones <simonpj@microsoft.com> | 2018-04-20 17:08:02 +0100 |
commit | 8b10b8968f25589b1857f12788fc79b3b142c467 (patch) | |
tree | 2750c2ccea607d1efdda84bc12cea92d3240fed9 /compiler | |
parent | 2fbe0b5171fd5639845b630faccb9a0c3b564df7 (diff) | |
download | haskell-8b10b8968f25589b1857f12788fc79b3b142c467.tar.gz |
Inline wrappers earlier
This patch has a single significant change:
strictness wrapper functions are inlined earlier,
in phase 2 rather than phase 0.
As shown by Trac #15056, this gives a better chance for RULEs to fire.
Before this change, a function that would have inlined early without
strictness analyss was instead inlining late. Result: applying
"optimisation" made the program worse.
This does not make too much difference in nofib, but I've stumbled
over the problem more than once, so even a "no-change" result would be
quite acceptable. Here are the headlines:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
cacheprof -0.5% -0.5% +2.5% +2.5% 0.0%
fulsom -1.0% +2.6% -0.1% -0.1% 0.0%
mate -0.6% +2.4% -0.9% -0.9% 0.0%
veritas -0.7% -23.2% 0.002 0.002 0.0%
--------------------------------------------------------------------------------
Min -1.4% -23.2% -12.5% -15.3% 0.0%
Max +0.6% +2.6% +4.4% +4.3% +19.0%
Geometric Mean -0.7% -0.2% -1.4% -1.7% +0.2%
* A worthwhile reduction in binary size.
* Runtimes are not to be trusted much but look as if they
are moving the right way.
* A really big win in veritas, described in comment:1 of
Trac #15056; more fusion rules fired.
* I investigated the losses in 'mate' and 'fulsom'; see #15056.
Diffstat (limited to 'compiler')
-rw-r--r-- | compiler/stranal/WorkWrap.hs | 137 |
1 files changed, 84 insertions, 53 deletions
diff --git a/compiler/stranal/WorkWrap.hs b/compiler/stranal/WorkWrap.hs index ac8798e56e..9557cecdfe 100644 --- a/compiler/stranal/WorkWrap.hs +++ b/compiler/stranal/WorkWrap.hs @@ -242,8 +242,8 @@ NOINLINE pragma to the worker. (See Trac #13143 for a real-world example.) -Note [Activation for workers] -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Note [Worker activation] +~~~~~~~~~~~~~~~~~~~~~~~~ Follows on from Note [Worker-wrapper for INLINABLE functions] It is *vital* that if the worker gets an INLINABLE pragma (from the @@ -260,7 +260,9 @@ original activation. Consider f y = let z = expensive y in ... -If expensive's worker inherits the wrapper's activation, we'll get +If expensive's worker inherits the wrapper's activation, +we'll get this (because of the compromise in point (2) of +Note [Wrapper activation]) {-# NOINLINE[0] $wexpensive #-} $wexpensive x = x + 1 @@ -346,40 +348,63 @@ call: Note [Wrapper activation] ~~~~~~~~~~~~~~~~~~~~~~~~~ -When should the wrapper inlining be active? It must not be active -earlier than the current Activation of the Id (eg it might have a -NOINLINE pragma). But in fact strictness analysis happens fairly -late in the pipeline, and we want to prioritise specialisations over -strictness. Eg if we have - module Foo where - f :: Num a => a -> Int -> a - f n 0 = n -- Strict in the Int, hence wrapper - f n x = f (n+n) (x-1) - - g :: Int -> Int - g x = f x x -- Provokes a specialisation for f - - module Bar where - import Foo - - h :: Int -> Int - h x = f 3 x - -Then we want the specialisation for 'f' to kick in before the wrapper does. - -Now in fact the 'gentle' simplification pass encourages this, by -having rules on, but inlinings off. But that's kind of lucky. It seems -more robust to give the wrapper an Activation of (ActiveAfter 0), -so that it becomes active in an importing module at the same time that -it appears in the first place in the defining module. - -At one stage I tried making the wrapper inlining always-active, and -that had a very bad effect on nofib/imaginary/x2n1; a wrapper was -inlined before the specialisation fired. - -The use an inl_inline of NoUserInline to distinguish this pragma from one -that was given by the user. In particular, CSE will not happen if there is a -user-specified pragma, but should happen for w/w’ed things (#14186). +When should the wrapper inlining be active? + +1. It must not be active earlier than the current Activation of the + Id + +2. It should be active at some point, despite (1) because of + Note [Worker-wrapper for NOINLINE functions] + +3. For ordinary functions with no pragmas we want to inline the + wrapper as early as possible (Trac #15056). Suppose another module + defines f x = g x x + and suppose there is some RULE for (g True True). Then if we have + a call (f True), we'd expect to inline 'f' and the RULE will fire. + But if f is w/w'd (which it might be), we want the inlining to + occur just as if it hadn't been. + + (This only matters if f's RHS is big enough to w/w, but small + enough to inline given the call site, but that can happen.) + +4. We do not want to inline the wrapper before specialisation. + module Foo where + f :: Num a => a -> Int -> a + f n 0 = n -- Strict in the Int, hence wrapper + f n x = f (n+n) (x-1) + + g :: Int -> Int + g x = f x x -- Provokes a specialisation for f + + module Bar where + import Foo + + h :: Int -> Int + h x = f 3 x + + In module Bar we want to give specialisations a chance to fire + before inlining f's wrapper. + +Reminder: Note [Don't w/w INLINE things], so we don't need to worry + about INLINE things here. + +Conclusion: + - If the user said NOINLINE[n], respect that + - If the user said NOINLINE, inline the wrapper as late as + poss (phase 0). This is a compromise driven by (2) above + - Otherwise inline wrapper in phase 2. That allows the + 'gentle' simplification pass to apply specialisation rules + +Historical note: At one stage I tried making the wrapper inlining +always-active, and that had a very bad effect on nofib/imaginary/x2n1; +a wrapper was inlined before the specialisation fired. + +Note [Wrapper NoUserInline] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The use an inl_inline of NoUserInline on the wrapper distinguishes +this pragma from one that was given by the user. In particular, CSE +will not happen if there is a user-specified pragma, but should happen +for w/w’ed things (#14186). -} tryWW :: DynFlags @@ -475,23 +500,24 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs Just (work_demands, join_arity, wrap_fn, work_fn) -> do work_uniq <- getUniqueM let work_rhs = work_fn rhs - work_inline = inl_inline inl_prag - work_act = case work_inline of - -- See Note [Activation for workers] - NoInline -> inl_act inl_prag - _ -> wrap_act + work_act = case fn_inline_spec of -- See Note [Worker activation] + NoInline -> fn_act + _ -> wrap_act + work_prag = InlinePragma { inl_src = SourceText "{-# INLINE" - , inl_inline = work_inline + , inl_inline = fn_inline_spec , inl_sat = Nothing , inl_act = work_act , inl_rule = FunLike } - -- idl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions] - -- idl_act: see Note [Activation for workers] - -- inl_rule: it does not make sense for workers to be constructorlike. + -- inl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions] + -- inl_act: see Note [Worker activation] + -- inl_rule: it does not make sense for workers to be constructorlike. + work_join_arity | isJoinId fn_id = Just join_arity | otherwise = Nothing -- worker is join point iff wrapper is join point -- (see Note [Don't CPR join points]) + work_id = mkWorkerId work_uniq fn_id (exprType work_rhs) `setIdOccInfo` occInfo fn_info -- Copy over occurrence info from parent @@ -523,16 +549,19 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs worker_demand | single_call = mkWorkerDemand work_arity | otherwise = topDmd - - wrap_act = ActiveAfter NoSourceText 0 wrap_rhs = wrap_fn work_id - wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE" + wrap_act = case fn_act of -- See Note [Wrapper activation] + ActiveAfter {} -> fn_act + NeverActive -> ActiveAfter NoSourceText 0 + _ -> ActiveAfter NoSourceText 2 + wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE" , inl_inline = NoUserInline , inl_sat = Nothing , inl_act = wrap_act , inl_rule = rule_match_info } - -- See Note [Wrapper activation] - -- The RuleMatchInfo is (and must be) unaffected + -- inl_act: see Note [Wrapper activation] + -- inl_inline: see Note [Wrapper NoUserInline] + -- inl_rule: RuleMatchInfo is (and must be) unaffected wrap_id = fn_id `setIdUnfolding` mkWwInlineRule wrap_rhs arity `setInlinePragma` wrap_prag @@ -550,8 +579,10 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs mb_join_arity = isJoinId_maybe fn_id rhs_fvs = exprFreeVars rhs fun_ty = idType fn_id - inl_prag = inlinePragInfo fn_info - rule_match_info = inlinePragmaRuleMatchInfo inl_prag + fn_inl_prag = inlinePragInfo fn_info + fn_inline_spec = inl_inline fn_inl_prag + fn_act = inl_act fn_inl_prag + rule_match_info = inlinePragmaRuleMatchInfo fn_inl_prag arity = arityInfo fn_info -- The arity is set by the simplifier using exprEtaExpandArity -- So it may be more than the number of top-level-visible lambdas |