summaryrefslogtreecommitdiff
path: root/compiler
diff options
context:
space:
mode:
authorSimon Peyton Jones <simonpj@microsoft.com>2018-04-20 13:57:16 +0100
committerSimon Peyton Jones <simonpj@microsoft.com>2018-04-20 17:08:02 +0100
commit8b10b8968f25589b1857f12788fc79b3b142c467 (patch)
tree2750c2ccea607d1efdda84bc12cea92d3240fed9 /compiler
parent2fbe0b5171fd5639845b630faccb9a0c3b564df7 (diff)
downloadhaskell-8b10b8968f25589b1857f12788fc79b3b142c467.tar.gz
Inline wrappers earlier
This patch has a single significant change: strictness wrapper functions are inlined earlier, in phase 2 rather than phase 0. As shown by Trac #15056, this gives a better chance for RULEs to fire. Before this change, a function that would have inlined early without strictness analyss was instead inlining late. Result: applying "optimisation" made the program worse. This does not make too much difference in nofib, but I've stumbled over the problem more than once, so even a "no-change" result would be quite acceptable. Here are the headlines: -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- cacheprof -0.5% -0.5% +2.5% +2.5% 0.0% fulsom -1.0% +2.6% -0.1% -0.1% 0.0% mate -0.6% +2.4% -0.9% -0.9% 0.0% veritas -0.7% -23.2% 0.002 0.002 0.0% -------------------------------------------------------------------------------- Min -1.4% -23.2% -12.5% -15.3% 0.0% Max +0.6% +2.6% +4.4% +4.3% +19.0% Geometric Mean -0.7% -0.2% -1.4% -1.7% +0.2% * A worthwhile reduction in binary size. * Runtimes are not to be trusted much but look as if they are moving the right way. * A really big win in veritas, described in comment:1 of Trac #15056; more fusion rules fired. * I investigated the losses in 'mate' and 'fulsom'; see #15056.
Diffstat (limited to 'compiler')
-rw-r--r--compiler/stranal/WorkWrap.hs137
1 files changed, 84 insertions, 53 deletions
diff --git a/compiler/stranal/WorkWrap.hs b/compiler/stranal/WorkWrap.hs
index ac8798e56e..9557cecdfe 100644
--- a/compiler/stranal/WorkWrap.hs
+++ b/compiler/stranal/WorkWrap.hs
@@ -242,8 +242,8 @@ NOINLINE pragma to the worker.
(See Trac #13143 for a real-world example.)
-Note [Activation for workers]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Worker activation]
+~~~~~~~~~~~~~~~~~~~~~~~~
Follows on from Note [Worker-wrapper for INLINABLE functions]
It is *vital* that if the worker gets an INLINABLE pragma (from the
@@ -260,7 +260,9 @@ original activation. Consider
f y = let z = expensive y in ...
-If expensive's worker inherits the wrapper's activation, we'll get
+If expensive's worker inherits the wrapper's activation,
+we'll get this (because of the compromise in point (2) of
+Note [Wrapper activation])
{-# NOINLINE[0] $wexpensive #-}
$wexpensive x = x + 1
@@ -346,40 +348,63 @@ call:
Note [Wrapper activation]
~~~~~~~~~~~~~~~~~~~~~~~~~
-When should the wrapper inlining be active? It must not be active
-earlier than the current Activation of the Id (eg it might have a
-NOINLINE pragma). But in fact strictness analysis happens fairly
-late in the pipeline, and we want to prioritise specialisations over
-strictness. Eg if we have
- module Foo where
- f :: Num a => a -> Int -> a
- f n 0 = n -- Strict in the Int, hence wrapper
- f n x = f (n+n) (x-1)
-
- g :: Int -> Int
- g x = f x x -- Provokes a specialisation for f
-
- module Bar where
- import Foo
-
- h :: Int -> Int
- h x = f 3 x
-
-Then we want the specialisation for 'f' to kick in before the wrapper does.
-
-Now in fact the 'gentle' simplification pass encourages this, by
-having rules on, but inlinings off. But that's kind of lucky. It seems
-more robust to give the wrapper an Activation of (ActiveAfter 0),
-so that it becomes active in an importing module at the same time that
-it appears in the first place in the defining module.
-
-At one stage I tried making the wrapper inlining always-active, and
-that had a very bad effect on nofib/imaginary/x2n1; a wrapper was
-inlined before the specialisation fired.
-
-The use an inl_inline of NoUserInline to distinguish this pragma from one
-that was given by the user. In particular, CSE will not happen if there is a
-user-specified pragma, but should happen for w/w’ed things (#14186).
+When should the wrapper inlining be active?
+
+1. It must not be active earlier than the current Activation of the
+ Id
+
+2. It should be active at some point, despite (1) because of
+ Note [Worker-wrapper for NOINLINE functions]
+
+3. For ordinary functions with no pragmas we want to inline the
+ wrapper as early as possible (Trac #15056). Suppose another module
+ defines f x = g x x
+ and suppose there is some RULE for (g True True). Then if we have
+ a call (f True), we'd expect to inline 'f' and the RULE will fire.
+ But if f is w/w'd (which it might be), we want the inlining to
+ occur just as if it hadn't been.
+
+ (This only matters if f's RHS is big enough to w/w, but small
+ enough to inline given the call site, but that can happen.)
+
+4. We do not want to inline the wrapper before specialisation.
+ module Foo where
+ f :: Num a => a -> Int -> a
+ f n 0 = n -- Strict in the Int, hence wrapper
+ f n x = f (n+n) (x-1)
+
+ g :: Int -> Int
+ g x = f x x -- Provokes a specialisation for f
+
+ module Bar where
+ import Foo
+
+ h :: Int -> Int
+ h x = f 3 x
+
+ In module Bar we want to give specialisations a chance to fire
+ before inlining f's wrapper.
+
+Reminder: Note [Don't w/w INLINE things], so we don't need to worry
+ about INLINE things here.
+
+Conclusion:
+ - If the user said NOINLINE[n], respect that
+ - If the user said NOINLINE, inline the wrapper as late as
+ poss (phase 0). This is a compromise driven by (2) above
+ - Otherwise inline wrapper in phase 2. That allows the
+ 'gentle' simplification pass to apply specialisation rules
+
+Historical note: At one stage I tried making the wrapper inlining
+always-active, and that had a very bad effect on nofib/imaginary/x2n1;
+a wrapper was inlined before the specialisation fired.
+
+Note [Wrapper NoUserInline]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The use an inl_inline of NoUserInline on the wrapper distinguishes
+this pragma from one that was given by the user. In particular, CSE
+will not happen if there is a user-specified pragma, but should happen
+for w/w’ed things (#14186).
-}
tryWW :: DynFlags
@@ -475,23 +500,24 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs
Just (work_demands, join_arity, wrap_fn, work_fn) -> do
work_uniq <- getUniqueM
let work_rhs = work_fn rhs
- work_inline = inl_inline inl_prag
- work_act = case work_inline of
- -- See Note [Activation for workers]
- NoInline -> inl_act inl_prag
- _ -> wrap_act
+ work_act = case fn_inline_spec of -- See Note [Worker activation]
+ NoInline -> fn_act
+ _ -> wrap_act
+
work_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
- , inl_inline = work_inline
+ , inl_inline = fn_inline_spec
, inl_sat = Nothing
, inl_act = work_act
, inl_rule = FunLike }
- -- idl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions]
- -- idl_act: see Note [Activation for workers]
- -- inl_rule: it does not make sense for workers to be constructorlike.
+ -- inl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions]
+ -- inl_act: see Note [Worker activation]
+ -- inl_rule: it does not make sense for workers to be constructorlike.
+
work_join_arity | isJoinId fn_id = Just join_arity
| otherwise = Nothing
-- worker is join point iff wrapper is join point
-- (see Note [Don't CPR join points])
+
work_id = mkWorkerId work_uniq fn_id (exprType work_rhs)
`setIdOccInfo` occInfo fn_info
-- Copy over occurrence info from parent
@@ -523,16 +549,19 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs
worker_demand | single_call = mkWorkerDemand work_arity
| otherwise = topDmd
-
- wrap_act = ActiveAfter NoSourceText 0
wrap_rhs = wrap_fn work_id
- wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
+ wrap_act = case fn_act of -- See Note [Wrapper activation]
+ ActiveAfter {} -> fn_act
+ NeverActive -> ActiveAfter NoSourceText 0
+ _ -> ActiveAfter NoSourceText 2
+ wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
, inl_inline = NoUserInline
, inl_sat = Nothing
, inl_act = wrap_act
, inl_rule = rule_match_info }
- -- See Note [Wrapper activation]
- -- The RuleMatchInfo is (and must be) unaffected
+ -- inl_act: see Note [Wrapper activation]
+ -- inl_inline: see Note [Wrapper NoUserInline]
+ -- inl_rule: RuleMatchInfo is (and must be) unaffected
wrap_id = fn_id `setIdUnfolding` mkWwInlineRule wrap_rhs arity
`setInlinePragma` wrap_prag
@@ -550,8 +579,10 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds res_info rhs
mb_join_arity = isJoinId_maybe fn_id
rhs_fvs = exprFreeVars rhs
fun_ty = idType fn_id
- inl_prag = inlinePragInfo fn_info
- rule_match_info = inlinePragmaRuleMatchInfo inl_prag
+ fn_inl_prag = inlinePragInfo fn_info
+ fn_inline_spec = inl_inline fn_inl_prag
+ fn_act = inl_act fn_inl_prag
+ rule_match_info = inlinePragmaRuleMatchInfo fn_inl_prag
arity = arityInfo fn_info
-- The arity is set by the simplifier using exprEtaExpandArity
-- So it may be more than the number of top-level-visible lambdas