diff options
author | Simon Peyton Jones <simonpj@microsoft.com> | 2013-11-22 17:13:05 +0000 |
---|---|---|
committer | Joachim Breitner <mail@joachim-breitner.de> | 2013-12-12 11:26:58 +0000 |
commit | 80989de947dc7edb55999456d1c1e8c337efc951 (patch) | |
tree | 4c8fc179290159ee48b2995c58eaa4ce00e1ac61 /compiler/stranal | |
parent | 869f69fd4a78371c221e6d9abd69a71440a4679a (diff) | |
download | haskell-80989de947dc7edb55999456d1c1e8c337efc951.tar.gz |
Improve the handling of used-once stuff
Joachim and I are committing this onto a branch so that we can share it,
but we expect to do a bit more work before merging it onto head.
Nofib staus:
- Most programs, no change
- A few improve
- A couple get worse (cacheprof, tak, rfib)
Investigating the "get worse" set is what's holding up putting this
on head.
The major issue is this. Consider
map (f g) ys
where f's demand signature looks like
f :: <L,C1(C1(U))> -> <L,U> -> .
So 'f' is not saturated. What demand do we place on g?
Answer
C(C1(U))
That is, the inner C1 should stay, even though f is not saturated.
I found that this made a significant difference in the demand signatures
inferred in GHC.IO, which uses lots of higher-order exception handlers.
I also had to add used-once demand signatures for some of the
'catch' primops, so that we know their handlers are only called once.
Diffstat (limited to 'compiler/stranal')
-rw-r--r-- | compiler/stranal/WorkWrap.lhs | 53 | ||||
-rw-r--r-- | compiler/stranal/WwLib.lhs | 64 |
2 files changed, 58 insertions, 59 deletions
diff --git a/compiler/stranal/WorkWrap.lhs b/compiler/stranal/WorkWrap.lhs index 14a01d5097..3c7820cca3 100644 --- a/compiler/stranal/WorkWrap.lhs +++ b/compiler/stranal/WorkWrap.lhs @@ -47,9 +47,9 @@ analysis pass. \end{enumerate} and we return some ``plain'' bindings which have been -worker/wrapper-ified, meaning: +worker/wrapper-ified, meaning: -\begin{enumerate} +\begin{enumerate} \item Functions have been split into workers and wrappers where appropriate. If a function has both strictness and CPR properties @@ -156,13 +156,13 @@ It's very important to refrain from w/w-ing an INLINE function (ie one with an InlineRule) because the wrapper will then overwrite the InlineRule unfolding. -Furthermore, if the programmer has marked something as INLINE, +Furthermore, if the programmer has marked something as INLINE, we may lose by w/w'ing it. If the strictness analyser is run twice, this test also prevents wrappers (which are INLINEd) from being re-done. (You can end up with several liked-named Ids bouncing around at the same time---absolute -mischief.) +mischief.) Notice that we refrain from w/w'ing an INLINE function even if it is in a recursive group. It might not be the loop breaker. (We could @@ -179,7 +179,7 @@ one. So we leave INLINABLE things alone too. This is a slight infelicity really, because it means that adding an INLINABLE pragma could make a program a bit less efficient, -because you lose the worker/wrapper stuff. But I don't see a way +because you lose the worker/wrapper stuff. But I don't see a way to avoid that. Note [Don't w/w inline small non-loop-breaker things] @@ -214,7 +214,7 @@ When should the wrapper inlining be active? It must not be active earlier than the current Activation of the Id (eg it might have a NOINLINE pragma). But in fact strictness analysis happens fairly late in the pipeline, and we want to prioritise specialisations over -strictness. Eg if we have +strictness. Eg if we have module Foo where f :: Num a => a -> Int -> a f n 0 = n -- Strict in the Int, hence wrapper @@ -232,7 +232,7 @@ strictness. Eg if we have Then we want the specialisation for 'f' to kick in before the wrapper does. Now in fact the 'gentle' simplification pass encourages this, by -having rules on, but inlinings off. But that's kind of lucky. It seems +having rules on, but inlinings off. But that's kind of lucky. It seems more robust to give the wrapper an Activation of (ActiveAfter 0), so that it becomes active in an importing module at the same time that it appears in the first place in the defining module. @@ -252,8 +252,8 @@ tryWW dflags is_rec fn_id rhs | isNeverActive inline_act -- No point in worker/wrappering if the thing is never inlined! -- Because the no-inline prag will prevent the wrapper ever - -- being inlined at a call site. - -- + -- being inlined at a call site. + -- -- Furthermore, don't even expose strictness info = return [ (fn_id, rhs) ] @@ -286,7 +286,7 @@ tryWW dflags is_rec fn_id rhs strict_sig = strictnessInfo fn_info StrictSig (DmdType env wrap_dmds res_info) = strict_sig - -- new_fn_id has the DmdEnv zapped. + -- new_fn_id has the DmdEnv zapped. -- (a) it is never used again -- (b) it wastes space -- (c) it becomes incorrect as things are cloned, because @@ -323,14 +323,14 @@ checkSize dflags fn_id rhs thing_inside splitFun :: DynFlags -> Id -> IdInfo -> [Demand] -> DmdResult -> Expr Var -> UniqSM [(Id, CoreExpr)] splitFun dflags fn_id fn_info wrap_dmds res_info rhs - = WARN( not (wrap_dmds `lengthIs` arity), ppr fn_id <+> (ppr arity $$ ppr wrap_dmds $$ ppr res_info) ) + = WARN( not (wrap_dmds `lengthIs` arity), ppr fn_id <+> (ppr arity $$ ppr wrap_dmds $$ ppr res_info) ) (do { -- The arity should match the signature (work_demands, wrap_fn, work_fn) <- mkWwBodies dflags fun_ty wrap_dmds res_info one_shots ; work_uniq <- getUniqueM ; let work_rhs = work_fn rhs - work_id = mkWorkerId work_uniq fn_id (exprType work_rhs) + work_id = mkWorkerId work_uniq fn_id (exprType work_rhs) `setIdOccInfo` occInfo fn_info -- Copy over occurrence info from parent -- Notably whether it's a loop breaker @@ -338,20 +338,20 @@ splitFun dflags fn_id fn_info wrap_dmds res_info rhs -- seems right-er to do so `setInlineActivation` (inlinePragmaActivation inl_prag) - -- Any inline activation (which sets when inlining is active) + -- Any inline activation (which sets when inlining is active) -- on the original function is duplicated on the worker -- It *matters* that the pragma stays on the wrapper -- It seems sensible to have it on the worker too, although we - -- can't think of a compelling reason. (In ptic, INLINE things are + -- can't think of a compelling reason. (In ptic, INLINE things are -- not w/wd). However, the RuleMatchInfo is not transferred since -- it does not make sense for workers to be constructorlike. `setIdStrictness` mkClosedStrictSig work_demands work_res_info - -- Even though we may not be at top level, + -- Even though we may not be at top level, -- it's ok to give it an empty DmdEnv `setIdArity` (exprArity work_rhs) - -- Set the arity so that the Core Lint check that the + -- Set the arity so that the Core Lint check that the -- arity is consistent with the demand type goes through wrap_rhs = wrap_fn work_id @@ -377,7 +377,7 @@ splitFun dflags fn_id fn_info wrap_dmds res_info rhs fun_ty = idType fn_id inl_prag = inlinePragInfo fn_info rule_match_info = inlinePragmaRuleMatchInfo inl_prag - arity = arityInfo fn_info + arity = arityInfo fn_info -- The arity is set by the simplifier using exprEtaExpandArity -- So it may be more than the number of top-level-visible lambdas @@ -390,15 +390,12 @@ splitFun dflags fn_id fn_info wrap_dmds res_info rhs -- make the wrapper and worker have corresponding one-shot arguments too. -- Otherwise we spuriously float stuff out of case-expression join points, -- which is very annoying. -get_one_shots :: Expr Var -> [Bool] +get_one_shots :: Expr Var -> [OneShotInfo] get_one_shots (Lam b e) - | isId b = isOneShotLambda b : get_one_shots e + | isId b = idOneShotInfo b : get_one_shots e | otherwise = get_one_shots e get_one_shots (Tick _ e) = get_one_shots e -get_one_shots _ = noOneShotInfo - -noOneShotInfo :: [Bool] -noOneShotInfo = repeat False +get_one_shots _ = [] \end{code} Note [Do not split void functions] @@ -415,7 +412,7 @@ in w/w so that we don't pass the argument at all. Note [Thunk splitting] ~~~~~~~~~~~~~~~~~~~~~~ Suppose x is used strictly (never mind whether it has the CPR -property). +property). let x* = x-rhs @@ -429,8 +426,8 @@ splitThunk transforms like this: Now simplifier will transform to - case x-rhs of - I# a -> let x* = I# a + case x-rhs of + I# a -> let x* = I# a in body which is what we want. Now suppose x-rhs is itself a case: @@ -442,7 +439,7 @@ what would have happened before) which is fine. Notice that x certainly has the CPR property now! -In fact, splitThunk uses the function argument w/w splitting +In fact, splitThunk uses the function argument w/w splitting function, so that if x's demand is deeper (say U(U(L,L),L)) then the splitting will go deeper too. @@ -452,7 +449,7 @@ then the splitting will go deeper too. -- x = e -- into -- x = let x = e --- in case x of +-- in case x of -- I# y -> let x = I# y in x } -- See comments above. Is it not beautifully short? -- Moreover, it works just as well when there are diff --git a/compiler/stranal/WwLib.lhs b/compiler/stranal/WwLib.lhs index 5c4cdbdbf6..fc94c9b921 100644 --- a/compiler/stranal/WwLib.lhs +++ b/compiler/stranal/WwLib.lhs @@ -11,8 +11,8 @@ module WwLib ( mkWwBodies, mkWWstr, mkWorkerArgs, deepSplitProductType_maybe ) w import CoreSyn import CoreUtils ( exprType, mkCast ) import Id ( Id, idType, mkSysLocal, idDemandInfo, setIdDemandInfo, - isOneShotLambda, setOneShotLambda, setIdUnfolding, - setIdInfo + setIdUnfolding, + setIdInfo, idOneShotInfo, setIdOneShotInfo ) import IdInfo ( vanillaIdInfo ) import DataCon @@ -23,7 +23,7 @@ import TysPrim ( voidPrimTy ) import TysWiredIn ( tupleCon ) import Type import Coercion hiding ( substTy, substTyVarBndr ) -import BasicTypes ( TupleSort(..) ) +import BasicTypes ( TupleSort(..), OneShotInfo(..), worstOneShot ) import Literal ( absentLiteralOf ) import TyCon import UniqSupply @@ -108,7 +108,7 @@ mkWwBodies :: DynFlags -> Type -- Type of original function -> [Demand] -- Strictness of original function -> DmdResult -- Info about function result - -> [Bool] -- One-shot-ness of the function + -> [OneShotInfo] -- One-shot-ness of the function, value args only -> UniqSM ([Demand], -- Demands for worker (value) args Id -> CoreExpr, -- Wrapper body, lacking only the worker Id CoreExpr -> CoreExpr) -- Worker body, lacking the original function rhs @@ -125,8 +125,8 @@ mkWwBodies :: DynFlags -- E mkWwBodies dflags fun_ty demands res_info one_shots - = do { let arg_info = demands `zip` (one_shots ++ repeat False) - all_one_shots = all snd arg_info + = do { let arg_info = demands `zip` (one_shots ++ repeat NoOneShotInfo) + all_one_shots = foldr (worstOneShot . snd) OneShotLam arg_info ; (wrap_args, wrap_fn_args, work_fn_args, res_ty) <- mkWWargs emptyTvSubst fun_ty arg_info ; (work_args, wrap_fn_str, work_fn_str) <- mkWWstr dflags wrap_args @@ -178,7 +178,7 @@ We use the state-token type which generates no code. \begin{code} mkWorkerArgs :: DynFlags -> [Var] - -> Bool -- Whether all arguments are one-shot + -> OneShotInfo -- Whether all arguments are one-shot -> Type -- Type of body -> ([Var], -- Lambda bound args [Var]) -- Args at call site @@ -194,14 +194,11 @@ mkWorkerArgs dflags args all_one_shot res_ty -- see Note [Protecting the last value argument] -- see Note [All One-Shot Arguments of a Worker] - newArg = if all_one_shot - then setOneShotLambda voidArgId - else voidArgId + newArg = setIdOneShotInfo voidArgId all_one_shot \end{code} Note [Protecting the last value argument] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - If the user writes (\_ -> E), they might be intentionally disallowing the sharing of E. Since absence analysis and worker-wrapper are keen to remove such unused arguments, we add in a void argument to prevent @@ -215,21 +212,27 @@ so f can't be inlined *under a lambda*. Note [All One-Shot Arguments of a Worker] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Sometimes, derived joint-points are just lambda-lifted thunks, whose +Sometimes, derived join-points are just lambda-lifted thunks, whose only argument is of the unit type and is never used. This might interfere with the absence analysis, basing on which results these never-used arguments are eliminated in the worker. The additional argument `all_one_shot` of `mkWorkerArgs` is to prevent this. -An example for this phenomenon is a `treejoin` program from the -`nofib` suite, which features the following joint points: +Example. Suppose we have + foo = \p(one-shot) q(one-shot). y + 3 +Then we drop the unused args to give + foo = \pq. $wfoo void# + $wfoo = \void(one-shot). y + 3 + +But suppse foo didn't have all one-shot args: + foo = \p(not-one-shot) q(one-shot). expensive y + 3 +Then we drop the unused args to give + foo = \pq. $wfoo void# + $wfoo = \void(not-one-shot). y + 3 + +If we made the void-arg one-shot we might inline an expensive +computation for y, which would be terrible! -$j_s1l1 = - \ _ -> - case GHC.Prim.<=# 56320 y_aOy of _ { - GHC.Types.False -> $j_s1kP GHC.Prim.realWorld#; - GHC.Types.True -> ... } %************************************************************************ %* * @@ -271,8 +274,8 @@ the \x to get what we want. mkWWargs :: TvSubst -- Freshening substitution to apply to the type -- See Note [Freshen type variables] -> Type -- The type of the function - -> [(Demand,Bool)] -- Demands and one-shot info for value arguments - -> UniqSM ([Var], -- Wrapper args + -> [(Demand,OneShotInfo)] -- Demands and one-shot info for value arguments + -> UniqSM ([Var], -- Wrapper args CoreExpr -> CoreExpr, -- Wrapper fn CoreExpr -> CoreExpr, -- Worker fn Type) -- Type of wrapper body @@ -327,12 +330,11 @@ mkWWargs subst fun_ty arg_info applyToVars :: [Var] -> CoreExpr -> CoreExpr applyToVars vars fn = mkVarApps fn vars -mk_wrap_arg :: Unique -> Type -> Demand -> Bool -> Id +mk_wrap_arg :: Unique -> Type -> Demand -> OneShotInfo -> Id mk_wrap_arg uniq ty dmd one_shot - = set_one_shot one_shot (setIdDemandInfo (mkSysLocal (fsLit "w") uniq ty) dmd) - where - set_one_shot True id = setOneShotLambda id - set_one_shot False id = id + = mkSysLocal (fsLit "w") uniq ty + `setIdDemandInfo` dmd + `setIdOneShotInfo` one_shot \end{code} Note [Freshen type variables] @@ -462,13 +464,13 @@ mkWWstr_one dflags arg where dmd = idDemandInfo arg + one_shot = idOneShotInfo arg -- If the wrapper argument is a one-shot lambda, then -- so should (all) the corresponding worker arguments be -- This bites when we do w/w on a case join point - set_worker_arg_info worker_arg demand = set_one_shot (setIdDemandInfo worker_arg demand) - - set_one_shot | isOneShotLambda arg = setOneShotLambda - | otherwise = \x -> x + set_worker_arg_info worker_arg demand + = worker_arg `setIdDemandInfo` demand + `setIdOneShotInfo` one_shot ---------------------- nop_fn :: CoreExpr -> CoreExpr |