diff options
Diffstat (limited to 'compiler/GHC/Core/Op')
25 files changed, 30242 insertions, 1 deletions
diff --git a/compiler/GHC/Core/Op/CSE.hs b/compiler/GHC/Core/Op/CSE.hs new file mode 100644 index 0000000000..dc93dacf07 --- /dev/null +++ b/compiler/GHC/Core/Op/CSE.hs @@ -0,0 +1,799 @@ +{- +(c) The AQUA Project, Glasgow University, 1993-1998 + +\section{Common subexpression} +-} + +{-# LANGUAGE CPP #-} + +{-# OPTIONS_GHC -Wno-incomplete-record-updates #-} +{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} + +module GHC.Core.Op.CSE (cseProgram, cseOneExpr) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core.Subst +import Var ( Var ) +import VarEnv ( mkInScopeSet ) +import Id ( Id, idType, idHasRules + , idInlineActivation, setInlineActivation + , zapIdOccInfo, zapIdUsageInfo, idInlinePragma + , isJoinId, isJoinId_maybe ) +import GHC.Core.Utils ( mkAltExpr, eqExpr + , exprIsTickedString + , stripTicksE, stripTicksT, mkTicks ) +import GHC.Core.FVs ( exprFreeVars ) +import GHC.Core.Type ( tyConAppArgs ) +import GHC.Core +import Outputable +import BasicTypes +import GHC.Core.Map +import Util ( filterOut, equalLength, debugIsOn ) +import Data.List ( mapAccumL ) + +{- + Simple common sub-expression + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we see + x1 = C a b + x2 = C x1 b +we build up a reverse mapping: C a b -> x1 + C x1 b -> x2 +and apply that to the rest of the program. + +When we then see + y1 = C a b + y2 = C y1 b +we replace the C a b with x1. But then we *dont* want to +add x1 -> y1 to the mapping. Rather, we want the reverse, y1 -> x1 +so that a subsequent binding + y2 = C y1 b +will get transformed to C x1 b, and then to x2. + +So we carry an extra var->var substitution which we apply *before* looking up in the +reverse mapping. + + +Note [Shadowing] +~~~~~~~~~~~~~~~~ +We have to be careful about shadowing. +For example, consider + f = \x -> let y = x+x in + h = \x -> x+x + in ... + +Here we must *not* do CSE on the inner x+x! The simplifier used to guarantee no +shadowing, but it doesn't any more (it proved too hard), so we clone as we go. +We can simply add clones to the substitution already described. + + +Note [CSE for bindings] +~~~~~~~~~~~~~~~~~~~~~~~ +Let-bindings have two cases, implemented by addBinding. + +* SUBSTITUTE: applies when the RHS is a variable + + let x = y in ...(h x).... + + Here we want to extend the /substitution/ with x -> y, so that the + (h x) in the body might CSE with an enclosing (let v = h y in ...). + NB: the substitution maps InIds, so we extend the substitution with + a binding for the original InId 'x' + + How can we have a variable on the RHS? Doesn't the simplifier inline them? + + - First, the original RHS might have been (g z) which has CSE'd + with an enclosing (let y = g z in ...). This is super-important. + See #5996: + x1 = C a b + x2 = C x1 b + y1 = C a b + y2 = C y1 b + Here we CSE y1's rhs to 'x1', and then we must add (y1->x1) to + the substitution so that we can CSE the binding for y2. + + - Second, we use addBinding for case expression scrutinees too; + see Note [CSE for case expressions] + +* EXTEND THE REVERSE MAPPING: applies in all other cases + + let x = h y in ...(h y)... + + Here we want to extend the /reverse mapping (cs_map)/ so that + we CSE the (h y) call to x. + + Note that we use EXTEND even for a trivial expression, provided it + is not a variable or literal. In particular this /includes/ type + applications. This can be important (#13156); e.g. + case f @ Int of { r1 -> + case f @ Int of { r2 -> ... + Here we want to common-up the two uses of (f @ Int) so we can + remove one of the case expressions. + + See also Note [Corner case for case expressions] for another + reason not to use SUBSTITUTE for all trivial expressions. + +Notice that + - The SUBSTITUTE situation extends the substitution (cs_subst) + - The EXTEND situation extends the reverse mapping (cs_map) + +Notice also that in the SUBSTITUTE case we leave behind a binding + x = y +even though we /also/ carry a substitution x -> y. Can we just drop +the binding instead? Well, not at top level! See Note [Top level and +postInlineUnconditionally] in GHC.Core.Op.Simplify.Utils; and in any +case CSE applies only to the /bindings/ of the program, and we leave +it to the simplifier to propate effects to the RULES. Finally, it +doesn't seem worth the effort to discard the nested bindings because +the simplifier will do it next. + +Note [CSE for case expressions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + case scrut_expr of x { ...alts... } +This is very like a strict let-binding + let !x = scrut_expr in ... +So we use (addBinding x scrut_expr) to process scrut_expr and x, and as a +result all the stuff under Note [CSE for bindings] applies directly. + +For example: + +* Trivial scrutinee + f = \x -> case x of wild { + (a:as) -> case a of wild1 { + (p,q) -> ...(wild1:as)... + + Here, (wild1:as) is morally the same as (a:as) and hence equal to + wild. But that's not quite obvious. In the rest of the compiler we + want to keep it as (wild1:as), but for CSE purpose that's a bad + idea. + + By using addBinding we add the binding (wild1 -> a) to the substitution, + which does exactly the right thing. + + (Notice this is exactly backwards to what the simplifier does, which + is to try to replaces uses of 'a' with uses of 'wild1'.) + + This is the main reason that addBinding is called with a trivial rhs. + +* Non-trivial scrutinee + case (f x) of y { pat -> ...let z = f x in ... } + + By using addBinding we'll add (f x :-> y) to the cs_map, and + thereby CSE the inner (f x) to y. + +Note [CSE for INLINE and NOINLINE] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There are some subtle interactions of CSE with functions that the user +has marked as INLINE or NOINLINE. (Examples from Roman Leshchinskiy.) +Consider + + yes :: Int {-# NOINLINE yes #-} + yes = undefined + + no :: Int {-# NOINLINE no #-} + no = undefined + + foo :: Int -> Int -> Int {-# NOINLINE foo #-} + foo m n = n + + {-# RULES "foo/no" foo no = id #-} + + bar :: Int -> Int + bar = foo yes + +We do not expect the rule to fire. But if we do CSE, then we risk +getting yes=no, and the rule does fire. Actually, it won't because +NOINLINE means that 'yes' will never be inlined, not even if we have +yes=no. So that's fine (now; perhaps in the olden days, yes=no would +have substituted even if 'yes' was NOINLINE). + +But we do need to take care. Consider + + {-# NOINLINE bar #-} + bar = <rhs> -- Same rhs as foo + + foo = <rhs> + +If CSE produces + foo = bar +then foo will never be inlined to <rhs> (when it should be, if <rhs> +is small). The conclusion here is this: + + We should not add + <rhs> :-> bar + to the CSEnv if 'bar' has any constraints on when it can inline; + that is, if its 'activation' not always active. Otherwise we + might replace <rhs> by 'bar', and then later be unable to see that it + really was <rhs>. + +An except to the rule is when the INLINE pragma is not from the user, e.g. from +WorkWrap (see Note [Wrapper activation]). We can tell because noUserInlineSpec +is then true. + +Note that we do not (currently) do CSE on the unfolding stored inside +an Id, even if it is a 'stable' unfolding. That means that when an +unfolding happens, it is always faithful to what the stable unfolding +originally was. + +Note [CSE for stable unfoldings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + {-# Unf = Stable (\pq. build blah) #-} + foo = x + +Here 'foo' has a stable unfolding, but its (optimised) RHS is trivial. +(Turns out that this actually happens for the enumFromTo method of +the Integer instance of Enum in GHC.Enum.) Suppose moreover that foo's +stable unfolding originates from an INLINE or INLINEABLE pragma on foo. +Then we obviously do NOT want to extend the substitution with (foo->x), +because we promised to inline foo as what the user wrote. See similar Note +[Stable unfoldings and postInlineUnconditionally] in GHC.Core.Op.Simplify.Utils. + +Nor do we want to change the reverse mapping. Suppose we have + + {-# Unf = Stable (\pq. build blah) #-} + foo = <expr> + bar = <expr> + +There could conceivably be merit in rewriting the RHS of bar: + bar = foo +but now bar's inlining behaviour will change, and importing +modules might see that. So it seems dodgy and we don't do it. + +Stable unfoldings are also created during worker/wrapper when we decide +that a function's definition is so small that it should always inline. +In this case we still want to do CSE (#13340). Hence the use of +isAnyInlinePragma rather than isStableUnfolding. + +Note [Corner case for case expressions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Here is another reason that we do not use SUBSTITUTE for +all trivial expressions. Consider + case x |> co of (y::Array# Int) { ... } + +We do not want to extend the substitution with (y -> x |> co); since y +is of unlifted type, this would destroy the let/app invariant if (x |> +co) was not ok-for-speculation. + +But surely (x |> co) is ok-for-speculation, because it's a trivial +expression, and x's type is also unlifted, presumably. Well, maybe +not if you are using unsafe casts. I actually found a case where we +had + (x :: HValue) |> (UnsafeCo :: HValue ~ Array# Int) + +Note [CSE for join points?] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We must not be naive about join points in CSE: + join j = e in + if b then jump j else 1 + e +The expression (1 + jump j) is not good (see Note [Invariants on join points] in +GHC.Core). This seems to come up quite seldom, but it happens (first seen +compiling ppHtml in Haddock.Backends.Xhtml). + +We could try and be careful by tracking which join points are still valid at +each subexpression, but since join points aren't allocated or shared, there's +less to gain by trying to CSE them. (#13219) + +Note [Look inside join-point binders] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Another way how CSE for join points is tricky is + + let join foo x = (x, 42) + join bar x = (x, 42) + in … jump foo 1 … jump bar 2 … + +naively, CSE would turn this into + + let join foo x = (x, 42) + join bar = foo + in … jump foo 1 … jump bar 2 … + +but now bar is a join point that claims arity one, but its right-hand side +is not a lambda, breaking the join-point invariant (this was #15002). + +So `cse_bind` must zoom past the lambdas of a join point (using +`collectNBinders`) and resume searching for CSE opportunities only in +the body of the join point. + +Note [CSE for recursive bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + f = \x ... f.... + g = \y ... g ... +where the "..." are identical. Could we CSE them? In full generality +with mutual recursion it's quite hard; but for self-recursive bindings +(which are very common) it's rather easy: + +* Maintain a separate cs_rec_map, that maps + (\f. (\x. ...f...) ) -> f + Note the \f in the domain of the mapping! + +* When we come across the binding for 'g', look up (\g. (\y. ...g...)) + Bingo we get a hit. So we can replace the 'g' binding with + g = f + +We can't use cs_map for this, because the key isn't an expression of +the program; it's a kind of synthetic key for recursive bindings. + + +************************************************************************ +* * +\section{Common subexpression} +* * +************************************************************************ +-} + +cseProgram :: CoreProgram -> CoreProgram +cseProgram binds = snd (mapAccumL (cseBind TopLevel) emptyCSEnv binds) + +cseBind :: TopLevelFlag -> CSEnv -> CoreBind -> (CSEnv, CoreBind) +cseBind toplevel env (NonRec b e) + = (env2, NonRec b2 e2) + where + (env1, b1) = addBinder env b + (env2, (b2, e2)) = cse_bind toplevel env1 (b,e) b1 + +cseBind toplevel env (Rec [(in_id, rhs)]) + | noCSE in_id + = (env1, Rec [(out_id, rhs')]) + + -- See Note [CSE for recursive bindings] + | Just previous <- lookupCSRecEnv env out_id rhs'' + , let previous' = mkTicks ticks previous + out_id' = delayInlining toplevel out_id + = -- We have a hit in the recursive-binding cache + (extendCSSubst env1 in_id previous', NonRec out_id' previous') + + | otherwise + = (extendCSRecEnv env1 out_id rhs'' id_expr', Rec [(zapped_id, rhs')]) + + where + (env1, [out_id]) = addRecBinders env [in_id] + rhs' = cseExpr env1 rhs + rhs'' = stripTicksE tickishFloatable rhs' + ticks = stripTicksT tickishFloatable rhs' + id_expr' = varToCoreExpr out_id + zapped_id = zapIdUsageInfo out_id + +cseBind toplevel env (Rec pairs) + = (env2, Rec pairs') + where + (env1, bndrs1) = addRecBinders env (map fst pairs) + (env2, pairs') = mapAccumL do_one env1 (zip pairs bndrs1) + + do_one env (pr, b1) = cse_bind toplevel env pr b1 + +-- | Given a binding of @in_id@ to @in_rhs@, and a fresh name to refer +-- to @in_id@ (@out_id@, created from addBinder or addRecBinders), +-- first try to CSE @in_rhs@, and then add the resulting (possibly CSE'd) +-- binding to the 'CSEnv', so that we attempt to CSE any expressions +-- which are equal to @out_rhs@. +cse_bind :: TopLevelFlag -> CSEnv -> (InId, InExpr) -> OutId -> (CSEnv, (OutId, OutExpr)) +cse_bind toplevel env (in_id, in_rhs) out_id + | isTopLevel toplevel, exprIsTickedString in_rhs + -- See Note [Take care with literal strings] + = (env', (out_id', in_rhs)) + + | Just arity <- isJoinId_maybe in_id + -- See Note [Look inside join-point binders] + = let (params, in_body) = collectNBinders arity in_rhs + (env', params') = addBinders env params + out_body = tryForCSE env' in_body + in (env, (out_id, mkLams params' out_body)) + + | otherwise + = (env', (out_id'', out_rhs)) + where + (env', out_id') = addBinding env in_id out_id out_rhs + (cse_done, out_rhs) = try_for_cse env in_rhs + out_id'' | cse_done = delayInlining toplevel out_id' + | otherwise = out_id' + +delayInlining :: TopLevelFlag -> Id -> Id +-- Add a NOINLINE[2] if the Id doesn't have an INLNE pragma already +-- See Note [Delay inlining after CSE] +delayInlining top_lvl bndr + | isTopLevel top_lvl + , isAlwaysActive (idInlineActivation bndr) + , idHasRules bndr -- Only if the Id has some RULES, + -- which might otherwise get lost + -- These rules are probably auto-generated specialisations, + -- since Ids with manual rules usually have manually-inserted + -- delayed inlining anyway + = bndr `setInlineActivation` activeAfterInitial + | otherwise + = bndr + +addBinding :: CSEnv -- Includes InId->OutId cloning + -> InVar -- Could be a let-bound type + -> OutId -> OutExpr -- Processed binding + -> (CSEnv, OutId) -- Final env, final bndr +-- Extend the CSE env with a mapping [rhs -> out-id] +-- unless we can instead just substitute [in-id -> rhs] +-- +-- It's possible for the binder to be a type variable (see +-- Note [Type-let] in GHC.Core), in which case we can just substitute. +addBinding env in_id out_id rhs' + | not (isId in_id) = (extendCSSubst env in_id rhs', out_id) + | noCSE in_id = (env, out_id) + | use_subst = (extendCSSubst env in_id rhs', out_id) + | otherwise = (extendCSEnv env rhs' id_expr', zapped_id) + where + id_expr' = varToCoreExpr out_id + zapped_id = zapIdUsageInfo out_id + -- Putting the Id into the cs_map makes it possible that + -- it'll become shared more than it is now, which would + -- invalidate (the usage part of) its demand info. + -- This caused #100218. + -- Easiest thing is to zap the usage info; subsequently + -- performing late demand-analysis will restore it. Don't zap + -- the strictness info; it's not necessary to do so, and losing + -- it is bad for performance if you don't do late demand + -- analysis + + -- Should we use SUBSTITUTE or EXTEND? + -- See Note [CSE for bindings] + use_subst = case rhs' of + Var {} -> True + _ -> False + +-- | Given a binder `let x = e`, this function +-- determines whether we should add `e -> x` to the cs_map +noCSE :: InId -> Bool +noCSE id = not (isAlwaysActive (idInlineActivation id)) && + not (noUserInlineSpec (inlinePragmaSpec (idInlinePragma id))) + -- See Note [CSE for INLINE and NOINLINE] + || isAnyInlinePragma (idInlinePragma id) + -- See Note [CSE for stable unfoldings] + || isJoinId id + -- See Note [CSE for join points?] + + +{- Note [Take care with literal strings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this example: + + x = "foo"# + y = "foo"# + ...x...y...x...y.... + +We would normally turn this into: + + x = "foo"# + y = x + ...x...x...x...x.... + +But this breaks an invariant of Core, namely that the RHS of a top-level binding +of type Addr# must be a string literal, not another variable. See Note +[Core top-level string literals] in GHC.Core. + +For this reason, we special case top-level bindings to literal strings and leave +the original RHS unmodified. This produces: + + x = "foo"# + y = "foo"# + ...x...x...x...x.... + +Now 'y' will be discarded as dead code, and we are done. + +The net effect is that for the y-binding we want to + - Use SUBSTITUTE, by extending the substitution with y :-> x + - but leave the original binding for y undisturbed + +This is done by cse_bind. I got it wrong the first time (#13367). + +Note [Delay inlining after CSE] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose (#15445) we have + f,g :: Num a => a -> a + f x = ...f (x-1)..... + g y = ...g (y-1) .... + +and we make some specialisations of 'g', either automatically, or via +a SPECIALISE pragma. Then CSE kicks in and notices that the RHSs of +'f' and 'g' are identical, so we get + f x = ...f (x-1)... + g = f + {-# RULES g @Int _ = $sg #-} + +Now there is terrible danger that, in an importing module, we'll inline +'g' before we have a chance to run its specialisation! + +Solution: during CSE, after a "hit" in the CSE cache + * when adding a binding + g = f + * for a top-level function g + * and g has specialisation RULES +add a NOINLINE[2] activation to it, to ensure it's not inlined +right away. + +Notes: +* Why top level only? Because for nested bindings we are already past + phase 2 and will never return there. + +* Why "only if g has RULES"? Because there is no point in + doing this if there are no RULES; and other things being + equal it delays optimisation to delay inlining (#17409) + + +---- Historical note --- + +This patch is simpler and more direct than an earlier +version: + + commit 2110738b280543698407924a16ac92b6d804dc36 + Author: Simon Peyton Jones <simonpj@microsoft.com> + Date: Mon Jul 30 13:43:56 2018 +0100 + + Don't inline functions with RULES too early + +We had to revert this patch because it made GHC itself slower. + +Why? It delayed inlining of /all/ functions with RULES, and that was +very bad in TcFlatten.flatten_ty_con_app + +* It delayed inlining of liftM +* That delayed the unravelling of the recursion in some dictionary + bindings. +* That delayed some eta expansion, leaving + flatten_ty_con_app = \x y. let <stuff> in \z. blah +* That allowed the float-out pass to put sguff between + the \y and \z. +* And that permanently stopped eta expansion of the function, + even once <stuff> was simplified. + +-} + +tryForCSE :: CSEnv -> InExpr -> OutExpr +tryForCSE env expr = snd (try_for_cse env expr) + +try_for_cse :: CSEnv -> InExpr -> (Bool, OutExpr) +-- (False, e') => We did not CSE the entire expression, +-- but we might have CSE'd some sub-expressions, +-- yielding e' +-- +-- (True, te') => We CSE'd the entire expression, +-- yielding the trivial expression te' +try_for_cse env expr + | Just e <- lookupCSEnv env expr'' = (True, mkTicks ticks e) + | otherwise = (False, expr') + -- The varToCoreExpr is needed if we have + -- case e of xco { ...case e of yco { ... } ... } + -- Then CSE will substitute yco -> xco; + -- but these are /coercion/ variables + where + expr' = cseExpr env expr + expr'' = stripTicksE tickishFloatable expr' + ticks = stripTicksT tickishFloatable expr' + -- We don't want to lose the source notes when a common sub + -- expression gets eliminated. Hence we push all (!) of them on + -- top of the replaced sub-expression. This is probably not too + -- useful in practice, but upholds our semantics. + +-- | Runs CSE on a single expression. +-- +-- This entry point is not used in the compiler itself, but is provided +-- as a convenient entry point for users of the GHC API. +cseOneExpr :: InExpr -> OutExpr +cseOneExpr e = cseExpr env e + where env = emptyCSEnv {cs_subst = mkEmptySubst (mkInScopeSet (exprFreeVars e)) } + +cseExpr :: CSEnv -> InExpr -> OutExpr +cseExpr env (Type t) = Type (substTy (csEnvSubst env) t) +cseExpr env (Coercion c) = Coercion (substCo (csEnvSubst env) c) +cseExpr _ (Lit lit) = Lit lit +cseExpr env (Var v) = lookupSubst env v +cseExpr env (App f a) = App (cseExpr env f) (tryForCSE env a) +cseExpr env (Tick t e) = Tick t (cseExpr env e) +cseExpr env (Cast e co) = Cast (tryForCSE env e) (substCo (csEnvSubst env) co) +cseExpr env (Lam b e) = let (env', b') = addBinder env b + in Lam b' (cseExpr env' e) +cseExpr env (Let bind e) = let (env', bind') = cseBind NotTopLevel env bind + in Let bind' (cseExpr env' e) +cseExpr env (Case e bndr ty alts) = cseCase env e bndr ty alts + +cseCase :: CSEnv -> InExpr -> InId -> InType -> [InAlt] -> OutExpr +cseCase env scrut bndr ty alts + = Case scrut1 bndr3 ty' $ + combineAlts alt_env (map cse_alt alts) + where + ty' = substTy (csEnvSubst env) ty + scrut1 = tryForCSE env scrut + + bndr1 = zapIdOccInfo bndr + -- Zapping the OccInfo is needed because the extendCSEnv + -- in cse_alt may mean that a dead case binder + -- becomes alive, and Lint rejects that + (env1, bndr2) = addBinder env bndr1 + (alt_env, bndr3) = addBinding env1 bndr bndr2 scrut1 + -- addBinding: see Note [CSE for case expressions] + + con_target :: OutExpr + con_target = lookupSubst alt_env bndr + + arg_tys :: [OutType] + arg_tys = tyConAppArgs (idType bndr3) + + -- See Note [CSE for case alternatives] + cse_alt (DataAlt con, args, rhs) + = (DataAlt con, args', tryForCSE new_env rhs) + where + (env', args') = addBinders alt_env args + new_env = extendCSEnv env' con_expr con_target + con_expr = mkAltExpr (DataAlt con) args' arg_tys + + cse_alt (con, args, rhs) + = (con, args', tryForCSE env' rhs) + where + (env', args') = addBinders alt_env args + +combineAlts :: CSEnv -> [OutAlt] -> [OutAlt] +-- See Note [Combine case alternatives] +combineAlts env alts + | (Just alt1, rest_alts) <- find_bndr_free_alt alts + , (_,bndrs1,rhs1) <- alt1 + , let filtered_alts = filterOut (identical_alt rhs1) rest_alts + , not (equalLength rest_alts filtered_alts) + = ASSERT2( null bndrs1, ppr alts ) + (DEFAULT, [], rhs1) : filtered_alts + + | otherwise + = alts + where + in_scope = substInScope (csEnvSubst env) + + find_bndr_free_alt :: [CoreAlt] -> (Maybe CoreAlt, [CoreAlt]) + -- The (Just alt) is a binder-free alt + -- See Note [Combine case alts: awkward corner] + find_bndr_free_alt [] + = (Nothing, []) + find_bndr_free_alt (alt@(_,bndrs,_) : alts) + | null bndrs = (Just alt, alts) + | otherwise = case find_bndr_free_alt alts of + (mb_bf, alts) -> (mb_bf, alt:alts) + + identical_alt rhs1 (_,_,rhs) = eqExpr in_scope rhs1 rhs + -- Even if this alt has binders, they will have been cloned + -- If any of these binders are mentioned in 'rhs', then + -- 'rhs' won't compare equal to 'rhs1' (which is from an + -- alt with no binders). + +{- Note [CSE for case alternatives] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider case e of x + K1 y -> ....(K1 y)... + K2 -> ....K2.... + +We definitely want to CSE that (K1 y) into just x. + +But what about the lone K2? At first you would think "no" because +turning K2 into 'x' increases the number of live variables. But + +* Turning K2 into x increases the chance of combining identical alts. + Example case xs of + (_:_) -> f xs + [] -> f [] + See #17901 and simplCore/should_compile/T17901 for more examples + of this kind. + +* The next run of the simplifier will turn 'x' back into K2, so we won't + permanently bloat the free-var count. + + +Note [Combine case alternatives] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +combineAlts is just a more heavyweight version of the use of +combineIdenticalAlts in GHC.Core.Op.Simplify.Utils.prepareAlts. The basic idea is +to transform + + DEFAULT -> e1 + K x -> e1 + W y z -> e2 +===> + DEFAULT -> e1 + W y z -> e2 + +In the simplifier we use cheapEqExpr, because it is called a lot. +But here in CSE we use the full eqExpr. After all, two alternatives usually +differ near the root, so it probably isn't expensive to compare the full +alternative. It seems like the same kind of thing that CSE is supposed +to be doing, which is why I put it here. + +I actually saw some examples in the wild, where some inlining made e1 too +big for cheapEqExpr to catch it. + +Note [Combine case alts: awkward corner] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We would really like to check isDeadBinder on the binders in the +alternative. But alas, the simplifer zaps occ-info on binders in case +alternatives; see Note [Case alternative occ info] in GHC.Core.Op.Simplify. + +* One alternative (perhaps a good one) would be to do OccAnal + just before CSE. Then perhaps we could get rid of combineIdenticalAlts + in the Simplifier, which might save work. + +* Another would be for CSE to return free vars as it goes. + +* But the current solution is to find a nullary alternative (including + the DEFAULT alt, if any). This will not catch + case x of + A y -> blah + B z p -> blah + where no alternative is nullary or DEFAULT. But the current + solution is at least cheap. + + +************************************************************************ +* * +\section{The CSE envt} +* * +************************************************************************ +-} + +data CSEnv + = CS { cs_subst :: Subst -- Maps InBndrs to OutExprs + -- The substitution variables to + -- /trivial/ OutExprs, not arbitrary expressions + + , cs_map :: CoreMap OutExpr -- The reverse mapping + -- Maps a OutExpr to a /trivial/ OutExpr + -- The key of cs_map is stripped of all Ticks + + , cs_rec_map :: CoreMap OutExpr + -- See Note [CSE for recursive bindings] + } + +emptyCSEnv :: CSEnv +emptyCSEnv = CS { cs_map = emptyCoreMap, cs_rec_map = emptyCoreMap + , cs_subst = emptySubst } + +lookupCSEnv :: CSEnv -> OutExpr -> Maybe OutExpr +lookupCSEnv (CS { cs_map = csmap }) expr + = lookupCoreMap csmap expr + +extendCSEnv :: CSEnv -> OutExpr -> OutExpr -> CSEnv +extendCSEnv cse expr triv_expr + = cse { cs_map = extendCoreMap (cs_map cse) sexpr triv_expr } + where + sexpr = stripTicksE tickishFloatable expr + +extendCSRecEnv :: CSEnv -> OutId -> OutExpr -> OutExpr -> CSEnv +-- See Note [CSE for recursive bindings] +extendCSRecEnv cse bndr expr triv_expr + = cse { cs_rec_map = extendCoreMap (cs_rec_map cse) (Lam bndr expr) triv_expr } + +lookupCSRecEnv :: CSEnv -> OutId -> OutExpr -> Maybe OutExpr +-- See Note [CSE for recursive bindings] +lookupCSRecEnv (CS { cs_rec_map = csmap }) bndr expr + = lookupCoreMap csmap (Lam bndr expr) + +csEnvSubst :: CSEnv -> Subst +csEnvSubst = cs_subst + +lookupSubst :: CSEnv -> Id -> OutExpr +lookupSubst (CS { cs_subst = sub}) x = lookupIdSubst (text "CSE.lookupSubst") sub x + +extendCSSubst :: CSEnv -> Id -> CoreExpr -> CSEnv +extendCSSubst cse x rhs = cse { cs_subst = extendSubst (cs_subst cse) x rhs } + +-- | Add clones to the substitution to deal with shadowing. See +-- Note [Shadowing] for more details. You should call this whenever +-- you go under a binder. +addBinder :: CSEnv -> Var -> (CSEnv, Var) +addBinder cse v = (cse { cs_subst = sub' }, v') + where + (sub', v') = substBndr (cs_subst cse) v + +addBinders :: CSEnv -> [Var] -> (CSEnv, [Var]) +addBinders cse vs = (cse { cs_subst = sub' }, vs') + where + (sub', vs') = substBndrs (cs_subst cse) vs + +addRecBinders :: CSEnv -> [Id] -> (CSEnv, [Id]) +addRecBinders cse vs = (cse { cs_subst = sub' }, vs') + where + (sub', vs') = substRecBndrs (cs_subst cse) vs diff --git a/compiler/GHC/Core/Op/CallArity.hs b/compiler/GHC/Core/Op/CallArity.hs new file mode 100644 index 0000000000..aaf3372071 --- /dev/null +++ b/compiler/GHC/Core/Op/CallArity.hs @@ -0,0 +1,763 @@ +-- +-- Copyright (c) 2014 Joachim Breitner +-- + +module GHC.Core.Op.CallArity + ( callArityAnalProgram + , callArityRHS -- for testing + ) where + +import GhcPrelude + +import VarSet +import VarEnv +import GHC.Driver.Session ( DynFlags ) + +import BasicTypes +import GHC.Core +import Id +import GHC.Core.Arity ( typeArity ) +import GHC.Core.Utils ( exprIsCheap, exprIsTrivial ) +import UnVarGraph +import Demand +import Util + +import Control.Arrow ( first, second ) + + +{- +%************************************************************************ +%* * + Call Arity Analysis +%* * +%************************************************************************ + +Note [Call Arity: The goal] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The goal of this analysis is to find out if we can eta-expand a local function, +based on how it is being called. The motivating example is this code, +which comes up when we implement foldl using foldr, and do list fusion: + + let go = \x -> let d = case ... of + False -> go (x+1) + True -> id + in \z -> d (x + z) + in go 1 0 + +If we do not eta-expand `go` to have arity 2, we are going to allocate a lot of +partial function applications, which would be bad. + +The function `go` has a type of arity two, but only one lambda is manifest. +Furthermore, an analysis that only looks at the RHS of go cannot be sufficient +to eta-expand go: If `go` is ever called with one argument (and the result used +multiple times), we would be doing the work in `...` multiple times. + +So `callArityAnalProgram` looks at the whole let expression to figure out if +all calls are nice, i.e. have a high enough arity. It then stores the result in +the `calledArity` field of the `IdInfo` of `go`, which the next simplifier +phase will eta-expand. + +The specification of the `calledArity` field is: + + No work will be lost if you eta-expand me to the arity in `calledArity`. + +What we want to know for a variable +----------------------------------- + +For every let-bound variable we'd like to know: + 1. A lower bound on the arity of all calls to the variable, and + 2. whether the variable is being called at most once or possible multiple + times. + +It is always ok to lower the arity, or pretend that there are multiple calls. +In particular, "Minimum arity 0 and possible called multiple times" is always +correct. + + +What we want to know from an expression +--------------------------------------- + +In order to obtain that information for variables, we analyze expression and +obtain bits of information: + + I. The arity analysis: + For every variable, whether it is absent, or called, + and if called, which what arity. + + II. The Co-Called analysis: + For every two variables, whether there is a possibility that both are being + called. + We obtain as a special case: For every variables, whether there is a + possibility that it is being called twice. + +For efficiency reasons, we gather this information only for a set of +*interesting variables*, to avoid spending time on, e.g., variables from pattern matches. + +The two analysis are not completely independent, as a higher arity can improve +the information about what variables are being called once or multiple times. + +Note [Analysis I: The arity analysis] +------------------------------------ + +The arity analysis is quite straight forward: The information about an +expression is an + VarEnv Arity +where absent variables are bound to Nothing and otherwise to a lower bound to +their arity. + +When we analyze an expression, we analyze it with a given context arity. +Lambdas decrease and applications increase the incoming arity. Analysizing a +variable will put that arity in the environment. In lets or cases all the +results from the various subexpressions are lubed, which takes the point-wise +minimum (considering Nothing an infinity). + + +Note [Analysis II: The Co-Called analysis] +------------------------------------------ + +The second part is more sophisticated. For reasons explained below, it is not +sufficient to simply know how often an expression evaluates a variable. Instead +we need to know which variables are possibly called together. + +The data structure here is an undirected graph of variables, which is provided +by the abstract + UnVarGraph + +It is safe to return a larger graph, i.e. one with more edges. The worst case +(i.e. the least useful and always correct result) is the complete graph on all +free variables, which means that anything can be called together with anything +(including itself). + +Notation for the following: +C(e) is the co-called result for e. +G₁∪G₂ is the union of two graphs +fv is the set of free variables (conveniently the domain of the arity analysis result) +S₁×S₂ is the complete bipartite graph { {a,b} | a ∈ S₁, b ∈ S₂ } +S² is the complete graph on the set of variables S, S² = S×S +C'(e) is a variant for bound expression: + If e is called at most once, or it is and stays a thunk (after the analysis), + it is simply C(e). Otherwise, the expression can be called multiple times + and we return (fv e)² + +The interesting cases of the analysis: + * Var v: + No other variables are being called. + Return {} (the empty graph) + * Lambda v e, under arity 0: + This means that e can be evaluated many times and we cannot get + any useful co-call information. + Return (fv e)² + * Case alternatives alt₁,alt₂,...: + Only one can be execuded, so + Return (alt₁ ∪ alt₂ ∪...) + * App e₁ e₂ (and analogously Case scrut alts), with non-trivial e₂: + We get the results from both sides, with the argument evaluated at most once. + Additionally, anything called by e₁ can possibly be called with anything + from e₂. + Return: C(e₁) ∪ C(e₂) ∪ (fv e₁) × (fv e₂) + * App e₁ x: + As this is already in A-normal form, CorePrep will not separately lambda + bind (and hence share) x. So we conservatively assume multiple calls to x here + Return: C(e₁) ∪ (fv e₁) × {x} ∪ {(x,x)} + * Let v = rhs in body: + In addition to the results from the subexpressions, add all co-calls from + everything that the body calls together with v to everything that is called + by v. + Return: C'(rhs) ∪ C(body) ∪ (fv rhs) × {v'| {v,v'} ∈ C(body)} + * Letrec v₁ = rhs₁ ... vₙ = rhsₙ in body + Tricky. + We assume that it is really mutually recursive, i.e. that every variable + calls one of the others, and that this is strongly connected (otherwise we + return an over-approximation, so that's ok), see note [Recursion and fixpointing]. + + Let V = {v₁,...vₙ}. + Assume that the vs have been analysed with an incoming demand and + cardinality consistent with the final result (this is the fixed-pointing). + Again we can use the results from all subexpressions. + In addition, for every variable vᵢ, we need to find out what it is called + with (call this set Sᵢ). There are two cases: + * If vᵢ is a function, we need to go through all right-hand-sides and bodies, + and collect every variable that is called together with any variable from V: + Sᵢ = {v' | j ∈ {1,...,n}, {v',vⱼ} ∈ C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ C(body) } + * If vᵢ is a thunk, then its rhs is evaluated only once, so we need to + exclude it from this set: + Sᵢ = {v' | j ∈ {1,...,n}, j≠i, {v',vⱼ} ∈ C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ C(body) } + Finally, combine all this: + Return: C(body) ∪ + C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ + (fv rhs₁) × S₁) ∪ ... ∪ (fv rhsₙ) × Sₙ) + +Using the result: Eta-Expansion +------------------------------- + +We use the result of these two analyses to decide whether we can eta-expand the +rhs of a let-bound variable. + +If the variable is already a function (exprIsCheap), and all calls to the +variables have a higher arity than the current manifest arity (i.e. the number +of lambdas), expand. + +If the variable is a thunk we must be careful: Eta-Expansion will prevent +sharing of work, so this is only safe if there is at most one call to the +function. Therefore, we check whether {v,v} ∈ G. + + Example: + + let n = case .. of .. -- A thunk! + in n 0 + n 1 + + vs. + + let n = case .. of .. + in case .. of T -> n 0 + F -> n 1 + + We are only allowed to eta-expand `n` if it is going to be called at most + once in the body of the outer let. So we need to know, for each variable + individually, that it is going to be called at most once. + + +Why the co-call graph? +---------------------- + +Why is it not sufficient to simply remember which variables are called once and +which are called multiple times? It would be in the previous example, but consider + + let n = case .. of .. + in case .. of + True -> let go = \y -> case .. of + True -> go (y + n 1) + False > n + in go 1 + False -> n + +vs. + + let n = case .. of .. + in case .. of + True -> let go = \y -> case .. of + True -> go (y+1) + False > n + in go 1 + False -> n + +In both cases, the body and the rhs of the inner let call n at most once. +But only in the second case that holds for the whole expression! The +crucial difference is that in the first case, the rhs of `go` can call +*both* `go` and `n`, and hence can call `n` multiple times as it recurses, +while in the second case find out that `go` and `n` are not called together. + + +Why co-call information for functions? +-------------------------------------- + +Although for eta-expansion we need the information only for thunks, we still +need to know whether functions are being called once or multiple times, and +together with what other functions. + + Example: + + let n = case .. of .. + f x = n (x+1) + in f 1 + f 2 + + vs. + + let n = case .. of .. + f x = n (x+1) + in case .. of T -> f 0 + F -> f 1 + + Here, the body of f calls n exactly once, but f itself is being called + multiple times, so eta-expansion is not allowed. + + +Note [Analysis type signature] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The work-hourse of the analysis is the function `callArityAnal`, with the +following type: + + type CallArityRes = (UnVarGraph, VarEnv Arity) + callArityAnal :: + Arity -> -- The arity this expression is called with + VarSet -> -- The set of interesting variables + CoreExpr -> -- The expression to analyse + (CallArityRes, CoreExpr) + +and the following specification: + + ((coCalls, callArityEnv), expr') = callArityEnv arity interestingIds expr + + <=> + + Assume the expression `expr` is being passed `arity` arguments. Then it holds that + * The domain of `callArityEnv` is a subset of `interestingIds`. + * Any variable from `interestingIds` that is not mentioned in the `callArityEnv` + is absent, i.e. not called at all. + * Every call from `expr` to a variable bound to n in `callArityEnv` has at + least n value arguments. + * For two interesting variables `v1` and `v2`, they are not adjacent in `coCalls`, + then in no execution of `expr` both are being called. + Furthermore, expr' is expr with the callArity field of the `IdInfo` updated. + + +Note [Which variables are interesting] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The analysis would quickly become prohibitive expensive if we would analyse all +variables; for most variables we simply do not care about how often they are +called, i.e. variables bound in a pattern match. So interesting are variables that are + * top-level or let bound + * and possibly functions (typeArity > 0) + +Note [Taking boring variables into account] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If we decide that the variable bound in `let x = e1 in e2` is not interesting, +the analysis of `e2` will not report anything about `x`. To ensure that +`callArityBind` does still do the right thing we have to take that into account +every time we would be lookup up `x` in the analysis result of `e2`. + * Instead of calling lookupCallArityRes, we return (0, True), indicating + that this variable might be called many times with no arguments. + * Instead of checking `calledWith x`, we assume that everything can be called + with it. + * In the recursive case, when calclulating the `cross_calls`, if there is + any boring variable in the recursive group, we ignore all co-call-results + and directly go to a very conservative assumption. + +The last point has the nice side effect that the relatively expensive +integration of co-call results in a recursive groups is often skipped. This +helped to avoid the compile time blowup in some real-world code with large +recursive groups (#10293). + +Note [Recursion and fixpointing] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For a mutually recursive let, we begin by + 1. analysing the body, using the same incoming arity as for the whole expression. + 2. Then we iterate, memoizing for each of the bound variables the last + analysis call, i.e. incoming arity, whether it is called once, and the CallArityRes. + 3. We combine the analysis result from the body and the memoized results for + the arguments (if already present). + 4. For each variable, we find out the incoming arity and whether it is called + once, based on the current analysis result. If this differs from the + memoized results, we re-analyse the rhs and update the memoized table. + 5. If nothing had to be reanalyzed, we are done. + Otherwise, repeat from step 3. + + +Note [Thunks in recursive groups] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We never eta-expand a thunk in a recursive group, on the grounds that if it is +part of a recursive group, then it will be called multiple times. + +This is not necessarily true, e.g. it would be safe to eta-expand t2 (but not +t1) in the following code: + + let go x = t1 + t1 = if ... then t2 else ... + t2 = if ... then go 1 else ... + in go 0 + +Detecting this would require finding out what variables are only ever called +from thunks. While this is certainly possible, we yet have to see this to be +relevant in the wild. + + +Note [Analysing top-level binds] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We can eta-expand top-level-binds if they are not exported, as we see all calls +to them. The plan is as follows: Treat the top-level binds as nested lets around +a body representing “all external calls”, which returns a pessimistic +CallArityRes (the co-call graph is the complete graph, all arityies 0). + +Note [Trimming arity] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the Call Arity papers, we are working on an untyped lambda calculus with no +other id annotations, where eta-expansion is always possible. But this is not +the case for Core! + 1. We need to ensure the invariant + callArity e <= typeArity (exprType e) + for the same reasons that exprArity needs this invariant (see Note + [exprArity invariant] in GHC.Core.Arity). + + If we are not doing that, a too-high arity annotation will be stored with + the id, confusing the simplifier later on. + + 2. Eta-expanding a right hand side might invalidate existing annotations. In + particular, if an id has a strictness annotation of <...><...>b, then + passing two arguments to it will definitely bottom out, so the simplifier + will throw away additional parameters. This conflicts with Call Arity! So + we ensure that we never eta-expand such a value beyond the number of + arguments mentioned in the strictness signature. + See #10176 for a real-world-example. + +Note [What is a thunk] +~~~~~~~~~~~~~~~~~~~~~~ + +Originally, everything that is not in WHNF (`exprIsWHNF`) is considered a +thunk, not eta-expanded, to avoid losing any sharing. This is also how the +published papers on Call Arity describe it. + +In practice, there are thunks that do a just little work, such as +pattern-matching on a variable, and the benefits of eta-expansion likely +outweigh the cost of doing that repeatedly. Therefore, this implementation of +Call Arity considers everything that is not cheap (`exprIsCheap`) as a thunk. + +Note [Call Arity and Join Points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Call Arity analysis does not care about join points, and treats them just +like normal functions. This is ok. + +The analysis *could* make use of the fact that join points are always evaluated +in the same context as the join-binding they are defined in and are always +one-shot, and handle join points separately, as suggested in +https://gitlab.haskell.org/ghc/ghc/issues/13479#note_134870. +This *might* be more efficient (for example, join points would not have to be +considered interesting variables), but it would also add redundant code. So for +now we do not do that. + +The simplifier never eta-expands join points (it instead pushes extra arguments from +an eta-expanded context into the join point’s RHS), so the call arity +annotation on join points is not actually used. As it would be equally valid +(though less efficient) to eta-expand join points, this is the simplifier's +choice, and hence Call Arity sets the call arity for join points as well. +-} + +-- Main entry point + +callArityAnalProgram :: DynFlags -> CoreProgram -> CoreProgram +callArityAnalProgram _dflags binds = binds' + where + (_, binds') = callArityTopLvl [] emptyVarSet binds + +-- See Note [Analysing top-level-binds] +callArityTopLvl :: [Var] -> VarSet -> [CoreBind] -> (CallArityRes, [CoreBind]) +callArityTopLvl exported _ [] + = ( calledMultipleTimes $ (emptyUnVarGraph, mkVarEnv $ [(v, 0) | v <- exported]) + , [] ) +callArityTopLvl exported int1 (b:bs) + = (ae2, b':bs') + where + int2 = bindersOf b + exported' = filter isExportedId int2 ++ exported + int' = int1 `addInterestingBinds` b + (ae1, bs') = callArityTopLvl exported' int' bs + (ae2, b') = callArityBind (boringBinds b) ae1 int1 b + + +callArityRHS :: CoreExpr -> CoreExpr +callArityRHS = snd . callArityAnal 0 emptyVarSet + +-- The main analysis function. See Note [Analysis type signature] +callArityAnal :: + Arity -> -- The arity this expression is called with + VarSet -> -- The set of interesting variables + CoreExpr -> -- The expression to analyse + (CallArityRes, CoreExpr) + -- How this expression uses its interesting variables + -- and the expression with IdInfo updated + +-- The trivial base cases +callArityAnal _ _ e@(Lit _) + = (emptyArityRes, e) +callArityAnal _ _ e@(Type _) + = (emptyArityRes, e) +callArityAnal _ _ e@(Coercion _) + = (emptyArityRes, e) +-- The transparent cases +callArityAnal arity int (Tick t e) + = second (Tick t) $ callArityAnal arity int e +callArityAnal arity int (Cast e co) + = second (\e -> Cast e co) $ callArityAnal arity int e + +-- The interesting case: Variables, Lambdas, Lets, Applications, Cases +callArityAnal arity int e@(Var v) + | v `elemVarSet` int + = (unitArityRes v arity, e) + | otherwise + = (emptyArityRes, e) + +-- Non-value lambdas are ignored +callArityAnal arity int (Lam v e) | not (isId v) + = second (Lam v) $ callArityAnal arity (int `delVarSet` v) e + +-- We have a lambda that may be called multiple times, so its free variables +-- can all be co-called. +callArityAnal 0 int (Lam v e) + = (ae', Lam v e') + where + (ae, e') = callArityAnal 0 (int `delVarSet` v) e + ae' = calledMultipleTimes ae +-- We have a lambda that we are calling. decrease arity. +callArityAnal arity int (Lam v e) + = (ae, Lam v e') + where + (ae, e') = callArityAnal (arity - 1) (int `delVarSet` v) e + +-- Application. Increase arity for the called expression, nothing to know about +-- the second +callArityAnal arity int (App e (Type t)) + = second (\e -> App e (Type t)) $ callArityAnal arity int e +callArityAnal arity int (App e1 e2) + = (final_ae, App e1' e2') + where + (ae1, e1') = callArityAnal (arity + 1) int e1 + (ae2, e2') = callArityAnal 0 int e2 + -- If the argument is trivial (e.g. a variable), then it will _not_ be + -- let-bound in the Core to STG transformation (CorePrep actually), + -- so no sharing will happen here, and we have to assume many calls. + ae2' | exprIsTrivial e2 = calledMultipleTimes ae2 + | otherwise = ae2 + final_ae = ae1 `both` ae2' + +-- Case expression. +callArityAnal arity int (Case scrut bndr ty alts) + = -- pprTrace "callArityAnal:Case" + -- (vcat [ppr scrut, ppr final_ae]) + (final_ae, Case scrut' bndr ty alts') + where + (alt_aes, alts') = unzip $ map go alts + go (dc, bndrs, e) = let (ae, e') = callArityAnal arity int e + in (ae, (dc, bndrs, e')) + alt_ae = lubRess alt_aes + (scrut_ae, scrut') = callArityAnal 0 int scrut + final_ae = scrut_ae `both` alt_ae + +-- For lets, use callArityBind +callArityAnal arity int (Let bind e) + = -- pprTrace "callArityAnal:Let" + -- (vcat [ppr v, ppr arity, ppr n, ppr final_ae ]) + (final_ae, Let bind' e') + where + int_body = int `addInterestingBinds` bind + (ae_body, e') = callArityAnal arity int_body e + (final_ae, bind') = callArityBind (boringBinds bind) ae_body int bind + +-- Which bindings should we look at? +-- See Note [Which variables are interesting] +isInteresting :: Var -> Bool +isInteresting v = not $ null (typeArity (idType v)) + +interestingBinds :: CoreBind -> [Var] +interestingBinds = filter isInteresting . bindersOf + +boringBinds :: CoreBind -> VarSet +boringBinds = mkVarSet . filter (not . isInteresting) . bindersOf + +addInterestingBinds :: VarSet -> CoreBind -> VarSet +addInterestingBinds int bind + = int `delVarSetList` bindersOf bind -- Possible shadowing + `extendVarSetList` interestingBinds bind + +-- Used for both local and top-level binds +-- Second argument is the demand from the body +callArityBind :: VarSet -> CallArityRes -> VarSet -> CoreBind -> (CallArityRes, CoreBind) +-- Non-recursive let +callArityBind boring_vars ae_body int (NonRec v rhs) + | otherwise + = -- pprTrace "callArityBind:NonRec" + -- (vcat [ppr v, ppr ae_body, ppr int, ppr ae_rhs, ppr safe_arity]) + (final_ae, NonRec v' rhs') + where + is_thunk = not (exprIsCheap rhs) -- see note [What is a thunk] + -- If v is boring, we will not find it in ae_body, but always assume (0, False) + boring = v `elemVarSet` boring_vars + + (arity, called_once) + | boring = (0, False) -- See Note [Taking boring variables into account] + | otherwise = lookupCallArityRes ae_body v + safe_arity | called_once = arity + | is_thunk = 0 -- A thunk! Do not eta-expand + | otherwise = arity + + -- See Note [Trimming arity] + trimmed_arity = trimArity v safe_arity + + (ae_rhs, rhs') = callArityAnal trimmed_arity int rhs + + + ae_rhs'| called_once = ae_rhs + | safe_arity == 0 = ae_rhs -- If it is not a function, its body is evaluated only once + | otherwise = calledMultipleTimes ae_rhs + + called_by_v = domRes ae_rhs' + called_with_v + | boring = domRes ae_body + | otherwise = calledWith ae_body v `delUnVarSet` v + final_ae = addCrossCoCalls called_by_v called_with_v $ ae_rhs' `lubRes` resDel v ae_body + + v' = v `setIdCallArity` trimmed_arity + + +-- Recursive let. See Note [Recursion and fixpointing] +callArityBind boring_vars ae_body int b@(Rec binds) + = -- (if length binds > 300 then + -- pprTrace "callArityBind:Rec" + -- (vcat [ppr (Rec binds'), ppr ae_body, ppr int, ppr ae_rhs]) else id) $ + (final_ae, Rec binds') + where + -- See Note [Taking boring variables into account] + any_boring = any (`elemVarSet` boring_vars) [ i | (i, _) <- binds] + + int_body = int `addInterestingBinds` b + (ae_rhs, binds') = fix initial_binds + final_ae = bindersOf b `resDelList` ae_rhs + + initial_binds = [(i,Nothing,e) | (i,e) <- binds] + + fix :: [(Id, Maybe (Bool, Arity, CallArityRes), CoreExpr)] -> (CallArityRes, [(Id, CoreExpr)]) + fix ann_binds + | -- pprTrace "callArityBind:fix" (vcat [ppr ann_binds, ppr any_change, ppr ae]) $ + any_change + = fix ann_binds' + | otherwise + = (ae, map (\(i, _, e) -> (i, e)) ann_binds') + where + aes_old = [ (i,ae) | (i, Just (_,_,ae), _) <- ann_binds ] + ae = callArityRecEnv any_boring aes_old ae_body + + rerun (i, mbLastRun, rhs) + | i `elemVarSet` int_body && not (i `elemUnVarSet` domRes ae) + -- No call to this yet, so do nothing + = (False, (i, Nothing, rhs)) + + | Just (old_called_once, old_arity, _) <- mbLastRun + , called_once == old_called_once + , new_arity == old_arity + -- No change, no need to re-analyze + = (False, (i, mbLastRun, rhs)) + + | otherwise + -- We previously analyzed this with a different arity (or not at all) + = let is_thunk = not (exprIsCheap rhs) -- see note [What is a thunk] + + safe_arity | is_thunk = 0 -- See Note [Thunks in recursive groups] + | otherwise = new_arity + + -- See Note [Trimming arity] + trimmed_arity = trimArity i safe_arity + + (ae_rhs, rhs') = callArityAnal trimmed_arity int_body rhs + + ae_rhs' | called_once = ae_rhs + | safe_arity == 0 = ae_rhs -- If it is not a function, its body is evaluated only once + | otherwise = calledMultipleTimes ae_rhs + + i' = i `setIdCallArity` trimmed_arity + + in (True, (i', Just (called_once, new_arity, ae_rhs'), rhs')) + where + -- See Note [Taking boring variables into account] + (new_arity, called_once) | i `elemVarSet` boring_vars = (0, False) + | otherwise = lookupCallArityRes ae i + + (changes, ann_binds') = unzip $ map rerun ann_binds + any_change = or changes + +-- Combining the results from body and rhs, (mutually) recursive case +-- See Note [Analysis II: The Co-Called analysis] +callArityRecEnv :: Bool -> [(Var, CallArityRes)] -> CallArityRes -> CallArityRes +callArityRecEnv any_boring ae_rhss ae_body + = -- (if length ae_rhss > 300 then pprTrace "callArityRecEnv" (vcat [ppr ae_rhss, ppr ae_body, ppr ae_new]) else id) $ + ae_new + where + vars = map fst ae_rhss + + ae_combined = lubRess (map snd ae_rhss) `lubRes` ae_body + + cross_calls + -- See Note [Taking boring variables into account] + | any_boring = completeGraph (domRes ae_combined) + -- Also, calculating cross_calls is expensive. Simply be conservative + -- if the mutually recursive group becomes too large. + | lengthExceeds ae_rhss 25 = completeGraph (domRes ae_combined) + | otherwise = unionUnVarGraphs $ map cross_call ae_rhss + cross_call (v, ae_rhs) = completeBipartiteGraph called_by_v called_with_v + where + is_thunk = idCallArity v == 0 + -- What rhs are relevant as happening before (or after) calling v? + -- If v is a thunk, everything from all the _other_ variables + -- If v is not a thunk, everything can happen. + ae_before_v | is_thunk = lubRess (map snd $ filter ((/= v) . fst) ae_rhss) `lubRes` ae_body + | otherwise = ae_combined + -- What do we want to know from these? + -- Which calls can happen next to any recursive call. + called_with_v + = unionUnVarSets $ map (calledWith ae_before_v) vars + called_by_v = domRes ae_rhs + + ae_new = first (cross_calls `unionUnVarGraph`) ae_combined + +-- See Note [Trimming arity] +trimArity :: Id -> Arity -> Arity +trimArity v a = minimum [a, max_arity_by_type, max_arity_by_strsig] + where + max_arity_by_type = length (typeArity (idType v)) + max_arity_by_strsig + | isBotDiv result_info = length demands + | otherwise = a + + (demands, result_info) = splitStrictSig (idStrictness v) + +--------------------------------------- +-- Functions related to CallArityRes -- +--------------------------------------- + +-- Result type for the two analyses. +-- See Note [Analysis I: The arity analysis] +-- and Note [Analysis II: The Co-Called analysis] +type CallArityRes = (UnVarGraph, VarEnv Arity) + +emptyArityRes :: CallArityRes +emptyArityRes = (emptyUnVarGraph, emptyVarEnv) + +unitArityRes :: Var -> Arity -> CallArityRes +unitArityRes v arity = (emptyUnVarGraph, unitVarEnv v arity) + +resDelList :: [Var] -> CallArityRes -> CallArityRes +resDelList vs ae = foldr resDel ae vs + +resDel :: Var -> CallArityRes -> CallArityRes +resDel v (g, ae) = (g `delNode` v, ae `delVarEnv` v) + +domRes :: CallArityRes -> UnVarSet +domRes (_, ae) = varEnvDom ae + +-- In the result, find out the minimum arity and whether the variable is called +-- at most once. +lookupCallArityRes :: CallArityRes -> Var -> (Arity, Bool) +lookupCallArityRes (g, ae) v + = case lookupVarEnv ae v of + Just a -> (a, not (g `hasLoopAt` v)) + Nothing -> (0, False) + +calledWith :: CallArityRes -> Var -> UnVarSet +calledWith (g, _) v = neighbors g v + +addCrossCoCalls :: UnVarSet -> UnVarSet -> CallArityRes -> CallArityRes +addCrossCoCalls set1 set2 = first (completeBipartiteGraph set1 set2 `unionUnVarGraph`) + +-- Replaces the co-call graph by a complete graph (i.e. no information) +calledMultipleTimes :: CallArityRes -> CallArityRes +calledMultipleTimes res = first (const (completeGraph (domRes res))) res + +-- Used for application and cases +both :: CallArityRes -> CallArityRes -> CallArityRes +both r1 r2 = addCrossCoCalls (domRes r1) (domRes r2) $ r1 `lubRes` r2 + +-- Used when combining results from alternative cases; take the minimum +lubRes :: CallArityRes -> CallArityRes -> CallArityRes +lubRes (g1, ae1) (g2, ae2) = (g1 `unionUnVarGraph` g2, ae1 `lubArityEnv` ae2) + +lubArityEnv :: VarEnv Arity -> VarEnv Arity -> VarEnv Arity +lubArityEnv = plusVarEnv_C min + +lubRess :: [CallArityRes] -> CallArityRes +lubRess = foldl' lubRes emptyArityRes diff --git a/compiler/GHC/Core/Op/ConstantFold.hs b/compiler/GHC/Core/Op/ConstantFold.hs new file mode 100644 index 0000000000..ae9ba8f262 --- /dev/null +++ b/compiler/GHC/Core/Op/ConstantFold.hs @@ -0,0 +1,2257 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +\section[ConFold]{Constant Folder} + +Conceptually, constant folding should be parameterized with the kind +of target machine to get identical behaviour during compilation time +and runtime. We cheat a little bit here... + +ToDo: + check boundaries before folding, e.g. we can fold the Float addition + (i1 + i2) only if it results in a valid Float. +-} + +{-# LANGUAGE CPP, RankNTypes, PatternSynonyms, ViewPatterns, RecordWildCards, + DeriveFunctor #-} +{-# OPTIONS_GHC -optc-DNON_POSIX_SOURCE -Wno-incomplete-uni-patterns #-} + +module GHC.Core.Op.ConstantFold + ( primOpRules + , builtinRules + , caseRules + ) +where + +#include "HsVersions.h" + +import GhcPrelude + +import {-# SOURCE #-} MkId ( mkPrimOpId, magicDictId ) + +import GHC.Core +import GHC.Core.Make +import Id +import Literal +import GHC.Core.SimpleOpt ( exprIsLiteral_maybe ) +import PrimOp ( PrimOp(..), tagToEnumKey ) +import TysWiredIn +import TysPrim +import GHC.Core.TyCon + ( tyConDataCons_maybe, isAlgTyCon, isEnumerationTyCon + , isNewTyCon, unwrapNewTyCon_maybe, tyConDataCons + , tyConFamilySize ) +import GHC.Core.DataCon ( dataConTagZ, dataConTyCon, dataConWrapId, dataConWorkId ) +import GHC.Core.Utils ( cheapEqExpr, cheapEqExpr', exprIsHNF, exprType + , stripTicksTop, stripTicksTopT, mkTicks ) +import GHC.Core.Unfold ( exprIsConApp_maybe ) +import GHC.Core.Type +import OccName ( occNameFS ) +import PrelNames +import Maybes ( orElse ) +import Name ( Name, nameOccName ) +import Outputable +import FastString +import BasicTypes +import GHC.Driver.Session +import GHC.Platform +import Util +import GHC.Core.Coercion (mkUnbranchedAxInstCo,mkSymCo,Role(..)) + +import Control.Applicative ( Alternative(..) ) + +import Control.Monad +import qualified Control.Monad.Fail as MonadFail +import Data.Bits as Bits +import qualified Data.ByteString as BS +import Data.Int +import Data.Ratio +import Data.Word + +{- +Note [Constant folding] +~~~~~~~~~~~~~~~~~~~~~~~ +primOpRules generates a rewrite rule for each primop +These rules do what is often called "constant folding" +E.g. the rules for +# might say + 4 +# 5 = 9 +Well, of course you'd need a lot of rules if you did it +like that, so we use a BuiltinRule instead, so that we +can match in any two literal values. So the rule is really +more like + (Lit x) +# (Lit y) = Lit (x+#y) +where the (+#) on the rhs is done at compile time + +That is why these rules are built in here. +-} + +primOpRules :: Name -> PrimOp -> Maybe CoreRule + -- ToDo: something for integer-shift ops? + -- NotOp +primOpRules nm TagToEnumOp = mkPrimOpRule nm 2 [ tagToEnumRule ] +primOpRules nm DataToTagOp = mkPrimOpRule nm 2 [ dataToTagRule ] + +-- Int operations +primOpRules nm IntAddOp = mkPrimOpRule nm 2 [ binaryLit (intOp2 (+)) + , identityDynFlags zeroi + , numFoldingRules IntAddOp intPrimOps + ] +primOpRules nm IntSubOp = mkPrimOpRule nm 2 [ binaryLit (intOp2 (-)) + , rightIdentityDynFlags zeroi + , equalArgs >> retLit zeroi + , numFoldingRules IntSubOp intPrimOps + ] +primOpRules nm IntAddCOp = mkPrimOpRule nm 2 [ binaryLit (intOpC2 (+)) + , identityCDynFlags zeroi ] +primOpRules nm IntSubCOp = mkPrimOpRule nm 2 [ binaryLit (intOpC2 (-)) + , rightIdentityCDynFlags zeroi + , equalArgs >> retLitNoC zeroi ] +primOpRules nm IntMulOp = mkPrimOpRule nm 2 [ binaryLit (intOp2 (*)) + , zeroElem zeroi + , identityDynFlags onei + , numFoldingRules IntMulOp intPrimOps + ] +primOpRules nm IntQuotOp = mkPrimOpRule nm 2 [ nonZeroLit 1 >> binaryLit (intOp2 quot) + , leftZero zeroi + , rightIdentityDynFlags onei + , equalArgs >> retLit onei ] +primOpRules nm IntRemOp = mkPrimOpRule nm 2 [ nonZeroLit 1 >> binaryLit (intOp2 rem) + , leftZero zeroi + , do l <- getLiteral 1 + dflags <- getDynFlags + guard (l == onei dflags) + retLit zeroi + , equalArgs >> retLit zeroi + , equalArgs >> retLit zeroi ] +primOpRules nm AndIOp = mkPrimOpRule nm 2 [ binaryLit (intOp2 (.&.)) + , idempotent + , zeroElem zeroi ] +primOpRules nm OrIOp = mkPrimOpRule nm 2 [ binaryLit (intOp2 (.|.)) + , idempotent + , identityDynFlags zeroi ] +primOpRules nm XorIOp = mkPrimOpRule nm 2 [ binaryLit (intOp2 xor) + , identityDynFlags zeroi + , equalArgs >> retLit zeroi ] +primOpRules nm NotIOp = mkPrimOpRule nm 1 [ unaryLit complementOp + , inversePrimOp NotIOp ] +primOpRules nm IntNegOp = mkPrimOpRule nm 1 [ unaryLit negOp + , inversePrimOp IntNegOp ] +primOpRules nm ISllOp = mkPrimOpRule nm 2 [ shiftRule (const Bits.shiftL) + , rightIdentityDynFlags zeroi ] +primOpRules nm ISraOp = mkPrimOpRule nm 2 [ shiftRule (const Bits.shiftR) + , rightIdentityDynFlags zeroi ] +primOpRules nm ISrlOp = mkPrimOpRule nm 2 [ shiftRule shiftRightLogical + , rightIdentityDynFlags zeroi ] + +-- Word operations +primOpRules nm WordAddOp = mkPrimOpRule nm 2 [ binaryLit (wordOp2 (+)) + , identityDynFlags zerow + , numFoldingRules WordAddOp wordPrimOps + ] +primOpRules nm WordSubOp = mkPrimOpRule nm 2 [ binaryLit (wordOp2 (-)) + , rightIdentityDynFlags zerow + , equalArgs >> retLit zerow + , numFoldingRules WordSubOp wordPrimOps + ] +primOpRules nm WordAddCOp = mkPrimOpRule nm 2 [ binaryLit (wordOpC2 (+)) + , identityCDynFlags zerow ] +primOpRules nm WordSubCOp = mkPrimOpRule nm 2 [ binaryLit (wordOpC2 (-)) + , rightIdentityCDynFlags zerow + , equalArgs >> retLitNoC zerow ] +primOpRules nm WordMulOp = mkPrimOpRule nm 2 [ binaryLit (wordOp2 (*)) + , identityDynFlags onew + , numFoldingRules WordMulOp wordPrimOps + ] +primOpRules nm WordQuotOp = mkPrimOpRule nm 2 [ nonZeroLit 1 >> binaryLit (wordOp2 quot) + , rightIdentityDynFlags onew ] +primOpRules nm WordRemOp = mkPrimOpRule nm 2 [ nonZeroLit 1 >> binaryLit (wordOp2 rem) + , leftZero zerow + , do l <- getLiteral 1 + dflags <- getDynFlags + guard (l == onew dflags) + retLit zerow + , equalArgs >> retLit zerow ] +primOpRules nm AndOp = mkPrimOpRule nm 2 [ binaryLit (wordOp2 (.&.)) + , idempotent + , zeroElem zerow ] +primOpRules nm OrOp = mkPrimOpRule nm 2 [ binaryLit (wordOp2 (.|.)) + , idempotent + , identityDynFlags zerow ] +primOpRules nm XorOp = mkPrimOpRule nm 2 [ binaryLit (wordOp2 xor) + , identityDynFlags zerow + , equalArgs >> retLit zerow ] +primOpRules nm NotOp = mkPrimOpRule nm 1 [ unaryLit complementOp + , inversePrimOp NotOp ] +primOpRules nm SllOp = mkPrimOpRule nm 2 [ shiftRule (const Bits.shiftL) ] +primOpRules nm SrlOp = mkPrimOpRule nm 2 [ shiftRule shiftRightLogical ] + +-- coercions +primOpRules nm Word2IntOp = mkPrimOpRule nm 1 [ liftLitDynFlags word2IntLit + , inversePrimOp Int2WordOp ] +primOpRules nm Int2WordOp = mkPrimOpRule nm 1 [ liftLitDynFlags int2WordLit + , inversePrimOp Word2IntOp ] +primOpRules nm Narrow8IntOp = mkPrimOpRule nm 1 [ liftLit narrow8IntLit + , subsumedByPrimOp Narrow8IntOp + , Narrow8IntOp `subsumesPrimOp` Narrow16IntOp + , Narrow8IntOp `subsumesPrimOp` Narrow32IntOp + , narrowSubsumesAnd AndIOp Narrow8IntOp 8 ] +primOpRules nm Narrow16IntOp = mkPrimOpRule nm 1 [ liftLit narrow16IntLit + , subsumedByPrimOp Narrow8IntOp + , subsumedByPrimOp Narrow16IntOp + , Narrow16IntOp `subsumesPrimOp` Narrow32IntOp + , narrowSubsumesAnd AndIOp Narrow16IntOp 16 ] +primOpRules nm Narrow32IntOp = mkPrimOpRule nm 1 [ liftLit narrow32IntLit + , subsumedByPrimOp Narrow8IntOp + , subsumedByPrimOp Narrow16IntOp + , subsumedByPrimOp Narrow32IntOp + , removeOp32 + , narrowSubsumesAnd AndIOp Narrow32IntOp 32 ] +primOpRules nm Narrow8WordOp = mkPrimOpRule nm 1 [ liftLit narrow8WordLit + , subsumedByPrimOp Narrow8WordOp + , Narrow8WordOp `subsumesPrimOp` Narrow16WordOp + , Narrow8WordOp `subsumesPrimOp` Narrow32WordOp + , narrowSubsumesAnd AndOp Narrow8WordOp 8 ] +primOpRules nm Narrow16WordOp = mkPrimOpRule nm 1 [ liftLit narrow16WordLit + , subsumedByPrimOp Narrow8WordOp + , subsumedByPrimOp Narrow16WordOp + , Narrow16WordOp `subsumesPrimOp` Narrow32WordOp + , narrowSubsumesAnd AndOp Narrow16WordOp 16 ] +primOpRules nm Narrow32WordOp = mkPrimOpRule nm 1 [ liftLit narrow32WordLit + , subsumedByPrimOp Narrow8WordOp + , subsumedByPrimOp Narrow16WordOp + , subsumedByPrimOp Narrow32WordOp + , removeOp32 + , narrowSubsumesAnd AndOp Narrow32WordOp 32 ] +primOpRules nm OrdOp = mkPrimOpRule nm 1 [ liftLit char2IntLit + , inversePrimOp ChrOp ] +primOpRules nm ChrOp = mkPrimOpRule nm 1 [ do [Lit lit] <- getArgs + guard (litFitsInChar lit) + liftLit int2CharLit + , inversePrimOp OrdOp ] +primOpRules nm Float2IntOp = mkPrimOpRule nm 1 [ liftLit float2IntLit ] +primOpRules nm Int2FloatOp = mkPrimOpRule nm 1 [ liftLit int2FloatLit ] +primOpRules nm Double2IntOp = mkPrimOpRule nm 1 [ liftLit double2IntLit ] +primOpRules nm Int2DoubleOp = mkPrimOpRule nm 1 [ liftLit int2DoubleLit ] +-- SUP: Not sure what the standard says about precision in the following 2 cases +primOpRules nm Float2DoubleOp = mkPrimOpRule nm 1 [ liftLit float2DoubleLit ] +primOpRules nm Double2FloatOp = mkPrimOpRule nm 1 [ liftLit double2FloatLit ] + +-- Float +primOpRules nm FloatAddOp = mkPrimOpRule nm 2 [ binaryLit (floatOp2 (+)) + , identity zerof ] +primOpRules nm FloatSubOp = mkPrimOpRule nm 2 [ binaryLit (floatOp2 (-)) + , rightIdentity zerof ] +primOpRules nm FloatMulOp = mkPrimOpRule nm 2 [ binaryLit (floatOp2 (*)) + , identity onef + , strengthReduction twof FloatAddOp ] + -- zeroElem zerof doesn't hold because of NaN +primOpRules nm FloatDivOp = mkPrimOpRule nm 2 [ guardFloatDiv >> binaryLit (floatOp2 (/)) + , rightIdentity onef ] +primOpRules nm FloatNegOp = mkPrimOpRule nm 1 [ unaryLit negOp + , inversePrimOp FloatNegOp ] + +-- Double +primOpRules nm DoubleAddOp = mkPrimOpRule nm 2 [ binaryLit (doubleOp2 (+)) + , identity zerod ] +primOpRules nm DoubleSubOp = mkPrimOpRule nm 2 [ binaryLit (doubleOp2 (-)) + , rightIdentity zerod ] +primOpRules nm DoubleMulOp = mkPrimOpRule nm 2 [ binaryLit (doubleOp2 (*)) + , identity oned + , strengthReduction twod DoubleAddOp ] + -- zeroElem zerod doesn't hold because of NaN +primOpRules nm DoubleDivOp = mkPrimOpRule nm 2 [ guardDoubleDiv >> binaryLit (doubleOp2 (/)) + , rightIdentity oned ] +primOpRules nm DoubleNegOp = mkPrimOpRule nm 1 [ unaryLit negOp + , inversePrimOp DoubleNegOp ] + +-- Relational operators + +primOpRules nm IntEqOp = mkRelOpRule nm (==) [ litEq True ] +primOpRules nm IntNeOp = mkRelOpRule nm (/=) [ litEq False ] +primOpRules nm CharEqOp = mkRelOpRule nm (==) [ litEq True ] +primOpRules nm CharNeOp = mkRelOpRule nm (/=) [ litEq False ] + +primOpRules nm IntGtOp = mkRelOpRule nm (>) [ boundsCmp Gt ] +primOpRules nm IntGeOp = mkRelOpRule nm (>=) [ boundsCmp Ge ] +primOpRules nm IntLeOp = mkRelOpRule nm (<=) [ boundsCmp Le ] +primOpRules nm IntLtOp = mkRelOpRule nm (<) [ boundsCmp Lt ] + +primOpRules nm CharGtOp = mkRelOpRule nm (>) [ boundsCmp Gt ] +primOpRules nm CharGeOp = mkRelOpRule nm (>=) [ boundsCmp Ge ] +primOpRules nm CharLeOp = mkRelOpRule nm (<=) [ boundsCmp Le ] +primOpRules nm CharLtOp = mkRelOpRule nm (<) [ boundsCmp Lt ] + +primOpRules nm FloatGtOp = mkFloatingRelOpRule nm (>) +primOpRules nm FloatGeOp = mkFloatingRelOpRule nm (>=) +primOpRules nm FloatLeOp = mkFloatingRelOpRule nm (<=) +primOpRules nm FloatLtOp = mkFloatingRelOpRule nm (<) +primOpRules nm FloatEqOp = mkFloatingRelOpRule nm (==) +primOpRules nm FloatNeOp = mkFloatingRelOpRule nm (/=) + +primOpRules nm DoubleGtOp = mkFloatingRelOpRule nm (>) +primOpRules nm DoubleGeOp = mkFloatingRelOpRule nm (>=) +primOpRules nm DoubleLeOp = mkFloatingRelOpRule nm (<=) +primOpRules nm DoubleLtOp = mkFloatingRelOpRule nm (<) +primOpRules nm DoubleEqOp = mkFloatingRelOpRule nm (==) +primOpRules nm DoubleNeOp = mkFloatingRelOpRule nm (/=) + +primOpRules nm WordGtOp = mkRelOpRule nm (>) [ boundsCmp Gt ] +primOpRules nm WordGeOp = mkRelOpRule nm (>=) [ boundsCmp Ge ] +primOpRules nm WordLeOp = mkRelOpRule nm (<=) [ boundsCmp Le ] +primOpRules nm WordLtOp = mkRelOpRule nm (<) [ boundsCmp Lt ] +primOpRules nm WordEqOp = mkRelOpRule nm (==) [ litEq True ] +primOpRules nm WordNeOp = mkRelOpRule nm (/=) [ litEq False ] + +primOpRules nm AddrAddOp = mkPrimOpRule nm 2 [ rightIdentityDynFlags zeroi ] + +primOpRules nm SeqOp = mkPrimOpRule nm 4 [ seqRule ] +primOpRules nm SparkOp = mkPrimOpRule nm 4 [ sparkRule ] + +primOpRules _ _ = Nothing + +{- +************************************************************************ +* * +\subsection{Doing the business} +* * +************************************************************************ +-} + +-- useful shorthands +mkPrimOpRule :: Name -> Int -> [RuleM CoreExpr] -> Maybe CoreRule +mkPrimOpRule nm arity rules = Just $ mkBasicRule nm arity (msum rules) + +mkRelOpRule :: Name -> (forall a . Ord a => a -> a -> Bool) + -> [RuleM CoreExpr] -> Maybe CoreRule +mkRelOpRule nm cmp extra + = mkPrimOpRule nm 2 $ + binaryCmpLit cmp : equal_rule : extra + where + -- x `cmp` x does not depend on x, so + -- compute it for the arbitrary value 'True' + -- and use that result + equal_rule = do { equalArgs + ; dflags <- getDynFlags + ; return (if cmp True True + then trueValInt dflags + else falseValInt dflags) } + +{- Note [Rules for floating-point comparisons] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We need different rules for floating-point values because for floats +it is not true that x = x (for NaNs); so we do not want the equal_rule +rule that mkRelOpRule uses. + +Note also that, in the case of equality/inequality, we do /not/ +want to switch to a case-expression. For example, we do not want +to convert + case (eqFloat# x 3.8#) of + True -> this + False -> that +to + case x of + 3.8#::Float# -> this + _ -> that +See #9238. Reason: comparing floating-point values for equality +delicate, and we don't want to implement that delicacy in the code for +case expressions. So we make it an invariant of Core that a case +expression never scrutinises a Float# or Double#. + +This transformation is what the litEq rule does; +see Note [The litEq rule: converting equality to case]. +So we /refrain/ from using litEq for mkFloatingRelOpRule. +-} + +mkFloatingRelOpRule :: Name -> (forall a . Ord a => a -> a -> Bool) + -> Maybe CoreRule +-- See Note [Rules for floating-point comparisons] +mkFloatingRelOpRule nm cmp + = mkPrimOpRule nm 2 [binaryCmpLit cmp] + +-- common constants +zeroi, onei, zerow, onew :: DynFlags -> Literal +zeroi dflags = mkLitInt dflags 0 +onei dflags = mkLitInt dflags 1 +zerow dflags = mkLitWord dflags 0 +onew dflags = mkLitWord dflags 1 + +zerof, onef, twof, zerod, oned, twod :: Literal +zerof = mkLitFloat 0.0 +onef = mkLitFloat 1.0 +twof = mkLitFloat 2.0 +zerod = mkLitDouble 0.0 +oned = mkLitDouble 1.0 +twod = mkLitDouble 2.0 + +cmpOp :: DynFlags -> (forall a . Ord a => a -> a -> Bool) + -> Literal -> Literal -> Maybe CoreExpr +cmpOp dflags cmp = go + where + done True = Just $ trueValInt dflags + done False = Just $ falseValInt dflags + + -- These compares are at different types + go (LitChar i1) (LitChar i2) = done (i1 `cmp` i2) + go (LitFloat i1) (LitFloat i2) = done (i1 `cmp` i2) + go (LitDouble i1) (LitDouble i2) = done (i1 `cmp` i2) + go (LitNumber nt1 i1 _) (LitNumber nt2 i2 _) + | nt1 /= nt2 = Nothing + | otherwise = done (i1 `cmp` i2) + go _ _ = Nothing + +-------------------------- + +negOp :: DynFlags -> Literal -> Maybe CoreExpr -- Negate +negOp _ (LitFloat 0.0) = Nothing -- can't represent -0.0 as a Rational +negOp dflags (LitFloat f) = Just (mkFloatVal dflags (-f)) +negOp _ (LitDouble 0.0) = Nothing +negOp dflags (LitDouble d) = Just (mkDoubleVal dflags (-d)) +negOp dflags (LitNumber nt i t) + | litNumIsSigned nt = Just (Lit (mkLitNumberWrap dflags nt (-i) t)) +negOp _ _ = Nothing + +complementOp :: DynFlags -> Literal -> Maybe CoreExpr -- Binary complement +complementOp dflags (LitNumber nt i t) = + Just (Lit (mkLitNumberWrap dflags nt (complement i) t)) +complementOp _ _ = Nothing + +-------------------------- +intOp2 :: (Integral a, Integral b) + => (a -> b -> Integer) + -> DynFlags -> Literal -> Literal -> Maybe CoreExpr +intOp2 = intOp2' . const + +intOp2' :: (Integral a, Integral b) + => (DynFlags -> a -> b -> Integer) + -> DynFlags -> Literal -> Literal -> Maybe CoreExpr +intOp2' op dflags (LitNumber LitNumInt i1 _) (LitNumber LitNumInt i2 _) = + let o = op dflags + in intResult dflags (fromInteger i1 `o` fromInteger i2) +intOp2' _ _ _ _ = Nothing -- Could find LitLit + +intOpC2 :: (Integral a, Integral b) + => (a -> b -> Integer) + -> DynFlags -> Literal -> Literal -> Maybe CoreExpr +intOpC2 op dflags (LitNumber LitNumInt i1 _) (LitNumber LitNumInt i2 _) = do + intCResult dflags (fromInteger i1 `op` fromInteger i2) +intOpC2 _ _ _ _ = Nothing -- Could find LitLit + +shiftRightLogical :: DynFlags -> Integer -> Int -> Integer +-- Shift right, putting zeros in rather than sign-propagating as Bits.shiftR would do +-- Do this by converting to Word and back. Obviously this won't work for big +-- values, but its ok as we use it here +shiftRightLogical dflags x n = + case platformWordSize (targetPlatform dflags) of + PW4 -> fromIntegral (fromInteger x `shiftR` n :: Word32) + PW8 -> fromIntegral (fromInteger x `shiftR` n :: Word64) + +-------------------------- +retLit :: (DynFlags -> Literal) -> RuleM CoreExpr +retLit l = do dflags <- getDynFlags + return $ Lit $ l dflags + +retLitNoC :: (DynFlags -> Literal) -> RuleM CoreExpr +retLitNoC l = do dflags <- getDynFlags + let lit = l dflags + let ty = literalType lit + return $ mkCoreUbxTup [ty, ty] [Lit lit, Lit (zeroi dflags)] + +wordOp2 :: (Integral a, Integral b) + => (a -> b -> Integer) + -> DynFlags -> Literal -> Literal -> Maybe CoreExpr +wordOp2 op dflags (LitNumber LitNumWord w1 _) (LitNumber LitNumWord w2 _) + = wordResult dflags (fromInteger w1 `op` fromInteger w2) +wordOp2 _ _ _ _ = Nothing -- Could find LitLit + +wordOpC2 :: (Integral a, Integral b) + => (a -> b -> Integer) + -> DynFlags -> Literal -> Literal -> Maybe CoreExpr +wordOpC2 op dflags (LitNumber LitNumWord w1 _) (LitNumber LitNumWord w2 _) = + wordCResult dflags (fromInteger w1 `op` fromInteger w2) +wordOpC2 _ _ _ _ = Nothing -- Could find LitLit + +shiftRule :: (DynFlags -> Integer -> Int -> Integer) -> RuleM CoreExpr +-- Shifts take an Int; hence third arg of op is Int +-- Used for shift primops +-- ISllOp, ISraOp, ISrlOp :: Word# -> Int# -> Word# +-- SllOp, SrlOp :: Word# -> Int# -> Word# +shiftRule shift_op + = do { dflags <- getDynFlags + ; [e1, Lit (LitNumber LitNumInt shift_len _)] <- getArgs + ; case e1 of + _ | shift_len == 0 + -> return e1 + -- See Note [Guarding against silly shifts] + | shift_len < 0 || shift_len > wordSizeInBits dflags + -> return $ Lit $ mkLitNumberWrap dflags LitNumInt 0 (exprType e1) + + -- Do the shift at type Integer, but shift length is Int + Lit (LitNumber nt x t) + | 0 < shift_len + , shift_len <= wordSizeInBits dflags + -> let op = shift_op dflags + y = x `op` fromInteger shift_len + in liftMaybe $ Just (Lit (mkLitNumberWrap dflags nt y t)) + + _ -> mzero } + +wordSizeInBits :: DynFlags -> Integer +wordSizeInBits dflags = toInteger (platformWordSizeInBits (targetPlatform dflags)) + +-------------------------- +floatOp2 :: (Rational -> Rational -> Rational) + -> DynFlags -> Literal -> Literal + -> Maybe (Expr CoreBndr) +floatOp2 op dflags (LitFloat f1) (LitFloat f2) + = Just (mkFloatVal dflags (f1 `op` f2)) +floatOp2 _ _ _ _ = Nothing + +-------------------------- +doubleOp2 :: (Rational -> Rational -> Rational) + -> DynFlags -> Literal -> Literal + -> Maybe (Expr CoreBndr) +doubleOp2 op dflags (LitDouble f1) (LitDouble f2) + = Just (mkDoubleVal dflags (f1 `op` f2)) +doubleOp2 _ _ _ _ = Nothing + +-------------------------- +{- Note [The litEq rule: converting equality to case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This stuff turns + n ==# 3# +into + case n of + 3# -> True + m -> False + +This is a Good Thing, because it allows case-of case things +to happen, and case-default absorption to happen. For +example: + + if (n ==# 3#) || (n ==# 4#) then e1 else e2 +will transform to + case n of + 3# -> e1 + 4# -> e1 + m -> e2 +(modulo the usual precautions to avoid duplicating e1) +-} + +litEq :: Bool -- True <=> equality, False <=> inequality + -> RuleM CoreExpr +litEq is_eq = msum + [ do [Lit lit, expr] <- getArgs + dflags <- getDynFlags + do_lit_eq dflags lit expr + , do [expr, Lit lit] <- getArgs + dflags <- getDynFlags + do_lit_eq dflags lit expr ] + where + do_lit_eq dflags lit expr = do + guard (not (litIsLifted lit)) + return (mkWildCase expr (literalType lit) intPrimTy + [(DEFAULT, [], val_if_neq), + (LitAlt lit, [], val_if_eq)]) + where + val_if_eq | is_eq = trueValInt dflags + | otherwise = falseValInt dflags + val_if_neq | is_eq = falseValInt dflags + | otherwise = trueValInt dflags + + +-- | Check if there is comparison with minBound or maxBound, that is +-- always true or false. For instance, an Int cannot be smaller than its +-- minBound, so we can replace such comparison with False. +boundsCmp :: Comparison -> RuleM CoreExpr +boundsCmp op = do + dflags <- getDynFlags + [a, b] <- getArgs + liftMaybe $ mkRuleFn dflags op a b + +data Comparison = Gt | Ge | Lt | Le + +mkRuleFn :: DynFlags -> Comparison -> CoreExpr -> CoreExpr -> Maybe CoreExpr +mkRuleFn dflags Gt (Lit lit) _ | isMinBound dflags lit = Just $ falseValInt dflags +mkRuleFn dflags Le (Lit lit) _ | isMinBound dflags lit = Just $ trueValInt dflags +mkRuleFn dflags Ge _ (Lit lit) | isMinBound dflags lit = Just $ trueValInt dflags +mkRuleFn dflags Lt _ (Lit lit) | isMinBound dflags lit = Just $ falseValInt dflags +mkRuleFn dflags Ge (Lit lit) _ | isMaxBound dflags lit = Just $ trueValInt dflags +mkRuleFn dflags Lt (Lit lit) _ | isMaxBound dflags lit = Just $ falseValInt dflags +mkRuleFn dflags Gt _ (Lit lit) | isMaxBound dflags lit = Just $ falseValInt dflags +mkRuleFn dflags Le _ (Lit lit) | isMaxBound dflags lit = Just $ trueValInt dflags +mkRuleFn _ _ _ _ = Nothing + +isMinBound :: DynFlags -> Literal -> Bool +isMinBound _ (LitChar c) = c == minBound +isMinBound dflags (LitNumber nt i _) = case nt of + LitNumInt -> i == tARGET_MIN_INT dflags + LitNumInt64 -> i == toInteger (minBound :: Int64) + LitNumWord -> i == 0 + LitNumWord64 -> i == 0 + LitNumNatural -> i == 0 + LitNumInteger -> False +isMinBound _ _ = False + +isMaxBound :: DynFlags -> Literal -> Bool +isMaxBound _ (LitChar c) = c == maxBound +isMaxBound dflags (LitNumber nt i _) = case nt of + LitNumInt -> i == tARGET_MAX_INT dflags + LitNumInt64 -> i == toInteger (maxBound :: Int64) + LitNumWord -> i == tARGET_MAX_WORD dflags + LitNumWord64 -> i == toInteger (maxBound :: Word64) + LitNumNatural -> False + LitNumInteger -> False +isMaxBound _ _ = False + +-- | Create an Int literal expression while ensuring the given Integer is in the +-- target Int range +intResult :: DynFlags -> Integer -> Maybe CoreExpr +intResult dflags result = Just (intResult' dflags result) + +intResult' :: DynFlags -> Integer -> CoreExpr +intResult' dflags result = Lit (mkLitIntWrap dflags result) + +-- | Create an unboxed pair of an Int literal expression, ensuring the given +-- Integer is in the target Int range and the corresponding overflow flag +-- (@0#@/@1#@) if it wasn't. +intCResult :: DynFlags -> Integer -> Maybe CoreExpr +intCResult dflags result = Just (mkPair [Lit lit, Lit c]) + where + mkPair = mkCoreUbxTup [intPrimTy, intPrimTy] + (lit, b) = mkLitIntWrapC dflags result + c = if b then onei dflags else zeroi dflags + +-- | Create a Word literal expression while ensuring the given Integer is in the +-- target Word range +wordResult :: DynFlags -> Integer -> Maybe CoreExpr +wordResult dflags result = Just (wordResult' dflags result) + +wordResult' :: DynFlags -> Integer -> CoreExpr +wordResult' dflags result = Lit (mkLitWordWrap dflags result) + +-- | Create an unboxed pair of a Word literal expression, ensuring the given +-- Integer is in the target Word range and the corresponding carry flag +-- (@0#@/@1#@) if it wasn't. +wordCResult :: DynFlags -> Integer -> Maybe CoreExpr +wordCResult dflags result = Just (mkPair [Lit lit, Lit c]) + where + mkPair = mkCoreUbxTup [wordPrimTy, intPrimTy] + (lit, b) = mkLitWordWrapC dflags result + c = if b then onei dflags else zeroi dflags + +inversePrimOp :: PrimOp -> RuleM CoreExpr +inversePrimOp primop = do + [Var primop_id `App` e] <- getArgs + matchPrimOpId primop primop_id + return e + +subsumesPrimOp :: PrimOp -> PrimOp -> RuleM CoreExpr +this `subsumesPrimOp` that = do + [Var primop_id `App` e] <- getArgs + matchPrimOpId that primop_id + return (Var (mkPrimOpId this) `App` e) + +subsumedByPrimOp :: PrimOp -> RuleM CoreExpr +subsumedByPrimOp primop = do + [e@(Var primop_id `App` _)] <- getArgs + matchPrimOpId primop primop_id + return e + +-- | narrow subsumes bitwise `and` with full mask (cf #16402): +-- +-- narrowN (x .&. m) +-- m .&. (2^N-1) = 2^N-1 +-- ==> narrowN x +-- +-- e.g. narrow16 (x .&. 0xFFFF) +-- ==> narrow16 x +-- +narrowSubsumesAnd :: PrimOp -> PrimOp -> Int -> RuleM CoreExpr +narrowSubsumesAnd and_primop narrw n = do + [Var primop_id `App` x `App` y] <- getArgs + matchPrimOpId and_primop primop_id + let mask = bit n -1 + g v (Lit (LitNumber _ m _)) = do + guard (m .&. mask == mask) + return (Var (mkPrimOpId narrw) `App` v) + g _ _ = mzero + g x y <|> g y x + +idempotent :: RuleM CoreExpr +idempotent = do [e1, e2] <- getArgs + guard $ cheapEqExpr e1 e2 + return e1 + +{- +Note [Guarding against silly shifts] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this code: + + import Data.Bits( (.|.), shiftL ) + chunkToBitmap :: [Bool] -> Word32 + chunkToBitmap chunk = foldr (.|.) 0 [ 1 `shiftL` n | (True,n) <- zip chunk [0..] ] + +This optimises to: +Shift.$wgo = \ (w_sCS :: GHC.Prim.Int#) (w1_sCT :: [GHC.Types.Bool]) -> + case w1_sCT of _ { + [] -> 0##; + : x_aAW xs_aAX -> + case x_aAW of _ { + GHC.Types.False -> + case w_sCS of wild2_Xh { + __DEFAULT -> Shift.$wgo (GHC.Prim.+# wild2_Xh 1) xs_aAX; + 9223372036854775807 -> 0## }; + GHC.Types.True -> + case GHC.Prim.>=# w_sCS 64 of _ { + GHC.Types.False -> + case w_sCS of wild3_Xh { + __DEFAULT -> + case Shift.$wgo (GHC.Prim.+# wild3_Xh 1) xs_aAX of ww_sCW { __DEFAULT -> + GHC.Prim.or# (GHC.Prim.narrow32Word# + (GHC.Prim.uncheckedShiftL# 1## wild3_Xh)) + ww_sCW + }; + 9223372036854775807 -> + GHC.Prim.narrow32Word# +!!!!--> (GHC.Prim.uncheckedShiftL# 1## 9223372036854775807) + }; + GHC.Types.True -> + case w_sCS of wild3_Xh { + __DEFAULT -> Shift.$wgo (GHC.Prim.+# wild3_Xh 1) xs_aAX; + 9223372036854775807 -> 0## + } } } } + +Note the massive shift on line "!!!!". It can't happen, because we've checked +that w < 64, but the optimiser didn't spot that. We DO NOT want to constant-fold this! +Moreover, if the programmer writes (n `uncheckedShiftL` 9223372036854775807), we +can't constant fold it, but if it gets to the assembler we get + Error: operand type mismatch for `shl' + +So the best thing to do is to rewrite the shift with a call to error, +when the second arg is large. However, in general we cannot do this; consider +this case + + let x = I# (uncheckedIShiftL# n 80) + in ... + +Here x contains an invalid shift and consequently we would like to rewrite it +as follows: + + let x = I# (error "invalid shift) + in ... + +This was originally done in the fix to #16449 but this breaks the let/app +invariant (see Note [Core let/app invariant] in GHC.Core) as noted in #16742. +For the reasons discussed in Note [Checking versus non-checking primops] (in +the PrimOp module) there is no safe way rewrite the argument of I# such that +it bottoms. + +Consequently we instead take advantage of the fact that large shifts are +undefined behavior (see associated documentation in primops.txt.pp) and +transform the invalid shift into an "obviously incorrect" value. + +There are two cases: + +- Shifting fixed-width things: the primops ISll, Sll, etc + These are handled by shiftRule. + + We are happy to shift by any amount up to wordSize but no more. + +- Shifting Integers: the function shiftLInteger, shiftRInteger + from the 'integer' library. These are handled by rule_shift_op, + and match_Integer_shift_op. + + Here we could in principle shift by any amount, but we arbitrary + limit the shift to 4 bits; in particular we do not want shift by a + huge amount, which can happen in code like that above. + +The two cases are more different in their code paths that is comfortable, +but that is only a historical accident. + + +************************************************************************ +* * +\subsection{Vaguely generic functions} +* * +************************************************************************ +-} + +mkBasicRule :: Name -> Int -> RuleM CoreExpr -> CoreRule +-- Gives the Rule the same name as the primop itself +mkBasicRule op_name n_args rm + = BuiltinRule { ru_name = occNameFS (nameOccName op_name), + ru_fn = op_name, + ru_nargs = n_args, + ru_try = runRuleM rm } + +newtype RuleM r = RuleM + { runRuleM :: DynFlags -> InScopeEnv -> Id -> [CoreExpr] -> Maybe r } + deriving (Functor) + +instance Applicative RuleM where + pure x = RuleM $ \_ _ _ _ -> Just x + (<*>) = ap + +instance Monad RuleM where + RuleM f >>= g + = RuleM $ \dflags iu fn args -> + case f dflags iu fn args of + Nothing -> Nothing + Just r -> runRuleM (g r) dflags iu fn args + +#if !MIN_VERSION_base(4,13,0) + fail = MonadFail.fail +#endif + +instance MonadFail.MonadFail RuleM where + fail _ = mzero + +instance Alternative RuleM where + empty = RuleM $ \_ _ _ _ -> Nothing + RuleM f1 <|> RuleM f2 = RuleM $ \dflags iu fn args -> + f1 dflags iu fn args <|> f2 dflags iu fn args + +instance MonadPlus RuleM + +instance HasDynFlags RuleM where + getDynFlags = RuleM $ \dflags _ _ _ -> Just dflags + +liftMaybe :: Maybe a -> RuleM a +liftMaybe Nothing = mzero +liftMaybe (Just x) = return x + +liftLit :: (Literal -> Literal) -> RuleM CoreExpr +liftLit f = liftLitDynFlags (const f) + +liftLitDynFlags :: (DynFlags -> Literal -> Literal) -> RuleM CoreExpr +liftLitDynFlags f = do + dflags <- getDynFlags + [Lit lit] <- getArgs + return $ Lit (f dflags lit) + +removeOp32 :: RuleM CoreExpr +removeOp32 = do + dflags <- getDynFlags + case platformWordSize (targetPlatform dflags) of + PW4 -> do + [e] <- getArgs + return e + PW8 -> + mzero + +getArgs :: RuleM [CoreExpr] +getArgs = RuleM $ \_ _ _ args -> Just args + +getInScopeEnv :: RuleM InScopeEnv +getInScopeEnv = RuleM $ \_ iu _ _ -> Just iu + +getFunction :: RuleM Id +getFunction = RuleM $ \_ _ fn _ -> Just fn + +-- return the n-th argument of this rule, if it is a literal +-- argument indices start from 0 +getLiteral :: Int -> RuleM Literal +getLiteral n = RuleM $ \_ _ _ exprs -> case drop n exprs of + (Lit l:_) -> Just l + _ -> Nothing + +unaryLit :: (DynFlags -> Literal -> Maybe CoreExpr) -> RuleM CoreExpr +unaryLit op = do + dflags <- getDynFlags + [Lit l] <- getArgs + liftMaybe $ op dflags (convFloating dflags l) + +binaryLit :: (DynFlags -> Literal -> Literal -> Maybe CoreExpr) -> RuleM CoreExpr +binaryLit op = do + dflags <- getDynFlags + [Lit l1, Lit l2] <- getArgs + liftMaybe $ op dflags (convFloating dflags l1) (convFloating dflags l2) + +binaryCmpLit :: (forall a . Ord a => a -> a -> Bool) -> RuleM CoreExpr +binaryCmpLit op = do + dflags <- getDynFlags + binaryLit (\_ -> cmpOp dflags op) + +leftIdentity :: Literal -> RuleM CoreExpr +leftIdentity id_lit = leftIdentityDynFlags (const id_lit) + +rightIdentity :: Literal -> RuleM CoreExpr +rightIdentity id_lit = rightIdentityDynFlags (const id_lit) + +identity :: Literal -> RuleM CoreExpr +identity lit = leftIdentity lit `mplus` rightIdentity lit + +leftIdentityDynFlags :: (DynFlags -> Literal) -> RuleM CoreExpr +leftIdentityDynFlags id_lit = do + dflags <- getDynFlags + [Lit l1, e2] <- getArgs + guard $ l1 == id_lit dflags + return e2 + +-- | Left identity rule for PrimOps like 'IntAddC' and 'WordAddC', where, in +-- addition to the result, we have to indicate that no carry/overflow occurred. +leftIdentityCDynFlags :: (DynFlags -> Literal) -> RuleM CoreExpr +leftIdentityCDynFlags id_lit = do + dflags <- getDynFlags + [Lit l1, e2] <- getArgs + guard $ l1 == id_lit dflags + let no_c = Lit (zeroi dflags) + return (mkCoreUbxTup [exprType e2, intPrimTy] [e2, no_c]) + +rightIdentityDynFlags :: (DynFlags -> Literal) -> RuleM CoreExpr +rightIdentityDynFlags id_lit = do + dflags <- getDynFlags + [e1, Lit l2] <- getArgs + guard $ l2 == id_lit dflags + return e1 + +-- | Right identity rule for PrimOps like 'IntSubC' and 'WordSubC', where, in +-- addition to the result, we have to indicate that no carry/overflow occurred. +rightIdentityCDynFlags :: (DynFlags -> Literal) -> RuleM CoreExpr +rightIdentityCDynFlags id_lit = do + dflags <- getDynFlags + [e1, Lit l2] <- getArgs + guard $ l2 == id_lit dflags + let no_c = Lit (zeroi dflags) + return (mkCoreUbxTup [exprType e1, intPrimTy] [e1, no_c]) + +identityDynFlags :: (DynFlags -> Literal) -> RuleM CoreExpr +identityDynFlags lit = + leftIdentityDynFlags lit `mplus` rightIdentityDynFlags lit + +-- | Identity rule for PrimOps like 'IntAddC' and 'WordAddC', where, in addition +-- to the result, we have to indicate that no carry/overflow occurred. +identityCDynFlags :: (DynFlags -> Literal) -> RuleM CoreExpr +identityCDynFlags lit = + leftIdentityCDynFlags lit `mplus` rightIdentityCDynFlags lit + +leftZero :: (DynFlags -> Literal) -> RuleM CoreExpr +leftZero zero = do + dflags <- getDynFlags + [Lit l1, _] <- getArgs + guard $ l1 == zero dflags + return $ Lit l1 + +rightZero :: (DynFlags -> Literal) -> RuleM CoreExpr +rightZero zero = do + dflags <- getDynFlags + [_, Lit l2] <- getArgs + guard $ l2 == zero dflags + return $ Lit l2 + +zeroElem :: (DynFlags -> Literal) -> RuleM CoreExpr +zeroElem lit = leftZero lit `mplus` rightZero lit + +equalArgs :: RuleM () +equalArgs = do + [e1, e2] <- getArgs + guard $ e1 `cheapEqExpr` e2 + +nonZeroLit :: Int -> RuleM () +nonZeroLit n = getLiteral n >>= guard . not . isZeroLit + +-- When excess precision is not requested, cut down the precision of the +-- Rational value to that of Float/Double. We confuse host architecture +-- and target architecture here, but it's convenient (and wrong :-). +convFloating :: DynFlags -> Literal -> Literal +convFloating dflags (LitFloat f) | not (gopt Opt_ExcessPrecision dflags) = + LitFloat (toRational (fromRational f :: Float )) +convFloating dflags (LitDouble d) | not (gopt Opt_ExcessPrecision dflags) = + LitDouble (toRational (fromRational d :: Double)) +convFloating _ l = l + +guardFloatDiv :: RuleM () +guardFloatDiv = do + [Lit (LitFloat f1), Lit (LitFloat f2)] <- getArgs + guard $ (f1 /=0 || f2 > 0) -- see Note [negative zero] + && f2 /= 0 -- avoid NaN and Infinity/-Infinity + +guardDoubleDiv :: RuleM () +guardDoubleDiv = do + [Lit (LitDouble d1), Lit (LitDouble d2)] <- getArgs + guard $ (d1 /=0 || d2 > 0) -- see Note [negative zero] + && d2 /= 0 -- avoid NaN and Infinity/-Infinity +-- Note [negative zero] Avoid (0 / -d), otherwise 0/(-1) reduces to +-- zero, but we might want to preserve the negative zero here which +-- is representable in Float/Double but not in (normalised) +-- Rational. (#3676) Perhaps we should generate (0 :% (-1)) instead? + +strengthReduction :: Literal -> PrimOp -> RuleM CoreExpr +strengthReduction two_lit add_op = do -- Note [Strength reduction] + arg <- msum [ do [arg, Lit mult_lit] <- getArgs + guard (mult_lit == two_lit) + return arg + , do [Lit mult_lit, arg] <- getArgs + guard (mult_lit == two_lit) + return arg ] + return $ Var (mkPrimOpId add_op) `App` arg `App` arg + +-- Note [Strength reduction] +-- ~~~~~~~~~~~~~~~~~~~~~~~~~ +-- +-- This rule turns floating point multiplications of the form 2.0 * x and +-- x * 2.0 into x + x addition, because addition costs less than multiplication. +-- See #7116 + +-- Note [What's true and false] +-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-- +-- trueValInt and falseValInt represent true and false values returned by +-- comparison primops for Char, Int, Word, Integer, Double, Float and Addr. +-- True is represented as an unboxed 1# literal, while false is represented +-- as 0# literal. +-- We still need Bool data constructors (True and False) to use in a rule +-- for constant folding of equal Strings + +trueValInt, falseValInt :: DynFlags -> Expr CoreBndr +trueValInt dflags = Lit $ onei dflags -- see Note [What's true and false] +falseValInt dflags = Lit $ zeroi dflags + +trueValBool, falseValBool :: Expr CoreBndr +trueValBool = Var trueDataConId -- see Note [What's true and false] +falseValBool = Var falseDataConId + +ltVal, eqVal, gtVal :: Expr CoreBndr +ltVal = Var ordLTDataConId +eqVal = Var ordEQDataConId +gtVal = Var ordGTDataConId + +mkIntVal :: DynFlags -> Integer -> Expr CoreBndr +mkIntVal dflags i = Lit (mkLitInt dflags i) +mkFloatVal :: DynFlags -> Rational -> Expr CoreBndr +mkFloatVal dflags f = Lit (convFloating dflags (LitFloat f)) +mkDoubleVal :: DynFlags -> Rational -> Expr CoreBndr +mkDoubleVal dflags d = Lit (convFloating dflags (LitDouble d)) + +matchPrimOpId :: PrimOp -> Id -> RuleM () +matchPrimOpId op id = do + op' <- liftMaybe $ isPrimOpId_maybe id + guard $ op == op' + +{- +************************************************************************ +* * +\subsection{Special rules for seq, tagToEnum, dataToTag} +* * +************************************************************************ + +Note [tagToEnum#] +~~~~~~~~~~~~~~~~~ +Nasty check to ensure that tagToEnum# is applied to a type that is an +enumeration TyCon. Unification may refine the type later, but this +check won't see that, alas. It's crude but it works. + +Here's are two cases that should fail + f :: forall a. a + f = tagToEnum# 0 -- Can't do tagToEnum# at a type variable + + g :: Int + g = tagToEnum# 0 -- Int is not an enumeration + +We used to make this check in the type inference engine, but it's quite +ugly to do so, because the delayed constraint solving means that we don't +really know what's going on until the end. It's very much a corner case +because we don't expect the user to call tagToEnum# at all; we merely +generate calls in derived instances of Enum. So we compromise: a +rewrite rule rewrites a bad instance of tagToEnum# to an error call, +and emits a warning. +-} + +tagToEnumRule :: RuleM CoreExpr +-- If data T a = A | B | C +-- then tagToEnum# (T ty) 2# --> B ty +tagToEnumRule = do + [Type ty, Lit (LitNumber LitNumInt i _)] <- getArgs + case splitTyConApp_maybe ty of + Just (tycon, tc_args) | isEnumerationTyCon tycon -> do + let tag = fromInteger i + correct_tag dc = (dataConTagZ dc) == tag + (dc:rest) <- return $ filter correct_tag (tyConDataCons_maybe tycon `orElse` []) + ASSERT(null rest) return () + return $ mkTyApps (Var (dataConWorkId dc)) tc_args + + -- See Note [tagToEnum#] + _ -> WARN( True, text "tagToEnum# on non-enumeration type" <+> ppr ty ) + return $ mkRuntimeErrorApp rUNTIME_ERROR_ID ty "tagToEnum# on non-enumeration type" + +------------------------------ +dataToTagRule :: RuleM CoreExpr +-- See Note [dataToTag#] in primops.txt.pp +dataToTagRule = a `mplus` b + where + -- dataToTag (tagToEnum x) ==> x + a = do + [Type ty1, Var tag_to_enum `App` Type ty2 `App` tag] <- getArgs + guard $ tag_to_enum `hasKey` tagToEnumKey + guard $ ty1 `eqType` ty2 + return tag + + -- dataToTag (K e1 e2) ==> tag-of K + -- This also works (via exprIsConApp_maybe) for + -- dataToTag x + -- where x's unfolding is a constructor application + b = do + dflags <- getDynFlags + [_, val_arg] <- getArgs + in_scope <- getInScopeEnv + (_,floats, dc,_,_) <- liftMaybe $ exprIsConApp_maybe in_scope val_arg + ASSERT( not (isNewTyCon (dataConTyCon dc)) ) return () + return $ wrapFloats floats (mkIntVal dflags (toInteger (dataConTagZ dc))) + +{- Note [dataToTag# magic] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +The primop dataToTag# is unusual because it evaluates its argument. +Only `SeqOp` shares that property. (Other primops do not do anything +as fancy as argument evaluation.) The special handling for dataToTag# +is: + +* GHC.Core.Utils.exprOkForSpeculation has a special case for DataToTagOp, + (actually in app_ok). Most primops with lifted arguments do not + evaluate those arguments, but DataToTagOp and SeqOp are two + exceptions. We say that they are /never/ ok-for-speculation, + regardless of the evaluated-ness of their argument. + See GHC.Core.Utils Note [exprOkForSpeculation and SeqOp/DataToTagOp] + +* There is a special case for DataToTagOp in GHC.StgToCmm.Expr.cgExpr, + that evaluates its argument and then extracts the tag from + the returned value. + +* An application like (dataToTag# (Just x)) is optimised by + dataToTagRule in GHC.Core.Op.ConstantFold. + +* A case expression like + case (dataToTag# e) of <alts> + gets transformed t + case e of <transformed alts> + by GHC.Core.Op.ConstantFold.caseRules; see Note [caseRules for dataToTag] + +See #15696 for a long saga. +-} + +{- ********************************************************************* +* * + unsafeEqualityProof +* * +********************************************************************* -} + +-- unsafeEqualityProof k t t ==> UnsafeRefl (Refl t) +-- That is, if the two types are equal, it's not unsafe! + +unsafeEqualityProofRule :: RuleM CoreExpr +unsafeEqualityProofRule + = do { [Type rep, Type t1, Type t2] <- getArgs + ; guard (t1 `eqType` t2) + ; fn <- getFunction + ; let (_, ue) = splitForAllTys (idType fn) + tc = tyConAppTyCon ue -- tycon: UnsafeEquality + (dc:_) = tyConDataCons tc -- data con: UnsafeRefl + -- UnsafeRefl :: forall (r :: RuntimeRep) (a :: TYPE r). + -- UnsafeEquality r a a + ; return (mkTyApps (Var (dataConWrapId dc)) [rep, t1]) } + + +{- ********************************************************************* +* * + Rules for seq# and spark# +* * +********************************************************************* -} + +{- Note [seq# magic] +~~~~~~~~~~~~~~~~~~~~ +The primop + seq# :: forall a s . a -> State# s -> (# State# s, a #) + +is /not/ the same as the Prelude function seq :: a -> b -> b +as you can see from its type. In fact, seq# is the implementation +mechanism for 'evaluate' + + evaluate :: a -> IO a + evaluate a = IO $ \s -> seq# a s + +The semantics of seq# is + * evaluate its first argument + * and return it + +Things to note + +* Why do we need a primop at all? That is, instead of + case seq# x s of (# x, s #) -> blah + why not instead say this? + case x of { DEFAULT -> blah) + + Reason (see #5129): if we saw + catch# (\s -> case x of { DEFAULT -> raiseIO# exn s }) handler + + then we'd drop the 'case x' because the body of the case is bottom + anyway. But we don't want to do that; the whole /point/ of + seq#/evaluate is to evaluate 'x' first in the IO monad. + + In short, we /always/ evaluate the first argument and never + just discard it. + +* Why return the value? So that we can control sharing of seq'd + values: in + let x = e in x `seq` ... x ... + We don't want to inline x, so better to represent it as + let x = e in case seq# x RW of (# _, x' #) -> ... x' ... + also it matches the type of rseq in the Eval monad. + +Implementing seq#. The compiler has magic for SeqOp in + +- GHC.Core.Op.ConstantFold.seqRule: eliminate (seq# <whnf> s) + +- GHC.StgToCmm.Expr.cgExpr, and cgCase: special case for seq# + +- GHC.Core.Utils.exprOkForSpeculation; + see Note [exprOkForSpeculation and SeqOp/DataToTagOp] in GHC.Core.Utils + +- Simplify.addEvals records evaluated-ness for the result; see + Note [Adding evaluatedness info to pattern-bound variables] + in GHC.Core.Op.Simplify +-} + +seqRule :: RuleM CoreExpr +seqRule = do + [Type ty_a, Type _ty_s, a, s] <- getArgs + guard $ exprIsHNF a + return $ mkCoreUbxTup [exprType s, ty_a] [s, a] + +-- spark# :: forall a s . a -> State# s -> (# State# s, a #) +sparkRule :: RuleM CoreExpr +sparkRule = seqRule -- reduce on HNF, just the same + -- XXX perhaps we shouldn't do this, because a spark eliminated by + -- this rule won't be counted as a dud at runtime? + +{- +************************************************************************ +* * +\subsection{Built in rules} +* * +************************************************************************ + +Note [Scoping for Builtin rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When compiling a (base-package) module that defines one of the +functions mentioned in the RHS of a built-in rule, there's a danger +that we'll see + + f = ...(eq String x).... + + ....and lower down... + + eqString = ... + +Then a rewrite would give + + f = ...(eqString x)... + ....and lower down... + eqString = ... + +and lo, eqString is not in scope. This only really matters when we +get to code generation. But the occurrence analyser does a GlomBinds +step when necessary, that does a new SCC analysis on the whole set of +bindings (see occurAnalysePgm), which sorts out the dependency, so all +is fine. +-} + +builtinRules :: [CoreRule] +-- Rules for non-primops that can't be expressed using a RULE pragma +builtinRules + = [BuiltinRule { ru_name = fsLit "AppendLitString", + ru_fn = unpackCStringFoldrName, + ru_nargs = 4, ru_try = match_append_lit }, + BuiltinRule { ru_name = fsLit "EqString", ru_fn = eqStringName, + ru_nargs = 2, ru_try = match_eq_string }, + BuiltinRule { ru_name = fsLit "Inline", ru_fn = inlineIdName, + ru_nargs = 2, ru_try = \_ _ _ -> match_inline }, + BuiltinRule { ru_name = fsLit "MagicDict", ru_fn = idName magicDictId, + ru_nargs = 4, ru_try = \_ _ _ -> match_magicDict }, + + mkBasicRule unsafeEqualityProofName 3 unsafeEqualityProofRule, + + mkBasicRule divIntName 2 $ msum + [ nonZeroLit 1 >> binaryLit (intOp2 div) + , leftZero zeroi + , do + [arg, Lit (LitNumber LitNumInt d _)] <- getArgs + Just n <- return $ exactLog2 d + dflags <- getDynFlags + return $ Var (mkPrimOpId ISraOp) `App` arg `App` mkIntVal dflags n + ], + + mkBasicRule modIntName 2 $ msum + [ nonZeroLit 1 >> binaryLit (intOp2 mod) + , leftZero zeroi + , do + [arg, Lit (LitNumber LitNumInt d _)] <- getArgs + Just _ <- return $ exactLog2 d + dflags <- getDynFlags + return $ Var (mkPrimOpId AndIOp) + `App` arg `App` mkIntVal dflags (d - 1) + ] + ] + ++ builtinIntegerRules + ++ builtinNaturalRules +{-# NOINLINE builtinRules #-} +-- there is no benefit to inlining these yet, despite this, GHC produces +-- unfoldings for this regardless since the floated list entries look small. + +builtinIntegerRules :: [CoreRule] +builtinIntegerRules = + [rule_IntToInteger "smallInteger" smallIntegerName, + rule_WordToInteger "wordToInteger" wordToIntegerName, + rule_Int64ToInteger "int64ToInteger" int64ToIntegerName, + rule_Word64ToInteger "word64ToInteger" word64ToIntegerName, + rule_convert "integerToWord" integerToWordName mkWordLitWord, + rule_convert "integerToInt" integerToIntName mkIntLitInt, + rule_convert "integerToWord64" integerToWord64Name (\_ -> mkWord64LitWord64), + rule_convert "integerToInt64" integerToInt64Name (\_ -> mkInt64LitInt64), + rule_binop "plusInteger" plusIntegerName (+), + rule_binop "minusInteger" minusIntegerName (-), + rule_binop "timesInteger" timesIntegerName (*), + rule_unop "negateInteger" negateIntegerName negate, + rule_binop_Prim "eqInteger#" eqIntegerPrimName (==), + rule_binop_Prim "neqInteger#" neqIntegerPrimName (/=), + rule_unop "absInteger" absIntegerName abs, + rule_unop "signumInteger" signumIntegerName signum, + rule_binop_Prim "leInteger#" leIntegerPrimName (<=), + rule_binop_Prim "gtInteger#" gtIntegerPrimName (>), + rule_binop_Prim "ltInteger#" ltIntegerPrimName (<), + rule_binop_Prim "geInteger#" geIntegerPrimName (>=), + rule_binop_Ordering "compareInteger" compareIntegerName compare, + rule_encodeFloat "encodeFloatInteger" encodeFloatIntegerName mkFloatLitFloat, + rule_convert "floatFromInteger" floatFromIntegerName (\_ -> mkFloatLitFloat), + rule_encodeFloat "encodeDoubleInteger" encodeDoubleIntegerName mkDoubleLitDouble, + rule_decodeDouble "decodeDoubleInteger" decodeDoubleIntegerName, + rule_convert "doubleFromInteger" doubleFromIntegerName (\_ -> mkDoubleLitDouble), + rule_rationalTo "rationalToFloat" rationalToFloatName mkFloatExpr, + rule_rationalTo "rationalToDouble" rationalToDoubleName mkDoubleExpr, + rule_binop "gcdInteger" gcdIntegerName gcd, + rule_binop "lcmInteger" lcmIntegerName lcm, + rule_binop "andInteger" andIntegerName (.&.), + rule_binop "orInteger" orIntegerName (.|.), + rule_binop "xorInteger" xorIntegerName xor, + rule_unop "complementInteger" complementIntegerName complement, + rule_shift_op "shiftLInteger" shiftLIntegerName shiftL, + rule_shift_op "shiftRInteger" shiftRIntegerName shiftR, + rule_bitInteger "bitInteger" bitIntegerName, + -- See Note [Integer division constant folding] in libraries/base/GHC/Real.hs + rule_divop_one "quotInteger" quotIntegerName quot, + rule_divop_one "remInteger" remIntegerName rem, + rule_divop_one "divInteger" divIntegerName div, + rule_divop_one "modInteger" modIntegerName mod, + rule_divop_both "divModInteger" divModIntegerName divMod, + rule_divop_both "quotRemInteger" quotRemIntegerName quotRem, + -- These rules below don't actually have to be built in, but if we + -- put them in the Haskell source then we'd have to duplicate them + -- between all Integer implementations + rule_XToIntegerToX "smallIntegerToInt" integerToIntName smallIntegerName, + rule_XToIntegerToX "wordToIntegerToWord" integerToWordName wordToIntegerName, + rule_XToIntegerToX "int64ToIntegerToInt64" integerToInt64Name int64ToIntegerName, + rule_XToIntegerToX "word64ToIntegerToWord64" integerToWord64Name word64ToIntegerName, + rule_smallIntegerTo "smallIntegerToWord" integerToWordName Int2WordOp, + rule_smallIntegerTo "smallIntegerToFloat" floatFromIntegerName Int2FloatOp, + rule_smallIntegerTo "smallIntegerToDouble" doubleFromIntegerName Int2DoubleOp + ] + where rule_convert str name convert + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_Integer_convert convert } + rule_IntToInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_IntToInteger } + rule_WordToInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_WordToInteger } + rule_Int64ToInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_Int64ToInteger } + rule_Word64ToInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_Word64ToInteger } + rule_unop str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_Integer_unop op } + rule_bitInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_bitInteger } + rule_binop str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_binop op } + rule_divop_both str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_divop_both op } + rule_divop_one str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_divop_one op } + rule_shift_op str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_shift_op op } + rule_binop_Prim str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_binop_Prim op } + rule_binop_Ordering str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_binop_Ordering op } + rule_encodeFloat str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Integer_Int_encodeFloat op } + rule_decodeDouble str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_decodeDouble } + rule_XToIntegerToX str name toIntegerName + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_XToIntegerToX toIntegerName } + rule_smallIntegerTo str name primOp + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_smallIntegerTo primOp } + rule_rationalTo str name mkLit + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_rationalTo mkLit } + +builtinNaturalRules :: [CoreRule] +builtinNaturalRules = + [rule_binop "plusNatural" plusNaturalName (+) + ,rule_partial_binop "minusNatural" minusNaturalName (\a b -> if a >= b then Just (a - b) else Nothing) + ,rule_binop "timesNatural" timesNaturalName (*) + ,rule_NaturalFromInteger "naturalFromInteger" naturalFromIntegerName + ,rule_NaturalToInteger "naturalToInteger" naturalToIntegerName + ,rule_WordToNatural "wordToNatural" wordToNaturalName + ] + where rule_binop str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Natural_binop op } + rule_partial_binop str name op + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2, + ru_try = match_Natural_partial_binop op } + rule_NaturalToInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_NaturalToInteger } + rule_NaturalFromInteger str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_NaturalFromInteger } + rule_WordToNatural str name + = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 1, + ru_try = match_WordToNatural } + +--------------------------------------------------- +-- The rule is this: +-- unpackFoldrCString# "foo" c (unpackFoldrCString# "baz" c n) +-- = unpackFoldrCString# "foobaz" c n + +match_append_lit :: RuleFun +match_append_lit _ id_unf _ + [ Type ty1 + , lit1 + , c1 + , e2 + ] + -- N.B. Ensure that we strip off any ticks (e.g. source notes) from the + -- `lit` and `c` arguments, lest this may fail to fire when building with + -- -g3. See #16740. + | (strTicks, Var unpk `App` Type ty2 + `App` lit2 + `App` c2 + `App` n) <- stripTicksTop tickishFloatable e2 + , unpk `hasKey` unpackCStringFoldrIdKey + , cheapEqExpr' tickishFloatable c1 c2 + , (c1Ticks, c1') <- stripTicksTop tickishFloatable c1 + , c2Ticks <- stripTicksTopT tickishFloatable c2 + , Just (LitString s1) <- exprIsLiteral_maybe id_unf lit1 + , Just (LitString s2) <- exprIsLiteral_maybe id_unf lit2 + = ASSERT( ty1 `eqType` ty2 ) + Just $ mkTicks strTicks + $ Var unpk `App` Type ty1 + `App` Lit (LitString (s1 `BS.append` s2)) + `App` mkTicks (c1Ticks ++ c2Ticks) c1' + `App` n + +match_append_lit _ _ _ _ = Nothing + +--------------------------------------------------- +-- The rule is this: +-- eqString (unpackCString# (Lit s1)) (unpackCString# (Lit s2)) = s1==s2 + +match_eq_string :: RuleFun +match_eq_string _ id_unf _ + [Var unpk1 `App` lit1, Var unpk2 `App` lit2] + | unpk1 `hasKey` unpackCStringIdKey + , unpk2 `hasKey` unpackCStringIdKey + , Just (LitString s1) <- exprIsLiteral_maybe id_unf lit1 + , Just (LitString s2) <- exprIsLiteral_maybe id_unf lit2 + = Just (if s1 == s2 then trueValBool else falseValBool) + +match_eq_string _ _ _ _ = Nothing + + +--------------------------------------------------- +-- The rule is this: +-- inline f_ty (f a b c) = <f's unfolding> a b c +-- (if f has an unfolding, EVEN if it's a loop breaker) +-- +-- It's important to allow the argument to 'inline' to have args itself +-- (a) because its more forgiving to allow the programmer to write +-- inline f a b c +-- or inline (f a b c) +-- (b) because a polymorphic f wll get a type argument that the +-- programmer can't avoid +-- +-- Also, don't forget about 'inline's type argument! +match_inline :: [Expr CoreBndr] -> Maybe (Expr CoreBndr) +match_inline (Type _ : e : _) + | (Var f, args1) <- collectArgs e, + Just unf <- maybeUnfoldingTemplate (realIdUnfolding f) + -- Ignore the IdUnfoldingFun here! + = Just (mkApps unf args1) + +match_inline _ = Nothing + + +-- See Note [magicDictId magic] in `basicTypes/MkId.hs` +-- for a description of what is going on here. +match_magicDict :: [Expr CoreBndr] -> Maybe (Expr CoreBndr) +match_magicDict [Type _, Var wrap `App` Type a `App` Type _ `App` f, x, y ] + | Just (fieldTy, _) <- splitFunTy_maybe $ dropForAlls $ idType wrap + , Just (dictTy, _) <- splitFunTy_maybe fieldTy + , Just dictTc <- tyConAppTyCon_maybe dictTy + , Just (_,_,co) <- unwrapNewTyCon_maybe dictTc + = Just + $ f `App` Cast x (mkSymCo (mkUnbranchedAxInstCo Representational co [a] [])) + `App` y + +match_magicDict _ = Nothing + +------------------------------------------------- +-- Integer rules +-- smallInteger (79::Int#) = 79::Integer +-- wordToInteger (79::Word#) = 79::Integer +-- Similarly Int64, Word64 + +match_IntToInteger :: RuleFun +match_IntToInteger = match_IntToInteger_unop id + +match_WordToInteger :: RuleFun +match_WordToInteger _ id_unf id [xl] + | Just (LitNumber LitNumWord x _) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType id) of + Just (_, integerTy) -> + Just (Lit (mkLitInteger x integerTy)) + _ -> + panic "match_WordToInteger: Id has the wrong type" +match_WordToInteger _ _ _ _ = Nothing + +match_Int64ToInteger :: RuleFun +match_Int64ToInteger _ id_unf id [xl] + | Just (LitNumber LitNumInt64 x _) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType id) of + Just (_, integerTy) -> + Just (Lit (mkLitInteger x integerTy)) + _ -> + panic "match_Int64ToInteger: Id has the wrong type" +match_Int64ToInteger _ _ _ _ = Nothing + +match_Word64ToInteger :: RuleFun +match_Word64ToInteger _ id_unf id [xl] + | Just (LitNumber LitNumWord64 x _) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType id) of + Just (_, integerTy) -> + Just (Lit (mkLitInteger x integerTy)) + _ -> + panic "match_Word64ToInteger: Id has the wrong type" +match_Word64ToInteger _ _ _ _ = Nothing + +match_NaturalToInteger :: RuleFun +match_NaturalToInteger _ id_unf id [xl] + | Just (LitNumber LitNumNatural x _) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType id) of + Just (_, naturalTy) -> + Just (Lit (LitNumber LitNumInteger x naturalTy)) + _ -> + panic "match_NaturalToInteger: Id has the wrong type" +match_NaturalToInteger _ _ _ _ = Nothing + +match_NaturalFromInteger :: RuleFun +match_NaturalFromInteger _ id_unf id [xl] + | Just (LitNumber LitNumInteger x _) <- exprIsLiteral_maybe id_unf xl + , x >= 0 + = case splitFunTy_maybe (idType id) of + Just (_, naturalTy) -> + Just (Lit (LitNumber LitNumNatural x naturalTy)) + _ -> + panic "match_NaturalFromInteger: Id has the wrong type" +match_NaturalFromInteger _ _ _ _ = Nothing + +match_WordToNatural :: RuleFun +match_WordToNatural _ id_unf id [xl] + | Just (LitNumber LitNumWord x _) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType id) of + Just (_, naturalTy) -> + Just (Lit (LitNumber LitNumNatural x naturalTy)) + _ -> + panic "match_WordToNatural: Id has the wrong type" +match_WordToNatural _ _ _ _ = Nothing + +------------------------------------------------- +{- Note [Rewriting bitInteger] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +For most types the bitInteger operation can be implemented in terms of shifts. +The integer-gmp package, however, can do substantially better than this if +allowed to provide its own implementation. However, in so doing it previously lost +constant-folding (see #8832). The bitInteger rule above provides constant folding +specifically for this function. + +There is, however, a bit of trickiness here when it comes to ranges. While the +AST encodes all integers as Integers, `bit` expects the bit +index to be given as an Int. Hence we coerce to an Int in the rule definition. +This will behave a bit funny for constants larger than the word size, but the user +should expect some funniness given that they will have at very least ignored a +warning in this case. +-} + +match_bitInteger :: RuleFun +-- Just for GHC.Integer.Type.bitInteger :: Int# -> Integer +match_bitInteger dflags id_unf fn [arg] + | Just (LitNumber LitNumInt x _) <- exprIsLiteral_maybe id_unf arg + , x >= 0 + , x <= (wordSizeInBits dflags - 1) + -- Make sure x is small enough to yield a decently small integer + -- Attempting to construct the Integer for + -- (bitInteger 9223372036854775807#) + -- would be a bad idea (#14959) + , let x_int = fromIntegral x :: Int + = case splitFunTy_maybe (idType fn) of + Just (_, integerTy) + -> Just (Lit (LitNumber LitNumInteger (bit x_int) integerTy)) + _ -> panic "match_IntToInteger_unop: Id has the wrong type" + +match_bitInteger _ _ _ _ = Nothing + + +------------------------------------------------- +match_Integer_convert :: Num a + => (DynFlags -> a -> Expr CoreBndr) + -> RuleFun +match_Integer_convert convert dflags id_unf _ [xl] + | Just (LitNumber LitNumInteger x _) <- exprIsLiteral_maybe id_unf xl + = Just (convert dflags (fromInteger x)) +match_Integer_convert _ _ _ _ _ = Nothing + +match_Integer_unop :: (Integer -> Integer) -> RuleFun +match_Integer_unop unop _ id_unf _ [xl] + | Just (LitNumber LitNumInteger x i) <- exprIsLiteral_maybe id_unf xl + = Just (Lit (LitNumber LitNumInteger (unop x) i)) +match_Integer_unop _ _ _ _ _ = Nothing + +match_IntToInteger_unop :: (Integer -> Integer) -> RuleFun +match_IntToInteger_unop unop _ id_unf fn [xl] + | Just (LitNumber LitNumInt x _) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType fn) of + Just (_, integerTy) -> + Just (Lit (LitNumber LitNumInteger (unop x) integerTy)) + _ -> + panic "match_IntToInteger_unop: Id has the wrong type" +match_IntToInteger_unop _ _ _ _ _ = Nothing + +match_Integer_binop :: (Integer -> Integer -> Integer) -> RuleFun +match_Integer_binop binop _ id_unf _ [xl,yl] + | Just (LitNumber LitNumInteger x i) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInteger y _) <- exprIsLiteral_maybe id_unf yl + = Just (Lit (mkLitInteger (x `binop` y) i)) +match_Integer_binop _ _ _ _ _ = Nothing + +match_Natural_binop :: (Integer -> Integer -> Integer) -> RuleFun +match_Natural_binop binop _ id_unf _ [xl,yl] + | Just (LitNumber LitNumNatural x i) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumNatural y _) <- exprIsLiteral_maybe id_unf yl + = Just (Lit (mkLitNatural (x `binop` y) i)) +match_Natural_binop _ _ _ _ _ = Nothing + +match_Natural_partial_binop :: (Integer -> Integer -> Maybe Integer) -> RuleFun +match_Natural_partial_binop binop _ id_unf _ [xl,yl] + | Just (LitNumber LitNumNatural x i) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumNatural y _) <- exprIsLiteral_maybe id_unf yl + , Just z <- x `binop` y + = Just (Lit (mkLitNatural z i)) +match_Natural_partial_binop _ _ _ _ _ = Nothing + +-- This helper is used for the quotRem and divMod functions +match_Integer_divop_both + :: (Integer -> Integer -> (Integer, Integer)) -> RuleFun +match_Integer_divop_both divop _ id_unf _ [xl,yl] + | Just (LitNumber LitNumInteger x t) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInteger y _) <- exprIsLiteral_maybe id_unf yl + , y /= 0 + , (r,s) <- x `divop` y + = Just $ mkCoreUbxTup [t,t] [Lit (mkLitInteger r t), Lit (mkLitInteger s t)] +match_Integer_divop_both _ _ _ _ _ = Nothing + +-- This helper is used for the quot and rem functions +match_Integer_divop_one :: (Integer -> Integer -> Integer) -> RuleFun +match_Integer_divop_one divop _ id_unf _ [xl,yl] + | Just (LitNumber LitNumInteger x i) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInteger y _) <- exprIsLiteral_maybe id_unf yl + , y /= 0 + = Just (Lit (mkLitInteger (x `divop` y) i)) +match_Integer_divop_one _ _ _ _ _ = Nothing + +match_Integer_shift_op :: (Integer -> Int -> Integer) -> RuleFun +-- Used for shiftLInteger, shiftRInteger :: Integer -> Int# -> Integer +-- See Note [Guarding against silly shifts] +match_Integer_shift_op binop _ id_unf _ [xl,yl] + | Just (LitNumber LitNumInteger x i) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInt y _) <- exprIsLiteral_maybe id_unf yl + , y >= 0 + , y <= 4 -- Restrict constant-folding of shifts on Integers, somewhat + -- arbitrary. We can get huge shifts in inaccessible code + -- (#15673) + = Just (Lit (mkLitInteger (x `binop` fromIntegral y) i)) +match_Integer_shift_op _ _ _ _ _ = Nothing + +match_Integer_binop_Prim :: (Integer -> Integer -> Bool) -> RuleFun +match_Integer_binop_Prim binop dflags id_unf _ [xl, yl] + | Just (LitNumber LitNumInteger x _) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInteger y _) <- exprIsLiteral_maybe id_unf yl + = Just (if x `binop` y then trueValInt dflags else falseValInt dflags) +match_Integer_binop_Prim _ _ _ _ _ = Nothing + +match_Integer_binop_Ordering :: (Integer -> Integer -> Ordering) -> RuleFun +match_Integer_binop_Ordering binop _ id_unf _ [xl, yl] + | Just (LitNumber LitNumInteger x _) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInteger y _) <- exprIsLiteral_maybe id_unf yl + = Just $ case x `binop` y of + LT -> ltVal + EQ -> eqVal + GT -> gtVal +match_Integer_binop_Ordering _ _ _ _ _ = Nothing + +match_Integer_Int_encodeFloat :: RealFloat a + => (a -> Expr CoreBndr) + -> RuleFun +match_Integer_Int_encodeFloat mkLit _ id_unf _ [xl,yl] + | Just (LitNumber LitNumInteger x _) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInt y _) <- exprIsLiteral_maybe id_unf yl + = Just (mkLit $ encodeFloat x (fromInteger y)) +match_Integer_Int_encodeFloat _ _ _ _ _ = Nothing + +--------------------------------------------------- +-- constant folding for Float/Double +-- +-- This turns +-- rationalToFloat n d +-- into a literal Float, and similarly for Doubles. +-- +-- it's important to not match d == 0, because that may represent a +-- literal "0/0" or similar, and we can't produce a literal value for +-- NaN or +-Inf +match_rationalTo :: RealFloat a + => (a -> Expr CoreBndr) + -> RuleFun +match_rationalTo mkLit _ id_unf _ [xl, yl] + | Just (LitNumber LitNumInteger x _) <- exprIsLiteral_maybe id_unf xl + , Just (LitNumber LitNumInteger y _) <- exprIsLiteral_maybe id_unf yl + , y /= 0 + = Just (mkLit (fromRational (x % y))) +match_rationalTo _ _ _ _ _ = Nothing + +match_decodeDouble :: RuleFun +match_decodeDouble dflags id_unf fn [xl] + | Just (LitDouble x) <- exprIsLiteral_maybe id_unf xl + = case splitFunTy_maybe (idType fn) of + Just (_, res) + | Just [_lev1, _lev2, integerTy, intHashTy] <- tyConAppArgs_maybe res + -> case decodeFloat (fromRational x :: Double) of + (y, z) -> + Just $ mkCoreUbxTup [integerTy, intHashTy] + [Lit (mkLitInteger y integerTy), + Lit (mkLitInt dflags (toInteger z))] + _ -> + pprPanic "match_decodeDouble: Id has the wrong type" + (ppr fn <+> dcolon <+> ppr (idType fn)) +match_decodeDouble _ _ _ _ = Nothing + +match_XToIntegerToX :: Name -> RuleFun +match_XToIntegerToX n _ _ _ [App (Var x) y] + | idName x == n + = Just y +match_XToIntegerToX _ _ _ _ _ = Nothing + +match_smallIntegerTo :: PrimOp -> RuleFun +match_smallIntegerTo primOp _ _ _ [App (Var x) y] + | idName x == smallIntegerName + = Just $ App (Var (mkPrimOpId primOp)) y +match_smallIntegerTo _ _ _ _ _ = Nothing + + + +-------------------------------------------------------- +-- Note [Constant folding through nested expressions] +-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-- +-- We use rewrites rules to perform constant folding. It means that we don't +-- have a global view of the expression we are trying to optimise. As a +-- consequence we only perform local (small-step) transformations that either: +-- 1) reduce the number of operations +-- 2) rearrange the expression to increase the odds that other rules will +-- match +-- +-- We don't try to handle more complex expression optimisation cases that would +-- require a global view. For example, rewriting expressions to increase +-- sharing (e.g., Horner's method); optimisations that require local +-- transformations increasing the number of operations; rearrangements to +-- cancel/factorize terms (e.g., (a+b-a-b) isn't rearranged to reduce to 0). +-- +-- We already have rules to perform constant folding on expressions with the +-- following shape (where a and/or b are literals): +-- +-- D) op +-- /\ +-- / \ +-- / \ +-- a b +-- +-- To support nested expressions, we match three other shapes of expression +-- trees: +-- +-- A) op1 B) op1 C) op1 +-- /\ /\ /\ +-- / \ / \ / \ +-- / \ / \ / \ +-- a op2 op2 c op2 op3 +-- /\ /\ /\ /\ +-- / \ / \ / \ / \ +-- b c a b a b c d +-- +-- +-- R1) +/- simplification: +-- ops = + or -, two literals (not siblings) +-- +-- Examples: +-- A: 5 + (10-x) ==> 15-x +-- B: (10+x) + 5 ==> 15+x +-- C: (5+a)-(5-b) ==> 0+(a+b) +-- +-- R2) * simplification +-- ops = *, two literals (not siblings) +-- +-- Examples: +-- A: 5 * (10*x) ==> 50*x +-- B: (10*x) * 5 ==> 50*x +-- C: (5*a)*(5*b) ==> 25*(a*b) +-- +-- R3) * distribution over +/- +-- op1 = *, op2 = + or -, two literals (not siblings) +-- +-- This transformation doesn't reduce the number of operations but switches +-- the outer and the inner operations so that the outer is (+) or (-) instead +-- of (*). It increases the odds that other rules will match after this one. +-- +-- Examples: +-- A: 5 * (10-x) ==> 50 - (5*x) +-- B: (10+x) * 5 ==> 50 + (5*x) +-- C: Not supported as it would increase the number of operations: +-- (5+a)*(5-b) ==> 25 - 5*b + 5*a - a*b +-- +-- R4) Simple factorization +-- +-- op1 = + or -, op2/op3 = *, +-- one literal for each innermost * operation (except in the D case), +-- the two other terms are equals +-- +-- Examples: +-- A: x - (10*x) ==> (-9)*x +-- B: (10*x) + x ==> 11*x +-- C: (5*x)-(x*3) ==> 2*x +-- D: x+x ==> 2*x +-- +-- R5) +/- propagation +-- +-- ops = + or -, one literal +-- +-- This transformation doesn't reduce the number of operations but propagates +-- the constant to the outer level. It increases the odds that other rules +-- will match after this one. +-- +-- Examples: +-- A: x - (10-y) ==> (x+y) - 10 +-- B: (10+x) - y ==> 10 + (x-y) +-- C: N/A (caught by the A and B cases) +-- +-------------------------------------------------------- + +-- | Rules to perform constant folding into nested expressions +-- +--See Note [Constant folding through nested expressions] +numFoldingRules :: PrimOp -> (DynFlags -> PrimOps) -> RuleM CoreExpr +numFoldingRules op dict = do + [e1,e2] <- getArgs + dflags <- getDynFlags + let PrimOps{..} = dict dflags + if not (gopt Opt_NumConstantFolding dflags) + then mzero + else case BinOpApp e1 op e2 of + -- R1) +/- simplification + x :++: (y :++: v) -> return $ mkL (x+y) `add` v + x :++: (L y :-: v) -> return $ mkL (x+y) `sub` v + x :++: (v :-: L y) -> return $ mkL (x-y) `add` v + L x :-: (y :++: v) -> return $ mkL (x-y) `sub` v + L x :-: (L y :-: v) -> return $ mkL (x-y) `add` v + L x :-: (v :-: L y) -> return $ mkL (x+y) `sub` v + + (y :++: v) :-: L x -> return $ mkL (y-x) `add` v + (L y :-: v) :-: L x -> return $ mkL (y-x) `sub` v + (v :-: L y) :-: L x -> return $ mkL (0-y-x) `add` v + + (x :++: w) :+: (y :++: v) -> return $ mkL (x+y) `add` (w `add` v) + (w :-: L x) :+: (L y :-: v) -> return $ mkL (y-x) `add` (w `sub` v) + (w :-: L x) :+: (v :-: L y) -> return $ mkL (0-x-y) `add` (w `add` v) + (L x :-: w) :+: (L y :-: v) -> return $ mkL (x+y) `sub` (w `add` v) + (L x :-: w) :+: (v :-: L y) -> return $ mkL (x-y) `add` (v `sub` w) + (w :-: L x) :+: (y :++: v) -> return $ mkL (y-x) `add` (w `add` v) + (L x :-: w) :+: (y :++: v) -> return $ mkL (x+y) `add` (v `sub` w) + (y :++: v) :+: (w :-: L x) -> return $ mkL (y-x) `add` (w `add` v) + (y :++: v) :+: (L x :-: w) -> return $ mkL (x+y) `add` (v `sub` w) + + (v :-: L y) :-: (w :-: L x) -> return $ mkL (x-y) `add` (v `sub` w) + (v :-: L y) :-: (L x :-: w) -> return $ mkL (0-x-y) `add` (v `add` w) + (L y :-: v) :-: (w :-: L x) -> return $ mkL (x+y) `sub` (v `add` w) + (L y :-: v) :-: (L x :-: w) -> return $ mkL (y-x) `add` (w `sub` v) + (x :++: w) :-: (y :++: v) -> return $ mkL (x-y) `add` (w `sub` v) + (w :-: L x) :-: (y :++: v) -> return $ mkL (0-y-x) `add` (w `sub` v) + (L x :-: w) :-: (y :++: v) -> return $ mkL (x-y) `sub` (v `add` w) + (y :++: v) :-: (w :-: L x) -> return $ mkL (y+x) `add` (v `sub` w) + (y :++: v) :-: (L x :-: w) -> return $ mkL (y-x) `add` (v `add` w) + + -- R2) * simplification + x :**: (y :**: v) -> return $ mkL (x*y) `mul` v + (x :**: w) :*: (y :**: v) -> return $ mkL (x*y) `mul` (w `mul` v) + + -- R3) * distribution over +/- + x :**: (y :++: v) -> return $ mkL (x*y) `add` (mkL x `mul` v) + x :**: (L y :-: v) -> return $ mkL (x*y) `sub` (mkL x `mul` v) + x :**: (v :-: L y) -> return $ (mkL x `mul` v) `sub` mkL (x*y) + + -- R4) Simple factorization + v :+: w + | w `cheapEqExpr` v -> return $ mkL 2 `mul` v + w :+: (y :**: v) + | w `cheapEqExpr` v -> return $ mkL (1+y) `mul` v + w :-: (y :**: v) + | w `cheapEqExpr` v -> return $ mkL (1-y) `mul` v + (y :**: v) :+: w + | w `cheapEqExpr` v -> return $ mkL (y+1) `mul` v + (y :**: v) :-: w + | w `cheapEqExpr` v -> return $ mkL (y-1) `mul` v + (x :**: w) :+: (y :**: v) + | w `cheapEqExpr` v -> return $ mkL (x+y) `mul` v + (x :**: w) :-: (y :**: v) + | w `cheapEqExpr` v -> return $ mkL (x-y) `mul` v + + -- R5) +/- propagation + w :+: (y :++: v) -> return $ mkL y `add` (w `add` v) + (y :++: v) :+: w -> return $ mkL y `add` (w `add` v) + w :-: (y :++: v) -> return $ (w `sub` v) `sub` mkL y + (y :++: v) :-: w -> return $ mkL y `add` (v `sub` w) + w :-: (L y :-: v) -> return $ (w `add` v) `sub` mkL y + (L y :-: v) :-: w -> return $ mkL y `sub` (w `add` v) + w :+: (L y :-: v) -> return $ mkL y `add` (w `sub` v) + w :+: (v :-: L y) -> return $ (w `add` v) `sub` mkL y + (L y :-: v) :+: w -> return $ mkL y `add` (w `sub` v) + (v :-: L y) :+: w -> return $ (w `add` v) `sub` mkL y + + _ -> mzero + + + +-- | Match the application of a binary primop +pattern BinOpApp :: Arg CoreBndr -> PrimOp -> Arg CoreBndr -> CoreExpr +pattern BinOpApp x op y = OpVal op `App` x `App` y + +-- | Match a primop +pattern OpVal :: PrimOp -> Arg CoreBndr +pattern OpVal op <- Var (isPrimOpId_maybe -> Just op) where + OpVal op = Var (mkPrimOpId op) + + + +-- | Match a literal +pattern L :: Integer -> Arg CoreBndr +pattern L l <- Lit (isLitValue_maybe -> Just l) + +-- | Match an addition +pattern (:+:) :: Arg CoreBndr -> Arg CoreBndr -> CoreExpr +pattern x :+: y <- BinOpApp x (isAddOp -> True) y + +-- | Match an addition with a literal (handle commutativity) +pattern (:++:) :: Integer -> Arg CoreBndr -> CoreExpr +pattern l :++: x <- (isAdd -> Just (l,x)) + +isAdd :: CoreExpr -> Maybe (Integer,CoreExpr) +isAdd e = case e of + L l :+: x -> Just (l,x) + x :+: L l -> Just (l,x) + _ -> Nothing + +-- | Match a multiplication +pattern (:*:) :: Arg CoreBndr -> Arg CoreBndr -> CoreExpr +pattern x :*: y <- BinOpApp x (isMulOp -> True) y + +-- | Match a multiplication with a literal (handle commutativity) +pattern (:**:) :: Integer -> Arg CoreBndr -> CoreExpr +pattern l :**: x <- (isMul -> Just (l,x)) + +isMul :: CoreExpr -> Maybe (Integer,CoreExpr) +isMul e = case e of + L l :*: x -> Just (l,x) + x :*: L l -> Just (l,x) + _ -> Nothing + + +-- | Match a subtraction +pattern (:-:) :: Arg CoreBndr -> Arg CoreBndr -> CoreExpr +pattern x :-: y <- BinOpApp x (isSubOp -> True) y + +isSubOp :: PrimOp -> Bool +isSubOp IntSubOp = True +isSubOp WordSubOp = True +isSubOp _ = False + +isAddOp :: PrimOp -> Bool +isAddOp IntAddOp = True +isAddOp WordAddOp = True +isAddOp _ = False + +isMulOp :: PrimOp -> Bool +isMulOp IntMulOp = True +isMulOp WordMulOp = True +isMulOp _ = False + +-- | Explicit "type-class"-like dictionary for numeric primops +-- +-- Depends on DynFlags because creating a literal value depends on DynFlags +data PrimOps = PrimOps + { add :: CoreExpr -> CoreExpr -> CoreExpr -- ^ Add two numbers + , sub :: CoreExpr -> CoreExpr -> CoreExpr -- ^ Sub two numbers + , mul :: CoreExpr -> CoreExpr -> CoreExpr -- ^ Multiply two numbers + , mkL :: Integer -> CoreExpr -- ^ Create a literal value + } + +intPrimOps :: DynFlags -> PrimOps +intPrimOps dflags = PrimOps + { add = \x y -> BinOpApp x IntAddOp y + , sub = \x y -> BinOpApp x IntSubOp y + , mul = \x y -> BinOpApp x IntMulOp y + , mkL = intResult' dflags + } + +wordPrimOps :: DynFlags -> PrimOps +wordPrimOps dflags = PrimOps + { add = \x y -> BinOpApp x WordAddOp y + , sub = \x y -> BinOpApp x WordSubOp y + , mul = \x y -> BinOpApp x WordMulOp y + , mkL = wordResult' dflags + } + + +-------------------------------------------------------- +-- Constant folding through case-expressions +-- +-- cf Scrutinee Constant Folding in simplCore/GHC.Core.Op.Simplify.Utils +-------------------------------------------------------- + +-- | Match the scrutinee of a case and potentially return a new scrutinee and a +-- function to apply to each literal alternative. +caseRules :: DynFlags + -> CoreExpr -- Scrutinee + -> Maybe ( CoreExpr -- New scrutinee + , AltCon -> Maybe AltCon -- How to fix up the alt pattern + -- Nothing <=> Unreachable + -- See Note [Unreachable caseRules alternatives] + , Id -> CoreExpr) -- How to reconstruct the original scrutinee + -- from the new case-binder +-- e.g case e of b { +-- ...; +-- con bs -> rhs; +-- ... } +-- ==> +-- case e' of b' { +-- ...; +-- fixup_altcon[con] bs -> let b = mk_orig[b] in rhs; +-- ... } + +caseRules dflags (App (App (Var f) v) (Lit l)) -- v `op` x# + | Just op <- isPrimOpId_maybe f + , Just x <- isLitValue_maybe l + , Just adjust_lit <- adjustDyadicRight op x + = Just (v, tx_lit_con dflags adjust_lit + , \v -> (App (App (Var f) (Var v)) (Lit l))) + +caseRules dflags (App (App (Var f) (Lit l)) v) -- x# `op` v + | Just op <- isPrimOpId_maybe f + , Just x <- isLitValue_maybe l + , Just adjust_lit <- adjustDyadicLeft x op + = Just (v, tx_lit_con dflags adjust_lit + , \v -> (App (App (Var f) (Lit l)) (Var v))) + + +caseRules dflags (App (Var f) v ) -- op v + | Just op <- isPrimOpId_maybe f + , Just adjust_lit <- adjustUnary op + = Just (v, tx_lit_con dflags adjust_lit + , \v -> App (Var f) (Var v)) + +-- See Note [caseRules for tagToEnum] +caseRules dflags (App (App (Var f) type_arg) v) + | Just TagToEnumOp <- isPrimOpId_maybe f + = Just (v, tx_con_tte dflags + , \v -> (App (App (Var f) type_arg) (Var v))) + +-- See Note [caseRules for dataToTag] +caseRules _ (App (App (Var f) (Type ty)) v) -- dataToTag x + | Just DataToTagOp <- isPrimOpId_maybe f + , Just (tc, _) <- tcSplitTyConApp_maybe ty + , isAlgTyCon tc + = Just (v, tx_con_dtt ty + , \v -> App (App (Var f) (Type ty)) (Var v)) + +caseRules _ _ = Nothing + + +tx_lit_con :: DynFlags -> (Integer -> Integer) -> AltCon -> Maybe AltCon +tx_lit_con _ _ DEFAULT = Just DEFAULT +tx_lit_con dflags adjust (LitAlt l) = Just $ LitAlt (mapLitValue dflags adjust l) +tx_lit_con _ _ alt = pprPanic "caseRules" (ppr alt) + -- NB: mapLitValue uses mkLitIntWrap etc, to ensure that the + -- literal alternatives remain in Word/Int target ranges + -- (See Note [Word/Int underflow/overflow] in Literal and #13172). + +adjustDyadicRight :: PrimOp -> Integer -> Maybe (Integer -> Integer) +-- Given (x `op` lit) return a function 'f' s.t. f (x `op` lit) = x +adjustDyadicRight op lit + = case op of + WordAddOp -> Just (\y -> y-lit ) + IntAddOp -> Just (\y -> y-lit ) + WordSubOp -> Just (\y -> y+lit ) + IntSubOp -> Just (\y -> y+lit ) + XorOp -> Just (\y -> y `xor` lit) + XorIOp -> Just (\y -> y `xor` lit) + _ -> Nothing + +adjustDyadicLeft :: Integer -> PrimOp -> Maybe (Integer -> Integer) +-- Given (lit `op` x) return a function 'f' s.t. f (lit `op` x) = x +adjustDyadicLeft lit op + = case op of + WordAddOp -> Just (\y -> y-lit ) + IntAddOp -> Just (\y -> y-lit ) + WordSubOp -> Just (\y -> lit-y ) + IntSubOp -> Just (\y -> lit-y ) + XorOp -> Just (\y -> y `xor` lit) + XorIOp -> Just (\y -> y `xor` lit) + _ -> Nothing + + +adjustUnary :: PrimOp -> Maybe (Integer -> Integer) +-- Given (op x) return a function 'f' s.t. f (op x) = x +adjustUnary op + = case op of + NotOp -> Just (\y -> complement y) + NotIOp -> Just (\y -> complement y) + IntNegOp -> Just (\y -> negate y ) + _ -> Nothing + +tx_con_tte :: DynFlags -> AltCon -> Maybe AltCon +tx_con_tte _ DEFAULT = Just DEFAULT +tx_con_tte _ alt@(LitAlt {}) = pprPanic "caseRules" (ppr alt) +tx_con_tte dflags (DataAlt dc) -- See Note [caseRules for tagToEnum] + = Just $ LitAlt $ mkLitInt dflags $ toInteger $ dataConTagZ dc + +tx_con_dtt :: Type -> AltCon -> Maybe AltCon +tx_con_dtt _ DEFAULT = Just DEFAULT +tx_con_dtt ty (LitAlt (LitNumber LitNumInt i _)) + | tag >= 0 + , tag < n_data_cons + = Just (DataAlt (data_cons !! tag)) -- tag is zero-indexed, as is (!!) + | otherwise + = Nothing + where + tag = fromInteger i :: ConTagZ + tc = tyConAppTyCon ty + n_data_cons = tyConFamilySize tc + data_cons = tyConDataCons tc + +tx_con_dtt _ alt = pprPanic "caseRules" (ppr alt) + + +{- Note [caseRules for tagToEnum] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We want to transform + case tagToEnum x of + False -> e1 + True -> e2 +into + case x of + 0# -> e1 + 1# -> e2 + +This rule eliminates a lot of boilerplate. For + if (x>y) then e2 else e1 +we generate + case tagToEnum (x ># y) of + False -> e1 + True -> e2 +and it is nice to then get rid of the tagToEnum. + +Beware (#14768): avoid the temptation to map constructor 0 to +DEFAULT, in the hope of getting this + case (x ># y) of + DEFAULT -> e1 + 1# -> e2 +That fails utterly in the case of + data Colour = Red | Green | Blue + case tagToEnum x of + DEFAULT -> e1 + Red -> e2 + +We don't want to get this! + case x of + DEFAULT -> e1 + DEFAULT -> e2 + +Instead, we deal with turning one branch into DEFAULT in GHC.Core.Op.Simplify.Utils +(add_default in mkCase3). + +Note [caseRules for dataToTag] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +See also Note [dataToTag#] in primpops.txt.pp + +We want to transform + case dataToTag x of + DEFAULT -> e1 + 1# -> e2 +into + case x of + DEFAULT -> e1 + (:) _ _ -> e2 + +Note the need for some wildcard binders in +the 'cons' case. + +For the time, we only apply this transformation when the type of `x` is a type +headed by a normal tycon. In particular, we do not apply this in the case of a +data family tycon, since that would require carefully applying coercion(s) +between the data family and the data family instance's representation type, +which caseRules isn't currently engineered to handle (#14680). + +Note [Unreachable caseRules alternatives] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Take care if we see something like + case dataToTag x of + DEFAULT -> e1 + -1# -> e2 + 100 -> e3 +because there isn't a data constructor with tag -1 or 100. In this case the +out-of-range alternative is dead code -- we know the range of tags for x. + +Hence caseRules returns (AltCon -> Maybe AltCon), with Nothing indicating +an alternative that is unreachable. + +You may wonder how this can happen: check out #15436. +-} diff --git a/compiler/GHC/Core/Op/CprAnal.hs b/compiler/GHC/Core/Op/CprAnal.hs new file mode 100644 index 0000000000..c8f7e314e9 --- /dev/null +++ b/compiler/GHC/Core/Op/CprAnal.hs @@ -0,0 +1,669 @@ +{-# LANGUAGE CPP #-} + +-- | Constructed Product Result analysis. Identifies functions that surely +-- return heap-allocated records on every code path, so that we can eliminate +-- said heap allocation by performing a worker/wrapper split. +-- +-- See https://www.microsoft.com/en-us/research/publication/constructed-product-result-analysis-haskell/. +-- CPR analysis should happen after strictness analysis. +-- See Note [Phase ordering]. +module GHC.Core.Op.CprAnal ( cprAnalProgram ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core.Op.WorkWrap.Lib ( deepSplitProductType_maybe ) +import GHC.Driver.Session +import Demand +import Cpr +import GHC.Core +import GHC.Core.Seq +import Outputable +import VarEnv +import BasicTypes +import Data.List +import GHC.Core.DataCon +import Id +import IdInfo +import GHC.Core.Utils ( exprIsHNF, dumpIdInfoOfProgram ) +import GHC.Core.TyCon +import GHC.Core.Type +import GHC.Core.FamInstEnv +import Util +import ErrUtils ( dumpIfSet_dyn, DumpFormat (..) ) +import Maybes ( isJust, isNothing ) + +{- Note [Constructed Product Result] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The goal of Constructed Product Result analysis is to identify functions that +surely return heap-allocated records on every code path, so that we can +eliminate said heap allocation by performing a worker/wrapper split. + +@swap@ below is such a function: + + swap (a, b) = (b, a) + +A @case@ on an application of @swap@, like +@case swap (10, 42) of (a, b) -> a + b@ could cancel away +(by case-of-known-constructor) if we "inlined" @swap@ and simplified. We then +say that @swap@ has the CPR property. + +We can't inline recursive functions, but similar reasoning applies there: + + f x n = case n of + 0 -> (x, 0) + _ -> f (x+1) (n-1) + +Inductively, @case f 1 2 of (a, b) -> a + b@ could cancel away the constructed +product with the case. So @f@, too, has the CPR property. But we can't really +"inline" @f@, because it's recursive. Also, non-recursive functions like @swap@ +might be too big to inline (or even marked NOINLINE). We still want to exploit +the CPR property, and that is exactly what the worker/wrapper transformation +can do for us: + + $wf x n = case n of + 0 -> case (x, 0) of -> (a, b) -> (# a, b #) + _ -> case f (x+1) (n-1) of (a, b) -> (# a, b #) + f x n = case $wf x n of (# a, b #) -> (a, b) + +where $wf readily simplifies (by case-of-known-constructor and inlining @f@) to: + + $wf x n = case n of + 0 -> (# x, 0 #) + _ -> $wf (x+1) (n-1) + +Now, a call site like @case f 1 2 of (a, b) -> a + b@ can inline @f@ and +eliminate the heap-allocated pair constructor. + +Note [Phase ordering] +~~~~~~~~~~~~~~~~~~~~~ +We need to perform strictness analysis before CPR analysis, because that might +unbox some arguments, in turn leading to more constructed products. +Ideally, we would want the following pipeline: + +1. Strictness +2. worker/wrapper (for strictness) +3. CPR +4. worker/wrapper (for CPR) + +Currently, we omit 2. and anticipate the results of worker/wrapper. +See Note [CPR in a DataAlt case alternative] and Note [CPR for strict binders]. +An additional w/w pass would simplify things, but probably add slight overhead. +So currently we have + +1. Strictness +2. CPR +3. worker/wrapper (for strictness and CPR) +-} + +-- +-- * Analysing programs +-- + +cprAnalProgram :: DynFlags -> FamInstEnvs -> CoreProgram -> IO CoreProgram +cprAnalProgram dflags fam_envs binds = do + let env = emptyAnalEnv fam_envs + let binds_plus_cpr = snd $ mapAccumL cprAnalTopBind env binds + dumpIfSet_dyn dflags Opt_D_dump_cpr_signatures "Cpr signatures" FormatText $ + dumpIdInfoOfProgram (ppr . cprInfo) binds_plus_cpr + -- See Note [Stamp out space leaks in demand analysis] in GHC.Core.Op.DmdAnal + seqBinds binds_plus_cpr `seq` return binds_plus_cpr + +-- Analyse a (group of) top-level binding(s) +cprAnalTopBind :: AnalEnv + -> CoreBind + -> (AnalEnv, CoreBind) +cprAnalTopBind env (NonRec id rhs) + = (extendAnalEnv env id' (idCprInfo id'), NonRec id' rhs') + where + (id', rhs') = cprAnalBind TopLevel env id rhs + +cprAnalTopBind env (Rec pairs) + = (env', Rec pairs') + where + (env', pairs') = cprFix TopLevel env pairs + +-- +-- * Analysing expressions +-- + +-- | The abstract semantic function ⟦_⟧ : Expr -> Env -> A from +-- "Constructed Product Result Analysis for Haskell" +cprAnal, cprAnal' + :: AnalEnv + -> CoreExpr -- ^ expression to be denoted by a 'CprType' + -> (CprType, CoreExpr) -- ^ the updated expression and its 'CprType' + +cprAnal env e = -- pprTraceWith "cprAnal" (\res -> ppr (fst (res)) $$ ppr e) $ + cprAnal' env e + +cprAnal' _ (Lit lit) = (topCprType, Lit lit) +cprAnal' _ (Type ty) = (topCprType, Type ty) -- Doesn't happen, in fact +cprAnal' _ (Coercion co) = (topCprType, Coercion co) + +cprAnal' env (Var var) = (cprTransform env var, Var var) + +cprAnal' env (Cast e co) + = (cpr_ty, Cast e' co) + where + (cpr_ty, e') = cprAnal env e + +cprAnal' env (Tick t e) + = (cpr_ty, Tick t e') + where + (cpr_ty, e') = cprAnal env e + +cprAnal' env (App fun (Type ty)) + = (fun_ty, App fun' (Type ty)) + where + (fun_ty, fun') = cprAnal env fun + +cprAnal' env (App fun arg) + = (res_ty, App fun' arg') + where + (fun_ty, fun') = cprAnal env fun + -- In contrast to DmdAnal, there is no useful (non-nested) CPR info to be + -- had by looking into the CprType of arg. + (_, arg') = cprAnal env arg + res_ty = applyCprTy fun_ty + +cprAnal' env (Lam var body) + | isTyVar var + , (body_ty, body') <- cprAnal env body + = (body_ty, Lam var body') + | otherwise + = (lam_ty, Lam var body') + where + env' = extendSigsWithLam env var + (body_ty, body') = cprAnal env' body + lam_ty = abstractCprTy body_ty + +cprAnal' env (Case scrut case_bndr ty alts) + = (res_ty, Case scrut' case_bndr ty alts') + where + (_, scrut') = cprAnal env scrut + -- Regardless whether scrut had the CPR property or not, the case binder + -- certainly has it. See 'extendEnvForDataAlt'. + (alt_tys, alts') = mapAndUnzip (cprAnalAlt env scrut case_bndr) alts + res_ty = foldl' lubCprType botCprType alt_tys + +cprAnal' env (Let (NonRec id rhs) body) + = (body_ty, Let (NonRec id' rhs') body') + where + (id', rhs') = cprAnalBind NotTopLevel env id rhs + env' = extendAnalEnv env id' (idCprInfo id') + (body_ty, body') = cprAnal env' body + +cprAnal' env (Let (Rec pairs) body) + = body_ty `seq` (body_ty, Let (Rec pairs') body') + where + (env', pairs') = cprFix NotTopLevel env pairs + (body_ty, body') = cprAnal env' body + +cprAnalAlt + :: AnalEnv + -> CoreExpr -- ^ scrutinee + -> Id -- ^ case binder + -> Alt Var -- ^ current alternative + -> (CprType, Alt Var) +cprAnalAlt env scrut case_bndr (con@(DataAlt dc),bndrs,rhs) + -- See 'extendEnvForDataAlt' and Note [CPR in a DataAlt case alternative] + = (rhs_ty, (con, bndrs, rhs')) + where + env_alt = extendEnvForDataAlt env scrut case_bndr dc bndrs + (rhs_ty, rhs') = cprAnal env_alt rhs +cprAnalAlt env _ _ (con,bndrs,rhs) + = (rhs_ty, (con, bndrs, rhs')) + where + (rhs_ty, rhs') = cprAnal env rhs + +-- +-- * CPR transformer +-- + +cprTransform :: AnalEnv -- ^ The analysis environment + -> Id -- ^ The function + -> CprType -- ^ The demand type of the function +cprTransform env id + = -- pprTrace "cprTransform" (vcat [ppr id, ppr sig]) + sig + where + sig + | isGlobalId id -- imported function or data con worker + = getCprSig (idCprInfo id) + | Just sig <- lookupSigEnv env id -- local let-bound + = getCprSig sig + | otherwise + = topCprType + +-- +-- * Bindings +-- + +-- Recursive bindings +cprFix :: TopLevelFlag + -> AnalEnv -- Does not include bindings for this binding + -> [(Id,CoreExpr)] + -> (AnalEnv, [(Id,CoreExpr)]) -- Binders annotated with stricness info + +cprFix top_lvl env orig_pairs + = loop 1 initial_pairs + where + bot_sig = mkCprSig 0 botCpr + -- See Note [Initialising strictness] in GHC.Core.Op.DmdAnal + initial_pairs | ae_virgin env = [(setIdCprInfo id bot_sig, rhs) | (id, rhs) <- orig_pairs ] + | otherwise = orig_pairs + + -- The fixed-point varies the idCprInfo field of the binders, and terminates if that + -- annotation does not change any more. + loop :: Int -> [(Id,CoreExpr)] -> (AnalEnv, [(Id,CoreExpr)]) + loop n pairs + | found_fixpoint = (final_anal_env, pairs') + | otherwise = loop (n+1) pairs' + where + found_fixpoint = map (idCprInfo . fst) pairs' == map (idCprInfo . fst) pairs + first_round = n == 1 + pairs' = step first_round pairs + final_anal_env = extendAnalEnvs env (map fst pairs') + + step :: Bool -> [(Id, CoreExpr)] -> [(Id, CoreExpr)] + step first_round pairs = pairs' + where + -- In all but the first iteration, delete the virgin flag + start_env | first_round = env + | otherwise = nonVirgin env + + start = extendAnalEnvs start_env (map fst pairs) + + (_, pairs') = mapAccumL my_downRhs start pairs + + my_downRhs env (id,rhs) + = (env', (id', rhs')) + where + (id', rhs') = cprAnalBind top_lvl env id rhs + env' = extendAnalEnv env id (idCprInfo id') + +-- | Process the RHS of the binding for a sensible arity, add the CPR signature +-- to the Id, and augment the environment with the signature as well. +cprAnalBind + :: TopLevelFlag + -> AnalEnv + -> Id + -> CoreExpr + -> (Id, CoreExpr) +cprAnalBind top_lvl env id rhs + = (id', rhs') + where + (rhs_ty, rhs') = cprAnal env rhs + -- possibly trim thunk CPR info + rhs_ty' + -- See Note [CPR for thunks] + | stays_thunk = trimCprTy rhs_ty + -- See Note [CPR for sum types] + | returns_sum = trimCprTy rhs_ty + | otherwise = rhs_ty + -- See Note [Arity trimming for CPR signatures] + sig = mkCprSigForArity (idArity id) rhs_ty' + id' = setIdCprInfo id sig + + -- See Note [CPR for thunks] + stays_thunk = is_thunk && not_strict + is_thunk = not (exprIsHNF rhs) && not (isJoinId id) + not_strict = not (isStrictDmd (idDemandInfo id)) + -- See Note [CPR for sum types] + (_, ret_ty) = splitPiTys (idType id) + not_a_prod = isNothing (deepSplitProductType_maybe (ae_fam_envs env) ret_ty) + returns_sum = not (isTopLevel top_lvl) && not_a_prod + +{- Note [Arity trimming for CPR signatures] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Although it doesn't affect correctness of the analysis per se, we have to trim +CPR signatures to idArity. Here's what might happen if we don't: + + f x = if expensive + then \y. Box y + else \z. Box z + g a b = f a b + +The two lambdas will have a CPR type of @1m@ (so construct a product after +applied to one argument). Thus, @f@ will have a CPR signature of @2m@ +(constructs a product after applied to two arguments). +But WW will never eta-expand @f@! In this case that would amount to possibly +duplicating @expensive@ work. + +(Side note: Even if @f@'s 'idArity' happened to be 2, it would not do so, see +Note [Don't eta expand in w/w].) + +So @f@ will not be worker/wrappered. But @g@ also inherited its CPR signature +from @f@'s, so it *will* be WW'd: + + f x = if expensive + then \y. Box y + else \z. Box z + $wg a b = case f a b of Box x -> x + g a b = Box ($wg a b) + +And the case in @g@ can never cancel away, thus we introduced extra reboxing. +Hence we always trim the CPR signature of a binding to idArity. +-} + +data AnalEnv + = AE + { ae_sigs :: SigEnv + -- ^ Current approximation of signatures for local ids + , ae_virgin :: Bool + -- ^ True only on every first iteration in a fixed-point + -- iteration. See Note [Initialising strictness] in "DmdAnal" + , ae_fam_envs :: FamInstEnvs + -- ^ Needed when expanding type families and synonyms of product types. + } + +type SigEnv = VarEnv CprSig + +instance Outputable AnalEnv where + ppr (AE { ae_sigs = env, ae_virgin = virgin }) + = text "AE" <+> braces (vcat + [ text "ae_virgin =" <+> ppr virgin + , text "ae_sigs =" <+> ppr env ]) + +emptyAnalEnv :: FamInstEnvs -> AnalEnv +emptyAnalEnv fam_envs + = AE + { ae_sigs = emptyVarEnv + , ae_virgin = True + , ae_fam_envs = fam_envs + } + +-- | Extend an environment with the strictness IDs attached to the id +extendAnalEnvs :: AnalEnv -> [Id] -> AnalEnv +extendAnalEnvs env ids + = env { ae_sigs = sigs' } + where + sigs' = extendVarEnvList (ae_sigs env) [ (id, idCprInfo id) | id <- ids ] + +extendAnalEnv :: AnalEnv -> Id -> CprSig -> AnalEnv +extendAnalEnv env id sig + = env { ae_sigs = extendVarEnv (ae_sigs env) id sig } + +lookupSigEnv :: AnalEnv -> Id -> Maybe CprSig +lookupSigEnv env id = lookupVarEnv (ae_sigs env) id + +nonVirgin :: AnalEnv -> AnalEnv +nonVirgin env = env { ae_virgin = False } + +extendSigsWithLam :: AnalEnv -> Id -> AnalEnv +-- Extend the AnalEnv when we meet a lambda binder +extendSigsWithLam env id + | isId id + , isStrictDmd (idDemandInfo id) -- See Note [CPR for strict binders] + , Just (dc,_,_,_) <- deepSplitProductType_maybe (ae_fam_envs env) $ idType id + = extendAnalEnv env id (CprSig (conCprType (dataConTag dc))) + | otherwise + = env + +extendEnvForDataAlt :: AnalEnv -> CoreExpr -> Id -> DataCon -> [Var] -> AnalEnv +-- See Note [CPR in a DataAlt case alternative] +extendEnvForDataAlt env scrut case_bndr dc bndrs + = foldl' do_con_arg env' ids_w_strs + where + env' = extendAnalEnv env case_bndr (CprSig case_bndr_ty) + + ids_w_strs = filter isId bndrs `zip` dataConRepStrictness dc + + tycon = dataConTyCon dc + is_product = isJust (isDataProductTyCon_maybe tycon) + is_sum = isJust (isDataSumTyCon_maybe tycon) + case_bndr_ty + | is_product || is_sum = conCprType (dataConTag dc) + -- Any of the constructors had existentials. This is a little too + -- conservative (after all, we only care about the particular data con), + -- but there is no easy way to write is_sum and this won't happen much. + | otherwise = topCprType + + -- We could have much deeper CPR info here with Nested CPR, which could + -- propagate available unboxed things from the scrutinee, getting rid of + -- the is_var_scrut heuristic. See Note [CPR in a DataAlt case alternative]. + -- Giving strict binders the CPR property only makes sense for products, as + -- the arguments in Note [CPR for strict binders] don't apply to sums (yet); + -- we lack WW for strict binders of sum type. + do_con_arg env (id, str) + | let is_strict = isStrictDmd (idDemandInfo id) || isMarkedStrict str + , is_var_scrut && is_strict + , let fam_envs = ae_fam_envs env + , Just (dc,_,_,_) <- deepSplitProductType_maybe fam_envs $ idType id + = extendAnalEnv env id (CprSig (conCprType (dataConTag dc))) + | otherwise + = env + + is_var_scrut = is_var scrut + is_var (Cast e _) = is_var e + is_var (Var v) = isLocalId v + is_var _ = False + +{- Note [Safe abortion in the fixed-point iteration] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Fixed-point iteration may fail to terminate. But we cannot simply give up and +return the environment and code unchanged! We still need to do one additional +round, to ensure that all expressions have been traversed at least once, and any +unsound CPR annotations have been updated. + +Note [CPR in a DataAlt case alternative] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In a case alternative, we want to give some of the binders the CPR property. +Specifically + + * The case binder; inside the alternative, the case binder always has + the CPR property, meaning that a case on it will successfully cancel. + Example: + f True x = case x of y { I# x' -> if x' ==# 3 + then y + else I# 8 } + f False x = I# 3 + + By giving 'y' the CPR property, we ensure that 'f' does too, so we get + f b x = case fw b x of { r -> I# r } + fw True x = case x of y { I# x' -> if x' ==# 3 then x' else 8 } + fw False x = 3 + + Of course there is the usual risk of re-boxing: we have 'x' available + boxed and unboxed, but we return the unboxed version for the wrapper to + box. If the wrapper doesn't cancel with its caller, we'll end up + re-boxing something that we did have available in boxed form. + + * Any strict binders with product type, can use Note [CPR for strict binders] + to anticipate worker/wrappering for strictness info. + But we can go a little further. Consider + + data T = MkT !Int Int + + f2 (MkT x y) | y>0 = f2 (MkT x (y-1)) + | otherwise = x + + For $wf2 we are going to unbox the MkT *and*, since it is strict, the + first argument of the MkT; see Note [Add demands for strict constructors]. + But then we don't want box it up again when returning it! We want + 'f2' to have the CPR property, so we give 'x' the CPR property. + + * It's a bit delicate because we're brittly anticipating worker/wrapper here. + If the case above is scrutinising something other than an argument the + original function, we really don't have the unboxed version available. E.g + g v = case foo v of + MkT x y | y>0 -> ... + | otherwise -> x + Here we don't have the unboxed 'x' available. Hence the + is_var_scrut test when making use of the strictness annotation. + Slightly ad-hoc, because even if the scrutinee *is* a variable it + might not be a onre of the arguments to the original function, or a + sub-component thereof. But it's simple, and nothing terrible + happens if we get it wrong. e.g. Trac #10694. + +Note [CPR for strict binders] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If a lambda-bound variable is marked demanded with a strict demand, then give it +a CPR signature, anticipating the results of worker/wrapper. Here's a concrete +example ('f1' in test T10482a), assuming h is strict: + + f1 :: Int -> Int + f1 x = case h x of + A -> x + B -> f1 (x-1) + C -> x+1 + +If we notice that 'x' is used strictly, we can give it the CPR +property; and hence f1 gets the CPR property too. It's sound (doesn't +change strictness) to give it the CPR property because by the time 'x' +is returned (case A above), it'll have been evaluated (by the wrapper +of 'h' in the example). + +Moreover, if f itself is strict in x, then we'll pass x unboxed to +f1, and so the boxed version *won't* be available; in that case it's +very helpful to give 'x' the CPR property. + +Note that + + * We only want to do this for something that definitely + has product type, else we may get over-optimistic CPR results + (e.g. from \x -> x!). + + * See Note [CPR examples] + +Note [CPR for sum types] +~~~~~~~~~~~~~~~~~~~~~~~~ +At the moment we do not do CPR for let-bindings that + * non-top level + * bind a sum type +Reason: I found that in some benchmarks we were losing let-no-escapes, +which messed it all up. Example + let j = \x. .... + in case y of + True -> j False + False -> j True +If we w/w this we get + let j' = \x. .... + in case y of + True -> case j' False of { (# a #) -> Just a } + False -> case j' True of { (# a #) -> Just a } +Notice that j' is not a let-no-escape any more. + +However this means in turn that the *enclosing* function +may be CPR'd (via the returned Justs). But in the case of +sums, there may be Nothing alternatives; and that messes +up the sum-type CPR. + +Conclusion: only do this for products. It's still not +guaranteed OK for products, but sums definitely lose sometimes. + +Note [CPR for thunks] +~~~~~~~~~~~~~~~~~~~~~ +If the rhs is a thunk, we usually forget the CPR info, because +it is presumably shared (else it would have been inlined, and +so we'd lose sharing if w/w'd it into a function). E.g. + + let r = case expensive of + (a,b) -> (b,a) + in ... + +If we marked r as having the CPR property, then we'd w/w into + + let $wr = \() -> case expensive of + (a,b) -> (# b, a #) + r = case $wr () of + (# b,a #) -> (b,a) + in ... + +But now r is a thunk, which won't be inlined, so we are no further ahead. +But consider + + f x = let r = case expensive of (a,b) -> (b,a) + in if foo r then r else (x,x) + +Does f have the CPR property? Well, no. + +However, if the strictness analyser has figured out (in a previous +iteration) that it's strict, then we DON'T need to forget the CPR info. +Instead we can retain the CPR info and do the thunk-splitting transform +(see WorkWrap.splitThunk). + +This made a big difference to PrelBase.modInt, which had something like + modInt = \ x -> let r = ... -> I# v in + ...body strict in r... +r's RHS isn't a value yet; but modInt returns r in various branches, so +if r doesn't have the CPR property then neither does modInt +Another case I found in practice (in Complex.magnitude), looks like this: + let k = if ... then I# a else I# b + in ... body strict in k .... +(For this example, it doesn't matter whether k is returned as part of +the overall result; but it does matter that k's RHS has the CPR property.) +Left to itself, the simplifier will make a join point thus: + let $j k = ...body strict in k... + if ... then $j (I# a) else $j (I# b) +With thunk-splitting, we get instead + let $j x = let k = I#x in ...body strict in k... + in if ... then $j a else $j b +This is much better; there's a good chance the I# won't get allocated. + +But what about botCpr? Consider + lvl = error "boom" + fac -1 = lvl + fac 0 = 1 + fac n = n * fac (n-1) +fac won't have the CPR property here when we trim every thunk! But the +assumption is that error cases are rarely entered and we are diverging anyway, +so WW doesn't hurt. + +Note [CPR examples] +~~~~~~~~~~~~~~~~~~~~ +Here are some examples (stranal/should_compile/T10482a) of the +usefulness of Note [CPR in a DataAlt case alternative]. The main +point: all of these functions can have the CPR property. + + ------- f1 ----------- + -- x is used strictly by h, so it'll be available + -- unboxed before it is returned in the True branch + + f1 :: Int -> Int + f1 x = case h x x of + True -> x + False -> f1 (x-1) + + + ------- f2 ----------- + -- x is a strict field of MkT2, so we'll pass it unboxed + -- to $wf2, so it's available unboxed. This depends on + -- the case expression analysing (a subcomponent of) one + -- of the original arguments to the function, so it's + -- a bit more delicate. + + data T2 = MkT2 !Int Int + + f2 :: T2 -> Int + f2 (MkT2 x y) | y>0 = f2 (MkT2 x (y-1)) + | otherwise = x + + + ------- f3 ----------- + -- h is strict in x, so x will be unboxed before it + -- is rerturned in the otherwise case. + + data T3 = MkT3 Int Int + + f1 :: T3 -> Int + f1 (MkT3 x y) | h x y = f3 (MkT3 x (y-1)) + | otherwise = x + + + ------- f4 ----------- + -- Just like f2, but MkT4 can't unbox its strict + -- argument automatically, as f2 can + + data family Foo a + newtype instance Foo Int = Foo Int + + data T4 a = MkT4 !(Foo a) Int + + f4 :: T4 Int -> Int + f4 (MkT4 x@(Foo v) y) | y>0 = f4 (MkT4 x (y-1)) + | otherwise = v +-} diff --git a/compiler/GHC/Core/Op/DmdAnal.hs b/compiler/GHC/Core/Op/DmdAnal.hs new file mode 100644 index 0000000000..57544519d3 --- /dev/null +++ b/compiler/GHC/Core/Op/DmdAnal.hs @@ -0,0 +1,1265 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1993-1998 + + + ----------------- + A demand analysis + ----------------- +-} + +{-# LANGUAGE CPP #-} + +module GHC.Core.Op.DmdAnal ( dmdAnalProgram ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Driver.Session +import GHC.Core.Op.WorkWrap.Lib ( findTypeShape ) +import Demand -- All of it +import GHC.Core +import GHC.Core.Seq ( seqBinds ) +import Outputable +import VarEnv +import BasicTypes +import Data.List ( mapAccumL ) +import GHC.Core.DataCon +import Id +import IdInfo +import GHC.Core.Utils +import GHC.Core.TyCon +import GHC.Core.Type +import GHC.Core.Coercion ( Coercion, coVarsOfCo ) +import GHC.Core.FamInstEnv +import Util +import Maybes ( isJust ) +import TysWiredIn +import TysPrim ( realWorldStatePrimTy ) +import ErrUtils ( dumpIfSet_dyn, DumpFormat (..) ) +import UniqSet + +{- +************************************************************************ +* * +\subsection{Top level stuff} +* * +************************************************************************ +-} + +dmdAnalProgram :: DynFlags -> FamInstEnvs -> CoreProgram -> IO CoreProgram +dmdAnalProgram dflags fam_envs binds = do + let env = emptyAnalEnv dflags fam_envs + let binds_plus_dmds = snd $ mapAccumL dmdAnalTopBind env binds + dumpIfSet_dyn dflags Opt_D_dump_str_signatures "Strictness signatures" FormatText $ + dumpIdInfoOfProgram (pprIfaceStrictSig . strictnessInfo) binds_plus_dmds + -- See Note [Stamp out space leaks in demand analysis] + seqBinds binds_plus_dmds `seq` return binds_plus_dmds + +-- Analyse a (group of) top-level binding(s) +dmdAnalTopBind :: AnalEnv + -> CoreBind + -> (AnalEnv, CoreBind) +dmdAnalTopBind env (NonRec id rhs) + = (extendAnalEnv TopLevel env id' (idStrictness id'), NonRec id' rhs') + where + ( _, id', rhs') = dmdAnalRhsLetDown Nothing env cleanEvalDmd id rhs + +dmdAnalTopBind env (Rec pairs) + = (env', Rec pairs') + where + (env', _, pairs') = dmdFix TopLevel env cleanEvalDmd pairs + -- We get two iterations automatically + -- c.f. the NonRec case above + +{- Note [Stamp out space leaks in demand analysis] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The demand analysis pass outputs a new copy of the Core program in +which binders have been annotated with demand and strictness +information. It's tiresome to ensure that this information is fully +evaluated everywhere that we produce it, so we just run a single +seqBinds over the output before returning it, to ensure that there are +no references holding on to the input Core program. + +This makes a ~30% reduction in peak memory usage when compiling +DynFlags (cf #9675 and #13426). + +This is particularly important when we are doing late demand analysis, +since we don't do a seqBinds at any point thereafter. Hence code +generation would hold on to an extra copy of the Core program, via +unforced thunks in demand or strictness information; and it is the +most memory-intensive part of the compilation process, so this added +seqBinds makes a big difference in peak memory usage. +-} + + +{- +************************************************************************ +* * +\subsection{The analyser itself} +* * +************************************************************************ + +Note [Ensure demand is strict] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's important not to analyse e with a lazy demand because +a) When we encounter case s of (a,b) -> + we demand s with U(d1d2)... but if the overall demand is lazy + that is wrong, and we'd need to reduce the demand on s, + which is inconvenient +b) More important, consider + f (let x = R in x+x), where f is lazy + We still want to mark x as demanded, because it will be when we + enter the let. If we analyse f's arg with a Lazy demand, we'll + just mark x as Lazy +c) The application rule wouldn't be right either + Evaluating (f x) in a L demand does *not* cause + evaluation of f in a C(L) demand! +-} + +-- If e is complicated enough to become a thunk, its contents will be evaluated +-- at most once, so oneify it. +dmdTransformThunkDmd :: CoreExpr -> Demand -> Demand +dmdTransformThunkDmd e + | exprIsTrivial e = id + | otherwise = oneifyDmd + +-- Do not process absent demands +-- Otherwise act like in a normal demand analysis +-- See ↦* relation in the Cardinality Analysis paper +dmdAnalStar :: AnalEnv + -> Demand -- This one takes a *Demand* + -> CoreExpr -- Should obey the let/app invariant + -> (BothDmdArg, CoreExpr) +dmdAnalStar env dmd e + | (dmd_shell, cd) <- toCleanDmd dmd + , (dmd_ty, e') <- dmdAnal env cd e + = ASSERT2( not (isUnliftedType (exprType e)) || exprOkForSpeculation e, ppr e ) + -- The argument 'e' should satisfy the let/app invariant + -- See Note [Analysing with absent demand] in Demand.hs + (postProcessDmdType dmd_shell dmd_ty, e') + +-- Main Demand Analsysis machinery +dmdAnal, dmdAnal' :: AnalEnv + -> CleanDemand -- The main one takes a *CleanDemand* + -> CoreExpr -> (DmdType, CoreExpr) + +-- The CleanDemand is always strict and not absent +-- See Note [Ensure demand is strict] + +dmdAnal env d e = -- pprTrace "dmdAnal" (ppr d <+> ppr e) $ + dmdAnal' env d e + +dmdAnal' _ _ (Lit lit) = (nopDmdType, Lit lit) +dmdAnal' _ _ (Type ty) = (nopDmdType, Type ty) -- Doesn't happen, in fact +dmdAnal' _ _ (Coercion co) + = (unitDmdType (coercionDmdEnv co), Coercion co) + +dmdAnal' env dmd (Var var) + = (dmdTransform env var dmd, Var var) + +dmdAnal' env dmd (Cast e co) + = (dmd_ty `bothDmdType` mkBothDmdArg (coercionDmdEnv co), Cast e' co) + where + (dmd_ty, e') = dmdAnal env dmd e + +dmdAnal' env dmd (Tick t e) + = (dmd_ty, Tick t e') + where + (dmd_ty, e') = dmdAnal env dmd e + +dmdAnal' env dmd (App fun (Type ty)) + = (fun_ty, App fun' (Type ty)) + where + (fun_ty, fun') = dmdAnal env dmd fun + +-- Lots of the other code is there to make this +-- beautiful, compositional, application rule :-) +dmdAnal' env dmd (App fun arg) + = -- This case handles value arguments (type args handled above) + -- Crucially, coercions /are/ handled here, because they are + -- value arguments (#10288) + let + call_dmd = mkCallDmd dmd + (fun_ty, fun') = dmdAnal env call_dmd fun + (arg_dmd, res_ty) = splitDmdTy fun_ty + (arg_ty, arg') = dmdAnalStar env (dmdTransformThunkDmd arg arg_dmd) arg + in +-- pprTrace "dmdAnal:app" (vcat +-- [ text "dmd =" <+> ppr dmd +-- , text "expr =" <+> ppr (App fun arg) +-- , text "fun dmd_ty =" <+> ppr fun_ty +-- , text "arg dmd =" <+> ppr arg_dmd +-- , text "arg dmd_ty =" <+> ppr arg_ty +-- , text "res dmd_ty =" <+> ppr res_ty +-- , text "overall res dmd_ty =" <+> ppr (res_ty `bothDmdType` arg_ty) ]) + (res_ty `bothDmdType` arg_ty, App fun' arg') + +dmdAnal' env dmd (Lam var body) + | isTyVar var + = let + (body_ty, body') = dmdAnal env dmd body + in + (body_ty, Lam var body') + + | otherwise + = let (body_dmd, defer_and_use) = peelCallDmd dmd + -- body_dmd: a demand to analyze the body + + (body_ty, body') = dmdAnal env body_dmd body + (lam_ty, var') = annotateLamIdBndr env notArgOfDfun body_ty var + in + (postProcessUnsat defer_and_use lam_ty, Lam var' body') + +dmdAnal' env dmd (Case scrut case_bndr ty [(DataAlt dc, bndrs, rhs)]) + -- Only one alternative with a product constructor + | let tycon = dataConTyCon dc + , isJust (isDataProductTyCon_maybe tycon) + , Just rec_tc' <- checkRecTc (ae_rec_tc env) tycon + = let + env_alt = env { ae_rec_tc = rec_tc' } + (rhs_ty, rhs') = dmdAnal env_alt dmd rhs + (alt_ty1, dmds) = findBndrsDmds env rhs_ty bndrs + (alt_ty2, case_bndr_dmd) = findBndrDmd env False alt_ty1 case_bndr + id_dmds = addCaseBndrDmd case_bndr_dmd dmds + alt_ty3 | io_hack_reqd scrut dc bndrs = deferAfterIO alt_ty2 + | otherwise = alt_ty2 + + -- Compute demand on the scrutinee + -- See Note [Demand on scrutinee of a product case] + scrut_dmd = mkProdDmd id_dmds + (scrut_ty, scrut') = dmdAnal env scrut_dmd scrut + res_ty = alt_ty3 `bothDmdType` toBothDmdArg scrut_ty + case_bndr' = setIdDemandInfo case_bndr case_bndr_dmd + bndrs' = setBndrsDemandInfo bndrs id_dmds + in +-- pprTrace "dmdAnal:Case1" (vcat [ text "scrut" <+> ppr scrut +-- , text "dmd" <+> ppr dmd +-- , text "case_bndr_dmd" <+> ppr (idDemandInfo case_bndr') +-- , text "id_dmds" <+> ppr id_dmds +-- , text "scrut_dmd" <+> ppr scrut_dmd +-- , text "scrut_ty" <+> ppr scrut_ty +-- , text "alt_ty" <+> ppr alt_ty2 +-- , text "res_ty" <+> ppr res_ty ]) $ + (res_ty, Case scrut' case_bndr' ty [(DataAlt dc, bndrs', rhs')]) + +dmdAnal' env dmd (Case scrut case_bndr ty alts) + = let -- Case expression with multiple alternatives + (alt_tys, alts') = mapAndUnzip (dmdAnalAlt env dmd case_bndr) alts + (scrut_ty, scrut') = dmdAnal env cleanEvalDmd scrut + (alt_ty, case_bndr') = annotateBndr env (foldr lubDmdType botDmdType alt_tys) case_bndr + -- NB: Base case is botDmdType, for empty case alternatives + -- This is a unit for lubDmdType, and the right result + -- when there really are no alternatives + res_ty = alt_ty `bothDmdType` toBothDmdArg scrut_ty + in +-- pprTrace "dmdAnal:Case2" (vcat [ text "scrut" <+> ppr scrut +-- , text "scrut_ty" <+> ppr scrut_ty +-- , text "alt_tys" <+> ppr alt_tys +-- , text "alt_ty" <+> ppr alt_ty +-- , text "res_ty" <+> ppr res_ty ]) $ + (res_ty, Case scrut' case_bndr' ty alts') + +-- Let bindings can be processed in two ways: +-- Down (RHS before body) or Up (body before RHS). +-- The following case handle the up variant. +-- +-- It is very simple. For let x = rhs in body +-- * Demand-analyse 'body' in the current environment +-- * Find the demand, 'rhs_dmd' placed on 'x' by 'body' +-- * Demand-analyse 'rhs' in 'rhs_dmd' +-- +-- This is used for a non-recursive local let without manifest lambdas. +-- This is the LetUp rule in the paper “Higher-Order Cardinality Analysis”. +dmdAnal' env dmd (Let (NonRec id rhs) body) + | useLetUp id + = (final_ty, Let (NonRec id' rhs') body') + where + (body_ty, body') = dmdAnal env dmd body + (body_ty', id_dmd) = findBndrDmd env notArgOfDfun body_ty id + id' = setIdDemandInfo id id_dmd + + (rhs_ty, rhs') = dmdAnalStar env (dmdTransformThunkDmd rhs id_dmd) rhs + final_ty = body_ty' `bothDmdType` rhs_ty + +dmdAnal' env dmd (Let (NonRec id rhs) body) + = (body_ty2, Let (NonRec id2 rhs') body') + where + (lazy_fv, id1, rhs') = dmdAnalRhsLetDown Nothing env dmd id rhs + env1 = extendAnalEnv NotTopLevel env id1 (idStrictness id1) + (body_ty, body') = dmdAnal env1 dmd body + (body_ty1, id2) = annotateBndr env body_ty id1 + body_ty2 = addLazyFVs body_ty1 lazy_fv -- see Note [Lazy and unleashable free variables] + + -- If the actual demand is better than the vanilla call + -- demand, you might think that we might do better to re-analyse + -- the RHS with the stronger demand. + -- But (a) That seldom happens, because it means that *every* path in + -- the body of the let has to use that stronger demand + -- (b) It often happens temporarily in when fixpointing, because + -- the recursive function at first seems to place a massive demand. + -- But we don't want to go to extra work when the function will + -- probably iterate to something less demanding. + -- In practice, all the times the actual demand on id2 is more than + -- the vanilla call demand seem to be due to (b). So we don't + -- bother to re-analyse the RHS. + +dmdAnal' env dmd (Let (Rec pairs) body) + = let + (env', lazy_fv, pairs') = dmdFix NotTopLevel env dmd pairs + (body_ty, body') = dmdAnal env' dmd body + body_ty1 = deleteFVs body_ty (map fst pairs) + body_ty2 = addLazyFVs body_ty1 lazy_fv -- see Note [Lazy and unleashable free variables] + in + body_ty2 `seq` + (body_ty2, Let (Rec pairs') body') + +io_hack_reqd :: CoreExpr -> DataCon -> [Var] -> Bool +-- See Note [IO hack in the demand analyser] +io_hack_reqd scrut con bndrs + | (bndr:_) <- bndrs + , con == tupleDataCon Unboxed 2 + , idType bndr `eqType` realWorldStatePrimTy + , (fun, _) <- collectArgs scrut + = case fun of + Var f -> not (isPrimOpId f) + _ -> True + | otherwise + = False + +dmdAnalAlt :: AnalEnv -> CleanDemand -> Id -> Alt Var -> (DmdType, Alt Var) +dmdAnalAlt env dmd case_bndr (con,bndrs,rhs) + | null bndrs -- Literals, DEFAULT, and nullary constructors + , (rhs_ty, rhs') <- dmdAnal env dmd rhs + = (rhs_ty, (con, [], rhs')) + + | otherwise -- Non-nullary data constructors + , (rhs_ty, rhs') <- dmdAnal env dmd rhs + , (alt_ty, dmds) <- findBndrsDmds env rhs_ty bndrs + , let case_bndr_dmd = findIdDemand alt_ty case_bndr + id_dmds = addCaseBndrDmd case_bndr_dmd dmds + = (alt_ty, (con, setBndrsDemandInfo bndrs id_dmds, rhs')) + + +{- Note [IO hack in the demand analyser] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There's a hack here for I/O operations. Consider + + case foo x s of { (# s', r #) -> y } + +Is this strict in 'y'? Often not! If foo x s performs some observable action +(including raising an exception with raiseIO#, modifying a mutable variable, or +even ending the program normally), then we must not force 'y' (which may fail +to terminate) until we have performed foo x s. + +Hackish solution: spot the IO-like situation and add a virtual branch, +as if we had + case foo x s of + (# s, r #) -> y + other -> return () +So the 'y' isn't necessarily going to be evaluated + +A more complete example (#148, #1592) where this shows up is: + do { let len = <expensive> ; + ; when (...) (exitWith ExitSuccess) + ; print len } + +However, consider + f x s = case getMaskingState# s of + (# s, r #) -> + case x of I# x2 -> ... + +Here it is terribly sad to make 'f' lazy in 's'. After all, +getMaskingState# is not going to diverge or throw an exception! This +situation actually arises in GHC.IO.Handle.Internals.wantReadableHandle +(on an MVar not an Int), and made a material difference. + +So if the scrutinee is a primop call, we *don't* apply the +state hack: + - If it is a simple, terminating one like getMaskingState, + applying the hack is over-conservative. + - If the primop is raise# then it returns bottom, so + the case alternatives are already discarded. + - If the primop can raise a non-IO exception, like + divide by zero or seg-fault (eg writing an array + out of bounds) then we don't mind evaluating 'x' first. + +Note [Demand on the scrutinee of a product case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When figuring out the demand on the scrutinee of a product case, +we use the demands of the case alternative, i.e. id_dmds. +But note that these include the demand on the case binder; +see Note [Demand on case-alternative binders] in Demand.hs. +This is crucial. Example: + f x = case x of y { (a,b) -> k y a } +If we just take scrut_demand = U(L,A), then we won't pass x to the +worker, so the worker will rebuild + x = (a, absent-error) +and that'll crash. + +Note [Aggregated demand for cardinality] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We use different strategies for strictness and usage/cardinality to +"unleash" demands captured on free variables by bindings. Let us +consider the example: + +f1 y = let {-# NOINLINE h #-} + h = y + in (h, h) + +We are interested in obtaining cardinality demand U1 on |y|, as it is +used only in a thunk, and, therefore, is not going to be updated any +more. Therefore, the demand on |y|, captured and unleashed by usage of +|h| is U1. However, if we unleash this demand every time |h| is used, +and then sum up the effects, the ultimate demand on |y| will be U1 + +U1 = U. In order to avoid it, we *first* collect the aggregate demand +on |h| in the body of let-expression, and only then apply the demand +transformer: + +transf[x](U) = {y |-> U1} + +so the resulting demand on |y| is U1. + +The situation is, however, different for strictness, where this +aggregating approach exhibits worse results because of the nature of +|both| operation for strictness. Consider the example: + +f y c = + let h x = y |seq| x + in case of + True -> h True + False -> y + +It is clear that |f| is strict in |y|, however, the suggested analysis +will infer from the body of |let| that |h| is used lazily (as it is +used in one branch only), therefore lazy demand will be put on its +free variable |y|. Conversely, if the demand on |h| is unleashed right +on the spot, we will get the desired result, namely, that |f| is +strict in |y|. + + +************************************************************************ +* * + Demand transformer +* * +************************************************************************ +-} + +dmdTransform :: AnalEnv -- The strictness environment + -> Id -- The function + -> CleanDemand -- The demand on the function + -> DmdType -- The demand type of the function in this context + -- Returned DmdEnv includes the demand on + -- this function plus demand on its free variables + +dmdTransform env var dmd + | isDataConWorkId var -- Data constructor + = dmdTransformDataConSig (idArity var) (idStrictness var) dmd + + | gopt Opt_DmdTxDictSel (ae_dflags env), + Just _ <- isClassOpId_maybe var -- Dictionary component selector + = dmdTransformDictSelSig (idStrictness var) dmd + + | isGlobalId var -- Imported function + , let res = dmdTransformSig (idStrictness var) dmd + = -- pprTrace "dmdTransform" (vcat [ppr var, ppr (idStrictness var), ppr dmd, ppr res]) + res + + | Just (sig, top_lvl) <- lookupSigEnv env var -- Local letrec bound thing + , let fn_ty = dmdTransformSig sig dmd + = -- pprTrace "dmdTransform" (vcat [ppr var, ppr sig, ppr dmd, ppr fn_ty]) $ + if isTopLevel top_lvl + then fn_ty -- Don't record top level things + else addVarDmd fn_ty var (mkOnceUsedDmd dmd) + + | otherwise -- Local non-letrec-bound thing + = unitDmdType (unitVarEnv var (mkOnceUsedDmd dmd)) + +{- +************************************************************************ +* * +\subsection{Bindings} +* * +************************************************************************ +-} + +-- Recursive bindings +dmdFix :: TopLevelFlag + -> AnalEnv -- Does not include bindings for this binding + -> CleanDemand + -> [(Id,CoreExpr)] + -> (AnalEnv, DmdEnv, [(Id,CoreExpr)]) -- Binders annotated with strictness info + +dmdFix top_lvl env let_dmd orig_pairs + = loop 1 initial_pairs + where + bndrs = map fst orig_pairs + + -- See Note [Initialising strictness] + initial_pairs | ae_virgin env = [(setIdStrictness id botSig, rhs) | (id, rhs) <- orig_pairs ] + | otherwise = orig_pairs + + -- If fixed-point iteration does not yield a result we use this instead + -- See Note [Safe abortion in the fixed-point iteration] + abort :: (AnalEnv, DmdEnv, [(Id,CoreExpr)]) + abort = (env, lazy_fv', zapped_pairs) + where (lazy_fv, pairs') = step True (zapIdStrictness orig_pairs) + -- Note [Lazy and unleashable free variables] + non_lazy_fvs = plusVarEnvList $ map (strictSigDmdEnv . idStrictness . fst) pairs' + lazy_fv' = lazy_fv `plusVarEnv` mapVarEnv (const topDmd) non_lazy_fvs + zapped_pairs = zapIdStrictness pairs' + + -- The fixed-point varies the idStrictness field of the binders, and terminates if that + -- annotation does not change any more. + loop :: Int -> [(Id,CoreExpr)] -> (AnalEnv, DmdEnv, [(Id,CoreExpr)]) + loop n pairs + | found_fixpoint = (final_anal_env, lazy_fv, pairs') + | n == 10 = abort + | otherwise = loop (n+1) pairs' + where + found_fixpoint = map (idStrictness . fst) pairs' == map (idStrictness . fst) pairs + first_round = n == 1 + (lazy_fv, pairs') = step first_round pairs + final_anal_env = extendAnalEnvs top_lvl env (map fst pairs') + + step :: Bool -> [(Id, CoreExpr)] -> (DmdEnv, [(Id, CoreExpr)]) + step first_round pairs = (lazy_fv, pairs') + where + -- In all but the first iteration, delete the virgin flag + start_env | first_round = env + | otherwise = nonVirgin env + + start = (extendAnalEnvs top_lvl start_env (map fst pairs), emptyDmdEnv) + + ((_,lazy_fv), pairs') = mapAccumL my_downRhs start pairs + -- mapAccumL: Use the new signature to do the next pair + -- The occurrence analyser has arranged them in a good order + -- so this can significantly reduce the number of iterations needed + + my_downRhs (env, lazy_fv) (id,rhs) + = ((env', lazy_fv'), (id', rhs')) + where + (lazy_fv1, id', rhs') = dmdAnalRhsLetDown (Just bndrs) env let_dmd id rhs + lazy_fv' = plusVarEnv_C bothDmd lazy_fv lazy_fv1 + env' = extendAnalEnv top_lvl env id (idStrictness id') + + + zapIdStrictness :: [(Id, CoreExpr)] -> [(Id, CoreExpr)] + zapIdStrictness pairs = [(setIdStrictness id nopSig, rhs) | (id, rhs) <- pairs ] + +{- +Note [Safe abortion in the fixed-point iteration] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Fixed-point iteration may fail to terminate. But we cannot simply give up and +return the environment and code unchanged! We still need to do one additional +round, for two reasons: + + * To get information on used free variables (both lazy and strict!) + (see Note [Lazy and unleashable free variables]) + * To ensure that all expressions have been traversed at least once, and any left-over + strictness annotations have been updated. + +This final iteration does not add the variables to the strictness signature +environment, which effectively assigns them 'nopSig' (see "getStrictness") + +-} + +-- Let bindings can be processed in two ways: +-- Down (RHS before body) or Up (body before RHS). +-- dmdAnalRhsLetDown implements the Down variant: +-- * assuming a demand of <L,U> +-- * looking at the definition +-- * determining a strictness signature +-- +-- It is used for toplevel definition, recursive definitions and local +-- non-recursive definitions that have manifest lambdas. +-- Local non-recursive definitions without a lambda are handled with LetUp. +-- +-- This is the LetDown rule in the paper “Higher-Order Cardinality Analysis”. +dmdAnalRhsLetDown + :: Maybe [Id] -- Just bs <=> recursive, Nothing <=> non-recursive + -> AnalEnv -> CleanDemand + -> Id -> CoreExpr + -> (DmdEnv, Id, CoreExpr) +-- Process the RHS of the binding, add the strictness signature +-- to the Id, and augment the environment with the signature as well. +dmdAnalRhsLetDown rec_flag env let_dmd id rhs + = (lazy_fv, id', rhs') + where + rhs_arity = idArity id + rhs_dmd + -- See Note [Demand analysis for join points] + -- See Note [Invariants on join points] invariant 2b, in GHC.Core + -- rhs_arity matches the join arity of the join point + | isJoinId id + = mkCallDmds rhs_arity let_dmd + | otherwise + -- NB: rhs_arity + -- See Note [Demand signatures are computed for a threshold demand based on idArity] + = mkRhsDmd env rhs_arity rhs + (DmdType rhs_fv rhs_dmds rhs_div, rhs') + = dmdAnal env rhs_dmd rhs + -- TODO: Won't the following line unnecessarily trim down arity for join + -- points returning a lambda in a C(S) context? + sig = mkStrictSigForArity rhs_arity (mkDmdType sig_fv rhs_dmds rhs_div) + id' = set_idStrictness env id sig + -- See Note [NOINLINE and strictness] + + + -- See Note [Aggregated demand for cardinality] + rhs_fv1 = case rec_flag of + Just bs -> reuseEnv (delVarEnvList rhs_fv bs) + Nothing -> rhs_fv + + -- See Note [Lazy and unleashable free variables] + (lazy_fv, sig_fv) = splitFVs is_thunk rhs_fv1 + is_thunk = not (exprIsHNF rhs) && not (isJoinId id) + +-- | @mkRhsDmd env rhs_arity rhs@ creates a 'CleanDemand' for +-- unleashing on the given function's @rhs@, by creating a call demand of +-- @rhs_arity@ with a body demand appropriate for possible product types. +-- See Note [Product demands for function body]. +-- For example, a call of the form @mkRhsDmd _ 2 (\x y -> (x, y))@ returns a +-- clean usage demand of @C1(C1(U(U,U)))@. +mkRhsDmd :: AnalEnv -> Arity -> CoreExpr -> CleanDemand +mkRhsDmd env rhs_arity rhs = + case peelTsFuns rhs_arity (findTypeShape (ae_fam_envs env) (exprType rhs)) of + Just (TsProd tss) -> mkCallDmds rhs_arity (cleanEvalProdDmd (length tss)) + _ -> mkCallDmds rhs_arity cleanEvalDmd + +-- | If given the let-bound 'Id', 'useLetUp' determines whether we should +-- process the binding up (body before rhs) or down (rhs before body). +-- +-- We use LetDown if there is a chance to get a useful strictness signature to +-- unleash at call sites. LetDown is generally more precise than LetUp if we can +-- correctly guess how it will be used in the body, that is, for which incoming +-- demand the strictness signature should be computed, which allows us to +-- unleash higher-order demands on arguments at call sites. This is mostly the +-- case when +-- +-- * The binding takes any arguments before performing meaningful work (cf. +-- 'idArity'), in which case we are interested to see how it uses them. +-- * The binding is a join point, hence acting like a function, not a value. +-- As a big plus, we know *precisely* how it will be used in the body; since +-- it's always tail-called, we can directly unleash the incoming demand of +-- the let binding on its RHS when computing a strictness signature. See +-- [Demand analysis for join points]. +-- +-- Thus, if the binding is not a join point and its arity is 0, we have a thunk +-- and use LetUp, implying that we have no usable demand signature available +-- when we analyse the let body. +-- +-- Since thunk evaluation is memoised, we want to unleash its 'DmdEnv' of free +-- vars at most once, regardless of how many times it was forced in the body. +-- This makes a real difference wrt. usage demands. The other reason is being +-- able to unleash a more precise product demand on its RHS once we know how the +-- thunk was used in the let body. +-- +-- Characteristic examples, always assuming a single evaluation: +-- +-- * @let x = 2*y in x + x@ => LetUp. Compared to LetDown, we find out that +-- the expression uses @y@ at most once. +-- * @let x = (a,b) in fst x@ => LetUp. Compared to LetDown, we find out that +-- @b@ is absent. +-- * @let f x = x*2 in f y@ => LetDown. Compared to LetUp, we find out that +-- the expression uses @y@ strictly, because we have @f@'s demand signature +-- available at the call site. +-- * @join exit = 2*y in if a then exit else if b then exit else 3*y@ => +-- LetDown. Compared to LetUp, we find out that the expression uses @y@ +-- strictly, because we can unleash @exit@'s signature at each call site. +-- * For a more convincing example with join points, see Note [Demand analysis +-- for join points]. +-- +useLetUp :: Var -> Bool +useLetUp f = idArity f == 0 && not (isJoinId f) + +{- Note [Demand analysis for join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + g :: (Int,Int) -> Int + g (p,q) = p+q + + f :: T -> Int -> Int + f x p = g (join j y = (p,y) + in case x of + A -> j 3 + B -> j 4 + C -> (p,7)) + +If j was a vanilla function definition, we'd analyse its body with +evalDmd, and think that it was lazy in p. But for join points we can +do better! We know that j's body will (if called at all) be evaluated +with the demand that consumes the entire join-binding, in this case +the argument demand from g. Whizzo! g evaluates both components of +its argument pair, so p will certainly be evaluated if j is called. + +For f to be strict in p, we need /all/ paths to evaluate p; in this +case the C branch does so too, so we are fine. So, as usual, we need +to transport demands on free variables to the call site(s). Compare +Note [Lazy and unleashable free variables]. + +The implementation is easy. When analysing a join point, we can +analyse its body with the demand from the entire join-binding (written +let_dmd here). + +Another win for join points! #13543. + +However, note that the strictness signature for a join point can +look a little puzzling. E.g. + + (join j x = \y. error "urk") + (in case v of ) + ( A -> j 3 ) x + ( B -> j 4 ) + ( C -> \y. blah ) + +The entire thing is in a C(S) context, so j's strictness signature +will be [A]b +meaning one absent argument, returns bottom. That seems odd because +there's a \y inside. But it's right because when consumed in a C(1) +context the RHS of the join point is indeed bottom. + +Note [Demand signatures are computed for a threshold demand based on idArity] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We compute demand signatures assuming idArity incoming arguments to approximate +behavior for when we have a call site with at least that many arguments. idArity +is /at least/ the number of manifest lambdas, but might be higher for PAPs and +trivial RHS (see Note [Demand analysis for trivial right-hand sides]). + +Because idArity of a function varies independently of its cardinality properties +(cf. Note [idArity varies independently of dmdTypeDepth]), we implicitly encode +the arity for when a demand signature is sound to unleash in its 'dmdTypeDepth' +(cf. Note [Understanding DmdType and StrictSig] in Demand). It is unsound to +unleash a demand signature when the incoming number of arguments is less than +that. See Note [What are demand signatures?] for more details on soundness. + +Why idArity arguments? Because that's a conservative estimate of how many +arguments we must feed a function before it does anything interesting with them. +Also it elegantly subsumes the trivial RHS and PAP case. + +There might be functions for which we might want to analyse for more incoming +arguments than idArity. Example: + + f x = + if expensive + then \y -> ... y ... + else \y -> ... y ... + +We'd analyse `f` under a unary call demand C(S), corresponding to idArity +being 1. That's enough to look under the manifest lambda and find out how a +unary call would use `x`, but not enough to look into the lambdas in the if +branches. + +On the other hand, if we analysed for call demand C(C(S)), we'd get useful +strictness info for `y` (and more precise info on `x`) and possibly CPR +information, but + + * We would no longer be able to unleash the signature at unary call sites + * Performing the worker/wrapper split based on this information would be + implicitly eta-expanding `f`, playing fast and loose with divergence and + even being unsound in the presence of newtypes, so we refrain from doing so. + Also see Note [Don't eta expand in w/w] in GHC.Core.Op.WorkWrap. + +Since we only compute one signature, we do so for arity 1. Computing multiple +signatures for different arities (i.e., polyvariance) would be entirely +possible, if it weren't for the additional runtime and implementation +complexity. + +Note [idArity varies independently of dmdTypeDepth] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We used to check in GHC.Core.Lint that dmdTypeDepth <= idArity for a let-bound +identifier. But that means we would have to zap demand signatures every time we +reset or decrease arity. That's an unnecessary dependency, because + + * The demand signature captures a semantic property that is independent of + what the binding's current arity is + * idArity is analysis information itself, thus volatile + * We already *have* dmdTypeDepth, wo why not just use it to encode the + threshold for when to unleash the signature + (cf. Note [Understanding DmdType and StrictSig] in Demand) + +Consider the following expression, for example: + + (let go x y = `x` seq ... in go) |> co + +`go` might have a strictness signature of `<S><L>`. The simplifier will identify +`go` as a nullary join point through `joinPointBinding_maybe` and float the +coercion into the binding, leading to an arity decrease: + + join go = (\x y -> `x` seq ...) |> co in go + +With the CoreLint check, we would have to zap `go`'s perfectly viable strictness +signature. + +Note [What are demand signatures?] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Demand analysis interprets expressions in the abstract domain of demand +transformers. Given an incoming demand we put an expression under, its abstract +transformer gives us back a demand type denoting how other things (like +arguments and free vars) were used when the expression was evaluated. +Here's an example: + + f x y = + if x + expensive + then \z -> z + y * ... + else \z -> z * ... + +The abstract transformer (let's call it F_e) of the if expression (let's call it +e) would transform an incoming head demand <S,HU> into a demand type like +{x-><S,1*U>,y-><L,U>}<L,U>. In pictures: + + Demand ---F_e---> DmdType + <S,HU> {x-><S,1*U>,y-><L,U>}<L,U> + +Let's assume that the demand transformers we compute for an expression are +correct wrt. to some concrete semantics for Core. How do demand signatures fit +in? They are strange beasts, given that they come with strict rules when to +it's sound to unleash them. + +Fortunately, we can formalise the rules with Galois connections. Consider +f's strictness signature, {}<S,1*U><L,U>. It's a single-point approximation of +the actual abstract transformer of f's RHS for arity 2. So, what happens is that +we abstract *once more* from the abstract domain we already are in, replacing +the incoming Demand by a simple lattice with two elements denoting incoming +arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom +element). Here's the diagram: + + A_2 -----f_f----> DmdType + ^ | + | α γ | + | v + Demand ---F_f---> DmdType + +With + α(C1(C1(_))) = >=2 -- example for usage demands, but similar for strictness + α(_) = <2 + γ(ty) = ty +and F_f being the abstract transformer of f's RHS and f_f being the abstracted +abstract transformer computable from our demand signature simply by + + f_f(>=2) = {}<S,1*U><L,U> + f_f(<2) = postProcessUnsat {}<S,1*U><L,U> + +where postProcessUnsat makes a proper top element out of the given demand type. + +Note [Demand analysis for trivial right-hand sides] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + foo = plusInt |> co +where plusInt is an arity-2 function with known strictness. Clearly +we want plusInt's strictness to propagate to foo! But because it has +no manifest lambdas, it won't do so automatically, and indeed 'co' might +have type (Int->Int->Int) ~ T. + +Fortunately, GHC.Core.Arity gives 'foo' arity 2, which is enough for LetDown to +forward plusInt's demand signature, and all is well (see Note [Newtype arity] in +GHC.Core.Arity)! A small example is the test case NewtypeArity. + + +Note [Product demands for function body] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example comes from shootout/binary_trees: + + Main.check' = \ b z ds. case z of z' { I# ip -> + case ds_d13s of + Main.Nil -> z' + Main.Node s14k s14l s14m -> + Main.check' (not b) + (Main.check' b + (case b { + False -> I# (-# s14h s14k); + True -> I# (+# s14h s14k) + }) + s14l) + s14m } } } + +Here we *really* want to unbox z, even though it appears to be used boxed in +the Nil case. Partly the Nil case is not a hot path. But more specifically, +the whole function gets the CPR property if we do. + +So for the demand on the body of a RHS we use a product demand if it's +a product type. + +************************************************************************ +* * +\subsection{Strictness signatures and types} +* * +************************************************************************ +-} + +unitDmdType :: DmdEnv -> DmdType +unitDmdType dmd_env = DmdType dmd_env [] topDiv + +coercionDmdEnv :: Coercion -> DmdEnv +coercionDmdEnv co = mapVarEnv (const topDmd) (getUniqSet $ coVarsOfCo co) + -- The VarSet from coVarsOfCo is really a VarEnv Var + +addVarDmd :: DmdType -> Var -> Demand -> DmdType +addVarDmd (DmdType fv ds res) var dmd + = DmdType (extendVarEnv_C bothDmd fv var dmd) ds res + +addLazyFVs :: DmdType -> DmdEnv -> DmdType +addLazyFVs dmd_ty lazy_fvs + = dmd_ty `bothDmdType` mkBothDmdArg lazy_fvs + -- Using bothDmdType (rather than just both'ing the envs) + -- is vital. Consider + -- let f = \x -> (x,y) + -- in error (f 3) + -- Here, y is treated as a lazy-fv of f, but we must `bothDmd` that L + -- demand with the bottom coming up from 'error' + -- + -- I got a loop in the fixpointer without this, due to an interaction + -- with the lazy_fv filtering in dmdAnalRhsLetDown. Roughly, it was + -- letrec f n x + -- = letrec g y = x `fatbar` + -- letrec h z = z + ...g... + -- in h (f (n-1) x) + -- in ... + -- In the initial iteration for f, f=Bot + -- Suppose h is found to be strict in z, but the occurrence of g in its RHS + -- is lazy. Now consider the fixpoint iteration for g, esp the demands it + -- places on its free variables. Suppose it places none. Then the + -- x `fatbar` ...call to h... + -- will give a x->V demand for x. That turns into a L demand for x, + -- which floats out of the defn for h. Without the modifyEnv, that + -- L demand doesn't get both'd with the Bot coming up from the inner + -- call to f. So we just get an L demand for x for g. + +{- +Note [Do not strictify the argument dictionaries of a dfun] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The typechecker can tie recursive knots involving dfuns, so we do the +conservative thing and refrain from strictifying a dfun's argument +dictionaries. +-} + +setBndrsDemandInfo :: [Var] -> [Demand] -> [Var] +setBndrsDemandInfo (b:bs) (d:ds) + | isTyVar b = b : setBndrsDemandInfo bs (d:ds) + | otherwise = setIdDemandInfo b d : setBndrsDemandInfo bs ds +setBndrsDemandInfo [] ds = ASSERT( null ds ) [] +setBndrsDemandInfo bs _ = pprPanic "setBndrsDemandInfo" (ppr bs) + +annotateBndr :: AnalEnv -> DmdType -> Var -> (DmdType, Var) +-- The returned env has the var deleted +-- The returned var is annotated with demand info +-- according to the result demand of the provided demand type +-- No effect on the argument demands +annotateBndr env dmd_ty var + | isId var = (dmd_ty', setIdDemandInfo var dmd) + | otherwise = (dmd_ty, var) + where + (dmd_ty', dmd) = findBndrDmd env False dmd_ty var + +annotateLamIdBndr :: AnalEnv + -> DFunFlag -- is this lambda at the top of the RHS of a dfun? + -> DmdType -- Demand type of body + -> Id -- Lambda binder + -> (DmdType, -- Demand type of lambda + Id) -- and binder annotated with demand + +annotateLamIdBndr env arg_of_dfun dmd_ty id +-- For lambdas we add the demand to the argument demands +-- Only called for Ids + = ASSERT( isId id ) + -- pprTrace "annLamBndr" (vcat [ppr id, ppr _dmd_ty]) $ + (final_ty, setIdDemandInfo id dmd) + where + -- Watch out! See note [Lambda-bound unfoldings] + final_ty = case maybeUnfoldingTemplate (idUnfolding id) of + Nothing -> main_ty + Just unf -> main_ty `bothDmdType` unf_ty + where + (unf_ty, _) = dmdAnalStar env dmd unf + + main_ty = addDemand dmd dmd_ty' + (dmd_ty', dmd) = findBndrDmd env arg_of_dfun dmd_ty id + +deleteFVs :: DmdType -> [Var] -> DmdType +deleteFVs (DmdType fvs dmds res) bndrs + = DmdType (delVarEnvList fvs bndrs) dmds res + +{- +Note [NOINLINE and strictness] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The strictness analyser used to have a HACK which ensured that NOINLNE +things were not strictness-analysed. The reason was unsafePerformIO. +Left to itself, the strictness analyser would discover this strictness +for unsafePerformIO: + unsafePerformIO: C(U(AV)) +But then consider this sub-expression + unsafePerformIO (\s -> let r = f x in + case writeIORef v r s of (# s1, _ #) -> + (# s1, r #) +The strictness analyser will now find that r is sure to be eval'd, +and may then hoist it out. This makes tests/lib/should_run/memo002 +deadlock. + +Solving this by making all NOINLINE things have no strictness info is overkill. +In particular, it's overkill for runST, which is perfectly respectable. +Consider + f x = runST (return x) +This should be strict in x. + +So the new plan is to define unsafePerformIO using the 'lazy' combinator: + + unsafePerformIO (IO m) = lazy (case m realWorld# of (# _, r #) -> r) + +Remember, 'lazy' is a wired-in identity-function Id, of type a->a, which is +magically NON-STRICT, and is inlined after strictness analysis. So +unsafePerformIO will look non-strict, and that's what we want. + +Now we don't need the hack in the strictness analyser. HOWEVER, this +decision does mean that even a NOINLINE function is not entirely +opaque: some aspect of its implementation leaks out, notably its +strictness. For example, if you have a function implemented by an +error stub, but which has RULES, you may want it not to be eliminated +in favour of error! + +Note [Lazy and unleashable free variables] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We put the strict and once-used FVs in the DmdType of the Id, so +that at its call sites we unleash demands on its strict fvs. +An example is 'roll' in imaginary/wheel-sieve2 +Something like this: + roll x = letrec + go y = if ... then roll (x-1) else x+1 + in + go ms +We want to see that roll is strict in x, which is because +go is called. So we put the DmdEnv for x in go's DmdType. + +Another example: + + f :: Int -> Int -> Int + f x y = let t = x+1 + h z = if z==0 then t else + if z==1 then x+1 else + x + h (z-1) + in h y + +Calling h does indeed evaluate x, but we can only see +that if we unleash a demand on x at the call site for t. + +Incidentally, here's a place where lambda-lifting h would +lose the cigar --- we couldn't see the joint strictness in t/x + + ON THE OTHER HAND + +We don't want to put *all* the fv's from the RHS into the +DmdType. Because + + * it makes the strictness signatures larger, and hence slows down fixpointing + +and + + * it is useless information at the call site anyways: + For lazy, used-many times fv's we will never get any better result than + that, no matter how good the actual demand on the function at the call site + is (unless it is always absent, but then the whole binder is useless). + +Therefore we exclude lazy multiple-used fv's from the environment in the +DmdType. + +But now the signature lies! (Missing variables are assumed to be absent.) To +make up for this, the code that analyses the binding keeps the demand on those +variable separate (usually called "lazy_fv") and adds it to the demand of the +whole binding later. + +What if we decide _not_ to store a strictness signature for a binding at all, as +we do when aborting a fixed-point iteration? The we risk losing the information +that the strict variables are being used. In that case, we take all free variables +mentioned in the (unsound) strictness signature, conservatively approximate the +demand put on them (topDmd), and add that to the "lazy_fv" returned by "dmdFix". + + +Note [Lambda-bound unfoldings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We allow a lambda-bound variable to carry an unfolding, a facility that is used +exclusively for join points; see Note [Case binders and join points]. If so, +we must be careful to demand-analyse the RHS of the unfolding! Example + \x. \y{=Just x}. <body> +Then if <body> uses 'y', then transitively it uses 'x', and we must not +forget that fact, otherwise we might make 'x' absent when it isn't. + + +************************************************************************ +* * +\subsection{Strictness signatures} +* * +************************************************************************ +-} + +type DFunFlag = Bool -- indicates if the lambda being considered is in the + -- sequence of lambdas at the top of the RHS of a dfun +notArgOfDfun :: DFunFlag +notArgOfDfun = False + +data AnalEnv + = AE { ae_dflags :: DynFlags + , ae_sigs :: SigEnv + , ae_virgin :: Bool -- True on first iteration only + -- See Note [Initialising strictness] + , ae_rec_tc :: RecTcChecker + , ae_fam_envs :: FamInstEnvs + } + + -- We use the se_env to tell us whether to + -- record info about a variable in the DmdEnv + -- We do so if it's a LocalId, but not top-level + -- + -- The DmdEnv gives the demand on the free vars of the function + -- when it is given enough args to satisfy the strictness signature + +type SigEnv = VarEnv (StrictSig, TopLevelFlag) + +instance Outputable AnalEnv where + ppr (AE { ae_sigs = env, ae_virgin = virgin }) + = text "AE" <+> braces (vcat + [ text "ae_virgin =" <+> ppr virgin + , text "ae_sigs =" <+> ppr env ]) + +emptyAnalEnv :: DynFlags -> FamInstEnvs -> AnalEnv +emptyAnalEnv dflags fam_envs + = AE { ae_dflags = dflags + , ae_sigs = emptySigEnv + , ae_virgin = True + , ae_rec_tc = initRecTc + , ae_fam_envs = fam_envs + } + +emptySigEnv :: SigEnv +emptySigEnv = emptyVarEnv + +-- | Extend an environment with the strictness IDs attached to the id +extendAnalEnvs :: TopLevelFlag -> AnalEnv -> [Id] -> AnalEnv +extendAnalEnvs top_lvl env vars + = env { ae_sigs = extendSigEnvs top_lvl (ae_sigs env) vars } + +extendSigEnvs :: TopLevelFlag -> SigEnv -> [Id] -> SigEnv +extendSigEnvs top_lvl sigs vars + = extendVarEnvList sigs [ (var, (idStrictness var, top_lvl)) | var <- vars] + +extendAnalEnv :: TopLevelFlag -> AnalEnv -> Id -> StrictSig -> AnalEnv +extendAnalEnv top_lvl env var sig + = env { ae_sigs = extendSigEnv top_lvl (ae_sigs env) var sig } + +extendSigEnv :: TopLevelFlag -> SigEnv -> Id -> StrictSig -> SigEnv +extendSigEnv top_lvl sigs var sig = extendVarEnv sigs var (sig, top_lvl) + +lookupSigEnv :: AnalEnv -> Id -> Maybe (StrictSig, TopLevelFlag) +lookupSigEnv env id = lookupVarEnv (ae_sigs env) id + +nonVirgin :: AnalEnv -> AnalEnv +nonVirgin env = env { ae_virgin = False } + +findBndrsDmds :: AnalEnv -> DmdType -> [Var] -> (DmdType, [Demand]) +-- Return the demands on the Ids in the [Var] +findBndrsDmds env dmd_ty bndrs + = go dmd_ty bndrs + where + go dmd_ty [] = (dmd_ty, []) + go dmd_ty (b:bs) + | isId b = let (dmd_ty1, dmds) = go dmd_ty bs + (dmd_ty2, dmd) = findBndrDmd env False dmd_ty1 b + in (dmd_ty2, dmd : dmds) + | otherwise = go dmd_ty bs + +findBndrDmd :: AnalEnv -> Bool -> DmdType -> Id -> (DmdType, Demand) +-- See Note [Trimming a demand to a type] in Demand.hs +findBndrDmd env arg_of_dfun dmd_ty id + = (dmd_ty', dmd') + where + dmd' = killUsageDemand (ae_dflags env) $ + strictify $ + trimToType starting_dmd (findTypeShape fam_envs id_ty) + + (dmd_ty', starting_dmd) = peelFV dmd_ty id + + id_ty = idType id + + strictify dmd + | gopt Opt_DictsStrict (ae_dflags env) + -- We never want to strictify a recursive let. At the moment + -- annotateBndr is only call for non-recursive lets; if that + -- changes, we need a RecFlag parameter and another guard here. + , not arg_of_dfun -- See Note [Do not strictify the argument dictionaries of a dfun] + = strictifyDictDmd id_ty dmd + | otherwise + = dmd + + fam_envs = ae_fam_envs env + +set_idStrictness :: AnalEnv -> Id -> StrictSig -> Id +set_idStrictness env id sig + = setIdStrictness id (killUsageSig (ae_dflags env) sig) + +{- Note [Initialising strictness] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +See section 9.2 (Finding fixpoints) of the paper. + +Our basic plan is to initialise the strictness of each Id in a +recursive group to "bottom", and find a fixpoint from there. However, +this group B might be inside an *enclosing* recursive group A, in +which case we'll do the entire fixpoint shebang on for each iteration +of A. This can be illustrated by the following example: + +Example: + + f [] = [] + f (x:xs) = let g [] = f xs + g (y:ys) = y+1 : g ys + in g (h x) + +At each iteration of the fixpoint for f, the analyser has to find a +fixpoint for the enclosed function g. In the meantime, the demand +values for g at each iteration for f are *greater* than those we +encountered in the previous iteration for f. Therefore, we can begin +the fixpoint for g not with the bottom value but rather with the +result of the previous analysis. I.e., when beginning the fixpoint +process for g, we can start from the demand signature computed for g +previously and attached to the binding occurrence of g. + +To speed things up, we initialise each iteration of A (the enclosing +one) from the result of the last one, which is neatly recorded in each +binder. That way we make use of earlier iterations of the fixpoint +algorithm. (Cunning plan.) + +But on the *first* iteration we want to *ignore* the current strictness +of the Id, and start from "bottom". Nowadays the Id can have a current +strictness, because interface files record strictness for nested bindings. +To know when we are in the first iteration, we look at the ae_virgin +field of the AnalEnv. + + +Note [Final Demand Analyser run] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Some of the information that the demand analyser determines is not always +preserved by the simplifier. For example, the simplifier will happily rewrite + \y [Demand=1*U] let x = y in x + x +to + \y [Demand=1*U] y + y +which is quite a lie. + +The once-used information is (currently) only used by the code +generator, though. So: + + * We zap the used-once info in the worker-wrapper; + see Note [Zapping Used Once info in WorkWrap] in + GHC.Core.Op.WorkWrap. + If it's not reliable, it's better not to have it at all. + + * Just before TidyCore, we add a pass of the demand analyser, + but WITHOUT subsequent worker/wrapper and simplifier, + right before TidyCore. See SimplCore.getCoreToDo. + + This way, correct information finds its way into the module interface + (strictness signatures!) and the code generator (single-entry thunks!) + +Note that, in contrast, the single-call information (C1(..)) /can/ be +relied upon, as the simplifier tends to be very careful about not +duplicating actual function calls. + +Also see #11731. +-} diff --git a/compiler/GHC/Core/Op/Exitify.hs b/compiler/GHC/Core/Op/Exitify.hs new file mode 100644 index 0000000000..45f9451787 --- /dev/null +++ b/compiler/GHC/Core/Op/Exitify.hs @@ -0,0 +1,499 @@ +module GHC.Core.Op.Exitify ( exitifyProgram ) where + +{- +Note [Exitification] +~~~~~~~~~~~~~~~~~~~~ + +This module implements Exitification. The goal is to pull as much code out of +recursive functions as possible, as the simplifier is better at inlining into +call-sites that are not in recursive functions. + +Example: + + let t = foo bar + joinrec go 0 x y = t (x*x) + go (n-1) x y = jump go (n-1) (x+y) + in … + +We’d like to inline `t`, but that does not happen: Because t is a thunk and is +used in a recursive function, doing so might lose sharing in general. In +this case, however, `t` is on the _exit path_ of `go`, so called at most once. +How do we make this clearly visible to the simplifier? + +A code path (i.e., an expression in a tail-recursive position) in a recursive +function is an exit path if it does not contain a recursive call. We can bind +this expression outside the recursive function, as a join-point. + +Example result: + + let t = foo bar + join exit x = t (x*x) + joinrec go 0 x y = jump exit x + go (n-1) x y = jump go (n-1) (x+y) + in … + +Now `t` is no longer in a recursive function, and good things happen! +-} + +import GhcPrelude +import Var +import Id +import IdInfo +import GHC.Core +import GHC.Core.Utils +import State +import Unique +import VarSet +import VarEnv +import GHC.Core.FVs +import FastString +import GHC.Core.Type +import Util( mapSnd ) + +import Data.Bifunctor +import Control.Monad + +-- | Traverses the AST, simply to find all joinrecs and call 'exitify' on them. +-- The really interesting function is exitifyRec +exitifyProgram :: CoreProgram -> CoreProgram +exitifyProgram binds = map goTopLvl binds + where + goTopLvl (NonRec v e) = NonRec v (go in_scope_toplvl e) + goTopLvl (Rec pairs) = Rec (map (second (go in_scope_toplvl)) pairs) + -- Top-level bindings are never join points + + in_scope_toplvl = emptyInScopeSet `extendInScopeSetList` bindersOfBinds binds + + go :: InScopeSet -> CoreExpr -> CoreExpr + go _ e@(Var{}) = e + go _ e@(Lit {}) = e + go _ e@(Type {}) = e + go _ e@(Coercion {}) = e + go in_scope (Cast e' c) = Cast (go in_scope e') c + go in_scope (Tick t e') = Tick t (go in_scope e') + go in_scope (App e1 e2) = App (go in_scope e1) (go in_scope e2) + + go in_scope (Lam v e') + = Lam v (go in_scope' e') + where in_scope' = in_scope `extendInScopeSet` v + + go in_scope (Case scrut bndr ty alts) + = Case (go in_scope scrut) bndr ty (map go_alt alts) + where + in_scope1 = in_scope `extendInScopeSet` bndr + go_alt (dc, pats, rhs) = (dc, pats, go in_scope' rhs) + where in_scope' = in_scope1 `extendInScopeSetList` pats + + go in_scope (Let (NonRec bndr rhs) body) + = Let (NonRec bndr (go in_scope rhs)) (go in_scope' body) + where + in_scope' = in_scope `extendInScopeSet` bndr + + go in_scope (Let (Rec pairs) body) + | is_join_rec = mkLets (exitifyRec in_scope' pairs') body' + | otherwise = Let (Rec pairs') body' + where + is_join_rec = any (isJoinId . fst) pairs + in_scope' = in_scope `extendInScopeSetList` bindersOf (Rec pairs) + pairs' = mapSnd (go in_scope') pairs + body' = go in_scope' body + + +-- | State Monad used inside `exitify` +type ExitifyM = State [(JoinId, CoreExpr)] + +-- | Given a recursive group of a joinrec, identifies “exit paths” and binds them as +-- join-points outside the joinrec. +exitifyRec :: InScopeSet -> [(Var,CoreExpr)] -> [CoreBind] +exitifyRec in_scope pairs + = [ NonRec xid rhs | (xid,rhs) <- exits ] ++ [Rec pairs'] + where + -- We need the set of free variables of many subexpressions here, so + -- annotate the AST with them + -- see Note [Calculating free variables] + ann_pairs = map (second freeVars) pairs + + -- Which are the recursive calls? + recursive_calls = mkVarSet $ map fst pairs + + (pairs',exits) = (`runState` []) $ do + forM ann_pairs $ \(x,rhs) -> do + -- go past the lambdas of the join point + let (args, body) = collectNAnnBndrs (idJoinArity x) rhs + body' <- go args body + let rhs' = mkLams args body' + return (x, rhs') + + --------------------- + -- 'go' is the main working function. + -- It goes through the RHS (tail-call positions only), + -- checks if there are no more recursive calls, if so, abstracts over + -- variables bound on the way and lifts it out as a join point. + -- + -- ExitifyM is a state monad to keep track of floated binds + go :: [Var] -- ^ Variables that are in-scope here, but + -- not in scope at the joinrec; that is, + -- we must potentially abstract over them. + -- Invariant: they are kept in dependency order + -> CoreExprWithFVs -- ^ Current expression in tail position + -> ExitifyM CoreExpr + + -- We first look at the expression (no matter what it shape is) + -- and determine if we can turn it into a exit join point + go captured ann_e + | -- An exit expression has no recursive calls + let fvs = dVarSetToVarSet (freeVarsOf ann_e) + , disjointVarSet fvs recursive_calls + = go_exit captured (deAnnotate ann_e) fvs + + -- We could not turn it into a exit join point. So now recurse + -- into all expression where eligible exit join points might sit, + -- i.e. into all tail-call positions: + + -- Case right hand sides are in tail-call position + go captured (_, AnnCase scrut bndr ty alts) = do + alts' <- forM alts $ \(dc, pats, rhs) -> do + rhs' <- go (captured ++ [bndr] ++ pats) rhs + return (dc, pats, rhs') + return $ Case (deAnnotate scrut) bndr ty alts' + + go captured (_, AnnLet ann_bind body) + -- join point, RHS and body are in tail-call position + | AnnNonRec j rhs <- ann_bind + , Just join_arity <- isJoinId_maybe j + = do let (params, join_body) = collectNAnnBndrs join_arity rhs + join_body' <- go (captured ++ params) join_body + let rhs' = mkLams params join_body' + body' <- go (captured ++ [j]) body + return $ Let (NonRec j rhs') body' + + -- rec join point, RHSs and body are in tail-call position + | AnnRec pairs <- ann_bind + , isJoinId (fst (head pairs)) + = do let js = map fst pairs + pairs' <- forM pairs $ \(j,rhs) -> do + let join_arity = idJoinArity j + (params, join_body) = collectNAnnBndrs join_arity rhs + join_body' <- go (captured ++ js ++ params) join_body + let rhs' = mkLams params join_body' + return (j, rhs') + body' <- go (captured ++ js) body + return $ Let (Rec pairs') body' + + -- normal Let, only the body is in tail-call position + | otherwise + = do body' <- go (captured ++ bindersOf bind ) body + return $ Let bind body' + where bind = deAnnBind ann_bind + + -- Cannot be turned into an exit join point, but also has no + -- tail-call subexpression. Nothing to do here. + go _ ann_e = return (deAnnotate ann_e) + + --------------------- + go_exit :: [Var] -- Variables captured locally + -> CoreExpr -- An exit expression + -> VarSet -- Free vars of the expression + -> ExitifyM CoreExpr + -- go_exit deals with a tail expression that is floatable + -- out as an exit point; that is, it mentions no recursive calls + go_exit captured e fvs + -- Do not touch an expression that is already a join jump where all arguments + -- are captured variables. See Note [Idempotency] + -- But _do_ float join jumps with interesting arguments. + -- See Note [Jumps can be interesting] + | (Var f, args) <- collectArgs e + , isJoinId f + , all isCapturedVarArg args + = return e + + -- Do not touch a boring expression (see Note [Interesting expression]) + | not is_interesting + = return e + + -- Cannot float out if local join points are used, as + -- we cannot abstract over them + | captures_join_points + = return e + + -- We have something to float out! + | otherwise + = do { -- Assemble the RHS of the exit join point + let rhs = mkLams abs_vars e + avoid = in_scope `extendInScopeSetList` captured + -- Remember this binding under a suitable name + ; v <- addExit avoid (length abs_vars) rhs + -- And jump to it from here + ; return $ mkVarApps (Var v) abs_vars } + + where + -- Used to detect exit expressions that are already proper exit jumps + isCapturedVarArg (Var v) = v `elem` captured + isCapturedVarArg _ = False + + -- An interesting exit expression has free, non-imported + -- variables from outside the recursive group + -- See Note [Interesting expression] + is_interesting = anyVarSet isLocalId $ + fvs `minusVarSet` mkVarSet captured + + -- The arguments of this exit join point + -- See Note [Picking arguments to abstract over] + abs_vars = snd $ foldr pick (fvs, []) captured + where + pick v (fvs', acc) | v `elemVarSet` fvs' = (fvs' `delVarSet` v, zap v : acc) + | otherwise = (fvs', acc) + + -- We are going to abstract over these variables, so we must + -- zap any IdInfo they have; see #15005 + -- cf. GHC.Core.Op.SetLevels.abstractVars + zap v | isId v = setIdInfo v vanillaIdInfo + | otherwise = v + + -- We cannot abstract over join points + captures_join_points = any isJoinId abs_vars + + +-- Picks a new unique, which is disjoint from +-- * the free variables of the whole joinrec +-- * any bound variables (captured) +-- * any exit join points created so far. +mkExitJoinId :: InScopeSet -> Type -> JoinArity -> ExitifyM JoinId +mkExitJoinId in_scope ty join_arity = do + fs <- get + let avoid = in_scope `extendInScopeSetList` (map fst fs) + `extendInScopeSet` exit_id_tmpl -- just cosmetics + return (uniqAway avoid exit_id_tmpl) + where + exit_id_tmpl = mkSysLocal (fsLit "exit") initExitJoinUnique ty + `asJoinId` join_arity + +addExit :: InScopeSet -> JoinArity -> CoreExpr -> ExitifyM JoinId +addExit in_scope join_arity rhs = do + -- Pick a suitable name + let ty = exprType rhs + v <- mkExitJoinId in_scope ty join_arity + fs <- get + put ((v,rhs):fs) + return v + +{- +Note [Interesting expression] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We do not want this to happen: + + joinrec go 0 x y = x + go (n-1) x y = jump go (n-1) (x+y) + in … +==> + join exit x = x + joinrec go 0 x y = jump exit x + go (n-1) x y = jump go (n-1) (x+y) + in … + +because the floated exit path (`x`) is simply a parameter of `go`; there are +not useful interactions exposed this way. + +Neither do we want this to happen + + joinrec go 0 x y = x+x + go (n-1) x y = jump go (n-1) (x+y) + in … +==> + join exit x = x+x + joinrec go 0 x y = jump exit x + go (n-1) x y = jump go (n-1) (x+y) + in … + +where the floated expression `x+x` is a bit more complicated, but still not +intersting. + +Expressions are interesting when they move an occurrence of a variable outside +the recursive `go` that can benefit from being obviously called once, for example: + * a local thunk that can then be inlined (see example in note [Exitification]) + * the parameter of a function, where the demand analyzer then can then + see that it is called at most once, and hence improve the function’s + strictness signature + +So we only hoist an exit expression out if it mentiones at least one free, +non-imported variable. + +Note [Jumps can be interesting] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A jump to a join point can be interesting, if its arguments contain free +non-exported variables (z in the following example): + + joinrec go 0 x y = jump j (x+z) + go (n-1) x y = jump go (n-1) (x+y) + in … +==> + join exit x y = jump j (x+z) + joinrec go 0 x y = jump exit x + go (n-1) x y = jump go (n-1) (x+y) + + +The join point itself can be interesting, even if none if its +arguments have free variables free in the joinrec. For example + + join j p = case p of (x,y) -> x+y + joinrec go 0 x y = jump j (x,y) + go (n-1) x y = jump go (n-1) (x+y) y + in … + +Here, `j` would not be inlined because we do not inline something that looks +like an exit join point (see Note [Do not inline exit join points]). But +if we exitify the 'jump j (x,y)' we get + + join j p = case p of (x,y) -> x+y + join exit x y = jump j (x,y) + joinrec go 0 x y = jump exit x y + go (n-1) x y = jump go (n-1) (x+y) y + in … + +and now 'j' can inline, and we get rid of the pair. Here's another +example (assume `g` to be an imported function that, on its own, +does not make this interesting): + + join j y = map f y + joinrec go 0 x y = jump j (map g x) + go (n-1) x y = jump go (n-1) (x+y) + in … + +Again, `j` would not be inlined because we do not inline something that looks +like an exit join point (see Note [Do not inline exit join points]). + +But after exitification we have + + join j y = map f y + join exit x = jump j (map g x) + joinrec go 0 x y = jump j (map g x) + go (n-1) x y = jump go (n-1) (x+y) + in … + +and now we can inline `j` and this will allow `map/map` to fire. + + +Note [Idempotency] +~~~~~~~~~~~~~~~~~~ + +We do not want this to happen, where we replace the floated expression with +essentially the same expression: + + join exit x = t (x*x) + joinrec go 0 x y = jump exit x + go (n-1) x y = jump go (n-1) (x+y) + in … +==> + join exit x = t (x*x) + join exit' x = jump exit x + joinrec go 0 x y = jump exit' x + go (n-1) x y = jump go (n-1) (x+y) + in … + +So when the RHS is a join jump, and all of its arguments are captured variables, +then we leave it in place. + +Note that `jump exit x` in this example looks interesting, as `exit` is a free +variable. Therefore, idempotency does not simply follow from floating only +interesting expressions. + +Note [Calculating free variables] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We have two options where to annotate the tree with free variables: + + A) The whole tree. + B) Each individual joinrec as we come across it. + +Downside of A: We pay the price on the whole module, even outside any joinrecs. +Downside of B: We pay the price per joinrec, possibly multiple times when +joinrecs are nested. + +Further downside of A: If the exitify function returns annotated expressions, +it would have to ensure that the annotations are correct. + +We therefore choose B, and calculate the free variables in `exitify`. + + +Note [Do not inline exit join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we have + + let t = foo bar + join exit x = t (x*x) + joinrec go 0 x y = jump exit x + go (n-1) x y = jump go (n-1) (x+y) + in … + +we do not want the simplifier to simply inline `exit` back in (which it happily +would). + +To prevent this, we need to recognize exit join points, and then disable +inlining. + +Exit join points, recognizeable using `isExitJoinId` are join points with an +occurrence in a recursive group, and can be recognized (after the occurrence +analyzer ran!) using `isExitJoinId`. +This function detects joinpoints with `occ_in_lam (idOccinfo id) == True`, +because the lambdas of a non-recursive join point are not considered for +`occ_in_lam`. For example, in the following code, `j1` is /not/ marked +occ_in_lam, because `j2` is called only once. + + join j1 x = x+1 + join j2 y = join j1 (y+2) + +To prevent inlining, we check for isExitJoinId +* In `preInlineUnconditionally` directly. +* In `simplLetUnfolding` we simply give exit join points no unfolding, which + prevents inlining in `postInlineUnconditionally` and call sites. + +Note [Placement of the exitification pass] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +I (Joachim) experimented with multiple positions for the Exitification pass in +the Core2Core pipeline: + + A) Before the `simpl_phases` + B) Between the `simpl_phases` and the "main" simplifier pass + C) After demand_analyser + D) Before the final simplification phase + +Here is the table (this is without inlining join exit points in the final +simplifier run): + + Program | Allocs | Instrs + | ABCD.log A.log B.log C.log D.log | ABCD.log A.log B.log C.log D.log +----------------|---------------------------------------------------|------------------------------------------------- + fannkuch-redux | -99.9% +0.0% -99.9% -99.9% -99.9% | -3.9% +0.5% -3.0% -3.9% -3.9% + fasta | -0.0% +0.0% +0.0% -0.0% -0.0% | -8.5% +0.0% +0.0% -0.0% -8.5% + fem | 0.0% 0.0% 0.0% 0.0% +0.0% | -2.2% -0.1% -0.1% -2.1% -2.1% + fish | 0.0% 0.0% 0.0% 0.0% +0.0% | -3.1% +0.0% -1.1% -1.1% -0.0% + k-nucleotide | -91.3% -91.0% -91.0% -91.3% -91.3% | -6.3% +11.4% +11.4% -6.3% -6.2% + scs | -0.0% -0.0% -0.0% -0.0% -0.0% | -3.4% -3.0% -3.1% -3.3% -3.3% + simple | -6.0% 0.0% -6.0% -6.0% +0.0% | -3.4% +0.0% -5.2% -3.4% -0.1% + spectral-norm | -0.0% 0.0% 0.0% -0.0% +0.0% | -2.7% +0.0% -2.7% -5.4% -5.4% +----------------|---------------------------------------------------|------------------------------------------------- + Min | -95.0% -91.0% -95.0% -95.0% -95.0% | -8.5% -3.0% -5.2% -6.3% -8.5% + Max | +0.2% +0.2% +0.2% +0.2% +1.5% | +0.4% +11.4% +11.4% +0.4% +1.5% + Geometric Mean | -4.7% -2.1% -4.7% -4.7% -4.6% | -0.4% +0.1% -0.1% -0.3% -0.2% + +Position A is disqualified, as it does not get rid of the allocations in +fannkuch-redux. +Position A and B are disqualified because it increases instructions in k-nucleotide. +Positions C and D have their advantages: C decreases allocations in simpl, but D instructions in fasta. + +Assuming we have a budget of _one_ run of Exitification, then C wins (but we +could get more from running it multiple times, as seen in fish). + +Note [Picking arguments to abstract over] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When we create an exit join point, so we need to abstract over those of its +free variables that are be out-of-scope at the destination of the exit join +point. So we go through the list `captured` and pick those that are actually +free variables of the join point. + +We do not just `filter (`elemVarSet` fvs) captured`, as there might be +shadowing, and `captured` may contain multiple variables with the same Unique. I +these cases we want to abstract only over the last occurrence, hence the `foldr` +(with emphasis on the `r`). This is #15110. + +-} diff --git a/compiler/GHC/Core/Op/FloatIn.hs b/compiler/GHC/Core/Op/FloatIn.hs new file mode 100644 index 0000000000..ac4ef8088e --- /dev/null +++ b/compiler/GHC/Core/Op/FloatIn.hs @@ -0,0 +1,772 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +************************************************************************ +* * +\section[FloatIn]{Floating Inwards pass} +* * +************************************************************************ + +The main purpose of @floatInwards@ is floating into branches of a +case, so that we don't allocate things, save them on the stack, and +then discover that they aren't needed in the chosen branch. +-} + +{-# LANGUAGE CPP #-} +{-# OPTIONS_GHC -fprof-auto #-} +{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} + +module GHC.Core.Op.FloatIn ( floatInwards ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core +import GHC.Core.Make hiding ( wrapFloats ) +import GHC.Driver.Types ( ModGuts(..) ) +import GHC.Core.Utils +import GHC.Core.FVs +import GHC.Core.Op.Monad ( CoreM ) +import Id ( isOneShotBndr, idType, isJoinId, isJoinId_maybe ) +import Var +import GHC.Core.Type +import VarSet +import Util +import GHC.Driver.Session +import Outputable +-- import Data.List ( mapAccumL ) +import BasicTypes ( RecFlag(..), isRec ) + +{- +Top-level interface function, @floatInwards@. Note that we do not +actually float any bindings downwards from the top-level. +-} + +floatInwards :: ModGuts -> CoreM ModGuts +floatInwards pgm@(ModGuts { mg_binds = binds }) + = do { dflags <- getDynFlags + ; return (pgm { mg_binds = map (fi_top_bind dflags) binds }) } + where + fi_top_bind dflags (NonRec binder rhs) + = NonRec binder (fiExpr dflags [] (freeVars rhs)) + fi_top_bind dflags (Rec pairs) + = Rec [ (b, fiExpr dflags [] (freeVars rhs)) | (b, rhs) <- pairs ] + + +{- +************************************************************************ +* * +\subsection{Mail from Andr\'e [edited]} +* * +************************************************************************ + +{\em Will wrote: What??? I thought the idea was to float as far +inwards as possible, no matter what. This is dropping all bindings +every time it sees a lambda of any kind. Help! } + +You are assuming we DO DO full laziness AFTER floating inwards! We +have to [not float inside lambdas] if we don't. + +If we indeed do full laziness after the floating inwards (we could +check the compilation flags for that) then I agree we could be more +aggressive and do float inwards past lambdas. + +Actually we are not doing a proper full laziness (see below), which +was another reason for not floating inwards past a lambda. + +This can easily be fixed. The problem is that we float lets outwards, +but there are a few expressions which are not let bound, like case +scrutinees and case alternatives. After floating inwards the +simplifier could decide to inline the let and the laziness would be +lost, e.g. + +\begin{verbatim} +let a = expensive ==> \b -> case expensive of ... +in \ b -> case a of ... +\end{verbatim} +The fix is +\begin{enumerate} +\item +to let bind the algebraic case scrutinees (done, I think) and +the case alternatives (except the ones with an +unboxed type)(not done, I think). This is best done in the +GHC.Core.Op.SetLevels.hs module, which tags things with their level numbers. +\item +do the full laziness pass (floating lets outwards). +\item +simplify. The simplifier inlines the (trivial) lets that were + created but were not floated outwards. +\end{enumerate} + +With the fix I think Will's suggestion that we can gain even more from +strictness by floating inwards past lambdas makes sense. + +We still gain even without going past lambdas, as things may be +strict in the (new) context of a branch (where it was floated to) or +of a let rhs, e.g. +\begin{verbatim} +let a = something case x of +in case x of alt1 -> case something of a -> a + a + alt1 -> a + a ==> alt2 -> b + alt2 -> b + +let a = something let b = case something of a -> a + a +in let b = a + a ==> in (b,b) +in (b,b) +\end{verbatim} +Also, even if a is not found to be strict in the new context and is +still left as a let, if the branch is not taken (or b is not entered) +the closure for a is not built. + +************************************************************************ +* * +\subsection{Main floating-inwards code} +* * +************************************************************************ +-} + +type FreeVarSet = DIdSet +type BoundVarSet = DIdSet + +data FloatInBind = FB BoundVarSet FreeVarSet FloatBind + -- The FreeVarSet is the free variables of the binding. In the case + -- of recursive bindings, the set doesn't include the bound + -- variables. + +type FloatInBinds = [FloatInBind] + -- In reverse dependency order (innermost binder first) + +fiExpr :: DynFlags + -> FloatInBinds -- Binds we're trying to drop + -- as far "inwards" as possible + -> CoreExprWithFVs -- Input expr + -> CoreExpr -- Result + +fiExpr _ to_drop (_, AnnLit lit) = wrapFloats to_drop (Lit lit) + -- See Note [Dead bindings] +fiExpr _ to_drop (_, AnnType ty) = ASSERT( null to_drop ) Type ty +fiExpr _ to_drop (_, AnnVar v) = wrapFloats to_drop (Var v) +fiExpr _ to_drop (_, AnnCoercion co) = wrapFloats to_drop (Coercion co) +fiExpr dflags to_drop (_, AnnCast expr (co_ann, co)) + = wrapFloats (drop_here ++ co_drop) $ + Cast (fiExpr dflags e_drop expr) co + where + [drop_here, e_drop, co_drop] + = sepBindsByDropPoint dflags False + [freeVarsOf expr, freeVarsOfAnn co_ann] + to_drop + +{- +Applications: we do float inside applications, mainly because we +need to get at all the arguments. The next simplifier run will +pull out any silly ones. +-} + +fiExpr dflags to_drop ann_expr@(_,AnnApp {}) + = wrapFloats drop_here $ wrapFloats extra_drop $ + mkTicks ticks $ + mkApps (fiExpr dflags fun_drop ann_fun) + (zipWith (fiExpr dflags) arg_drops ann_args) + where + (ann_fun, ann_args, ticks) = collectAnnArgsTicks tickishFloatable ann_expr + fun_ty = exprType (deAnnotate ann_fun) + fun_fvs = freeVarsOf ann_fun + arg_fvs = map freeVarsOf ann_args + + (drop_here : extra_drop : fun_drop : arg_drops) + = sepBindsByDropPoint dflags False + (extra_fvs : fun_fvs : arg_fvs) + to_drop + -- Shortcut behaviour: if to_drop is empty, + -- sepBindsByDropPoint returns a suitable bunch of empty + -- lists without evaluating extra_fvs, and hence without + -- peering into each argument + + (_, extra_fvs) = foldl' add_arg (fun_ty, extra_fvs0) ann_args + extra_fvs0 = case ann_fun of + (_, AnnVar _) -> fun_fvs + _ -> emptyDVarSet + -- Don't float the binding for f into f x y z; see Note [Join points] + -- for why we *can't* do it when f is a join point. (If f isn't a + -- join point, floating it in isn't especially harmful but it's + -- useless since the simplifier will immediately float it back out.) + + add_arg :: (Type,FreeVarSet) -> CoreExprWithFVs -> (Type,FreeVarSet) + add_arg (fun_ty, extra_fvs) (_, AnnType ty) + = (piResultTy fun_ty ty, extra_fvs) + + add_arg (fun_ty, extra_fvs) (arg_fvs, arg) + | noFloatIntoArg arg arg_ty + = (res_ty, extra_fvs `unionDVarSet` arg_fvs) + | otherwise + = (res_ty, extra_fvs) + where + (arg_ty, res_ty) = splitFunTy fun_ty + +{- Note [Dead bindings] +~~~~~~~~~~~~~~~~~~~~~~~ +At a literal we won't usually have any floated bindings; the +only way that can happen is if the binding wrapped the literal +/in the original input program/. e.g. + case x of { DEFAULT -> 1# } +But, while this may be unusual it is not actually wrong, and it did +once happen (#15696). + +Note [Do not destroy the let/app invariant] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Watch out for + f (x +# y) +We don't want to float bindings into here + f (case ... of { x -> x +# y }) +because that might destroy the let/app invariant, which requires +unlifted function arguments to be ok-for-speculation. + +Note [Join points] +~~~~~~~~~~~~~~~~~~ +Generally, we don't need to worry about join points - there are places we're +not allowed to float them, but since they can't have occurrences in those +places, we're not tempted. + +We do need to be careful about jumps, however: + + joinrec j x y z = ... in + jump j a b c + +Previous versions often floated the definition of a recursive function into its +only non-recursive occurrence. But for a join point, this is a disaster: + + (joinrec j x y z = ... in + jump j) a b c -- wrong! + +Every jump must be exact, so the jump to j must have three arguments. Hence +we're careful not to float into the target of a jump (though we can float into +the arguments just fine). + +Note [Floating in past a lambda group] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* We must be careful about floating inside a value lambda. + That risks losing laziness. + The float-out pass might rescue us, but then again it might not. + +* We must be careful about type lambdas too. At one time we did, and + there is no risk of duplicating work thereby, but we do need to be + careful. In particular, here is a bad case (it happened in the + cichelli benchmark: + let v = ... + in let f = /\t -> \a -> ... + ==> + let f = /\t -> let v = ... in \a -> ... + This is bad as now f is an updatable closure (update PAP) + and has arity 0. + +* Hack alert! We only float in through one-shot lambdas, + not (as you might guess) through lone big lambdas. + Reason: we float *out* past big lambdas (see the test in the Lam + case of FloatOut.floatExpr) and we don't want to float straight + back in again. + + It *is* important to float into one-shot lambdas, however; + see the remarks with noFloatIntoRhs. + +So we treat lambda in groups, using the following rule: + + Float in if (a) there is at least one Id, + and (b) there are no non-one-shot Ids + + Otherwise drop all the bindings outside the group. + +This is what the 'go' function in the AnnLam case is doing. + +(Join points are handled similarly: a join point is considered one-shot iff +it's non-recursive, so we float only into non-recursive join points.) + +Urk! if all are tyvars, and we don't float in, we may miss an + opportunity to float inside a nested case branch + + +Note [Floating coercions] +~~~~~~~~~~~~~~~~~~~~~~~~~ +We could, in principle, have a coercion binding like + case f x of co { DEFAULT -> e1 e2 } +It's not common to have a function that returns a coercion, but nothing +in Core prohibits it. If so, 'co' might be mentioned in e1 or e2 +/only in a type/. E.g. suppose e1 was + let (x :: Int |> co) = blah in blah2 + + +But, with coercions appearing in types, there is a complication: we +might be floating in a "strict let" -- that is, a case. Case expressions +mention their return type. We absolutely can't float a coercion binding +inward to the point that the type of the expression it's about to wrap +mentions the coercion. So we include the union of the sets of free variables +of the types of all the drop points involved. If any of the floaters +bind a coercion variable mentioned in any of the types, that binder must +be dropped right away. + +-} + +fiExpr dflags to_drop lam@(_, AnnLam _ _) + | noFloatIntoLam bndrs -- Dump it all here + -- NB: Must line up with noFloatIntoRhs (AnnLam...); see #7088 + = wrapFloats to_drop (mkLams bndrs (fiExpr dflags [] body)) + + | otherwise -- Float inside + = mkLams bndrs (fiExpr dflags to_drop body) + + where + (bndrs, body) = collectAnnBndrs lam + +{- +We don't float lets inwards past an SCC. + ToDo: keep info on current cc, and when passing + one, if it is not the same, annotate all lets in binds with current + cc, change current cc to the new one and float binds into expr. +-} + +fiExpr dflags to_drop (_, AnnTick tickish expr) + | tickish `tickishScopesLike` SoftScope + = Tick tickish (fiExpr dflags to_drop expr) + + | otherwise -- Wimp out for now - we could push values in + = wrapFloats to_drop (Tick tickish (fiExpr dflags [] expr)) + +{- +For @Lets@, the possible ``drop points'' for the \tr{to_drop} +bindings are: (a)~in the body, (b1)~in the RHS of a NonRec binding, +or~(b2), in each of the RHSs of the pairs of a @Rec@. + +Note that we do {\em weird things} with this let's binding. Consider: +\begin{verbatim} +let + w = ... +in { + let v = ... w ... + in ... v .. w ... +} +\end{verbatim} +Look at the inner \tr{let}. As \tr{w} is used in both the bind and +body of the inner let, we could panic and leave \tr{w}'s binding where +it is. But \tr{v} is floatable further into the body of the inner let, and +{\em then} \tr{w} will also be only in the body of that inner let. + +So: rather than drop \tr{w}'s binding here, we add it onto the list of +things to drop in the outer let's body, and let nature take its +course. + +Note [extra_fvs (1): avoid floating into RHS] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider let x=\y....t... in body. We do not necessarily want to float +a binding for t into the RHS, because it'll immediately be floated out +again. (It won't go inside the lambda else we risk losing work.) +In letrec, we need to be more careful still. We don't want to transform + let x# = y# +# 1# + in + letrec f = \z. ...x#...f... + in ... +into + letrec f = let x# = y# +# 1# in \z. ...x#...f... in ... +because now we can't float the let out again, because a letrec +can't have unboxed bindings. + +So we make "extra_fvs" which is the rhs_fvs of such bindings, and +arrange to dump bindings that bind extra_fvs before the entire let. + +Note [extra_fvs (2): free variables of rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + let x{rule mentioning y} = rhs in body +Here y is not free in rhs or body; but we still want to dump bindings +that bind y outside the let. So we augment extra_fvs with the +idRuleAndUnfoldingVars of x. No need for type variables, hence not using +idFreeVars. +-} + +fiExpr dflags to_drop (_,AnnLet bind body) + = fiExpr dflags (after ++ new_float : before) body + -- to_drop is in reverse dependency order + where + (before, new_float, after) = fiBind dflags to_drop bind body_fvs + body_fvs = freeVarsOf body + +{- Note [Floating primops] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +We try to float-in a case expression over an unlifted type. The +motivating example was #5658: in particular, this change allows +array indexing operations, which have a single DEFAULT alternative +without any binders, to be floated inward. + +SIMD primops for unpacking SIMD vectors into an unboxed tuple of unboxed +scalars also need to be floated inward, but unpacks have a single non-DEFAULT +alternative that binds the elements of the tuple. We now therefore also support +floating in cases with a single alternative that may bind values. + +But there are wrinkles + +* Which unlifted cases do we float? See PrimOp.hs + Note [PrimOp can_fail and has_side_effects] which explains: + - We can float-in can_fail primops, but we can't float them out. + - But we can float a has_side_effects primop, but NOT inside a lambda, + so for now we don't float them at all. + Hence exprOkForSideEffects + +* Because we can float can-fail primops (array indexing, division) inwards + but not outwards, we must be careful not to transform + case a /# b of r -> f (F# r) + ===> + f (case a /# b of r -> F# r) + because that creates a new thunk that wasn't there before. And + because it can't be floated out (can_fail), the thunk will stay + there. Disaster! (This happened in nofib 'simple' and 'scs'.) + + Solution: only float cases into the branches of other cases, and + not into the arguments of an application, or the RHS of a let. This + is somewhat conservative, but it's simple. And it still hits the + cases like #5658. This is implemented in sepBindsByJoinPoint; + if is_case is False we dump all floating cases right here. + +* #14511 is another example of why we want to restrict float-in + of case-expressions. Consider + case indexArray# a n of (# r #) -> writeArray# ma i (f r) + Now, floating that indexing operation into the (f r) thunk will + not create any new thunks, but it will keep the array 'a' alive + for much longer than the programmer expected. + + So again, not floating a case into a let or argument seems like + the Right Thing + +For @Case@, the possible drop points for the 'to_drop' +bindings are: + (a) inside the scrutinee + (b) inside one of the alternatives/default (default FVs always /first/!). + +-} + +fiExpr dflags to_drop (_, AnnCase scrut case_bndr _ [(con,alt_bndrs,rhs)]) + | isUnliftedType (idType case_bndr) + , exprOkForSideEffects (deAnnotate scrut) + -- See Note [Floating primops] + = wrapFloats shared_binds $ + fiExpr dflags (case_float : rhs_binds) rhs + where + case_float = FB (mkDVarSet (case_bndr : alt_bndrs)) scrut_fvs + (FloatCase scrut' case_bndr con alt_bndrs) + scrut' = fiExpr dflags scrut_binds scrut + rhs_fvs = freeVarsOf rhs `delDVarSetList` (case_bndr : alt_bndrs) + scrut_fvs = freeVarsOf scrut + + [shared_binds, scrut_binds, rhs_binds] + = sepBindsByDropPoint dflags False + [scrut_fvs, rhs_fvs] + to_drop + +fiExpr dflags to_drop (_, AnnCase scrut case_bndr ty alts) + = wrapFloats drop_here1 $ + wrapFloats drop_here2 $ + Case (fiExpr dflags scrut_drops scrut) case_bndr ty + (zipWith fi_alt alts_drops_s alts) + where + -- Float into the scrut and alts-considered-together just like App + [drop_here1, scrut_drops, alts_drops] + = sepBindsByDropPoint dflags False + [scrut_fvs, all_alts_fvs] + to_drop + + -- Float into the alts with the is_case flag set + (drop_here2 : alts_drops_s) + | [ _ ] <- alts = [] : [alts_drops] + | otherwise = sepBindsByDropPoint dflags True alts_fvs alts_drops + + scrut_fvs = freeVarsOf scrut + alts_fvs = map alt_fvs alts + all_alts_fvs = unionDVarSets alts_fvs + alt_fvs (_con, args, rhs) + = foldl' delDVarSet (freeVarsOf rhs) (case_bndr:args) + -- Delete case_bndr and args from free vars of rhs + -- to get free vars of alt + + fi_alt to_drop (con, args, rhs) = (con, args, fiExpr dflags to_drop rhs) + +------------------ +fiBind :: DynFlags + -> FloatInBinds -- Binds we're trying to drop + -- as far "inwards" as possible + -> CoreBindWithFVs -- Input binding + -> DVarSet -- Free in scope of binding + -> ( FloatInBinds -- Land these before + , FloatInBind -- The binding itself + , FloatInBinds) -- Land these after + +fiBind dflags to_drop (AnnNonRec id ann_rhs@(rhs_fvs, rhs)) body_fvs + = ( extra_binds ++ shared_binds -- Land these before + -- See Note [extra_fvs (1,2)] + , FB (unitDVarSet id) rhs_fvs' -- The new binding itself + (FloatLet (NonRec id rhs')) + , body_binds ) -- Land these after + + where + body_fvs2 = body_fvs `delDVarSet` id + + rule_fvs = bndrRuleAndUnfoldingVarsDSet id -- See Note [extra_fvs (2): free variables of rules] + extra_fvs | noFloatIntoRhs NonRecursive id rhs + = rule_fvs `unionDVarSet` rhs_fvs + | otherwise + = rule_fvs + -- See Note [extra_fvs (1): avoid floating into RHS] + -- No point in floating in only to float straight out again + -- We *can't* float into ok-for-speculation unlifted RHSs + -- But do float into join points + + [shared_binds, extra_binds, rhs_binds, body_binds] + = sepBindsByDropPoint dflags False + [extra_fvs, rhs_fvs, body_fvs2] + to_drop + + -- Push rhs_binds into the right hand side of the binding + rhs' = fiRhs dflags rhs_binds id ann_rhs + rhs_fvs' = rhs_fvs `unionDVarSet` floatedBindsFVs rhs_binds `unionDVarSet` rule_fvs + -- Don't forget the rule_fvs; the binding mentions them! + +fiBind dflags to_drop (AnnRec bindings) body_fvs + = ( extra_binds ++ shared_binds + , FB (mkDVarSet ids) rhs_fvs' + (FloatLet (Rec (fi_bind rhss_binds bindings))) + , body_binds ) + where + (ids, rhss) = unzip bindings + rhss_fvs = map freeVarsOf rhss + + -- See Note [extra_fvs (1,2)] + rule_fvs = mapUnionDVarSet bndrRuleAndUnfoldingVarsDSet ids + extra_fvs = rule_fvs `unionDVarSet` + unionDVarSets [ rhs_fvs | (bndr, (rhs_fvs, rhs)) <- bindings + , noFloatIntoRhs Recursive bndr rhs ] + + (shared_binds:extra_binds:body_binds:rhss_binds) + = sepBindsByDropPoint dflags False + (extra_fvs:body_fvs:rhss_fvs) + to_drop + + rhs_fvs' = unionDVarSets rhss_fvs `unionDVarSet` + unionDVarSets (map floatedBindsFVs rhss_binds) `unionDVarSet` + rule_fvs -- Don't forget the rule variables! + + -- Push rhs_binds into the right hand side of the binding + fi_bind :: [FloatInBinds] -- one per "drop pt" conjured w/ fvs_of_rhss + -> [(Id, CoreExprWithFVs)] + -> [(Id, CoreExpr)] + + fi_bind to_drops pairs + = [ (binder, fiRhs dflags to_drop binder rhs) + | ((binder, rhs), to_drop) <- zipEqual "fi_bind" pairs to_drops ] + +------------------ +fiRhs :: DynFlags -> FloatInBinds -> CoreBndr -> CoreExprWithFVs -> CoreExpr +fiRhs dflags to_drop bndr rhs + | Just join_arity <- isJoinId_maybe bndr + , let (bndrs, body) = collectNAnnBndrs join_arity rhs + = mkLams bndrs (fiExpr dflags to_drop body) + | otherwise + = fiExpr dflags to_drop rhs + +------------------ +noFloatIntoLam :: [Var] -> Bool +noFloatIntoLam bndrs = any bad bndrs + where + bad b = isId b && not (isOneShotBndr b) + -- Don't float inside a non-one-shot lambda + +noFloatIntoRhs :: RecFlag -> Id -> CoreExprWithFVs' -> Bool +-- ^ True if it's a bad idea to float bindings into this RHS +noFloatIntoRhs is_rec bndr rhs + | isJoinId bndr + = isRec is_rec -- Joins are one-shot iff non-recursive + + | otherwise + = noFloatIntoArg rhs (idType bndr) + +noFloatIntoArg :: CoreExprWithFVs' -> Type -> Bool +noFloatIntoArg expr expr_ty + | isUnliftedType expr_ty + = True -- See Note [Do not destroy the let/app invariant] + + | AnnLam bndr e <- expr + , (bndrs, _) <- collectAnnBndrs e + = noFloatIntoLam (bndr:bndrs) -- Wrinkle 1 (a) + || all isTyVar (bndr:bndrs) -- Wrinkle 1 (b) + -- See Note [noFloatInto considerations] wrinkle 2 + + | otherwise -- Note [noFloatInto considerations] wrinkle 2 + = exprIsTrivial deann_expr || exprIsHNF deann_expr + where + deann_expr = deAnnotate' expr + +{- Note [noFloatInto considerations] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When do we want to float bindings into + - noFloatIntoRHs: the RHS of a let-binding + - noFloatIntoArg: the argument of a function application + +Definitely don't float in if it has unlifted type; that +would destroy the let/app invariant. + +* Wrinkle 1: do not float in if + (a) any non-one-shot value lambdas + or (b) all type lambdas + In both cases we'll float straight back out again + NB: Must line up with fiExpr (AnnLam...); see #7088 + + (a) is important: we /must/ float into a one-shot lambda group + (which includes join points). This makes a big difference + for things like + f x# = let x = I# x# + in let j = \() -> ...x... + in if <condition> then normal-path else j () + If x is used only in the error case join point, j, we must float the + boxing constructor into it, else we box it every time which is very + bad news indeed. + +* Wrinkle 2: for RHSs, do not float into a HNF; we'll just float right + back out again... not tragic, but a waste of time. + + For function arguments we will still end up with this + in-then-out stuff; consider + letrec x = e in f x + Here x is not a HNF, so we'll produce + f (letrec x = e in x) + which is OK... it's not that common, and we'll end up + floating out again, in CorePrep if not earlier. + Still, we use exprIsTrivial to catch this case (sigh) + + +************************************************************************ +* * +\subsection{@sepBindsByDropPoint@} +* * +************************************************************************ + +This is the crucial function. The idea is: We have a wad of bindings +that we'd like to distribute inside a collection of {\em drop points}; +insides the alternatives of a \tr{case} would be one example of some +drop points; the RHS and body of a non-recursive \tr{let} binding +would be another (2-element) collection. + +So: We're given a list of sets-of-free-variables, one per drop point, +and a list of floating-inwards bindings. If a binding can go into +only one drop point (without suddenly making something out-of-scope), +in it goes. If a binding is used inside {\em multiple} drop points, +then it has to go in a you-must-drop-it-above-all-these-drop-points +point. + +We have to maintain the order on these drop-point-related lists. +-} + +-- pprFIB :: FloatInBinds -> SDoc +-- pprFIB fibs = text "FIB:" <+> ppr [b | FB _ _ b <- fibs] + +sepBindsByDropPoint + :: DynFlags + -> Bool -- True <=> is case expression + -> [FreeVarSet] -- One set of FVs per drop point + -- Always at least two long! + -> FloatInBinds -- Candidate floaters + -> [FloatInBinds] -- FIRST one is bindings which must not be floated + -- inside any drop point; the rest correspond + -- one-to-one with the input list of FV sets + +-- Every input floater is returned somewhere in the result; +-- none are dropped, not even ones which don't seem to be +-- free in *any* of the drop-point fvs. Why? Because, for example, +-- a binding (let x = E in B) might have a specialised version of +-- x (say x') stored inside x, but x' isn't free in E or B. + +type DropBox = (FreeVarSet, FloatInBinds) + +sepBindsByDropPoint dflags is_case drop_pts floaters + | null floaters -- Shortcut common case + = [] : [[] | _ <- drop_pts] + + | otherwise + = ASSERT( drop_pts `lengthAtLeast` 2 ) + go floaters (map (\fvs -> (fvs, [])) (emptyDVarSet : drop_pts)) + where + n_alts = length drop_pts + + go :: FloatInBinds -> [DropBox] -> [FloatInBinds] + -- The *first* one in the argument list is the drop_here set + -- The FloatInBinds in the lists are in the reverse of + -- the normal FloatInBinds order; that is, they are the right way round! + + go [] drop_boxes = map (reverse . snd) drop_boxes + + go (bind_w_fvs@(FB bndrs bind_fvs bind) : binds) drop_boxes@(here_box : fork_boxes) + = go binds new_boxes + where + -- "here" means the group of bindings dropped at the top of the fork + + (used_here : used_in_flags) = [ fvs `intersectsDVarSet` bndrs + | (fvs, _) <- drop_boxes] + + drop_here = used_here || cant_push + + n_used_alts = count id used_in_flags -- returns number of Trues in list. + + cant_push + | is_case = n_used_alts == n_alts -- Used in all, don't push + -- Remember n_alts > 1 + || (n_used_alts > 1 && not (floatIsDupable dflags bind)) + -- floatIsDupable: see Note [Duplicating floats] + + | otherwise = floatIsCase bind || n_used_alts > 1 + -- floatIsCase: see Note [Floating primops] + + new_boxes | drop_here = (insert here_box : fork_boxes) + | otherwise = (here_box : new_fork_boxes) + + new_fork_boxes = zipWithEqual "FloatIn.sepBinds" insert_maybe + fork_boxes used_in_flags + + insert :: DropBox -> DropBox + insert (fvs,drops) = (fvs `unionDVarSet` bind_fvs, bind_w_fvs:drops) + + insert_maybe box True = insert box + insert_maybe box False = box + + go _ _ = panic "sepBindsByDropPoint/go" + + +{- Note [Duplicating floats] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For case expressions we duplicate the binding if it is reasonably +small, and if it is not used in all the RHSs This is good for +situations like + let x = I# y in + case e of + C -> error x + D -> error x + E -> ...not mentioning x... + +If the thing is used in all RHSs there is nothing gained, +so we don't duplicate then. +-} + +floatedBindsFVs :: FloatInBinds -> FreeVarSet +floatedBindsFVs binds = mapUnionDVarSet fbFVs binds + +fbFVs :: FloatInBind -> DVarSet +fbFVs (FB _ fvs _) = fvs + +wrapFloats :: FloatInBinds -> CoreExpr -> CoreExpr +-- Remember FloatInBinds is in *reverse* dependency order +wrapFloats [] e = e +wrapFloats (FB _ _ fl : bs) e = wrapFloats bs (wrapFloat fl e) + +floatIsDupable :: DynFlags -> FloatBind -> Bool +floatIsDupable dflags (FloatCase scrut _ _ _) = exprIsDupable dflags scrut +floatIsDupable dflags (FloatLet (Rec prs)) = all (exprIsDupable dflags . snd) prs +floatIsDupable dflags (FloatLet (NonRec _ r)) = exprIsDupable dflags r + +floatIsCase :: FloatBind -> Bool +floatIsCase (FloatCase {}) = True +floatIsCase (FloatLet {}) = False diff --git a/compiler/GHC/Core/Op/FloatOut.hs b/compiler/GHC/Core/Op/FloatOut.hs new file mode 100644 index 0000000000..fb47b2b3ed --- /dev/null +++ b/compiler/GHC/Core/Op/FloatOut.hs @@ -0,0 +1,757 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +\section[FloatOut]{Float bindings outwards (towards the top level)} + +``Long-distance'' floating of bindings towards the top level. +-} + +{-# LANGUAGE CPP #-} + +module GHC.Core.Op.FloatOut ( floatOutwards ) where + +import GhcPrelude + +import GHC.Core +import GHC.Core.Utils +import GHC.Core.Make +import GHC.Core.Arity ( etaExpand ) +import GHC.Core.Op.Monad ( FloatOutSwitches(..) ) + +import GHC.Driver.Session +import ErrUtils ( dumpIfSet_dyn, DumpFormat (..) ) +import Id ( Id, idArity, idType, isBottomingId, + isJoinId, isJoinId_maybe ) +import GHC.Core.Op.SetLevels +import UniqSupply ( UniqSupply ) +import Bag +import Util +import Maybes +import Outputable +import GHC.Core.Type +import qualified Data.IntMap as M + +import Data.List ( partition ) + +#include "HsVersions.h" + +{- + ----------------- + Overall game plan + ----------------- + +The Big Main Idea is: + + To float out sub-expressions that can thereby get outside + a non-one-shot value lambda, and hence may be shared. + + +To achieve this we may need to do two things: + + a) Let-bind the sub-expression: + + f (g x) ==> let lvl = f (g x) in lvl + + Now we can float the binding for 'lvl'. + + b) More than that, we may need to abstract wrt a type variable + + \x -> ... /\a -> let v = ...a... in .... + + Here the binding for v mentions 'a' but not 'x'. So we + abstract wrt 'a', to give this binding for 'v': + + vp = /\a -> ...a... + v = vp a + + Now the binding for vp can float out unimpeded. + I can't remember why this case seemed important enough to + deal with, but I certainly found cases where important floats + didn't happen if we did not abstract wrt tyvars. + +With this in mind we can also achieve another goal: lambda lifting. +We can make an arbitrary (function) binding float to top level by +abstracting wrt *all* local variables, not just type variables, leaving +a binding that can be floated right to top level. Whether or not this +happens is controlled by a flag. + + +Random comments +~~~~~~~~~~~~~~~ + +At the moment we never float a binding out to between two adjacent +lambdas. For example: + +@ + \x y -> let t = x+x in ... +===> + \x -> let t = x+x in \y -> ... +@ +Reason: this is less efficient in the case where the original lambda +is never partially applied. + +But there's a case I've seen where this might not be true. Consider: +@ +elEm2 x ys + = elem' x ys + where + elem' _ [] = False + elem' x (y:ys) = x==y || elem' x ys +@ +It turns out that this generates a subexpression of the form +@ + \deq x ys -> let eq = eqFromEqDict deq in ... +@ +which might usefully be separated to +@ + \deq -> let eq = eqFromEqDict deq in \xy -> ... +@ +Well, maybe. We don't do this at the moment. + +Note [Join points] +~~~~~~~~~~~~~~~~~~ +Every occurrence of a join point must be a tail call (see Note [Invariants on +join points] in GHC.Core), so we must be careful with how far we float them. The +mechanism for doing so is the *join ceiling*, detailed in Note [Join ceiling] +in GHC.Core.Op.SetLevels. For us, the significance is that a binder might be marked to be +dropped at the nearest boundary between tail calls and non-tail calls. For +example: + + (< join j = ... in + let x = < ... > in + case < ... > of + A -> ... + B -> ... + >) < ... > < ... > + +Here the join ceilings are marked with angle brackets. Either side of an +application is a join ceiling, as is the scrutinee position of a case +expression or the RHS of a let binding (but not a join point). + +Why do we *want* do float join points at all? After all, they're never +allocated, so there's no sharing to be gained by floating them. However, the +other benefit of floating is making RHSes small, and this can have a significant +impact. In particular, stream fusion has been known to produce nested loops like +this: + + joinrec j1 x1 = + joinrec j2 x2 = + joinrec j3 x3 = ... jump j1 (x3 + 1) ... jump j2 (x3 + 1) ... + in jump j3 x2 + in jump j2 x1 + in jump j1 x + +(Assume x1 and x2 do *not* occur free in j3.) + +Here j1 and j2 are wholly superfluous---each of them merely forwards its +argument to j3. Since j3 only refers to x3, we can float j2 and j3 to make +everything one big mutual recursion: + + joinrec j1 x1 = jump j2 x1 + j2 x2 = jump j3 x2 + j3 x3 = ... jump j1 (x3 + 1) ... jump j2 (x3 + 1) ... + in jump j1 x + +Now the simplifier will happily inline the trivial j1 and j2, leaving only j3. +Without floating, we're stuck with three loops instead of one. + +************************************************************************ +* * +\subsection[floatOutwards]{@floatOutwards@: let-floating interface function} +* * +************************************************************************ +-} + +floatOutwards :: FloatOutSwitches + -> DynFlags + -> UniqSupply + -> CoreProgram -> IO CoreProgram + +floatOutwards float_sws dflags us pgm + = do { + let { annotated_w_levels = setLevels float_sws pgm us ; + (fss, binds_s') = unzip (map floatTopBind annotated_w_levels) + } ; + + dumpIfSet_dyn dflags Opt_D_verbose_core2core "Levels added:" + FormatCore + (vcat (map ppr annotated_w_levels)); + + let { (tlets, ntlets, lams) = get_stats (sum_stats fss) }; + + dumpIfSet_dyn dflags Opt_D_dump_simpl_stats "FloatOut stats:" + FormatText + (hcat [ int tlets, text " Lets floated to top level; ", + int ntlets, text " Lets floated elsewhere; from ", + int lams, text " Lambda groups"]); + + return (bagToList (unionManyBags binds_s')) + } + +floatTopBind :: LevelledBind -> (FloatStats, Bag CoreBind) +floatTopBind bind + = case (floatBind bind) of { (fs, floats, bind') -> + let float_bag = flattenTopFloats floats + in case bind' of + -- bind' can't have unlifted values or join points, so can only be one + -- value bind, rec or non-rec (see comment on floatBind) + [Rec prs] -> (fs, unitBag (Rec (addTopFloatPairs float_bag prs))) + [NonRec b e] -> (fs, float_bag `snocBag` NonRec b e) + _ -> pprPanic "floatTopBind" (ppr bind') } + +{- +************************************************************************ +* * +\subsection[FloatOut-Bind]{Floating in a binding (the business end)} +* * +************************************************************************ +-} + +floatBind :: LevelledBind -> (FloatStats, FloatBinds, [CoreBind]) + -- Returns a list with either + -- * A single non-recursive binding (value or join point), or + -- * The following, in order: + -- * Zero or more non-rec unlifted bindings + -- * One or both of: + -- * A recursive group of join binds + -- * A recursive group of value binds + -- See Note [Floating out of Rec rhss] for why things get arranged this way. +floatBind (NonRec (TB var _) rhs) + = case (floatRhs var rhs) of { (fs, rhs_floats, rhs') -> + + -- A tiresome hack: + -- see Note [Bottoming floats: eta expansion] in GHC.Core.Op.SetLevels + let rhs'' | isBottomingId var = etaExpand (idArity var) rhs' + | otherwise = rhs' + + in (fs, rhs_floats, [NonRec var rhs'']) } + +floatBind (Rec pairs) + = case floatList do_pair pairs of { (fs, rhs_floats, new_pairs) -> + let (new_ul_pairss, new_other_pairss) = unzip new_pairs + (new_join_pairs, new_l_pairs) = partition (isJoinId . fst) + (concat new_other_pairss) + -- Can't put the join points and the values in the same rec group + new_rec_binds | null new_join_pairs = [ Rec new_l_pairs ] + | null new_l_pairs = [ Rec new_join_pairs ] + | otherwise = [ Rec new_l_pairs + , Rec new_join_pairs ] + new_non_rec_binds = [ NonRec b e | (b, e) <- concat new_ul_pairss ] + in + (fs, rhs_floats, new_non_rec_binds ++ new_rec_binds) } + where + do_pair :: (LevelledBndr, LevelledExpr) + -> (FloatStats, FloatBinds, + ([(Id,CoreExpr)], -- Non-recursive unlifted value bindings + [(Id,CoreExpr)])) -- Join points and lifted value bindings + do_pair (TB name spec, rhs) + | isTopLvl dest_lvl -- See Note [floatBind for top level] + = case (floatRhs name rhs) of { (fs, rhs_floats, rhs') -> + (fs, emptyFloats, ([], addTopFloatPairs (flattenTopFloats rhs_floats) + [(name, rhs')]))} + | otherwise -- Note [Floating out of Rec rhss] + = case (floatRhs name rhs) of { (fs, rhs_floats, rhs') -> + case (partitionByLevel dest_lvl rhs_floats) of { (rhs_floats', heres) -> + case (splitRecFloats heres) of { (ul_pairs, pairs, case_heres) -> + let pairs' = (name, installUnderLambdas case_heres rhs') : pairs in + (fs, rhs_floats', (ul_pairs, pairs')) }}} + where + dest_lvl = floatSpecLevel spec + +splitRecFloats :: Bag FloatBind + -> ([(Id,CoreExpr)], -- Non-recursive unlifted value bindings + [(Id,CoreExpr)], -- Join points and lifted value bindings + Bag FloatBind) -- A tail of further bindings +-- The "tail" begins with a case +-- See Note [Floating out of Rec rhss] +splitRecFloats fs + = go [] [] (bagToList fs) + where + go ul_prs prs (FloatLet (NonRec b r) : fs) | isUnliftedType (idType b) + , not (isJoinId b) + = go ((b,r):ul_prs) prs fs + | otherwise + = go ul_prs ((b,r):prs) fs + go ul_prs prs (FloatLet (Rec prs') : fs) = go ul_prs (prs' ++ prs) fs + go ul_prs prs fs = (reverse ul_prs, prs, + listToBag fs) + -- Order only matters for + -- non-rec + +installUnderLambdas :: Bag FloatBind -> CoreExpr -> CoreExpr +-- Note [Floating out of Rec rhss] +installUnderLambdas floats e + | isEmptyBag floats = e + | otherwise = go e + where + go (Lam b e) = Lam b (go e) + go e = install floats e + +--------------- +floatList :: (a -> (FloatStats, FloatBinds, b)) -> [a] -> (FloatStats, FloatBinds, [b]) +floatList _ [] = (zeroStats, emptyFloats, []) +floatList f (a:as) = case f a of { (fs_a, binds_a, b) -> + case floatList f as of { (fs_as, binds_as, bs) -> + (fs_a `add_stats` fs_as, binds_a `plusFloats` binds_as, b:bs) }} + +{- +Note [Floating out of Rec rhss] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider Rec { f<1,0> = \xy. body } +From the body we may get some floats. The ones with level <1,0> must +stay here, since they may mention f. Ideally we'd like to make them +part of the Rec block pairs -- but we can't if there are any +FloatCases involved. + +Nor is it a good idea to dump them in the rhs, but outside the lambda + f = case x of I# y -> \xy. body +because now f's arity might get worse, which is Not Good. (And if +there's an SCC around the RHS it might not get better again. +See #5342.) + +So, gruesomely, we split the floats into + * the outer FloatLets, which can join the Rec, and + * an inner batch starting in a FloatCase, which are then + pushed *inside* the lambdas. +This loses full-laziness the rare situation where there is a +FloatCase and a Rec interacting. + +If there are unlifted FloatLets (that *aren't* join points) among the floats, +we can't add them to the recursive group without angering Core Lint, but since +they must be ok-for-speculation, they can't actually be making any recursive +calls, so we can safely pull them out and keep them non-recursive. + +(Why is something getting floated to <1,0> that doesn't make a recursive call? +The case that came up in testing was that f *and* the unlifted binding were +getting floated *to the same place*: + + \x<2,0> -> + ... <3,0> + letrec { f<F<2,0>> = + ... let x'<F<2,0>> = x +# 1# in ... + } in ... + +Everything gets labeled "float to <2,0>" because it all depends on x, but this +makes f and x' look mutually recursive when they're not. + +The test was shootout/k-nucleotide, as compiled using commit 47d5dd68 on the +wip/join-points branch. + +TODO: This can probably be solved somehow in GHC.Core.Op.SetLevels. The difference between +"this *is at* level <2,0>" and "this *depends on* level <2,0>" is very +important.) + +Note [floatBind for top level] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We may have a *nested* binding whose destination level is (FloatMe tOP_LEVEL), thus + letrec { foo <0,0> = .... (let bar<0,0> = .. in ..) .... } +The binding for bar will be in the "tops" part of the floating binds, +and thus not partioned by floatBody. + +We could perhaps get rid of the 'tops' component of the floating binds, +but this case works just as well. + + +************************************************************************ + +\subsection[FloatOut-Expr]{Floating in expressions} +* * +************************************************************************ +-} + +floatBody :: Level + -> LevelledExpr + -> (FloatStats, FloatBinds, CoreExpr) + +floatBody lvl arg -- Used rec rhss, and case-alternative rhss + = case (floatExpr arg) of { (fsa, floats, arg') -> + case (partitionByLevel lvl floats) of { (floats', heres) -> + -- Dump bindings are bound here + (fsa, floats', install heres arg') }} + +----------------- + +{- Note [Floating past breakpoints] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We used to disallow floating out of breakpoint ticks (see #10052). However, I +think this is too restrictive. + +Consider the case of an expression scoped over by a breakpoint tick, + + tick<...> (let x = ... in f x) + +In this case it is completely legal to float out x, despite the fact that +breakpoint ticks are scoped, + + let x = ... in (tick<...> f x) + +The reason here is that we know that the breakpoint will still be hit when the +expression is entered since the tick still scopes over the RHS. + +-} + +floatExpr :: LevelledExpr + -> (FloatStats, FloatBinds, CoreExpr) +floatExpr (Var v) = (zeroStats, emptyFloats, Var v) +floatExpr (Type ty) = (zeroStats, emptyFloats, Type ty) +floatExpr (Coercion co) = (zeroStats, emptyFloats, Coercion co) +floatExpr (Lit lit) = (zeroStats, emptyFloats, Lit lit) + +floatExpr (App e a) + = case (atJoinCeiling $ floatExpr e) of { (fse, floats_e, e') -> + case (atJoinCeiling $ floatExpr a) of { (fsa, floats_a, a') -> + (fse `add_stats` fsa, floats_e `plusFloats` floats_a, App e' a') }} + +floatExpr lam@(Lam (TB _ lam_spec) _) + = let (bndrs_w_lvls, body) = collectBinders lam + bndrs = [b | TB b _ <- bndrs_w_lvls] + bndr_lvl = asJoinCeilLvl (floatSpecLevel lam_spec) + -- All the binders have the same level + -- See GHC.Core.Op.SetLevels.lvlLamBndrs + -- Use asJoinCeilLvl to make this the join ceiling + in + case (floatBody bndr_lvl body) of { (fs, floats, body') -> + (add_to_stats fs floats, floats, mkLams bndrs body') } + +floatExpr (Tick tickish expr) + | tickish `tickishScopesLike` SoftScope -- not scoped, can just float + = case (atJoinCeiling $ floatExpr expr) of { (fs, floating_defns, expr') -> + (fs, floating_defns, Tick tickish expr') } + + | not (tickishCounts tickish) || tickishCanSplit tickish + = case (atJoinCeiling $ floatExpr expr) of { (fs, floating_defns, expr') -> + let -- Annotate bindings floated outwards past an scc expression + -- with the cc. We mark that cc as "duplicated", though. + annotated_defns = wrapTick (mkNoCount tickish) floating_defns + in + (fs, annotated_defns, Tick tickish expr') } + + -- Note [Floating past breakpoints] + | Breakpoint{} <- tickish + = case (floatExpr expr) of { (fs, floating_defns, expr') -> + (fs, floating_defns, Tick tickish expr') } + + | otherwise + = pprPanic "floatExpr tick" (ppr tickish) + +floatExpr (Cast expr co) + = case (atJoinCeiling $ floatExpr expr) of { (fs, floating_defns, expr') -> + (fs, floating_defns, Cast expr' co) } + +floatExpr (Let bind body) + = case bind_spec of + FloatMe dest_lvl + -> case (floatBind bind) of { (fsb, bind_floats, binds') -> + case (floatExpr body) of { (fse, body_floats, body') -> + let new_bind_floats = foldr plusFloats emptyFloats + (map (unitLetFloat dest_lvl) binds') in + ( add_stats fsb fse + , bind_floats `plusFloats` new_bind_floats + `plusFloats` body_floats + , body') }} + + StayPut bind_lvl -- See Note [Avoiding unnecessary floating] + -> case (floatBind bind) of { (fsb, bind_floats, binds') -> + case (floatBody bind_lvl body) of { (fse, body_floats, body') -> + ( add_stats fsb fse + , bind_floats `plusFloats` body_floats + , foldr Let body' binds' ) }} + where + bind_spec = case bind of + NonRec (TB _ s) _ -> s + Rec ((TB _ s, _) : _) -> s + Rec [] -> panic "floatExpr:rec" + +floatExpr (Case scrut (TB case_bndr case_spec) ty alts) + = case case_spec of + FloatMe dest_lvl -- Case expression moves + | [(con@(DataAlt {}), bndrs, rhs)] <- alts + -> case atJoinCeiling $ floatExpr scrut of { (fse, fde, scrut') -> + case floatExpr rhs of { (fsb, fdb, rhs') -> + let + float = unitCaseFloat dest_lvl scrut' + case_bndr con [b | TB b _ <- bndrs] + in + (add_stats fse fsb, fde `plusFloats` float `plusFloats` fdb, rhs') }} + | otherwise + -> pprPanic "Floating multi-case" (ppr alts) + + StayPut bind_lvl -- Case expression stays put + -> case atJoinCeiling $ floatExpr scrut of { (fse, fde, scrut') -> + case floatList (float_alt bind_lvl) alts of { (fsa, fda, alts') -> + (add_stats fse fsa, fda `plusFloats` fde, Case scrut' case_bndr ty alts') + }} + where + float_alt bind_lvl (con, bs, rhs) + = case (floatBody bind_lvl rhs) of { (fs, rhs_floats, rhs') -> + (fs, rhs_floats, (con, [b | TB b _ <- bs], rhs')) } + +floatRhs :: CoreBndr + -> LevelledExpr + -> (FloatStats, FloatBinds, CoreExpr) +floatRhs bndr rhs + | Just join_arity <- isJoinId_maybe bndr + , Just (bndrs, body) <- try_collect join_arity rhs [] + = case bndrs of + [] -> floatExpr rhs + (TB _ lam_spec):_ -> + let lvl = floatSpecLevel lam_spec in + case floatBody lvl body of { (fs, floats, body') -> + (fs, floats, mkLams [b | TB b _ <- bndrs] body') } + | otherwise + = atJoinCeiling $ floatExpr rhs + where + try_collect 0 expr acc = Just (reverse acc, expr) + try_collect n (Lam b e) acc = try_collect (n-1) e (b:acc) + try_collect _ _ _ = Nothing + +{- +Note [Avoiding unnecessary floating] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In general we want to avoid floating a let unnecessarily, because +it might worsen strictness: + let + x = ...(let y = e in y+y).... +Here y is demanded. If we float it outside the lazy 'x=..' then +we'd have to zap its demand info, and it may never be restored. + +So at a 'let' we leave the binding right where the are unless +the binding will escape a value lambda, e.g. + +(\x -> let y = fac 100 in y) + +That's what the partitionByMajorLevel does in the floatExpr (Let ...) +case. + +Notice, though, that we must take care to drop any bindings +from the body of the let that depend on the staying-put bindings. + +We used instead to do the partitionByMajorLevel on the RHS of an '=', +in floatRhs. But that was quite tiresome. We needed to test for +values or trivial rhss, because (in particular) we don't want to insert +new bindings between the "=" and the "\". E.g. + f = \x -> let <bind> in <body> +We do not want + f = let <bind> in \x -> <body> +(a) The simplifier will immediately float it further out, so we may + as well do so right now; in general, keeping rhss as manifest + values is good +(b) If a float-in pass follows immediately, it might add yet more + bindings just after the '='. And some of them might (correctly) + be strict even though the 'let f' is lazy, because f, being a value, + gets its demand-info zapped by the simplifier. +And even all that turned out to be very fragile, and broke +altogether when profiling got in the way. + +So now we do the partition right at the (Let..) itself. + +************************************************************************ +* * +\subsection{Utility bits for floating stats} +* * +************************************************************************ + +I didn't implement this with unboxed numbers. I don't want to be too +strict in this stuff, as it is rarely turned on. (WDP 95/09) +-} + +data FloatStats + = FlS Int -- Number of top-floats * lambda groups they've been past + Int -- Number of non-top-floats * lambda groups they've been past + Int -- Number of lambda (groups) seen + +get_stats :: FloatStats -> (Int, Int, Int) +get_stats (FlS a b c) = (a, b, c) + +zeroStats :: FloatStats +zeroStats = FlS 0 0 0 + +sum_stats :: [FloatStats] -> FloatStats +sum_stats xs = foldr add_stats zeroStats xs + +add_stats :: FloatStats -> FloatStats -> FloatStats +add_stats (FlS a1 b1 c1) (FlS a2 b2 c2) + = FlS (a1 + a2) (b1 + b2) (c1 + c2) + +add_to_stats :: FloatStats -> FloatBinds -> FloatStats +add_to_stats (FlS a b c) (FB tops ceils others) + = FlS (a + lengthBag tops) + (b + lengthBag ceils + lengthBag (flattenMajor others)) + (c + 1) + +{- +************************************************************************ +* * +\subsection{Utility bits for floating} +* * +************************************************************************ + +Note [Representation of FloatBinds] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The FloatBinds types is somewhat important. We can get very large numbers +of floating bindings, often all destined for the top level. A typical example +is x = [4,2,5,2,5, .... ] +Then we get lots of small expressions like (fromInteger 4), which all get +lifted to top level. + +The trouble is that + (a) we partition these floating bindings *at every binding site* + (b) GHC.Core.Op.SetLevels introduces a new bindings site for every float +So we had better not look at each binding at each binding site! + +That is why MajorEnv is represented as a finite map. + +We keep the bindings destined for the *top* level separate, because +we float them out even if they don't escape a *value* lambda; see +partitionByMajorLevel. +-} + +type FloatLet = CoreBind -- INVARIANT: a FloatLet is always lifted +type MajorEnv = M.IntMap MinorEnv -- Keyed by major level +type MinorEnv = M.IntMap (Bag FloatBind) -- Keyed by minor level + +data FloatBinds = FB !(Bag FloatLet) -- Destined for top level + !(Bag FloatBind) -- Destined for join ceiling + !MajorEnv -- Other levels + -- See Note [Representation of FloatBinds] + +instance Outputable FloatBinds where + ppr (FB fbs ceils defs) + = text "FB" <+> (braces $ vcat + [ text "tops =" <+> ppr fbs + , text "ceils =" <+> ppr ceils + , text "non-tops =" <+> ppr defs ]) + +flattenTopFloats :: FloatBinds -> Bag CoreBind +flattenTopFloats (FB tops ceils defs) + = ASSERT2( isEmptyBag (flattenMajor defs), ppr defs ) + ASSERT2( isEmptyBag ceils, ppr ceils ) + tops + +addTopFloatPairs :: Bag CoreBind -> [(Id,CoreExpr)] -> [(Id,CoreExpr)] +addTopFloatPairs float_bag prs + = foldr add prs float_bag + where + add (NonRec b r) prs = (b,r):prs + add (Rec prs1) prs2 = prs1 ++ prs2 + +flattenMajor :: MajorEnv -> Bag FloatBind +flattenMajor = M.foldr (unionBags . flattenMinor) emptyBag + +flattenMinor :: MinorEnv -> Bag FloatBind +flattenMinor = M.foldr unionBags emptyBag + +emptyFloats :: FloatBinds +emptyFloats = FB emptyBag emptyBag M.empty + +unitCaseFloat :: Level -> CoreExpr -> Id -> AltCon -> [Var] -> FloatBinds +unitCaseFloat (Level major minor t) e b con bs + | t == JoinCeilLvl + = FB emptyBag floats M.empty + | otherwise + = FB emptyBag emptyBag (M.singleton major (M.singleton minor floats)) + where + floats = unitBag (FloatCase e b con bs) + +unitLetFloat :: Level -> FloatLet -> FloatBinds +unitLetFloat lvl@(Level major minor t) b + | isTopLvl lvl = FB (unitBag b) emptyBag M.empty + | t == JoinCeilLvl = FB emptyBag floats M.empty + | otherwise = FB emptyBag emptyBag (M.singleton major + (M.singleton minor floats)) + where + floats = unitBag (FloatLet b) + +plusFloats :: FloatBinds -> FloatBinds -> FloatBinds +plusFloats (FB t1 c1 l1) (FB t2 c2 l2) + = FB (t1 `unionBags` t2) (c1 `unionBags` c2) (l1 `plusMajor` l2) + +plusMajor :: MajorEnv -> MajorEnv -> MajorEnv +plusMajor = M.unionWith plusMinor + +plusMinor :: MinorEnv -> MinorEnv -> MinorEnv +plusMinor = M.unionWith unionBags + +install :: Bag FloatBind -> CoreExpr -> CoreExpr +install defn_groups expr + = foldr wrapFloat expr defn_groups + +partitionByLevel + :: Level -- Partitioning level + -> FloatBinds -- Defns to be divided into 2 piles... + -> (FloatBinds, -- Defns with level strictly < partition level, + Bag FloatBind) -- The rest + +{- +-- ---- partitionByMajorLevel ---- +-- Float it if we escape a value lambda, +-- *or* if we get to the top level +-- *or* if it's a case-float and its minor level is < current +-- +-- If we can get to the top level, say "yes" anyway. This means that +-- x = f e +-- transforms to +-- lvl = e +-- x = f lvl +-- which is as it should be + +partitionByMajorLevel (Level major _) (FB tops defns) + = (FB tops outer, heres `unionBags` flattenMajor inner) + where + (outer, mb_heres, inner) = M.splitLookup major defns + heres = case mb_heres of + Nothing -> emptyBag + Just h -> flattenMinor h +-} + +partitionByLevel (Level major minor typ) (FB tops ceils defns) + = (FB tops ceils' (outer_maj `plusMajor` M.singleton major outer_min), + here_min `unionBags` here_ceil + `unionBags` flattenMinor inner_min + `unionBags` flattenMajor inner_maj) + + where + (outer_maj, mb_here_maj, inner_maj) = M.splitLookup major defns + (outer_min, mb_here_min, inner_min) = case mb_here_maj of + Nothing -> (M.empty, Nothing, M.empty) + Just min_defns -> M.splitLookup minor min_defns + here_min = mb_here_min `orElse` emptyBag + (here_ceil, ceils') | typ == JoinCeilLvl = (ceils, emptyBag) + | otherwise = (emptyBag, ceils) + +-- Like partitionByLevel, but instead split out the bindings that are marked +-- to float to the nearest join ceiling (see Note [Join points]) +partitionAtJoinCeiling :: FloatBinds -> (FloatBinds, Bag FloatBind) +partitionAtJoinCeiling (FB tops ceils defs) + = (FB tops emptyBag defs, ceils) + +-- Perform some action at a join ceiling, i.e., don't let join points float out +-- (see Note [Join points]) +atJoinCeiling :: (FloatStats, FloatBinds, CoreExpr) + -> (FloatStats, FloatBinds, CoreExpr) +atJoinCeiling (fs, floats, expr') + = (fs, floats', install ceils expr') + where + (floats', ceils) = partitionAtJoinCeiling floats + +wrapTick :: Tickish Id -> FloatBinds -> FloatBinds +wrapTick t (FB tops ceils defns) + = FB (mapBag wrap_bind tops) (wrap_defns ceils) + (M.map (M.map wrap_defns) defns) + where + wrap_defns = mapBag wrap_one + + wrap_bind (NonRec binder rhs) = NonRec binder (maybe_tick rhs) + wrap_bind (Rec pairs) = Rec (mapSnd maybe_tick pairs) + + wrap_one (FloatLet bind) = FloatLet (wrap_bind bind) + wrap_one (FloatCase e b c bs) = FloatCase (maybe_tick e) b c bs + + maybe_tick e | exprIsHNF e = tickHNFArgs t e + | otherwise = mkTick t e + -- we don't need to wrap a tick around an HNF when we float it + -- outside a tick: that is an invariant of the tick semantics + -- Conversely, inlining of HNFs inside an SCC is allowed, and + -- indeed the HNF we're floating here might well be inlined back + -- again, and we don't want to end up with duplicate ticks. diff --git a/compiler/GHC/Core/Op/LiberateCase.hs b/compiler/GHC/Core/Op/LiberateCase.hs new file mode 100644 index 0000000000..399abf4c67 --- /dev/null +++ b/compiler/GHC/Core/Op/LiberateCase.hs @@ -0,0 +1,442 @@ +{- +(c) The AQUA Project, Glasgow University, 1994-1998 + +\section[LiberateCase]{Unroll recursion to allow evals to be lifted from a loop} +-} + +{-# LANGUAGE CPP #-} +module GHC.Core.Op.LiberateCase ( liberateCase ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Driver.Session +import GHC.Core +import GHC.Core.Unfold ( couldBeSmallEnoughToInline ) +import TysWiredIn ( unitDataConId ) +import Id +import VarEnv +import Util ( notNull ) + +{- +The liberate-case transformation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This module walks over @Core@, and looks for @case@ on free variables. +The criterion is: + if there is case on a free on the route to the recursive call, + then the recursive call is replaced with an unfolding. + +Example + + f = \ t -> case v of + V a b -> a : f t + +=> the inner f is replaced. + + f = \ t -> case v of + V a b -> a : (letrec + f = \ t -> case v of + V a b -> a : f t + in f) t +(note the NEED for shadowing) + +=> Simplify + + f = \ t -> case v of + V a b -> a : (letrec + f = \ t -> a : f t + in f t) + +Better code, because 'a' is free inside the inner letrec, rather +than needing projection from v. + +Note that this deals with *free variables*. SpecConstr deals with +*arguments* that are of known form. E.g. + + last [] = error + last (x:[]) = x + last (x:xs) = last xs + + +Note [Scrutinee with cast] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this: + f = \ t -> case (v `cast` co) of + V a b -> a : f t + +Exactly the same optimisation (unrolling one call to f) will work here, +despite the cast. See mk_alt_env in the Case branch of libCase. + + +To think about (Apr 94) +~~~~~~~~~~~~~~ +Main worry: duplicating code excessively. At the moment we duplicate +the entire binding group once at each recursive call. But there may +be a group of recursive calls which share a common set of evaluated +free variables, in which case the duplication is a plain waste. + +Another thing we could consider adding is some unfold-threshold thing, +so that we'll only duplicate if the size of the group rhss isn't too +big. + +Data types +~~~~~~~~~~ +The ``level'' of a binder tells how many +recursive defns lexically enclose the binding +A recursive defn "encloses" its RHS, not its +scope. For example: +\begin{verbatim} + letrec f = let g = ... in ... + in + let h = ... + in ... +\end{verbatim} +Here, the level of @f@ is zero, the level of @g@ is one, +and the level of @h@ is zero (NB not one). + + +************************************************************************ +* * + Top-level code +* * +************************************************************************ +-} + +liberateCase :: DynFlags -> CoreProgram -> CoreProgram +liberateCase dflags binds = do_prog (initEnv dflags) binds + where + do_prog _ [] = [] + do_prog env (bind:binds) = bind' : do_prog env' binds + where + (env', bind') = libCaseBind env bind + +{- +************************************************************************ +* * + Main payload +* * +************************************************************************ + +Bindings +~~~~~~~~ +-} + +libCaseBind :: LibCaseEnv -> CoreBind -> (LibCaseEnv, CoreBind) + +libCaseBind env (NonRec binder rhs) + = (addBinders env [binder], NonRec binder (libCase env rhs)) + +libCaseBind env (Rec pairs) + = (env_body, Rec pairs') + where + binders = map fst pairs + + env_body = addBinders env binders + + pairs' = [(binder, libCase env_rhs rhs) | (binder,rhs) <- pairs] + + -- We extend the rec-env by binding each Id to its rhs, first + -- processing the rhs with an *un-extended* environment, so + -- that the same process doesn't occur for ever! + env_rhs | is_dupable_bind = addRecBinds env dup_pairs + | otherwise = env + + dup_pairs = [ (localiseId binder, libCase env_body rhs) + | (binder, rhs) <- pairs ] + -- localiseID : see Note [Need to localiseId in libCaseBind] + + is_dupable_bind = small_enough && all ok_pair pairs + + -- Size: we are going to duplicate dup_pairs; to find their + -- size, build a fake binding (let { dup_pairs } in (), + -- and find the size of that + -- See Note [Small enough] + small_enough = case bombOutSize env of + Nothing -> True -- Infinity + Just size -> couldBeSmallEnoughToInline (lc_dflags env) size $ + Let (Rec dup_pairs) (Var unitDataConId) + + ok_pair (id,_) + = idArity id > 0 -- Note [Only functions!] + && not (isBottomingId id) -- Note [Not bottoming ids] + +{- Note [Not bottoming Ids] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Do not specialise error-functions (this is unusual, but I once saw it, +(actually in Data.Typable.Internal) + +Note [Only functions!] +~~~~~~~~~~~~~~~~~~~~~~ +Consider the following code + + f = g (case v of V a b -> a : t f) + +where g is expensive. If we aren't careful, liberate case will turn this into + + f = g (case v of + V a b -> a : t (letrec f = g (case v of V a b -> a : f t) + in f) + ) + +Yikes! We evaluate g twice. This leads to a O(2^n) explosion +if g calls back to the same code recursively. + +Solution: make sure that we only do the liberate-case thing on *functions* + +Note [Small enough] +~~~~~~~~~~~~~~~~~~~ +Consider + \fv. letrec + f = \x. BIG...(case fv of { (a,b) -> ...g.. })... + g = \y. SMALL...f... + +Then we *can* in principle do liberate-case on 'g' (small RHS) but not +for 'f' (too big). But doing so is not profitable, because duplicating +'g' at its call site in 'f' doesn't get rid of any cases. So we just +ask for the whole group to be small enough. + +Note [Need to localiseId in libCaseBind] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The call to localiseId is needed for two subtle reasons +(a) Reset the export flags on the binders so + that we don't get name clashes on exported things if the + local binding floats out to top level. This is most unlikely + to happen, since the whole point concerns free variables. + But resetting the export flag is right regardless. + +(b) Make the name an Internal one. External Names should never be + nested; if it were floated to the top level, we'd get a name + clash at code generation time. + +Expressions +~~~~~~~~~~~ +-} + +libCase :: LibCaseEnv + -> CoreExpr + -> CoreExpr + +libCase env (Var v) = libCaseApp env v [] +libCase _ (Lit lit) = Lit lit +libCase _ (Type ty) = Type ty +libCase _ (Coercion co) = Coercion co +libCase env e@(App {}) | let (fun, args) = collectArgs e + , Var v <- fun + = libCaseApp env v args +libCase env (App fun arg) = App (libCase env fun) (libCase env arg) +libCase env (Tick tickish body) = Tick tickish (libCase env body) +libCase env (Cast e co) = Cast (libCase env e) co + +libCase env (Lam binder body) + = Lam binder (libCase (addBinders env [binder]) body) + +libCase env (Let bind body) + = Let bind' (libCase env_body body) + where + (env_body, bind') = libCaseBind env bind + +libCase env (Case scrut bndr ty alts) + = Case (libCase env scrut) bndr ty (map (libCaseAlt env_alts) alts) + where + env_alts = addBinders (mk_alt_env scrut) [bndr] + mk_alt_env (Var scrut_var) = addScrutedVar env scrut_var + mk_alt_env (Cast scrut _) = mk_alt_env scrut -- Note [Scrutinee with cast] + mk_alt_env _ = env + +libCaseAlt :: LibCaseEnv -> (AltCon, [CoreBndr], CoreExpr) + -> (AltCon, [CoreBndr], CoreExpr) +libCaseAlt env (con,args,rhs) = (con, args, libCase (addBinders env args) rhs) + +{- +Ids +~~~ + +To unfold, we can't just wrap the id itself in its binding if it's a join point: + + jump j a b c => (joinrec j x y z = ... in jump j) a b c -- wrong!!! + +Every jump must provide all arguments, so we have to be careful to wrap the +whole jump instead: + + jump j a b c => joinrec j x y z = ... in jump j a b c -- right + +-} + +libCaseApp :: LibCaseEnv -> Id -> [CoreExpr] -> CoreExpr +libCaseApp env v args + | Just the_bind <- lookupRecId env v -- It's a use of a recursive thing + , notNull free_scruts -- with free vars scrutinised in RHS + = Let the_bind expr' + + | otherwise + = expr' + + where + rec_id_level = lookupLevel env v + free_scruts = freeScruts env rec_id_level + expr' = mkApps (Var v) (map (libCase env) args) + +freeScruts :: LibCaseEnv + -> LibCaseLevel -- Level of the recursive Id + -> [Id] -- Ids that are scrutinised between the binding + -- of the recursive Id and here +freeScruts env rec_bind_lvl + = [v | (v, scrut_bind_lvl, scrut_at_lvl) <- lc_scruts env + , scrut_bind_lvl <= rec_bind_lvl + , scrut_at_lvl > rec_bind_lvl] + -- Note [When to specialise] + -- Note [Avoiding fruitless liberate-case] + +{- +Note [When to specialise] +~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + f = \x. letrec g = \y. case x of + True -> ... (f a) ... + False -> ... (g b) ... + +We get the following levels + f 0 + x 1 + g 1 + y 2 + +Then 'x' is being scrutinised at a deeper level than its binding, so +it's added to lc_sruts: [(x,1)] + +We do *not* want to specialise the call to 'f', because 'x' is not free +in 'f'. So here the bind-level of 'x' (=1) is not <= the bind-level of 'f' (=0). + +We *do* want to specialise the call to 'g', because 'x' is free in g. +Here the bind-level of 'x' (=1) is <= the bind-level of 'g' (=1). + +Note [Avoiding fruitless liberate-case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider also: + f = \x. case top_lvl_thing of + I# _ -> let g = \y. ... g ... + in ... + +Here, top_lvl_thing is scrutinised at a level (1) deeper than its +binding site (0). Nevertheless, we do NOT want to specialise the call +to 'g' because all the structure in its free variables is already +visible at the definition site for g. Hence, when considering specialising +an occurrence of 'g', we want to check that there's a scruted-var v st + + a) v's binding site is *outside* g + b) v's scrutinisation site is *inside* g + + +************************************************************************ +* * + Utility functions +* * +************************************************************************ +-} + +addBinders :: LibCaseEnv -> [CoreBndr] -> LibCaseEnv +addBinders env@(LibCaseEnv { lc_lvl = lvl, lc_lvl_env = lvl_env }) binders + = env { lc_lvl_env = lvl_env' } + where + lvl_env' = extendVarEnvList lvl_env (binders `zip` repeat lvl) + +addRecBinds :: LibCaseEnv -> [(Id,CoreExpr)] -> LibCaseEnv +addRecBinds env@(LibCaseEnv {lc_lvl = lvl, lc_lvl_env = lvl_env, + lc_rec_env = rec_env}) pairs + = env { lc_lvl = lvl', lc_lvl_env = lvl_env', lc_rec_env = rec_env' } + where + lvl' = lvl + 1 + lvl_env' = extendVarEnvList lvl_env [(binder,lvl) | (binder,_) <- pairs] + rec_env' = extendVarEnvList rec_env [(binder, Rec pairs) | (binder,_) <- pairs] + +addScrutedVar :: LibCaseEnv + -> Id -- This Id is being scrutinised by a case expression + -> LibCaseEnv + +addScrutedVar env@(LibCaseEnv { lc_lvl = lvl, lc_lvl_env = lvl_env, + lc_scruts = scruts }) scrut_var + | bind_lvl < lvl + = env { lc_scruts = scruts' } + -- Add to scruts iff the scrut_var is being scrutinised at + -- a deeper level than its defn + + | otherwise = env + where + scruts' = (scrut_var, bind_lvl, lvl) : scruts + bind_lvl = case lookupVarEnv lvl_env scrut_var of + Just lvl -> lvl + Nothing -> topLevel + +lookupRecId :: LibCaseEnv -> Id -> Maybe CoreBind +lookupRecId env id = lookupVarEnv (lc_rec_env env) id + +lookupLevel :: LibCaseEnv -> Id -> LibCaseLevel +lookupLevel env id + = case lookupVarEnv (lc_lvl_env env) id of + Just lvl -> lvl + Nothing -> topLevel + +{- +************************************************************************ +* * + The environment +* * +************************************************************************ +-} + +type LibCaseLevel = Int + +topLevel :: LibCaseLevel +topLevel = 0 + +data LibCaseEnv + = LibCaseEnv { + lc_dflags :: DynFlags, + + lc_lvl :: LibCaseLevel, -- Current level + -- The level is incremented when (and only when) going + -- inside the RHS of a (sufficiently small) recursive + -- function. + + lc_lvl_env :: IdEnv LibCaseLevel, + -- Binds all non-top-level in-scope Ids (top-level and + -- imported things have a level of zero) + + lc_rec_env :: IdEnv CoreBind, + -- Binds *only* recursively defined ids, to their own + -- binding group, and *only* in their own RHSs + + lc_scruts :: [(Id, LibCaseLevel, LibCaseLevel)] + -- Each of these Ids was scrutinised by an enclosing + -- case expression, at a level deeper than its binding + -- level. + -- + -- The first LibCaseLevel is the *binding level* of + -- the scrutinised Id, + -- The second is the level *at which it was scrutinised*. + -- (see Note [Avoiding fruitless liberate-case]) + -- The former is a bit redundant, since you could always + -- look it up in lc_lvl_env, but it's just cached here + -- + -- The order is insignificant; it's a bag really + -- + -- There's one element per scrutinisation; + -- in principle the same Id may appear multiple times, + -- although that'd be unusual: + -- case x of { (a,b) -> ....(case x of ...) .. } + } + +initEnv :: DynFlags -> LibCaseEnv +initEnv dflags + = LibCaseEnv { lc_dflags = dflags, + lc_lvl = 0, + lc_lvl_env = emptyVarEnv, + lc_rec_env = emptyVarEnv, + lc_scruts = [] } + +-- Bomb-out size for deciding if +-- potential liberatees are too big. +-- (passed in from cmd-line args) +bombOutSize :: LibCaseEnv -> Maybe Int +bombOutSize = liberateCaseThreshold . lc_dflags diff --git a/compiler/GHC/Core/Op/Monad.hs b/compiler/GHC/Core/Op/Monad.hs new file mode 100644 index 0000000000..a0a15bba6f --- /dev/null +++ b/compiler/GHC/Core/Op/Monad.hs @@ -0,0 +1,828 @@ +{- +(c) The AQUA Project, Glasgow University, 1993-1998 + +-} + +{-# LANGUAGE CPP #-} +{-# LANGUAGE DeriveFunctor #-} + +{-# OPTIONS_GHC -Wno-incomplete-record-updates #-} + +module GHC.Core.Op.Monad ( + -- * Configuration of the core-to-core passes + CoreToDo(..), runWhen, runMaybe, + SimplMode(..), + FloatOutSwitches(..), + pprPassDetails, + + -- * Plugins + CorePluginPass, bindsOnlyPass, + + -- * Counting + SimplCount, doSimplTick, doFreeSimplTick, simplCountN, + pprSimplCount, plusSimplCount, zeroSimplCount, + isZeroSimplCount, hasDetailedCounts, Tick(..), + + -- * The monad + CoreM, runCoreM, + + -- ** Reading from the monad + getHscEnv, getRuleBase, getModule, + getDynFlags, getPackageFamInstEnv, + getVisibleOrphanMods, getUniqMask, + getPrintUnqualified, getSrcSpanM, + + -- ** Writing to the monad + addSimplCount, + + -- ** Lifting into the monad + liftIO, liftIOWithCount, + + -- ** Dealing with annotations + getAnnotations, getFirstAnnotations, + + -- ** Screen output + putMsg, putMsgS, errorMsg, errorMsgS, warnMsg, + fatalErrorMsg, fatalErrorMsgS, + debugTraceMsg, debugTraceMsgS, + dumpIfSet_dyn + ) where + +import GhcPrelude hiding ( read ) + +import GHC.Core +import GHC.Driver.Types +import Module +import GHC.Driver.Session +import BasicTypes ( CompilerPhase(..) ) +import Annotations + +import IOEnv hiding ( liftIO, failM, failWithM ) +import qualified IOEnv ( liftIO ) +import Var +import Outputable +import FastString +import ErrUtils( Severity(..), DumpFormat (..), dumpOptionsFromFlag ) +import UniqSupply +import MonadUtils +import NameEnv +import SrcLoc +import Data.Bifunctor ( bimap ) +import ErrUtils (dumpAction) +import Data.List (intersperse, groupBy, sortBy) +import Data.Ord +import Data.Dynamic +import Data.Map (Map) +import qualified Data.Map as Map +import qualified Data.Map.Strict as MapStrict +import Data.Word +import Control.Monad +import Control.Applicative ( Alternative(..) ) +import Panic (throwGhcException, GhcException(..)) + +{- +************************************************************************ +* * + The CoreToDo type and related types + Abstraction of core-to-core passes to run. +* * +************************************************************************ +-} + +data CoreToDo -- These are diff core-to-core passes, + -- which may be invoked in any order, + -- as many times as you like. + + = CoreDoSimplify -- The core-to-core simplifier. + Int -- Max iterations + SimplMode + | CoreDoPluginPass String CorePluginPass + | CoreDoFloatInwards + | CoreDoFloatOutwards FloatOutSwitches + | CoreLiberateCase + | CoreDoPrintCore + | CoreDoStaticArgs + | CoreDoCallArity + | CoreDoExitify + | CoreDoDemand + | CoreDoCpr + | CoreDoWorkerWrapper + | CoreDoSpecialising + | CoreDoSpecConstr + | CoreCSE + | CoreDoRuleCheck CompilerPhase String -- Check for non-application of rules + -- matching this string + | CoreDoNothing -- Useful when building up + | CoreDoPasses [CoreToDo] -- lists of these things + + | CoreDesugar -- Right after desugaring, no simple optimisation yet! + | CoreDesugarOpt -- CoreDesugarXXX: Not strictly a core-to-core pass, but produces + -- Core output, and hence useful to pass to endPass + + | CoreTidy + | CorePrep + | CoreOccurAnal + +instance Outputable CoreToDo where + ppr (CoreDoSimplify _ _) = text "Simplifier" + ppr (CoreDoPluginPass s _) = text "Core plugin: " <+> text s + ppr CoreDoFloatInwards = text "Float inwards" + ppr (CoreDoFloatOutwards f) = text "Float out" <> parens (ppr f) + ppr CoreLiberateCase = text "Liberate case" + ppr CoreDoStaticArgs = text "Static argument" + ppr CoreDoCallArity = text "Called arity analysis" + ppr CoreDoExitify = text "Exitification transformation" + ppr CoreDoDemand = text "Demand analysis" + ppr CoreDoCpr = text "Constructed Product Result analysis" + ppr CoreDoWorkerWrapper = text "Worker Wrapper binds" + ppr CoreDoSpecialising = text "Specialise" + ppr CoreDoSpecConstr = text "SpecConstr" + ppr CoreCSE = text "Common sub-expression" + ppr CoreDesugar = text "Desugar (before optimization)" + ppr CoreDesugarOpt = text "Desugar (after optimization)" + ppr CoreTidy = text "Tidy Core" + ppr CorePrep = text "CorePrep" + ppr CoreOccurAnal = text "Occurrence analysis" + ppr CoreDoPrintCore = text "Print core" + ppr (CoreDoRuleCheck {}) = text "Rule check" + ppr CoreDoNothing = text "CoreDoNothing" + ppr (CoreDoPasses passes) = text "CoreDoPasses" <+> ppr passes + +pprPassDetails :: CoreToDo -> SDoc +pprPassDetails (CoreDoSimplify n md) = vcat [ text "Max iterations =" <+> int n + , ppr md ] +pprPassDetails _ = Outputable.empty + +data SimplMode -- See comments in GHC.Core.Op.Simplify.Monad + = SimplMode + { sm_names :: [String] -- Name(s) of the phase + , sm_phase :: CompilerPhase + , sm_dflags :: DynFlags -- Just for convenient non-monadic + -- access; we don't override these + , sm_rules :: Bool -- Whether RULES are enabled + , sm_inline :: Bool -- Whether inlining is enabled + , sm_case_case :: Bool -- Whether case-of-case is enabled + , sm_eta_expand :: Bool -- Whether eta-expansion is enabled + } + +instance Outputable SimplMode where + ppr (SimplMode { sm_phase = p, sm_names = ss + , sm_rules = r, sm_inline = i + , sm_eta_expand = eta, sm_case_case = cc }) + = text "SimplMode" <+> braces ( + sep [ text "Phase =" <+> ppr p <+> + brackets (text (concat $ intersperse "," ss)) <> comma + , pp_flag i (sLit "inline") <> comma + , pp_flag r (sLit "rules") <> comma + , pp_flag eta (sLit "eta-expand") <> comma + , pp_flag cc (sLit "case-of-case") ]) + where + pp_flag f s = ppUnless f (text "no") <+> ptext s + +data FloatOutSwitches = FloatOutSwitches { + floatOutLambdas :: Maybe Int, -- ^ Just n <=> float lambdas to top level, if + -- doing so will abstract over n or fewer + -- value variables + -- Nothing <=> float all lambdas to top level, + -- regardless of how many free variables + -- Just 0 is the vanilla case: float a lambda + -- iff it has no free vars + + floatOutConstants :: Bool, -- ^ True <=> float constants to top level, + -- even if they do not escape a lambda + floatOutOverSatApps :: Bool, + -- ^ True <=> float out over-saturated applications + -- based on arity information. + -- See Note [Floating over-saturated applications] + -- in GHC.Core.Op.SetLevels + floatToTopLevelOnly :: Bool -- ^ Allow floating to the top level only. + } +instance Outputable FloatOutSwitches where + ppr = pprFloatOutSwitches + +pprFloatOutSwitches :: FloatOutSwitches -> SDoc +pprFloatOutSwitches sw + = text "FOS" <+> (braces $ + sep $ punctuate comma $ + [ text "Lam =" <+> ppr (floatOutLambdas sw) + , text "Consts =" <+> ppr (floatOutConstants sw) + , text "OverSatApps =" <+> ppr (floatOutOverSatApps sw) ]) + +-- The core-to-core pass ordering is derived from the DynFlags: +runWhen :: Bool -> CoreToDo -> CoreToDo +runWhen True do_this = do_this +runWhen False _ = CoreDoNothing + +runMaybe :: Maybe a -> (a -> CoreToDo) -> CoreToDo +runMaybe (Just x) f = f x +runMaybe Nothing _ = CoreDoNothing + +{- + +************************************************************************ +* * + Types for Plugins +* * +************************************************************************ +-} + +-- | A description of the plugin pass itself +type CorePluginPass = ModGuts -> CoreM ModGuts + +bindsOnlyPass :: (CoreProgram -> CoreM CoreProgram) -> ModGuts -> CoreM ModGuts +bindsOnlyPass pass guts + = do { binds' <- pass (mg_binds guts) + ; return (guts { mg_binds = binds' }) } + +{- +************************************************************************ +* * + Counting and logging +* * +************************************************************************ +-} + +getVerboseSimplStats :: (Bool -> SDoc) -> SDoc +getVerboseSimplStats = getPprDebug -- For now, anyway + +zeroSimplCount :: DynFlags -> SimplCount +isZeroSimplCount :: SimplCount -> Bool +hasDetailedCounts :: SimplCount -> Bool +pprSimplCount :: SimplCount -> SDoc +doSimplTick :: DynFlags -> Tick -> SimplCount -> SimplCount +doFreeSimplTick :: Tick -> SimplCount -> SimplCount +plusSimplCount :: SimplCount -> SimplCount -> SimplCount + +data SimplCount + = VerySimplCount !Int -- Used when don't want detailed stats + + | SimplCount { + ticks :: !Int, -- Total ticks + details :: !TickCounts, -- How many of each type + + n_log :: !Int, -- N + log1 :: [Tick], -- Last N events; <= opt_HistorySize, + -- most recent first + log2 :: [Tick] -- Last opt_HistorySize events before that + -- Having log1, log2 lets us accumulate the + -- recent history reasonably efficiently + } + +type TickCounts = Map Tick Int + +simplCountN :: SimplCount -> Int +simplCountN (VerySimplCount n) = n +simplCountN (SimplCount { ticks = n }) = n + +zeroSimplCount dflags + -- This is where we decide whether to do + -- the VerySimpl version or the full-stats version + | dopt Opt_D_dump_simpl_stats dflags + = SimplCount {ticks = 0, details = Map.empty, + n_log = 0, log1 = [], log2 = []} + | otherwise + = VerySimplCount 0 + +isZeroSimplCount (VerySimplCount n) = n==0 +isZeroSimplCount (SimplCount { ticks = n }) = n==0 + +hasDetailedCounts (VerySimplCount {}) = False +hasDetailedCounts (SimplCount {}) = True + +doFreeSimplTick tick sc@SimplCount { details = dts } + = sc { details = dts `addTick` tick } +doFreeSimplTick _ sc = sc + +doSimplTick dflags tick + sc@(SimplCount { ticks = tks, details = dts, n_log = nl, log1 = l1 }) + | nl >= historySize dflags = sc1 { n_log = 1, log1 = [tick], log2 = l1 } + | otherwise = sc1 { n_log = nl+1, log1 = tick : l1 } + where + sc1 = sc { ticks = tks+1, details = dts `addTick` tick } + +doSimplTick _ _ (VerySimplCount n) = VerySimplCount (n+1) + + +addTick :: TickCounts -> Tick -> TickCounts +addTick fm tick = MapStrict.insertWith (+) tick 1 fm + +plusSimplCount sc1@(SimplCount { ticks = tks1, details = dts1 }) + sc2@(SimplCount { ticks = tks2, details = dts2 }) + = log_base { ticks = tks1 + tks2 + , details = MapStrict.unionWith (+) dts1 dts2 } + where + -- A hackish way of getting recent log info + log_base | null (log1 sc2) = sc1 -- Nothing at all in sc2 + | null (log2 sc2) = sc2 { log2 = log1 sc1 } + | otherwise = sc2 + +plusSimplCount (VerySimplCount n) (VerySimplCount m) = VerySimplCount (n+m) +plusSimplCount lhs rhs = + throwGhcException . PprProgramError "plusSimplCount" $ vcat + [ text "lhs" + , pprSimplCount lhs + , text "rhs" + , pprSimplCount rhs + ] + -- We use one or the other consistently + +pprSimplCount (VerySimplCount n) = text "Total ticks:" <+> int n +pprSimplCount (SimplCount { ticks = tks, details = dts, log1 = l1, log2 = l2 }) + = vcat [text "Total ticks: " <+> int tks, + blankLine, + pprTickCounts dts, + getVerboseSimplStats $ \dbg -> if dbg + then + vcat [blankLine, + text "Log (most recent first)", + nest 4 (vcat (map ppr l1) $$ vcat (map ppr l2))] + else Outputable.empty + ] + +{- Note [Which transformations are innocuous] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +At one point (Jun 18) I wondered if some transformations (ticks) +might be "innocuous", in the sense that they do not unlock a later +transformation that does not occur in the same pass. If so, we could +refrain from bumping the overall tick-count for such innocuous +transformations, and perhaps terminate the simplifier one pass +earlier. + +But alas I found that virtually nothing was innocuous! This Note +just records what I learned, in case anyone wants to try again. + +These transformations are not innocuous: + +*** NB: I think these ones could be made innocuous + EtaExpansion + LetFloatFromLet + +LetFloatFromLet + x = K (let z = e2 in Just z) + prepareRhs transforms to + x2 = let z=e2 in Just z + x = K xs + And now more let-floating can happen in the + next pass, on x2 + +PreInlineUnconditionally + Example in spectral/cichelli/Auxil + hinsert = ...let lo = e in + let j = ...lo... in + case x of + False -> () + True -> case lo of I# lo' -> + ...j... + When we PreInlineUnconditionally j, lo's occ-info changes to once, + so it can be PreInlineUnconditionally in the next pass, and a + cascade of further things can happen. + +PostInlineUnconditionally + let x = e in + let y = ...x.. in + case .. of { A -> ...x...y... + B -> ...x...y... } + Current postinlineUnconditinaly will inline y, and then x; sigh. + + But PostInlineUnconditionally might also unlock subsequent + transformations for the same reason as PreInlineUnconditionally, + so it's probably not innocuous anyway. + +KnownBranch, BetaReduction: + May drop chunks of code, and thereby enable PreInlineUnconditionally + for some let-binding which now occurs once + +EtaExpansion: + Example in imaginary/digits-of-e1 + fail = \void. e where e :: IO () + --> etaExpandRhs + fail = \void. (\s. (e |> g) s) |> sym g where g :: IO () ~ S -> (S,()) + --> Next iteration of simplify + fail1 = \void. \s. (e |> g) s + fail = fail1 |> Void#->sym g + And now inline 'fail' + +CaseMerge: + case x of y { + DEFAULT -> case y of z { pi -> ei } + alts2 } + ---> CaseMerge + case x of { pi -> let z = y in ei + ; alts2 } + The "let z=y" case-binder-swap gets dealt with in the next pass +-} + +pprTickCounts :: Map Tick Int -> SDoc +pprTickCounts counts + = vcat (map pprTickGroup groups) + where + groups :: [[(Tick,Int)]] -- Each group shares a common tag + -- toList returns common tags adjacent + groups = groupBy same_tag (Map.toList counts) + same_tag (tick1,_) (tick2,_) = tickToTag tick1 == tickToTag tick2 + +pprTickGroup :: [(Tick, Int)] -> SDoc +pprTickGroup group@((tick1,_):_) + = hang (int (sum [n | (_,n) <- group]) <+> text (tickString tick1)) + 2 (vcat [ int n <+> pprTickCts tick + -- flip as we want largest first + | (tick,n) <- sortBy (flip (comparing snd)) group]) +pprTickGroup [] = panic "pprTickGroup" + +data Tick -- See Note [Which transformations are innocuous] + = PreInlineUnconditionally Id + | PostInlineUnconditionally Id + + | UnfoldingDone Id + | RuleFired FastString -- Rule name + + | LetFloatFromLet + | EtaExpansion Id -- LHS binder + | EtaReduction Id -- Binder on outer lambda + | BetaReduction Id -- Lambda binder + + + | CaseOfCase Id -- Bndr on *inner* case + | KnownBranch Id -- Case binder + | CaseMerge Id -- Binder on outer case + | AltMerge Id -- Case binder + | CaseElim Id -- Case binder + | CaseIdentity Id -- Case binder + | FillInCaseDefault Id -- Case binder + + | SimplifierDone -- Ticked at each iteration of the simplifier + +instance Outputable Tick where + ppr tick = text (tickString tick) <+> pprTickCts tick + +instance Eq Tick where + a == b = case a `cmpTick` b of + EQ -> True + _ -> False + +instance Ord Tick where + compare = cmpTick + +tickToTag :: Tick -> Int +tickToTag (PreInlineUnconditionally _) = 0 +tickToTag (PostInlineUnconditionally _) = 1 +tickToTag (UnfoldingDone _) = 2 +tickToTag (RuleFired _) = 3 +tickToTag LetFloatFromLet = 4 +tickToTag (EtaExpansion _) = 5 +tickToTag (EtaReduction _) = 6 +tickToTag (BetaReduction _) = 7 +tickToTag (CaseOfCase _) = 8 +tickToTag (KnownBranch _) = 9 +tickToTag (CaseMerge _) = 10 +tickToTag (CaseElim _) = 11 +tickToTag (CaseIdentity _) = 12 +tickToTag (FillInCaseDefault _) = 13 +tickToTag SimplifierDone = 16 +tickToTag (AltMerge _) = 17 + +tickString :: Tick -> String +tickString (PreInlineUnconditionally _) = "PreInlineUnconditionally" +tickString (PostInlineUnconditionally _)= "PostInlineUnconditionally" +tickString (UnfoldingDone _) = "UnfoldingDone" +tickString (RuleFired _) = "RuleFired" +tickString LetFloatFromLet = "LetFloatFromLet" +tickString (EtaExpansion _) = "EtaExpansion" +tickString (EtaReduction _) = "EtaReduction" +tickString (BetaReduction _) = "BetaReduction" +tickString (CaseOfCase _) = "CaseOfCase" +tickString (KnownBranch _) = "KnownBranch" +tickString (CaseMerge _) = "CaseMerge" +tickString (AltMerge _) = "AltMerge" +tickString (CaseElim _) = "CaseElim" +tickString (CaseIdentity _) = "CaseIdentity" +tickString (FillInCaseDefault _) = "FillInCaseDefault" +tickString SimplifierDone = "SimplifierDone" + +pprTickCts :: Tick -> SDoc +pprTickCts (PreInlineUnconditionally v) = ppr v +pprTickCts (PostInlineUnconditionally v)= ppr v +pprTickCts (UnfoldingDone v) = ppr v +pprTickCts (RuleFired v) = ppr v +pprTickCts LetFloatFromLet = Outputable.empty +pprTickCts (EtaExpansion v) = ppr v +pprTickCts (EtaReduction v) = ppr v +pprTickCts (BetaReduction v) = ppr v +pprTickCts (CaseOfCase v) = ppr v +pprTickCts (KnownBranch v) = ppr v +pprTickCts (CaseMerge v) = ppr v +pprTickCts (AltMerge v) = ppr v +pprTickCts (CaseElim v) = ppr v +pprTickCts (CaseIdentity v) = ppr v +pprTickCts (FillInCaseDefault v) = ppr v +pprTickCts _ = Outputable.empty + +cmpTick :: Tick -> Tick -> Ordering +cmpTick a b = case (tickToTag a `compare` tickToTag b) of + GT -> GT + EQ -> cmpEqTick a b + LT -> LT + +cmpEqTick :: Tick -> Tick -> Ordering +cmpEqTick (PreInlineUnconditionally a) (PreInlineUnconditionally b) = a `compare` b +cmpEqTick (PostInlineUnconditionally a) (PostInlineUnconditionally b) = a `compare` b +cmpEqTick (UnfoldingDone a) (UnfoldingDone b) = a `compare` b +cmpEqTick (RuleFired a) (RuleFired b) = a `compare` b +cmpEqTick (EtaExpansion a) (EtaExpansion b) = a `compare` b +cmpEqTick (EtaReduction a) (EtaReduction b) = a `compare` b +cmpEqTick (BetaReduction a) (BetaReduction b) = a `compare` b +cmpEqTick (CaseOfCase a) (CaseOfCase b) = a `compare` b +cmpEqTick (KnownBranch a) (KnownBranch b) = a `compare` b +cmpEqTick (CaseMerge a) (CaseMerge b) = a `compare` b +cmpEqTick (AltMerge a) (AltMerge b) = a `compare` b +cmpEqTick (CaseElim a) (CaseElim b) = a `compare` b +cmpEqTick (CaseIdentity a) (CaseIdentity b) = a `compare` b +cmpEqTick (FillInCaseDefault a) (FillInCaseDefault b) = a `compare` b +cmpEqTick _ _ = EQ + +{- +************************************************************************ +* * + Monad and carried data structure definitions +* * +************************************************************************ +-} + +data CoreReader = CoreReader { + cr_hsc_env :: HscEnv, + cr_rule_base :: RuleBase, + cr_module :: Module, + cr_print_unqual :: PrintUnqualified, + cr_loc :: SrcSpan, -- Use this for log/error messages so they + -- are at least tagged with the right source file + cr_visible_orphan_mods :: !ModuleSet, + cr_uniq_mask :: !Char -- Mask for creating unique values +} + +-- Note: CoreWriter used to be defined with data, rather than newtype. If it +-- is defined that way again, the cw_simpl_count field, at least, must be +-- strict to avoid a space leak (#7702). +newtype CoreWriter = CoreWriter { + cw_simpl_count :: SimplCount +} + +emptyWriter :: DynFlags -> CoreWriter +emptyWriter dflags = CoreWriter { + cw_simpl_count = zeroSimplCount dflags + } + +plusWriter :: CoreWriter -> CoreWriter -> CoreWriter +plusWriter w1 w2 = CoreWriter { + cw_simpl_count = (cw_simpl_count w1) `plusSimplCount` (cw_simpl_count w2) + } + +type CoreIOEnv = IOEnv CoreReader + +-- | The monad used by Core-to-Core passes to register simplification statistics. +-- Also used to have common state (in the form of UniqueSupply) for generating Uniques. +newtype CoreM a = CoreM { unCoreM :: CoreIOEnv (a, CoreWriter) } + deriving (Functor) + +instance Monad CoreM where + mx >>= f = CoreM $ do + (x, w1) <- unCoreM mx + (y, w2) <- unCoreM (f x) + let w = w1 `plusWriter` w2 + return $ seq w (y, w) + -- forcing w before building the tuple avoids a space leak + -- (#7702) + +instance Applicative CoreM where + pure x = CoreM $ nop x + (<*>) = ap + m *> k = m >>= \_ -> k + +instance Alternative CoreM where + empty = CoreM Control.Applicative.empty + m <|> n = CoreM (unCoreM m <|> unCoreM n) + +instance MonadPlus CoreM + +instance MonadUnique CoreM where + getUniqueSupplyM = do + mask <- read cr_uniq_mask + liftIO $! mkSplitUniqSupply mask + + getUniqueM = do + mask <- read cr_uniq_mask + liftIO $! uniqFromMask mask + +runCoreM :: HscEnv + -> RuleBase + -> Char -- ^ Mask + -> Module + -> ModuleSet + -> PrintUnqualified + -> SrcSpan + -> CoreM a + -> IO (a, SimplCount) +runCoreM hsc_env rule_base mask mod orph_imps print_unqual loc m + = liftM extract $ runIOEnv reader $ unCoreM m + where + reader = CoreReader { + cr_hsc_env = hsc_env, + cr_rule_base = rule_base, + cr_module = mod, + cr_visible_orphan_mods = orph_imps, + cr_print_unqual = print_unqual, + cr_loc = loc, + cr_uniq_mask = mask + } + + extract :: (a, CoreWriter) -> (a, SimplCount) + extract (value, writer) = (value, cw_simpl_count writer) + +{- +************************************************************************ +* * + Core combinators, not exported +* * +************************************************************************ +-} + +nop :: a -> CoreIOEnv (a, CoreWriter) +nop x = do + r <- getEnv + return (x, emptyWriter $ (hsc_dflags . cr_hsc_env) r) + +read :: (CoreReader -> a) -> CoreM a +read f = CoreM $ getEnv >>= (\r -> nop (f r)) + +write :: CoreWriter -> CoreM () +write w = CoreM $ return ((), w) + +-- \subsection{Lifting IO into the monad} + +-- | Lift an 'IOEnv' operation into 'CoreM' +liftIOEnv :: CoreIOEnv a -> CoreM a +liftIOEnv mx = CoreM (mx >>= (\x -> nop x)) + +instance MonadIO CoreM where + liftIO = liftIOEnv . IOEnv.liftIO + +-- | Lift an 'IO' operation into 'CoreM' while consuming its 'SimplCount' +liftIOWithCount :: IO (SimplCount, a) -> CoreM a +liftIOWithCount what = liftIO what >>= (\(count, x) -> addSimplCount count >> return x) + +{- +************************************************************************ +* * + Reader, writer and state accessors +* * +************************************************************************ +-} + +getHscEnv :: CoreM HscEnv +getHscEnv = read cr_hsc_env + +getRuleBase :: CoreM RuleBase +getRuleBase = read cr_rule_base + +getVisibleOrphanMods :: CoreM ModuleSet +getVisibleOrphanMods = read cr_visible_orphan_mods + +getPrintUnqualified :: CoreM PrintUnqualified +getPrintUnqualified = read cr_print_unqual + +getSrcSpanM :: CoreM SrcSpan +getSrcSpanM = read cr_loc + +addSimplCount :: SimplCount -> CoreM () +addSimplCount count = write (CoreWriter { cw_simpl_count = count }) + +getUniqMask :: CoreM Char +getUniqMask = read cr_uniq_mask + +-- Convenience accessors for useful fields of HscEnv + +instance HasDynFlags CoreM where + getDynFlags = fmap hsc_dflags getHscEnv + +instance HasModule CoreM where + getModule = read cr_module + +getPackageFamInstEnv :: CoreM PackageFamInstEnv +getPackageFamInstEnv = do + hsc_env <- getHscEnv + eps <- liftIO $ hscEPS hsc_env + return $ eps_fam_inst_env eps + +{- +************************************************************************ +* * + Dealing with annotations +* * +************************************************************************ +-} + +-- | Get all annotations of a given type. This happens lazily, that is +-- no deserialization will take place until the [a] is actually demanded and +-- the [a] can also be empty (the UniqFM is not filtered). +-- +-- This should be done once at the start of a Core-to-Core pass that uses +-- annotations. +-- +-- See Note [Annotations] +getAnnotations :: Typeable a => ([Word8] -> a) -> ModGuts -> CoreM (ModuleEnv [a], NameEnv [a]) +getAnnotations deserialize guts = do + hsc_env <- getHscEnv + ann_env <- liftIO $ prepareAnnotations hsc_env (Just guts) + return (deserializeAnns deserialize ann_env) + +-- | Get at most one annotation of a given type per annotatable item. +getFirstAnnotations :: Typeable a => ([Word8] -> a) -> ModGuts -> CoreM (ModuleEnv a, NameEnv a) +getFirstAnnotations deserialize guts + = bimap mod name <$> getAnnotations deserialize guts + where + mod = mapModuleEnv head . filterModuleEnv (const $ not . null) + name = mapNameEnv head . filterNameEnv (not . null) + +{- +Note [Annotations] +~~~~~~~~~~~~~~~~~~ +A Core-to-Core pass that wants to make use of annotations calls +getAnnotations or getFirstAnnotations at the beginning to obtain a UniqFM with +annotations of a specific type. This produces all annotations from interface +files read so far. However, annotations from interface files read during the +pass will not be visible until getAnnotations is called again. This is similar +to how rules work and probably isn't too bad. + +The current implementation could be optimised a bit: when looking up +annotations for a thing from the HomePackageTable, we could search directly in +the module where the thing is defined rather than building one UniqFM which +contains all annotations we know of. This would work because annotations can +only be given to things defined in the same module. However, since we would +only want to deserialise every annotation once, we would have to build a cache +for every module in the HTP. In the end, it's probably not worth it as long as +we aren't using annotations heavily. + +************************************************************************ +* * + Direct screen output +* * +************************************************************************ +-} + +msg :: Severity -> WarnReason -> SDoc -> CoreM () +msg sev reason doc + = do { dflags <- getDynFlags + ; loc <- getSrcSpanM + ; unqual <- getPrintUnqualified + ; let sty = case sev of + SevError -> err_sty + SevWarning -> err_sty + SevDump -> dump_sty + _ -> user_sty + err_sty = mkErrStyle dflags unqual + user_sty = mkUserStyle dflags unqual AllTheWay + dump_sty = mkDumpStyle dflags unqual + ; liftIO $ putLogMsg dflags reason sev loc sty doc } + +-- | Output a String message to the screen +putMsgS :: String -> CoreM () +putMsgS = putMsg . text + +-- | Output a message to the screen +putMsg :: SDoc -> CoreM () +putMsg = msg SevInfo NoReason + +-- | Output an error to the screen. Does not cause the compiler to die. +errorMsgS :: String -> CoreM () +errorMsgS = errorMsg . text + +-- | Output an error to the screen. Does not cause the compiler to die. +errorMsg :: SDoc -> CoreM () +errorMsg = msg SevError NoReason + +warnMsg :: WarnReason -> SDoc -> CoreM () +warnMsg = msg SevWarning + +-- | Output a fatal error to the screen. Does not cause the compiler to die. +fatalErrorMsgS :: String -> CoreM () +fatalErrorMsgS = fatalErrorMsg . text + +-- | Output a fatal error to the screen. Does not cause the compiler to die. +fatalErrorMsg :: SDoc -> CoreM () +fatalErrorMsg = msg SevFatal NoReason + +-- | Output a string debugging message at verbosity level of @-v@ or higher +debugTraceMsgS :: String -> CoreM () +debugTraceMsgS = debugTraceMsg . text + +-- | Outputs a debugging message at verbosity level of @-v@ or higher +debugTraceMsg :: SDoc -> CoreM () +debugTraceMsg = msg SevDump NoReason + +-- | Show some labelled 'SDoc' if a particular flag is set or at a verbosity level of @-v -ddump-most@ or higher +dumpIfSet_dyn :: DumpFlag -> String -> DumpFormat -> SDoc -> CoreM () +dumpIfSet_dyn flag str fmt doc + = do { dflags <- getDynFlags + ; unqual <- getPrintUnqualified + ; when (dopt flag dflags) $ liftIO $ do + let sty = mkDumpStyle dflags unqual + dumpAction dflags sty (dumpOptionsFromFlag flag) str fmt doc } diff --git a/compiler/GHC/Core/Op/Monad.hs-boot b/compiler/GHC/Core/Op/Monad.hs-boot new file mode 100644 index 0000000000..4ca105a66c --- /dev/null +++ b/compiler/GHC/Core/Op/Monad.hs-boot @@ -0,0 +1,30 @@ +-- Created this hs-boot file to remove circular dependencies from the use of +-- Plugins. Plugins needs CoreToDo and CoreM types to define core-to-core +-- transformations. +-- However GHC.Core.Op.Monad does much more than defining these, and because Plugins are +-- activated in various modules, the imports become circular. To solve this I +-- extracted CoreToDo and CoreM into this file. +-- I needed to write the whole definition of these types, otherwise it created +-- a data-newtype conflict. + +module GHC.Core.Op.Monad ( CoreToDo, CoreM ) where + +import GhcPrelude + +import IOEnv ( IOEnv ) + +type CoreIOEnv = IOEnv CoreReader + +data CoreReader + +newtype CoreWriter = CoreWriter { + cw_simpl_count :: SimplCount +} + +data SimplCount + +newtype CoreM a = CoreM { unCoreM :: CoreIOEnv (a, CoreWriter) } + +instance Monad CoreM + +data CoreToDo diff --git a/compiler/GHC/Core/Op/OccurAnal.hs b/compiler/GHC/Core/Op/OccurAnal.hs new file mode 100644 index 0000000000..b676be38ae --- /dev/null +++ b/compiler/GHC/Core/Op/OccurAnal.hs @@ -0,0 +1,2898 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +************************************************************************ +* * +\section[OccurAnal]{Occurrence analysis pass} +* * +************************************************************************ + +The occurrence analyser re-typechecks a core expression, returning a new +core expression with (hopefully) improved usage information. +-} + +{-# LANGUAGE CPP, BangPatterns, MultiWayIf, ViewPatterns #-} + +{-# OPTIONS_GHC -Wno-incomplete-record-updates #-} + +module GHC.Core.Op.OccurAnal ( + occurAnalysePgm, occurAnalyseExpr, occurAnalyseExpr_NoBinderSwap + ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core +import GHC.Core.FVs +import GHC.Core.Utils ( exprIsTrivial, isDefaultAlt, isExpandableApp, + stripTicksTopE, mkTicks ) +import GHC.Core.Arity ( joinRhsArity ) +import Id +import IdInfo +import Name( localiseName ) +import BasicTypes +import Module( Module ) +import GHC.Core.Coercion +import GHC.Core.Type + +import VarSet +import VarEnv +import Var +import Demand ( argOneShots, argsOneShots ) +import Digraph ( SCC(..), Node(..) + , stronglyConnCompFromEdgedVerticesUniq + , stronglyConnCompFromEdgedVerticesUniqR ) +import Unique +import UniqFM +import UniqSet +import Util +import Outputable +import Data.List +import Control.Arrow ( second ) + +{- +************************************************************************ +* * + occurAnalysePgm, occurAnalyseExpr, occurAnalyseExpr_NoBinderSwap +* * +************************************************************************ + +Here's the externally-callable interface: +-} + +occurAnalysePgm :: Module -- Used only in debug output + -> (Id -> Bool) -- Active unfoldings + -> (Activation -> Bool) -- Active rules + -> [CoreRule] + -> CoreProgram -> CoreProgram +occurAnalysePgm this_mod active_unf active_rule imp_rules binds + | isEmptyDetails final_usage + = occ_anald_binds + + | otherwise -- See Note [Glomming] + = WARN( True, hang (text "Glomming in" <+> ppr this_mod <> colon) + 2 (ppr final_usage ) ) + occ_anald_glommed_binds + where + init_env = initOccEnv { occ_rule_act = active_rule + , occ_unf_act = active_unf } + + (final_usage, occ_anald_binds) = go init_env binds + (_, occ_anald_glommed_binds) = occAnalRecBind init_env TopLevel + imp_rule_edges + (flattenBinds binds) + initial_uds + -- It's crucial to re-analyse the glommed-together bindings + -- so that we establish the right loop breakers. Otherwise + -- we can easily create an infinite loop (#9583 is an example) + -- + -- Also crucial to re-analyse the /original/ bindings + -- in case the first pass accidentally discarded as dead code + -- a binding that was actually needed (albeit before its + -- definition site). #17724 threw this up. + + initial_uds = addManyOccsSet emptyDetails + (rulesFreeVars imp_rules) + -- The RULES declarations keep things alive! + + -- Note [Preventing loops due to imported functions rules] + imp_rule_edges = foldr (plusVarEnv_C unionVarSet) emptyVarEnv + [ mapVarEnv (const maps_to) $ + getUniqSet (exprFreeIds arg `delVarSetList` ru_bndrs imp_rule) + | imp_rule <- imp_rules + , not (isBuiltinRule imp_rule) -- See Note [Plugin rules] + , let maps_to = exprFreeIds (ru_rhs imp_rule) + `delVarSetList` ru_bndrs imp_rule + , arg <- ru_args imp_rule ] + + go :: OccEnv -> [CoreBind] -> (UsageDetails, [CoreBind]) + go _ [] + = (initial_uds, []) + go env (bind:binds) + = (final_usage, bind' ++ binds') + where + (bs_usage, binds') = go env binds + (final_usage, bind') = occAnalBind env TopLevel imp_rule_edges bind + bs_usage + +occurAnalyseExpr :: CoreExpr -> CoreExpr + -- Do occurrence analysis, and discard occurrence info returned +occurAnalyseExpr = occurAnalyseExpr' True -- do binder swap + +occurAnalyseExpr_NoBinderSwap :: CoreExpr -> CoreExpr +occurAnalyseExpr_NoBinderSwap = occurAnalyseExpr' False -- do not do binder swap + +occurAnalyseExpr' :: Bool -> CoreExpr -> CoreExpr +occurAnalyseExpr' enable_binder_swap expr + = snd (occAnal env expr) + where + env = initOccEnv { occ_binder_swap = enable_binder_swap } + +{- Note [Plugin rules] +~~~~~~~~~~~~~~~~~~~~~~ +Conal Elliott (#11651) built a GHC plugin that added some +BuiltinRules (for imported Ids) to the mg_rules field of ModGuts, to +do some domain-specific transformations that could not be expressed +with an ordinary pattern-matching CoreRule. But then we can't extract +the dependencies (in imp_rule_edges) from ru_rhs etc, because a +BuiltinRule doesn't have any of that stuff. + +So we simply assume that BuiltinRules have no dependencies, and filter +them out from the imp_rule_edges comprehension. +-} + +{- +************************************************************************ +* * + Bindings +* * +************************************************************************ + +Note [Recursive bindings: the grand plan] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we come across a binding group + Rec { x1 = r1; ...; xn = rn } +we treat it like this (occAnalRecBind): + +1. Occurrence-analyse each right hand side, and build a + "Details" for each binding to capture the results. + + Wrap the details in a Node (details, node-id, dep-node-ids), + where node-id is just the unique of the binder, and + dep-node-ids lists all binders on which this binding depends. + We'll call these the "scope edges". + See Note [Forming the Rec groups]. + + All this is done by makeNode. + +2. Do SCC-analysis on these Nodes. Each SCC will become a new Rec or + NonRec. The key property is that every free variable of a binding + is accounted for by the scope edges, so that when we are done + everything is still in scope. + +3. For each Cyclic SCC of the scope-edge SCC-analysis in (2), we + identify suitable loop-breakers to ensure that inlining terminates. + This is done by occAnalRec. + +4. To do so we form a new set of Nodes, with the same details, but + different edges, the "loop-breaker nodes". The loop-breaker nodes + have both more and fewer dependencies than the scope edges + (see Note [Choosing loop breakers]) + + More edges: if f calls g, and g has an active rule that mentions h + then we add an edge from f -> h + + Fewer edges: we only include dependencies on active rules, on rule + RHSs (not LHSs) and if there is an INLINE pragma only + on the stable unfolding (and vice versa). The scope + edges must be much more inclusive. + +5. The "weak fvs" of a node are, by definition: + the scope fvs - the loop-breaker fvs + See Note [Weak loop breakers], and the nd_weak field of Details + +6. Having formed the loop-breaker nodes + +Note [Dead code] +~~~~~~~~~~~~~~~~ +Dropping dead code for a cyclic Strongly Connected Component is done +in a very simple way: + + the entire SCC is dropped if none of its binders are mentioned + in the body; otherwise the whole thing is kept. + +The key observation is that dead code elimination happens after +dependency analysis: so 'occAnalBind' processes SCCs instead of the +original term's binding groups. + +Thus 'occAnalBind' does indeed drop 'f' in an example like + + letrec f = ...g... + g = ...(...g...)... + in + ...g... + +when 'g' no longer uses 'f' at all (eg 'f' does not occur in a RULE in +'g'). 'occAnalBind' first consumes 'CyclicSCC g' and then it consumes +'AcyclicSCC f', where 'body_usage' won't contain 'f'. + +------------------------------------------------------------ +Note [Forming Rec groups] +~~~~~~~~~~~~~~~~~~~~~~~~~ +We put bindings {f = ef; g = eg } in a Rec group if "f uses g" +and "g uses f", no matter how indirectly. We do a SCC analysis +with an edge f -> g if "f uses g". + +More precisely, "f uses g" iff g should be in scope wherever f is. +That is, g is free in: + a) the rhs 'ef' + b) or the RHS of a rule for f (Note [Rules are extra RHSs]) + c) or the LHS or a rule for f (Note [Rule dependency info]) + +These conditions apply regardless of the activation of the RULE (eg it might be +inactive in this phase but become active later). Once a Rec is broken up +it can never be put back together, so we must be conservative. + +The principle is that, regardless of rule firings, every variable is +always in scope. + + * Note [Rules are extra RHSs] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~ + A RULE for 'f' is like an extra RHS for 'f'. That way the "parent" + keeps the specialised "children" alive. If the parent dies + (because it isn't referenced any more), then the children will die + too (unless they are already referenced directly). + + To that end, we build a Rec group for each cyclic strongly + connected component, + *treating f's rules as extra RHSs for 'f'*. + More concretely, the SCC analysis runs on a graph with an edge + from f -> g iff g is mentioned in + (a) f's rhs + (b) f's RULES + These are rec_edges. + + Under (b) we include variables free in *either* LHS *or* RHS of + the rule. The former might seems silly, but see Note [Rule + dependency info]. So in Example [eftInt], eftInt and eftIntFB + will be put in the same Rec, even though their 'main' RHSs are + both non-recursive. + + * Note [Rule dependency info] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The VarSet in a RuleInfo is used for dependency analysis in the + occurrence analyser. We must track free vars in *both* lhs and rhs. + Hence use of idRuleVars, rather than idRuleRhsVars in occAnalBind. + Why both? Consider + x = y + RULE f x = v+4 + Then if we substitute y for x, we'd better do so in the + rule's LHS too, so we'd better ensure the RULE appears to mention 'x' + as well as 'v' + + * Note [Rules are visible in their own rec group] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + We want the rules for 'f' to be visible in f's right-hand side. + And we'd like them to be visible in other functions in f's Rec + group. E.g. in Note [Specialisation rules] we want f' rule + to be visible in both f's RHS, and fs's RHS. + + This means that we must simplify the RULEs first, before looking + at any of the definitions. This is done by Simplify.simplRecBind, + when it calls addLetIdInfo. + +------------------------------------------------------------ +Note [Choosing loop breakers] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Loop breaking is surprisingly subtle. First read the section 4 of +"Secrets of the GHC inliner". This describes our basic plan. +We avoid infinite inlinings by choosing loop breakers, and +ensuring that a loop breaker cuts each loop. + +See also Note [Inlining and hs-boot files] in GHC.Core.ToIface, which +deals with a closely related source of infinite loops. + +Fundamentally, we do SCC analysis on a graph. For each recursive +group we choose a loop breaker, delete all edges to that node, +re-analyse the SCC, and iterate. + +But what is the graph? NOT the same graph as was used for Note +[Forming Rec groups]! In particular, a RULE is like an equation for +'f' that is *always* inlined if it is applicable. We do *not* disable +rules for loop-breakers. It's up to whoever makes the rules to make +sure that the rules themselves always terminate. See Note [Rules for +recursive functions] in GHC.Core.Op.Simplify + +Hence, if + f's RHS (or its INLINE template if it has one) mentions g, and + g has a RULE that mentions h, and + h has a RULE that mentions f + +then we *must* choose f to be a loop breaker. Example: see Note +[Specialisation rules]. + +In general, take the free variables of f's RHS, and augment it with +all the variables reachable by RULES from those starting points. That +is the whole reason for computing rule_fv_env in occAnalBind. (Of +course we only consider free vars that are also binders in this Rec +group.) See also Note [Finding rule RHS free vars] + +Note that when we compute this rule_fv_env, we only consider variables +free in the *RHS* of the rule, in contrast to the way we build the +Rec group in the first place (Note [Rule dependency info]) + +Note that if 'g' has RHS that mentions 'w', we should add w to +g's loop-breaker edges. More concretely there is an edge from f -> g +iff + (a) g is mentioned in f's RHS `xor` f's INLINE rhs + (see Note [Inline rules]) + (b) or h is mentioned in f's RHS, and + g appears in the RHS of an active RULE of h + or a transitive sequence of active rules starting with h + +Why "active rules"? See Note [Finding rule RHS free vars] + +Note that in Example [eftInt], *neither* eftInt *nor* eftIntFB is +chosen as a loop breaker, because their RHSs don't mention each other. +And indeed both can be inlined safely. + +Note again that the edges of the graph we use for computing loop breakers +are not the same as the edges we use for computing the Rec blocks. +That's why we compute + +- rec_edges for the Rec block analysis +- loop_breaker_nodes for the loop breaker analysis + + * Note [Finding rule RHS free vars] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Consider this real example from Data Parallel Haskell + tagZero :: Array Int -> Array Tag + {-# INLINE [1] tagZeroes #-} + tagZero xs = pmap (\x -> fromBool (x==0)) xs + + {-# RULES "tagZero" [~1] forall xs n. + pmap fromBool <blah blah> = tagZero xs #-} + So tagZero's RHS mentions pmap, and pmap's RULE mentions tagZero. + However, tagZero can only be inlined in phase 1 and later, while + the RULE is only active *before* phase 1. So there's no problem. + + To make this work, we look for the RHS free vars only for + *active* rules. That's the reason for the occ_rule_act field + of the OccEnv. + + * Note [Weak loop breakers] + ~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a last nasty wrinkle. Suppose we have + + Rec { f = f_rhs + RULE f [] = g + + h = h_rhs + g = h + ...more... + } + + Remember that we simplify the RULES before any RHS (see Note + [Rules are visible in their own rec group] above). + + So we must *not* postInlineUnconditionally 'g', even though + its RHS turns out to be trivial. (I'm assuming that 'g' is + not chosen as a loop breaker.) Why not? Because then we + drop the binding for 'g', which leaves it out of scope in the + RULE! + + Here's a somewhat different example of the same thing + Rec { g = h + ; h = ...f... + ; f = f_rhs + RULE f [] = g } + Here the RULE is "below" g, but we *still* can't postInlineUnconditionally + g, because the RULE for f is active throughout. So the RHS of h + might rewrite to h = ...g... + So g must remain in scope in the output program! + + We "solve" this by: + + Make g a "weak" loop breaker (OccInfo = IAmLoopBreaker True) + iff g is a "missing free variable" of the Rec group + + A "missing free variable" x is one that is mentioned in an RHS or + INLINE or RULE of a binding in the Rec group, but where the + dependency on x may not show up in the loop_breaker_nodes (see + note [Choosing loop breakers} above). + + A normal "strong" loop breaker has IAmLoopBreaker False. So + + Inline postInlineUnconditionally + strong IAmLoopBreaker False no no + weak IAmLoopBreaker True yes no + other yes yes + + The **sole** reason for this kind of loop breaker is so that + postInlineUnconditionally does not fire. Ugh. (Typically it'll + inline via the usual callSiteInline stuff, so it'll be dead in the + next pass, so the main Ugh is the tiresome complication.) + +Note [Rules for imported functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this + f = /\a. B.g a + RULE B.g Int = 1 + f Int +Note that + * The RULE is for an imported function. + * f is non-recursive +Now we +can get + f Int --> B.g Int Inlining f + --> 1 + f Int Firing RULE +and so the simplifier goes into an infinite loop. This +would not happen if the RULE was for a local function, +because we keep track of dependencies through rules. But +that is pretty much impossible to do for imported Ids. Suppose +f's definition had been + f = /\a. C.h a +where (by some long and devious process), C.h eventually inlines to +B.g. We could only spot such loops by exhaustively following +unfoldings of C.h etc, in case we reach B.g, and hence (via the RULE) +f. + +Note that RULES for imported functions are important in practice; they +occur a lot in the libraries. + +We regard this potential infinite loop as a *programmer* error. +It's up the programmer not to write silly rules like + RULE f x = f x +and the example above is just a more complicated version. + +Note [Preventing loops due to imported functions rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider: + import GHC.Base (foldr) + + {-# RULES "filterList" forall p. foldr (filterFB (:) p) [] = filter p #-} + filter p xs = build (\c n -> foldr (filterFB c p) n xs) + filterFB c p = ... + + f = filter p xs + +Note that filter is not a loop-breaker, so what happens is: + f = filter p xs + = {inline} build (\c n -> foldr (filterFB c p) n xs) + = {inline} foldr (filterFB (:) p) [] xs + = {RULE} filter p xs + +We are in an infinite loop. + +A more elaborate example (that I actually saw in practice when I went to +mark GHC.List.filter as INLINABLE) is as follows. Say I have this module: + {-# LANGUAGE RankNTypes #-} + module GHCList where + + import Prelude hiding (filter) + import GHC.Base (build) + + {-# INLINABLE filter #-} + filter :: (a -> Bool) -> [a] -> [a] + filter p [] = [] + filter p (x:xs) = if p x then x : filter p xs else filter p xs + + {-# NOINLINE [0] filterFB #-} + filterFB :: (a -> b -> b) -> (a -> Bool) -> a -> b -> b + filterFB c p x r | p x = x `c` r + | otherwise = r + + {-# RULES + "filter" [~1] forall p xs. filter p xs = build (\c n -> foldr + (filterFB c p) n xs) + "filterList" [1] forall p. foldr (filterFB (:) p) [] = filter p + #-} + +Then (because RULES are applied inside INLINABLE unfoldings, but inlinings +are not), the unfolding given to "filter" in the interface file will be: + filter p [] = [] + filter p (x:xs) = if p x then x : build (\c n -> foldr (filterFB c p) n xs) + else build (\c n -> foldr (filterFB c p) n xs + +Note that because this unfolding does not mention "filter", filter is not +marked as a strong loop breaker. Therefore at a use site in another module: + filter p xs + = {inline} + case xs of [] -> [] + (x:xs) -> if p x then x : build (\c n -> foldr (filterFB c p) n xs) + else build (\c n -> foldr (filterFB c p) n xs) + + build (\c n -> foldr (filterFB c p) n xs) + = {inline} foldr (filterFB (:) p) [] xs + = {RULE} filter p xs + +And we are in an infinite loop again, except that this time the loop is producing an +infinitely large *term* (an unrolling of filter) and so the simplifier finally +dies with "ticks exhausted" + +Because of this problem, we make a small change in the occurrence analyser +designed to mark functions like "filter" as strong loop breakers on the basis that: + 1. The RHS of filter mentions the local function "filterFB" + 2. We have a rule which mentions "filterFB" on the LHS and "filter" on the RHS + +So for each RULE for an *imported* function we are going to add +dependency edges between the *local* FVS of the rule LHS and the +*local* FVS of the rule RHS. We don't do anything special for RULES on +local functions because the standard occurrence analysis stuff is +pretty good at getting loop-breakerness correct there. + +It is important to note that even with this extra hack we aren't always going to get +things right. For example, it might be that the rule LHS mentions an imported Id, +and another module has a RULE that can rewrite that imported Id to one of our local +Ids. + +Note [Specialising imported functions] (referred to from Specialise) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +BUT for *automatically-generated* rules, the programmer can't be +responsible for the "programmer error" in Note [Rules for imported +functions]. In particular, consider specialising a recursive function +defined in another module. If we specialise a recursive function B.g, +we get + g_spec = .....(B.g Int)..... + RULE B.g Int = g_spec +Here, g_spec doesn't look recursive, but when the rule fires, it +becomes so. And if B.g was mutually recursive, the loop might +not be as obvious as it is here. + +To avoid this, + * When specialising a function that is a loop breaker, + give a NOINLINE pragma to the specialised function + +Note [Glomming] +~~~~~~~~~~~~~~~ +RULES for imported Ids can make something at the top refer to something at the bottom: + f = \x -> B.g (q x) + h = \y -> 3 + + RULE: B.g (q x) = h x + +Applying this rule makes f refer to h, although f doesn't appear to +depend on h. (And, as in Note [Rules for imported functions], the +dependency might be more indirect. For example, f might mention C.t +rather than B.g, where C.t eventually inlines to B.g.) + +NOTICE that this cannot happen for rules whose head is a +locally-defined function, because we accurately track dependencies +through RULES. It only happens for rules whose head is an imported +function (B.g in the example above). + +Solution: + - When simplifying, bring all top level identifiers into + scope at the start, ignoring the Rec/NonRec structure, so + that when 'h' pops up in f's rhs, we find it in the in-scope set + (as the simplifier generally expects). This happens in simplTopBinds. + + - In the occurrence analyser, if there are any out-of-scope + occurrences that pop out of the top, which will happen after + firing the rule: f = \x -> h x + h = \y -> 3 + then just glom all the bindings into a single Rec, so that + the *next* iteration of the occurrence analyser will sort + them all out. This part happens in occurAnalysePgm. + +------------------------------------------------------------ +Note [Inline rules] +~~~~~~~~~~~~~~~~~~~ +None of the above stuff about RULES applies to Inline Rules, +stored in a CoreUnfolding. The unfolding, if any, is simplified +at the same time as the regular RHS of the function (ie *not* like +Note [Rules are visible in their own rec group]), so it should be +treated *exactly* like an extra RHS. + +Or, rather, when computing loop-breaker edges, + * If f has an INLINE pragma, and it is active, we treat the + INLINE rhs as f's rhs + * If it's inactive, we treat f as having no rhs + * If it has no INLINE pragma, we look at f's actual rhs + + +There is a danger that we'll be sub-optimal if we see this + f = ...f... + [INLINE f = ..no f...] +where f is recursive, but the INLINE is not. This can just about +happen with a sufficiently odd set of rules; eg + + foo :: Int -> Int + {-# INLINE [1] foo #-} + foo x = x+1 + + bar :: Int -> Int + {-# INLINE [1] bar #-} + bar x = foo x + 1 + + {-# RULES "foo" [~1] forall x. foo x = bar x #-} + +Here the RULE makes bar recursive; but it's INLINE pragma remains +non-recursive. It's tempting to then say that 'bar' should not be +a loop breaker, but an attempt to do so goes wrong in two ways: + a) We may get + $df = ...$cfoo... + $cfoo = ...$df.... + [INLINE $cfoo = ...no-$df...] + But we want $cfoo to depend on $df explicitly so that we + put the bindings in the right order to inline $df in $cfoo + and perhaps break the loop altogether. (Maybe this + b) + + +Example [eftInt] +~~~~~~~~~~~~~~~ +Example (from GHC.Enum): + + eftInt :: Int# -> Int# -> [Int] + eftInt x y = ...(non-recursive)... + + {-# INLINE [0] eftIntFB #-} + eftIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> r + eftIntFB c n x y = ...(non-recursive)... + + {-# RULES + "eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y) + "eftIntList" [1] eftIntFB (:) [] = eftInt + #-} + +Note [Specialisation rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this group, which is typical of what SpecConstr builds: + + fs a = ....f (C a).... + f x = ....f (C a).... + {-# RULE f (C a) = fs a #-} + +So 'f' and 'fs' are in the same Rec group (since f refers to fs via its RULE). + +But watch out! If 'fs' is not chosen as a loop breaker, we may get an infinite loop: + - the RULE is applied in f's RHS (see Note [Self-recursive rules] in GHC.Core.Op.Simplify + - fs is inlined (say it's small) + - now there's another opportunity to apply the RULE + +This showed up when compiling Control.Concurrent.Chan.getChanContents. + +------------------------------------------------------------ +Note [Finding join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's the occurrence analyser's job to find bindings that we can turn into join +points, but it doesn't perform that transformation right away. Rather, it marks +the eligible bindings as part of their occurrence data, leaving it to the +simplifier (or to simpleOptPgm) to actually change the binder's 'IdDetails'. +The simplifier then eta-expands the RHS if needed and then updates the +occurrence sites. Dividing the work this way means that the occurrence analyser +still only takes one pass, yet one can always tell the difference between a +function call and a jump by looking at the occurrence (because the same pass +changes the 'IdDetails' and propagates the binders to their occurrence sites). + +To track potential join points, we use the 'occ_tail' field of OccInfo. A value +of `AlwaysTailCalled n` indicates that every occurrence of the variable is a +tail call with `n` arguments (counting both value and type arguments). Otherwise +'occ_tail' will be 'NoTailCallInfo'. The tail call info flows bottom-up with the +rest of 'OccInfo' until it goes on the binder. + +Note [Rules and join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Things get fiddly with rules. Suppose we have: + + let j :: Int -> Int + j y = 2 * y + k :: Int -> Int -> Int + {-# RULES "SPEC k 0" k 0 = j #-} + k x y = x + 2 * y + in ... + +Now suppose that both j and k appear only as saturated tail calls in the body. +Thus we would like to make them both join points. The rule complicates matters, +though, as its RHS has an unapplied occurrence of j. *However*, if we were to +eta-expand the rule, all would be well: + + {-# RULES "SPEC k 0" forall a. k 0 a = j a #-} + +So conceivably we could notice that a potential join point would have an +"undersaturated" rule and account for it. This would mean we could make +something that's been specialised a join point, for instance. But local bindings +are rarely specialised, and being overly cautious about rules only +costs us anything when, for some `j`: + + * Before specialisation, `j` has non-tail calls, so it can't be a join point. + * During specialisation, `j` gets specialised and thus acquires rules. + * Sometime afterward, the non-tail calls to `j` disappear (as dead code, say), + and so now `j` *could* become a join point. + +This appears to be very rare in practice. TODO Perhaps we should gather +statistics to be sure. + +------------------------------------------------------------ +Note [Adjusting right-hand sides] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There's a bit of a dance we need to do after analysing a lambda expression or +a right-hand side. In particular, we need to + + a) call 'markAllInsideLam' *unless* the binding is for a thunk, a one-shot + lambda, or a non-recursive join point; and + b) call 'markAllNonTailCalled' *unless* the binding is for a join point. + +Some examples, with how the free occurrences in e (assumed not to be a value +lambda) get marked: + + inside lam non-tail-called + ------------------------------------------------------------ + let x = e No Yes + let f = \x -> e Yes Yes + let f = \x{OneShot} -> e No Yes + \x -> e Yes Yes + join j x = e No No + joinrec j x = e Yes No + +There are a few other caveats; most importantly, if we're marking a binding as +'AlwaysTailCalled', it's *going* to be a join point, so we treat it as one so +that the effect cascades properly. Consequently, at the time the RHS is +analysed, we won't know what adjustments to make; thus 'occAnalLamOrRhs' must +return the unadjusted 'UsageDetails', to be adjusted by 'adjustRhsUsage' once +join-point-hood has been decided. + +Thus the overall sequence taking place in 'occAnalNonRecBind' and +'occAnalRecBind' is as follows: + + 1. Call 'occAnalLamOrRhs' to find usage information for the RHS. + 2. Call 'tagNonRecBinder' or 'tagRecBinders', which decides whether to make + the binding a join point. + 3. Call 'adjustRhsUsage' accordingly. (Done as part of 'tagRecBinders' when + recursive.) + +(In the recursive case, this logic is spread between 'makeNode' and +'occAnalRec'.) +-} + +------------------------------------------------------------------ +-- occAnalBind +------------------------------------------------------------------ + +occAnalBind :: OccEnv -- The incoming OccEnv + -> TopLevelFlag + -> ImpRuleEdges + -> CoreBind + -> UsageDetails -- Usage details of scope + -> (UsageDetails, -- Of the whole let(rec) + [CoreBind]) + +occAnalBind env lvl top_env (NonRec binder rhs) body_usage + = occAnalNonRecBind env lvl top_env binder rhs body_usage +occAnalBind env lvl top_env (Rec pairs) body_usage + = occAnalRecBind env lvl top_env pairs body_usage + +----------------- +occAnalNonRecBind :: OccEnv -> TopLevelFlag -> ImpRuleEdges -> Var -> CoreExpr + -> UsageDetails -> (UsageDetails, [CoreBind]) +occAnalNonRecBind env lvl imp_rule_edges binder rhs body_usage + | isTyVar binder -- A type let; we don't gather usage info + = (body_usage, [NonRec binder rhs]) + + | not (binder `usedIn` body_usage) -- It's not mentioned + = (body_usage, []) + + | otherwise -- It's mentioned in the body + = (body_usage' `andUDs` rhs_usage', [NonRec tagged_binder rhs']) + where + (body_usage', tagged_binder) = tagNonRecBinder lvl body_usage binder + mb_join_arity = willBeJoinId_maybe tagged_binder + + (bndrs, body) = collectBinders rhs + + (rhs_usage1, bndrs', body') = occAnalNonRecRhs env tagged_binder bndrs body + rhs' = mkLams (markJoinOneShots mb_join_arity bndrs') body' + -- For a /non-recursive/ join point we can mark all + -- its join-lambda as one-shot; and it's a good idea to do so + + -- Unfoldings + -- See Note [Unfoldings and join points] + rhs_usage2 = case occAnalUnfolding env NonRecursive binder of + Just unf_usage -> rhs_usage1 `andUDs` unf_usage + Nothing -> rhs_usage1 + + -- Rules + -- See Note [Rules are extra RHSs] and Note [Rule dependency info] + rules_w_uds = occAnalRules env mb_join_arity NonRecursive tagged_binder + rule_uds = map (\(_, l, r) -> l `andUDs` r) rules_w_uds + rhs_usage3 = foldr andUDs rhs_usage2 rule_uds + rhs_usage4 = case lookupVarEnv imp_rule_edges binder of + Nothing -> rhs_usage3 + Just vs -> addManyOccsSet rhs_usage3 vs + -- See Note [Preventing loops due to imported functions rules] + + -- Final adjustment + rhs_usage' = adjustRhsUsage mb_join_arity NonRecursive bndrs' rhs_usage4 + +----------------- +occAnalRecBind :: OccEnv -> TopLevelFlag -> ImpRuleEdges -> [(Var,CoreExpr)] + -> UsageDetails -> (UsageDetails, [CoreBind]) +occAnalRecBind env lvl imp_rule_edges pairs body_usage + = foldr (occAnalRec env lvl) (body_usage, []) sccs + -- For a recursive group, we + -- * occ-analyse all the RHSs + -- * compute strongly-connected components + -- * feed those components to occAnalRec + -- See Note [Recursive bindings: the grand plan] + where + sccs :: [SCC Details] + sccs = {-# SCC "occAnalBind.scc" #-} + stronglyConnCompFromEdgedVerticesUniq nodes + + nodes :: [LetrecNode] + nodes = {-# SCC "occAnalBind.assoc" #-} + map (makeNode env imp_rule_edges bndr_set) pairs + + bndr_set = mkVarSet (map fst pairs) + +{- +Note [Unfoldings and join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We assume that anything in an unfolding occurs multiple times, since unfoldings +are often copied (that's the whole point!). But we still need to track tail +calls for the purpose of finding join points. +-} + +----------------------------- +occAnalRec :: OccEnv -> TopLevelFlag + -> SCC Details + -> (UsageDetails, [CoreBind]) + -> (UsageDetails, [CoreBind]) + + -- The NonRec case is just like a Let (NonRec ...) above +occAnalRec _ lvl (AcyclicSCC (ND { nd_bndr = bndr, nd_rhs = rhs + , nd_uds = rhs_uds, nd_rhs_bndrs = rhs_bndrs })) + (body_uds, binds) + | not (bndr `usedIn` body_uds) + = (body_uds, binds) -- See Note [Dead code] + + | otherwise -- It's mentioned in the body + = (body_uds' `andUDs` rhs_uds', + NonRec tagged_bndr rhs : binds) + where + (body_uds', tagged_bndr) = tagNonRecBinder lvl body_uds bndr + rhs_uds' = adjustRhsUsage (willBeJoinId_maybe tagged_bndr) NonRecursive + rhs_bndrs rhs_uds + + -- The Rec case is the interesting one + -- See Note [Recursive bindings: the grand plan] + -- See Note [Loop breaking] +occAnalRec env lvl (CyclicSCC details_s) (body_uds, binds) + | not (any (`usedIn` body_uds) bndrs) -- NB: look at body_uds, not total_uds + = (body_uds, binds) -- See Note [Dead code] + + | otherwise -- At this point we always build a single Rec + = -- pprTrace "occAnalRec" (vcat + -- [ text "weak_fvs" <+> ppr weak_fvs + -- , text "lb nodes" <+> ppr loop_breaker_nodes]) + (final_uds, Rec pairs : binds) + + where + bndrs = map nd_bndr details_s + bndr_set = mkVarSet bndrs + + ------------------------------ + -- See Note [Choosing loop breakers] for loop_breaker_nodes + final_uds :: UsageDetails + loop_breaker_nodes :: [LetrecNode] + (final_uds, loop_breaker_nodes) + = mkLoopBreakerNodes env lvl bndr_set body_uds details_s + + ------------------------------ + weak_fvs :: VarSet + weak_fvs = mapUnionVarSet nd_weak details_s + + --------------------------- + -- Now reconstruct the cycle + pairs :: [(Id,CoreExpr)] + pairs | isEmptyVarSet weak_fvs = reOrderNodes 0 bndr_set weak_fvs loop_breaker_nodes [] + | otherwise = loopBreakNodes 0 bndr_set weak_fvs loop_breaker_nodes [] + -- If weak_fvs is empty, the loop_breaker_nodes will include + -- all the edges in the original scope edges [remember, + -- weak_fvs is the difference between scope edges and + -- lb-edges], so a fresh SCC computation would yield a + -- single CyclicSCC result; and reOrderNodes deals with + -- exactly that case + + +------------------------------------------------------------------ +-- Loop breaking +------------------------------------------------------------------ + +type Binding = (Id,CoreExpr) + +loopBreakNodes :: Int + -> VarSet -- All binders + -> VarSet -- Binders whose dependencies may be "missing" + -- See Note [Weak loop breakers] + -> [LetrecNode] + -> [Binding] -- Append these to the end + -> [Binding] +{- +loopBreakNodes is applied to the list of nodes for a cyclic strongly +connected component (there's guaranteed to be a cycle). It returns +the same nodes, but + a) in a better order, + b) with some of the Ids having a IAmALoopBreaker pragma + +The "loop-breaker" Ids are sufficient to break all cycles in the SCC. This means +that the simplifier can guarantee not to loop provided it never records an inlining +for these no-inline guys. + +Furthermore, the order of the binds is such that if we neglect dependencies +on the no-inline Ids then the binds are topologically sorted. This means +that the simplifier will generally do a good job if it works from top bottom, +recording inlinings for any Ids which aren't marked as "no-inline" as it goes. +-} + +-- Return the bindings sorted into a plausible order, and marked with loop breakers. +loopBreakNodes depth bndr_set weak_fvs nodes binds + = -- pprTrace "loopBreakNodes" (ppr nodes) $ + go (stronglyConnCompFromEdgedVerticesUniqR nodes) binds + where + go [] binds = binds + go (scc:sccs) binds = loop_break_scc scc (go sccs binds) + + loop_break_scc scc binds + = case scc of + AcyclicSCC node -> mk_non_loop_breaker weak_fvs node : binds + CyclicSCC nodes -> reOrderNodes depth bndr_set weak_fvs nodes binds + +---------------------------------- +reOrderNodes :: Int -> VarSet -> VarSet -> [LetrecNode] -> [Binding] -> [Binding] + -- Choose a loop breaker, mark it no-inline, + -- and call loopBreakNodes on the rest +reOrderNodes _ _ _ [] _ = panic "reOrderNodes" +reOrderNodes _ _ _ [node] binds = mk_loop_breaker node : binds +reOrderNodes depth bndr_set weak_fvs (node : nodes) binds + = -- pprTrace "reOrderNodes" (vcat [ text "unchosen" <+> ppr unchosen + -- , text "chosen" <+> ppr chosen_nodes ]) $ + loopBreakNodes new_depth bndr_set weak_fvs unchosen $ + (map mk_loop_breaker chosen_nodes ++ binds) + where + (chosen_nodes, unchosen) = chooseLoopBreaker approximate_lb + (nd_score (node_payload node)) + [node] [] nodes + + approximate_lb = depth >= 2 + new_depth | approximate_lb = 0 + | otherwise = depth+1 + -- After two iterations (d=0, d=1) give up + -- and approximate, returning to d=0 + +mk_loop_breaker :: LetrecNode -> Binding +mk_loop_breaker (node_payload -> ND { nd_bndr = bndr, nd_rhs = rhs}) + = (bndr `setIdOccInfo` strongLoopBreaker { occ_tail = tail_info }, rhs) + where + tail_info = tailCallInfo (idOccInfo bndr) + +mk_non_loop_breaker :: VarSet -> LetrecNode -> Binding +-- See Note [Weak loop breakers] +mk_non_loop_breaker weak_fvs (node_payload -> ND { nd_bndr = bndr + , nd_rhs = rhs}) + | bndr `elemVarSet` weak_fvs = (setIdOccInfo bndr occ', rhs) + | otherwise = (bndr, rhs) + where + occ' = weakLoopBreaker { occ_tail = tail_info } + tail_info = tailCallInfo (idOccInfo bndr) + +---------------------------------- +chooseLoopBreaker :: Bool -- True <=> Too many iterations, + -- so approximate + -> NodeScore -- Best score so far + -> [LetrecNode] -- Nodes with this score + -> [LetrecNode] -- Nodes with higher scores + -> [LetrecNode] -- Unprocessed nodes + -> ([LetrecNode], [LetrecNode]) + -- This loop looks for the bind with the lowest score + -- to pick as the loop breaker. The rest accumulate in +chooseLoopBreaker _ _ loop_nodes acc [] + = (loop_nodes, acc) -- Done + + -- If approximate_loop_breaker is True, we pick *all* + -- nodes with lowest score, else just one + -- See Note [Complexity of loop breaking] +chooseLoopBreaker approx_lb loop_sc loop_nodes acc (node : nodes) + | approx_lb + , rank sc == rank loop_sc + = chooseLoopBreaker approx_lb loop_sc (node : loop_nodes) acc nodes + + | sc `betterLB` loop_sc -- Better score so pick this new one + = chooseLoopBreaker approx_lb sc [node] (loop_nodes ++ acc) nodes + + | otherwise -- Worse score so don't pick it + = chooseLoopBreaker approx_lb loop_sc loop_nodes (node : acc) nodes + where + sc = nd_score (node_payload node) + +{- +Note [Complexity of loop breaking] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The loop-breaking algorithm knocks out one binder at a time, and +performs a new SCC analysis on the remaining binders. That can +behave very badly in tightly-coupled groups of bindings; in the +worst case it can be (N**2)*log N, because it does a full SCC +on N, then N-1, then N-2 and so on. + +To avoid this, we switch plans after 2 (or whatever) attempts: + Plan A: pick one binder with the lowest score, make it + a loop breaker, and try again + Plan B: pick *all* binders with the lowest score, make them + all loop breakers, and try again +Since there are only a small finite number of scores, this will +terminate in a constant number of iterations, rather than O(N) +iterations. + +You might thing that it's very unlikely, but RULES make it much +more likely. Here's a real example from #1969: + Rec { $dm = \d.\x. op d + {-# RULES forall d. $dm Int d = $s$dm1 + forall d. $dm Bool d = $s$dm2 #-} + + dInt = MkD .... opInt ... + dInt = MkD .... opBool ... + opInt = $dm dInt + opBool = $dm dBool + + $s$dm1 = \x. op dInt + $s$dm2 = \x. op dBool } +The RULES stuff means that we can't choose $dm as a loop breaker +(Note [Choosing loop breakers]), so we must choose at least (say) +opInt *and* opBool, and so on. The number of loop breakders is +linear in the number of instance declarations. + +Note [Loop breakers and INLINE/INLINABLE pragmas] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Avoid choosing a function with an INLINE pramga as the loop breaker! +If such a function is mutually-recursive with a non-INLINE thing, +then the latter should be the loop-breaker. + +It's vital to distinguish between INLINE and INLINABLE (the +Bool returned by hasStableCoreUnfolding_maybe). If we start with + Rec { {-# INLINABLE f #-} + f x = ...f... } +and then worker/wrapper it through strictness analysis, we'll get + Rec { {-# INLINABLE $wf #-} + $wf p q = let x = (p,q) in ...f... + + {-# INLINE f #-} + f x = case x of (p,q) -> $wf p q } + +Now it is vital that we choose $wf as the loop breaker, so we can +inline 'f' in '$wf'. + +Note [DFuns should not be loop breakers] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's particularly bad to make a DFun into a loop breaker. See +Note [How instance declarations are translated] in TcInstDcls + +We give DFuns a higher score than ordinary CONLIKE things because +if there's a choice we want the DFun to be the non-loop breaker. Eg + +rec { sc = /\ a \$dC. $fBWrap (T a) ($fCT @ a $dC) + + $fCT :: forall a_afE. (Roman.C a_afE) => Roman.C (Roman.T a_afE) + {-# DFUN #-} + $fCT = /\a \$dC. MkD (T a) ((sc @ a $dC) |> blah) ($ctoF @ a $dC) + } + +Here 'sc' (the superclass) looks CONLIKE, but we'll never get to it +if we can't unravel the DFun first. + +Note [Constructor applications] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's really really important to inline dictionaries. Real +example (the Enum Ordering instance from GHC.Base): + + rec f = \ x -> case d of (p,q,r) -> p x + g = \ x -> case d of (p,q,r) -> q x + d = (v, f, g) + +Here, f and g occur just once; but we can't inline them into d. +On the other hand we *could* simplify those case expressions if +we didn't stupidly choose d as the loop breaker. +But we won't because constructor args are marked "Many". +Inlining dictionaries is really essential to unravelling +the loops in static numeric dictionaries, see GHC.Float. + +Note [Closure conversion] +~~~~~~~~~~~~~~~~~~~~~~~~~ +We treat (\x. C p q) as a high-score candidate in the letrec scoring algorithm. +The immediate motivation came from the result of a closure-conversion transformation +which generated code like this: + + data Clo a b = forall c. Clo (c -> a -> b) c + + ($:) :: Clo a b -> a -> b + Clo f env $: x = f env x + + rec { plus = Clo plus1 () + + ; plus1 _ n = Clo plus2 n + + ; plus2 Zero n = n + ; plus2 (Succ m) n = Succ (plus $: m $: n) } + +If we inline 'plus' and 'plus1', everything unravels nicely. But if +we choose 'plus1' as the loop breaker (which is entirely possible +otherwise), the loop does not unravel nicely. + + +@occAnalUnfolding@ deals with the question of bindings where the Id is marked +by an INLINE pragma. For these we record that anything which occurs +in its RHS occurs many times. This pessimistically assumes that this +inlined binder also occurs many times in its scope, but if it doesn't +we'll catch it next time round. At worst this costs an extra simplifier pass. +ToDo: try using the occurrence info for the inline'd binder. + +[March 97] We do the same for atomic RHSs. Reason: see notes with loopBreakSCC. +[June 98, SLPJ] I've undone this change; I don't understand it. See notes with loopBreakSCC. + + +************************************************************************ +* * + Making nodes +* * +************************************************************************ +-} + +type ImpRuleEdges = IdEnv IdSet -- Mapping from FVs of imported RULE LHSs to RHS FVs + +noImpRuleEdges :: ImpRuleEdges +noImpRuleEdges = emptyVarEnv + +type LetrecNode = Node Unique Details -- Node comes from Digraph + -- The Unique key is gotten from the Id +data Details + = ND { nd_bndr :: Id -- Binder + , nd_rhs :: CoreExpr -- RHS, already occ-analysed + , nd_rhs_bndrs :: [CoreBndr] -- Outer lambdas of RHS + -- INVARIANT: (nd_rhs_bndrs nd, _) == + -- collectBinders (nd_rhs nd) + + , nd_uds :: UsageDetails -- Usage from RHS, and RULES, and stable unfoldings + -- ignoring phase (ie assuming all are active) + -- See Note [Forming Rec groups] + + , nd_inl :: IdSet -- Free variables of + -- the stable unfolding (if present and active) + -- or the RHS (if not) + -- but excluding any RULES + -- This is the IdSet that may be used if the Id is inlined + + , nd_weak :: IdSet -- Binders of this Rec that are mentioned in nd_uds + -- but are *not* in nd_inl. These are the ones whose + -- dependencies might not be respected by loop_breaker_nodes + -- See Note [Weak loop breakers] + + , nd_active_rule_fvs :: IdSet -- Free variables of the RHS of active RULES + + , nd_score :: NodeScore + } + +instance Outputable Details where + ppr nd = text "ND" <> braces + (sep [ text "bndr =" <+> ppr (nd_bndr nd) + , text "uds =" <+> ppr (nd_uds nd) + , text "inl =" <+> ppr (nd_inl nd) + , text "weak =" <+> ppr (nd_weak nd) + , text "rule =" <+> ppr (nd_active_rule_fvs nd) + , text "score =" <+> ppr (nd_score nd) + ]) + +-- The NodeScore is compared lexicographically; +-- e.g. lower rank wins regardless of size +type NodeScore = ( Int -- Rank: lower => more likely to be picked as loop breaker + , Int -- Size of rhs: higher => more likely to be picked as LB + -- Maxes out at maxExprSize; we just use it to prioritise + -- small functions + , Bool ) -- Was it a loop breaker before? + -- True => more likely to be picked + -- Note [Loop breakers, node scoring, and stability] + +rank :: NodeScore -> Int +rank (r, _, _) = r + +makeNode :: OccEnv -> ImpRuleEdges -> VarSet + -> (Var, CoreExpr) -> LetrecNode +-- See Note [Recursive bindings: the grand plan] +makeNode env imp_rule_edges bndr_set (bndr, rhs) + = DigraphNode details (varUnique bndr) (nonDetKeysUniqSet node_fvs) + -- It's OK to use nonDetKeysUniqSet here as stronglyConnCompFromEdgedVerticesR + -- is still deterministic with edges in nondeterministic order as + -- explained in Note [Deterministic SCC] in Digraph. + where + details = ND { nd_bndr = bndr + , nd_rhs = rhs' + , nd_rhs_bndrs = bndrs' + , nd_uds = rhs_usage3 + , nd_inl = inl_fvs + , nd_weak = node_fvs `minusVarSet` inl_fvs + , nd_active_rule_fvs = active_rule_fvs + , nd_score = pprPanic "makeNodeDetails" (ppr bndr) } + + -- Constructing the edges for the main Rec computation + -- See Note [Forming Rec groups] + (bndrs, body) = collectBinders rhs + (rhs_usage1, bndrs', body') = occAnalRecRhs env bndrs body + rhs' = mkLams bndrs' body' + rhs_usage2 = foldr andUDs rhs_usage1 rule_uds + -- Note [Rules are extra RHSs] + -- Note [Rule dependency info] + rhs_usage3 = case mb_unf_uds of + Just unf_uds -> rhs_usage2 `andUDs` unf_uds + Nothing -> rhs_usage2 + node_fvs = udFreeVars bndr_set rhs_usage3 + + -- Finding the free variables of the rules + is_active = occ_rule_act env :: Activation -> Bool + + rules_w_uds :: [(CoreRule, UsageDetails, UsageDetails)] + rules_w_uds = occAnalRules env (Just (length bndrs)) Recursive bndr + + rules_w_rhs_fvs :: [(Activation, VarSet)] -- Find the RHS fvs + rules_w_rhs_fvs = maybe id (\ids -> ((AlwaysActive, ids):)) + (lookupVarEnv imp_rule_edges bndr) + -- See Note [Preventing loops due to imported functions rules] + [ (ru_act rule, udFreeVars bndr_set rhs_uds) + | (rule, _, rhs_uds) <- rules_w_uds ] + rule_uds = map (\(_, l, r) -> l `andUDs` r) rules_w_uds + active_rule_fvs = unionVarSets [fvs | (a,fvs) <- rules_w_rhs_fvs + , is_active a] + + -- Finding the usage details of the INLINE pragma (if any) + mb_unf_uds = occAnalUnfolding env Recursive bndr + + -- Find the "nd_inl" free vars; for the loop-breaker phase + inl_fvs = case mb_unf_uds of + Nothing -> udFreeVars bndr_set rhs_usage1 -- No INLINE, use RHS + Just unf_uds -> udFreeVars bndr_set unf_uds + -- We could check for an *active* INLINE (returning + -- emptyVarSet for an inactive one), but is_active + -- isn't the right thing (it tells about + -- RULE activation), so we'd need more plumbing + +mkLoopBreakerNodes :: OccEnv -> TopLevelFlag + -> VarSet + -> UsageDetails -- for BODY of let + -> [Details] + -> (UsageDetails, -- adjusted + [LetrecNode]) +-- Does four things +-- a) tag each binder with its occurrence info +-- b) add a NodeScore to each node +-- c) make a Node with the right dependency edges for +-- the loop-breaker SCC analysis +-- d) adjust each RHS's usage details according to +-- the binder's (new) shotness and join-point-hood +mkLoopBreakerNodes env lvl bndr_set body_uds details_s + = (final_uds, zipWith mk_lb_node details_s bndrs') + where + (final_uds, bndrs') = tagRecBinders lvl body_uds + [ ((nd_bndr nd) + ,(nd_uds nd) + ,(nd_rhs_bndrs nd)) + | nd <- details_s ] + mk_lb_node nd@(ND { nd_bndr = bndr, nd_rhs = rhs, nd_inl = inl_fvs }) bndr' + = DigraphNode nd' (varUnique bndr) (nonDetKeysUniqSet lb_deps) + -- It's OK to use nonDetKeysUniqSet here as + -- stronglyConnCompFromEdgedVerticesR is still deterministic with edges + -- in nondeterministic order as explained in + -- Note [Deterministic SCC] in Digraph. + where + nd' = nd { nd_bndr = bndr', nd_score = score } + score = nodeScore env bndr bndr' rhs lb_deps + lb_deps = extendFvs_ rule_fv_env inl_fvs + + rule_fv_env :: IdEnv IdSet + -- Maps a variable f to the variables from this group + -- mentioned in RHS of active rules for f + -- Domain is *subset* of bound vars (others have no rule fvs) + rule_fv_env = transClosureFV (mkVarEnv init_rule_fvs) + init_rule_fvs -- See Note [Finding rule RHS free vars] + = [ (b, trimmed_rule_fvs) + | ND { nd_bndr = b, nd_active_rule_fvs = rule_fvs } <- details_s + , let trimmed_rule_fvs = rule_fvs `intersectVarSet` bndr_set + , not (isEmptyVarSet trimmed_rule_fvs) ] + + +------------------------------------------ +nodeScore :: OccEnv + -> Id -- Binder has old occ-info (just for loop-breaker-ness) + -> Id -- Binder with new occ-info + -> CoreExpr -- RHS + -> VarSet -- Loop-breaker dependencies + -> NodeScore +nodeScore env old_bndr new_bndr bind_rhs lb_deps + | not (isId old_bndr) -- A type or coercion variable is never a loop breaker + = (100, 0, False) + + | old_bndr `elemVarSet` lb_deps -- Self-recursive things are great loop breakers + = (0, 0, True) -- See Note [Self-recursion and loop breakers] + + | not (occ_unf_act env old_bndr) -- A binder whose inlining is inactive (e.g. has + = (0, 0, True) -- a NOINLINE pragma) makes a great loop breaker + + | exprIsTrivial rhs + = mk_score 10 -- Practically certain to be inlined + -- Used to have also: && not (isExportedId bndr) + -- But I found this sometimes cost an extra iteration when we have + -- rec { d = (a,b); a = ...df...; b = ...df...; df = d } + -- where df is the exported dictionary. Then df makes a really + -- bad choice for loop breaker + + | DFunUnfolding { df_args = args } <- id_unfolding + -- Never choose a DFun as a loop breaker + -- Note [DFuns should not be loop breakers] + = (9, length args, is_lb) + + -- Data structures are more important than INLINE pragmas + -- so that dictionary/method recursion unravels + + | CoreUnfolding { uf_guidance = UnfWhen {} } <- id_unfolding + = mk_score 6 + + | is_con_app rhs -- Data types help with cases: + = mk_score 5 -- Note [Constructor applications] + + | isStableUnfolding id_unfolding + , can_unfold + = mk_score 3 + + | isOneOcc (idOccInfo new_bndr) + = mk_score 2 -- Likely to be inlined + + | can_unfold -- The Id has some kind of unfolding + = mk_score 1 + + | otherwise + = (0, 0, is_lb) + + where + mk_score :: Int -> NodeScore + mk_score rank = (rank, rhs_size, is_lb) + + is_lb = isStrongLoopBreaker (idOccInfo old_bndr) + rhs = case id_unfolding of + CoreUnfolding { uf_src = src, uf_tmpl = unf_rhs } + | isStableSource src + -> unf_rhs + _ -> bind_rhs + -- 'bind_rhs' is irrelevant for inlining things with a stable unfolding + rhs_size = case id_unfolding of + CoreUnfolding { uf_guidance = guidance } + | UnfIfGoodArgs { ug_size = size } <- guidance + -> size + _ -> cheapExprSize rhs + + can_unfold = canUnfold id_unfolding + id_unfolding = realIdUnfolding old_bndr + -- realIdUnfolding: Ignore loop-breaker-ness here because + -- that is what we are setting! + + -- Checking for a constructor application + -- Cheap and cheerful; the simplifier moves casts out of the way + -- The lambda case is important to spot x = /\a. C (f a) + -- which comes up when C is a dictionary constructor and + -- f is a default method. + -- Example: the instance for Show (ST s a) in GHC.ST + -- + -- However we *also* treat (\x. C p q) as a con-app-like thing, + -- Note [Closure conversion] + is_con_app (Var v) = isConLikeId v + is_con_app (App f _) = is_con_app f + is_con_app (Lam _ e) = is_con_app e + is_con_app (Tick _ e) = is_con_app e + is_con_app _ = False + +maxExprSize :: Int +maxExprSize = 20 -- Rather arbitrary + +cheapExprSize :: CoreExpr -> Int +-- Maxes out at maxExprSize +cheapExprSize e + = go 0 e + where + go n e | n >= maxExprSize = n + | otherwise = go1 n e + + go1 n (Var {}) = n+1 + go1 n (Lit {}) = n+1 + go1 n (Type {}) = n + go1 n (Coercion {}) = n + go1 n (Tick _ e) = go1 n e + go1 n (Cast e _) = go1 n e + go1 n (App f a) = go (go1 n f) a + go1 n (Lam b e) + | isTyVar b = go1 n e + | otherwise = go (n+1) e + go1 n (Let b e) = gos (go1 n e) (rhssOfBind b) + go1 n (Case e _ _ as) = gos (go1 n e) (rhssOfAlts as) + + gos n [] = n + gos n (e:es) | n >= maxExprSize = n + | otherwise = gos (go1 n e) es + +betterLB :: NodeScore -> NodeScore -> Bool +-- If n1 `betterLB` n2 then choose n1 as the loop breaker +betterLB (rank1, size1, lb1) (rank2, size2, _) + | rank1 < rank2 = True + | rank1 > rank2 = False + | size1 < size2 = False -- Make the bigger n2 into the loop breaker + | size1 > size2 = True + | lb1 = True -- Tie-break: if n1 was a loop breaker before, choose it + | otherwise = False -- See Note [Loop breakers, node scoring, and stability] + +{- Note [Self-recursion and loop breakers] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have + rec { f = ...f...g... + ; g = .....f... } +then 'f' has to be a loop breaker anyway, so we may as well choose it +right away, so that g can inline freely. + +This is really just a cheap hack. Consider + rec { f = ...g... + ; g = ..f..h... + ; h = ...f....} +Here f or g are better loop breakers than h; but we might accidentally +choose h. Finding the minimal set of loop breakers is hard. + +Note [Loop breakers, node scoring, and stability] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To choose a loop breaker, we give a NodeScore to each node in the SCC, +and pick the one with the best score (according to 'betterLB'). + +We need to be jolly careful (#12425, #12234) about the stability +of this choice. Suppose we have + + let rec { f = ...g...g... + ; g = ...f...f... } + in + case x of + True -> ...f.. + False -> ..f... + +In each iteration of the simplifier the occurrence analyser OccAnal +chooses a loop breaker. Suppose in iteration 1 it choose g as the loop +breaker. That means it is free to inline f. + +Suppose that GHC decides to inline f in the branches of the case, but +(for some reason; eg it is not saturated) in the rhs of g. So we get + + let rec { f = ...g...g... + ; g = ...f...f... } + in + case x of + True -> ...g...g..... + False -> ..g..g.... + +Now suppose that, for some reason, in the next iteration the occurrence +analyser chooses f as the loop breaker, so it can freely inline g. And +again for some reason the simplifier inlines g at its calls in the case +branches, but not in the RHS of f. Then we get + + let rec { f = ...g...g... + ; g = ...f...f... } + in + case x of + True -> ...(...f...f...)...(...f..f..)..... + False -> ..(...f...f...)...(..f..f...).... + +You can see where this is going! Each iteration of the simplifier +doubles the number of calls to f or g. No wonder GHC is slow! + +(In the particular example in comment:3 of #12425, f and g are the two +mutually recursive fmap instances for CondT and Result. They are both +marked INLINE which, oddly, is why they don't inline in each other's +RHS, because the call there is not saturated.) + +The root cause is that we flip-flop on our choice of loop breaker. I +always thought it didn't matter, and indeed for any single iteration +to terminate, it doesn't matter. But when we iterate, it matters a +lot!! + +So The Plan is this: + If there is a tie, choose the node that + was a loop breaker last time round + +Hence the is_lb field of NodeScore + +************************************************************************ +* * + Right hand sides +* * +************************************************************************ +-} + +occAnalRhs :: OccEnv -> RecFlag -> Id -> [CoreBndr] -> CoreExpr + -> (UsageDetails, [CoreBndr], CoreExpr) + -- Returned usage details covers only the RHS, + -- and *not* the RULE or INLINE template for the Id +occAnalRhs env Recursive _ bndrs body + = occAnalRecRhs env bndrs body +occAnalRhs env NonRecursive id bndrs body + = occAnalNonRecRhs env id bndrs body + +occAnalRecRhs :: OccEnv -> [CoreBndr] -> CoreExpr -- Rhs lambdas, body + -> (UsageDetails, [CoreBndr], CoreExpr) + -- Returned usage details covers only the RHS, + -- and *not* the RULE or INLINE template for the Id +occAnalRecRhs env bndrs body = occAnalLamOrRhs (rhsCtxt env) bndrs body + +occAnalNonRecRhs :: OccEnv + -> Id -> [CoreBndr] -> CoreExpr -- Binder; rhs lams, body + -- Binder is already tagged with occurrence info + -> (UsageDetails, [CoreBndr], CoreExpr) + -- Returned usage details covers only the RHS, + -- and *not* the RULE or INLINE template for the Id +occAnalNonRecRhs env bndr bndrs body + = occAnalLamOrRhs rhs_env bndrs body + where + env1 | is_join_point = env -- See Note [Join point RHSs] + | certainly_inline = env -- See Note [Cascading inlines] + | otherwise = rhsCtxt env + + -- See Note [Sources of one-shot information] + rhs_env = env1 { occ_one_shots = argOneShots dmd } + + certainly_inline -- See Note [Cascading inlines] + = case occ of + OneOcc { occ_in_lam = NotInsideLam, occ_one_br = InOneBranch } + -> active && not_stable + _ -> False + + is_join_point = isAlwaysTailCalled occ + -- Like (isJoinId bndr) but happens one step earlier + -- c.f. willBeJoinId_maybe + + occ = idOccInfo bndr + dmd = idDemandInfo bndr + active = isAlwaysActive (idInlineActivation bndr) + not_stable = not (isStableUnfolding (idUnfolding bndr)) + +occAnalUnfolding :: OccEnv + -> RecFlag + -> Id + -> Maybe UsageDetails + -- Just the analysis, not a new unfolding. The unfolding + -- got analysed when it was created and we don't need to + -- update it. +occAnalUnfolding env rec_flag id + = case realIdUnfolding id of -- ignore previous loop-breaker flag + CoreUnfolding { uf_tmpl = rhs, uf_src = src } + | not (isStableSource src) + -> Nothing + | otherwise + -> Just $ markAllMany usage + where + (bndrs, body) = collectBinders rhs + (usage, _, _) = occAnalRhs env rec_flag id bndrs body + + DFunUnfolding { df_bndrs = bndrs, df_args = args } + -> Just $ zapDetails (delDetailsList usage bndrs) + where + usage = andUDsList (map (fst . occAnal env) args) + + _ -> Nothing + +occAnalRules :: OccEnv + -> Maybe JoinArity -- If the binder is (or MAY become) a join + -- point, what its join arity is (or WOULD + -- become). See Note [Rules and join points]. + -> RecFlag + -> Id + -> [(CoreRule, -- Each (non-built-in) rule + UsageDetails, -- Usage details for LHS + UsageDetails)] -- Usage details for RHS +occAnalRules env mb_expected_join_arity rec_flag id + = [ (rule, lhs_uds, rhs_uds) | rule@Rule {} <- idCoreRules id + , let (lhs_uds, rhs_uds) = occ_anal_rule rule ] + where + occ_anal_rule (Rule { ru_bndrs = bndrs, ru_args = args, ru_rhs = rhs }) + = (lhs_uds, final_rhs_uds) + where + lhs_uds = addManyOccsSet emptyDetails $ + (exprsFreeVars args `delVarSetList` bndrs) + (rhs_bndrs, rhs_body) = collectBinders rhs + (rhs_uds, _, _) = occAnalRhs env rec_flag id rhs_bndrs rhs_body + -- Note [Rules are extra RHSs] + -- Note [Rule dependency info] + final_rhs_uds = adjust_tail_info args $ markAllMany $ + (rhs_uds `delDetailsList` bndrs) + occ_anal_rule _ + = (emptyDetails, emptyDetails) + + adjust_tail_info args uds -- see Note [Rules and join points] + = case mb_expected_join_arity of + Just ar | args `lengthIs` ar -> uds + _ -> markAllNonTailCalled uds +{- Note [Join point RHSs] +~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + x = e + join j = Just x + +We want to inline x into j right away, so we don't want to give +the join point a RhsCtxt (#14137). It's not a huge deal, because +the FloatIn pass knows to float into join point RHSs; and the simplifier +does not float things out of join point RHSs. But it's a simple, cheap +thing to do. See #14137. + +Note [Cascading inlines] +~~~~~~~~~~~~~~~~~~~~~~~~ +By default we use an rhsCtxt for the RHS of a binding. This tells the +occ anal n that it's looking at an RHS, which has an effect in +occAnalApp. In particular, for constructor applications, it makes +the arguments appear to have NoOccInfo, so that we don't inline into +them. Thus x = f y + k = Just x +we do not want to inline x. + +But there's a problem. Consider + x1 = a0 : [] + x2 = a1 : x1 + x3 = a2 : x2 + g = f x3 +First time round, it looks as if x1 and x2 occur as an arg of a +let-bound constructor ==> give them a many-occurrence. +But then x3 is inlined (unconditionally as it happens) and +next time round, x2 will be, and the next time round x1 will be +Result: multiple simplifier iterations. Sigh. + +So, when analysing the RHS of x3 we notice that x3 will itself +definitely inline the next time round, and so we analyse x3's rhs in +an ordinary context, not rhsCtxt. Hence the "certainly_inline" stuff. + +Annoyingly, we have to approximate GHC.Core.Op.Simplify.Utils.preInlineUnconditionally. +If (a) the RHS is expandable (see isExpandableApp in occAnalApp), and + (b) certainly_inline says "yes" when preInlineUnconditionally says "no" +then the simplifier iterates indefinitely: + x = f y + k = Just x -- We decide that k is 'certainly_inline' + v = ...k... -- but preInlineUnconditionally doesn't inline it +inline ==> + k = Just (f y) + v = ...k... +float ==> + x1 = f y + k = Just x1 + v = ...k... + +This is worse than the slow cascade, so we only want to say "certainly_inline" +if it really is certain. Look at the note with preInlineUnconditionally +for the various clauses. + + +************************************************************************ +* * + Expressions +* * +************************************************************************ +-} + +occAnal :: OccEnv + -> CoreExpr + -> (UsageDetails, -- Gives info only about the "interesting" Ids + CoreExpr) + +occAnal _ expr@(Type _) = (emptyDetails, expr) +occAnal _ expr@(Lit _) = (emptyDetails, expr) +occAnal env expr@(Var _) = occAnalApp env (expr, [], []) + -- At one stage, I gathered the idRuleVars for the variable here too, + -- which in a way is the right thing to do. + -- But that went wrong right after specialisation, when + -- the *occurrences* of the overloaded function didn't have any + -- rules in them, so the *specialised* versions looked as if they + -- weren't used at all. + +occAnal _ (Coercion co) + = (addManyOccsSet emptyDetails (coVarsOfCo co), Coercion co) + -- See Note [Gather occurrences of coercion variables] + +{- +Note [Gather occurrences of coercion variables] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We need to gather info about what coercion variables appear, so that +we can sort them into the right place when doing dependency analysis. +-} + +occAnal env (Tick tickish body) + | SourceNote{} <- tickish + = (usage, Tick tickish body') + -- SourceNotes are best-effort; so we just proceed as usual. + -- If we drop a tick due to the issues described below it's + -- not the end of the world. + + | tickish `tickishScopesLike` SoftScope + = (markAllNonTailCalled usage, Tick tickish body') + + | Breakpoint _ ids <- tickish + = (usage_lam `andUDs` foldr addManyOccs emptyDetails ids, Tick tickish body') + -- never substitute for any of the Ids in a Breakpoint + + | otherwise + = (usage_lam, Tick tickish body') + where + !(usage,body') = occAnal env body + -- for a non-soft tick scope, we can inline lambdas only + usage_lam = markAllNonTailCalled (markAllInsideLam usage) + -- TODO There may be ways to make ticks and join points play + -- nicer together, but right now there are problems: + -- let j x = ... in tick<t> (j 1) + -- Making j a join point may cause the simplifier to drop t + -- (if the tick is put into the continuation). So we don't + -- count j 1 as a tail call. + -- See #14242. + +occAnal env (Cast expr co) + = case occAnal env expr of { (usage, expr') -> + let usage1 = zapDetailsIf (isRhsEnv env) usage + -- usage1: if we see let x = y `cast` co + -- then mark y as 'Many' so that we don't + -- immediately inline y again. + usage2 = addManyOccsSet usage1 (coVarsOfCo co) + -- usage2: see Note [Gather occurrences of coercion variables] + in (markAllNonTailCalled usage2, Cast expr' co) + } + +occAnal env app@(App _ _) + = occAnalApp env (collectArgsTicks tickishFloatable app) + +-- Ignore type variables altogether +-- (a) occurrences inside type lambdas only not marked as InsideLam +-- (b) type variables not in environment + +occAnal env (Lam x body) + | isTyVar x + = case occAnal env body of { (body_usage, body') -> + (markAllNonTailCalled body_usage, Lam x body') + } + +-- For value lambdas we do a special hack. Consider +-- (\x. \y. ...x...) +-- If we did nothing, x is used inside the \y, so would be marked +-- as dangerous to dup. But in the common case where the abstraction +-- is applied to two arguments this is over-pessimistic. +-- So instead, we just mark each binder with its occurrence +-- info in the *body* of the multiple lambda. +-- Then, the simplifier is careful when partially applying lambdas. + +occAnal env expr@(Lam _ _) + = case occAnalLamOrRhs env binders body of { (usage, tagged_binders, body') -> + let + expr' = mkLams tagged_binders body' + usage1 = markAllNonTailCalled usage + one_shot_gp = all isOneShotBndr tagged_binders + final_usage | one_shot_gp = usage1 + | otherwise = markAllInsideLam usage1 + in + (final_usage, expr') } + where + (binders, body) = collectBinders expr + +occAnal env (Case scrut bndr ty alts) + = case occ_anal_scrut scrut alts of { (scrut_usage, scrut') -> + case mapAndUnzip occ_anal_alt alts of { (alts_usage_s, alts') -> + let + alts_usage = foldr orUDs emptyDetails alts_usage_s + (alts_usage1, tagged_bndr) = tagLamBinder alts_usage bndr + total_usage = markAllNonTailCalled scrut_usage `andUDs` alts_usage1 + -- Alts can have tail calls, but the scrutinee can't + in + total_usage `seq` (total_usage, Case scrut' tagged_bndr ty alts') }} + where + alt_env = mkAltEnv env scrut bndr + occ_anal_alt = occAnalAlt alt_env + + occ_anal_scrut (Var v) (alt1 : other_alts) + | not (null other_alts) || not (isDefaultAlt alt1) + = (mkOneOcc env v IsInteresting 0, Var v) + -- The 'True' says that the variable occurs in an interesting + -- context; the case has at least one non-default alternative + occ_anal_scrut (Tick t e) alts + | t `tickishScopesLike` SoftScope + -- No reason to not look through all ticks here, but only + -- for soft-scoped ticks we can do so without having to + -- update returned occurrence info (see occAnal) + = second (Tick t) $ occ_anal_scrut e alts + + occ_anal_scrut scrut _alts + = occAnal (vanillaCtxt env) scrut -- No need for rhsCtxt + +occAnal env (Let bind body) + = case occAnal env body of { (body_usage, body') -> + case occAnalBind env NotTopLevel + noImpRuleEdges bind + body_usage of { (final_usage, new_binds) -> + (final_usage, mkLets new_binds body') }} + +occAnalArgs :: OccEnv -> [CoreExpr] -> [OneShots] -> (UsageDetails, [CoreExpr]) +occAnalArgs _ [] _ + = (emptyDetails, []) + +occAnalArgs env (arg:args) one_shots + | isTypeArg arg + = case occAnalArgs env args one_shots of { (uds, args') -> + (uds, arg:args') } + + | otherwise + = case argCtxt env one_shots of { (arg_env, one_shots') -> + case occAnal arg_env arg of { (uds1, arg') -> + case occAnalArgs env args one_shots' of { (uds2, args') -> + (uds1 `andUDs` uds2, arg':args') }}} + +{- +Applications are dealt with specially because we want +the "build hack" to work. + +Note [Arguments of let-bound constructors] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + f x = let y = expensive x in + let z = (True,y) in + (case z of {(p,q)->q}, case z of {(p,q)->q}) +We feel free to duplicate the WHNF (True,y), but that means +that y may be duplicated thereby. + +If we aren't careful we duplicate the (expensive x) call! +Constructors are rather like lambdas in this way. +-} + +occAnalApp :: OccEnv + -> (Expr CoreBndr, [Arg CoreBndr], [Tickish Id]) + -> (UsageDetails, Expr CoreBndr) +occAnalApp env (Var fun, args, ticks) + | null ticks = (uds, mkApps (Var fun) args') + | otherwise = (uds, mkTicks ticks $ mkApps (Var fun) args') + where + uds = fun_uds `andUDs` final_args_uds + + !(args_uds, args') = occAnalArgs env args one_shots + !final_args_uds + | isRhsEnv env && is_exp = markAllNonTailCalled $ + markAllInsideLam args_uds + | otherwise = markAllNonTailCalled args_uds + -- We mark the free vars of the argument of a constructor or PAP + -- as "inside-lambda", if it is the RHS of a let(rec). + -- This means that nothing gets inlined into a constructor or PAP + -- argument position, which is what we want. Typically those + -- constructor arguments are just variables, or trivial expressions. + -- We use inside-lam because it's like eta-expanding the PAP. + -- + -- This is the *whole point* of the isRhsEnv predicate + -- See Note [Arguments of let-bound constructors] + + n_val_args = valArgCount args + n_args = length args + fun_uds = mkOneOcc env fun (if n_val_args > 0 then IsInteresting else NotInteresting) n_args + is_exp = isExpandableApp fun n_val_args + -- See Note [CONLIKE pragma] in BasicTypes + -- The definition of is_exp should match that in GHC.Core.Op.Simplify.prepareRhs + + one_shots = argsOneShots (idStrictness fun) guaranteed_val_args + guaranteed_val_args = n_val_args + length (takeWhile isOneShotInfo + (occ_one_shots env)) + -- See Note [Sources of one-shot information], bullet point A'] + +occAnalApp env (fun, args, ticks) + = (markAllNonTailCalled (fun_uds `andUDs` args_uds), + mkTicks ticks $ mkApps fun' args') + where + !(fun_uds, fun') = occAnal (addAppCtxt env args) fun + -- The addAppCtxt is a bit cunning. One iteration of the simplifier + -- often leaves behind beta redexs like + -- (\x y -> e) a1 a2 + -- Here we would like to mark x,y as one-shot, and treat the whole + -- thing much like a let. We do this by pushing some True items + -- onto the context stack. + !(args_uds, args') = occAnalArgs env args [] + +zapDetailsIf :: Bool -- If this is true + -> UsageDetails -- Then do zapDetails on this + -> UsageDetails +zapDetailsIf True uds = zapDetails uds +zapDetailsIf False uds = uds + +{- +Note [Sources of one-shot information] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The occurrence analyser obtains one-shot-lambda information from two sources: + +A: Saturated applications: eg f e1 .. en + + In general, given a call (f e1 .. en) we can propagate one-shot info from + f's strictness signature into e1 .. en, but /only/ if n is enough to + saturate the strictness signature. A strictness signature like + + f :: C1(C1(L))LS + + means that *if f is applied to three arguments* then it will guarantee to + call its first argument at most once, and to call the result of that at + most once. But if f has fewer than three arguments, all bets are off; e.g. + + map (f (\x y. expensive) e2) xs + + Here the \x y abstraction may be called many times (once for each element of + xs) so we should not mark x and y as one-shot. But if it was + + map (f (\x y. expensive) 3 2) xs + + then the first argument of f will be called at most once. + + The one-shot info, derived from f's strictness signature, is + computed by 'argsOneShots', called in occAnalApp. + +A': Non-obviously saturated applications: eg build (f (\x y -> expensive)) + where f is as above. + + In this case, f is only manifestly applied to one argument, so it does not + look saturated. So by the previous point, we should not use its strictness + signature to learn about the one-shotness of \x y. But in this case we can: + build is fully applied, so we may use its strictness signature; and from + that we learn that build calls its argument with two arguments *at most once*. + + So there is really only one call to f, and it will have three arguments. In + that sense, f is saturated, and we may proceed as described above. + + Hence the computation of 'guaranteed_val_args' in occAnalApp, using + '(occ_one_shots env)'. See also #13227, comment:9 + +B: Let-bindings: eg let f = \c. let ... in \n -> blah + in (build f, build f) + + Propagate one-shot info from the demanand-info on 'f' to the + lambdas in its RHS (which may not be syntactically at the top) + + This information must have come from a previous run of the demanand + analyser. + +Previously, the demand analyser would *also* set the one-shot information, but +that code was buggy (see #11770), so doing it only in on place, namely here, is +saner. + +Note [OneShots] +~~~~~~~~~~~~~~~ +When analysing an expression, the occ_one_shots argument contains information +about how the function is being used. The length of the list indicates +how many arguments will eventually be passed to the analysed expression, +and the OneShotInfo indicates whether this application is once or multiple times. + +Example: + + Context of f occ_one_shots when analysing f + + f 1 2 [OneShot, OneShot] + map (f 1) [OneShot, NoOneShotInfo] + build f [OneShot, OneShot] + f 1 2 `seq` f 2 1 [NoOneShotInfo, OneShot] + +Note [Binders in case alternatives] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + case x of y { (a,b) -> f y } +We treat 'a', 'b' as dead, because they don't physically occur in the +case alternative. (Indeed, a variable is dead iff it doesn't occur in +its scope in the output of OccAnal.) It really helps to know when +binders are unused. See esp the call to isDeadBinder in +Simplify.mkDupableAlt + +In this example, though, the Simplifier will bring 'a' and 'b' back to +life, because it binds 'y' to (a,b) (imagine got inlined and +scrutinised y). +-} + +occAnalLamOrRhs :: OccEnv -> [CoreBndr] -> CoreExpr + -> (UsageDetails, [CoreBndr], CoreExpr) +occAnalLamOrRhs env [] body + = case occAnal env body of (body_usage, body') -> (body_usage, [], body') + -- RHS of thunk or nullary join point +occAnalLamOrRhs env (bndr:bndrs) body + | isTyVar bndr + = -- Important: Keep the environment so that we don't inline into an RHS like + -- \(@ x) -> C @x (f @x) + -- (see the beginning of Note [Cascading inlines]). + case occAnalLamOrRhs env bndrs body of + (body_usage, bndrs', body') -> (body_usage, bndr:bndrs', body') +occAnalLamOrRhs env binders body + = case occAnal env_body body of { (body_usage, body') -> + let + (final_usage, tagged_binders) = tagLamBinders body_usage binders' + -- Use binders' to put one-shot info on the lambdas + in + (final_usage, tagged_binders, body') } + where + (env_body, binders') = oneShotGroup env binders + +occAnalAlt :: (OccEnv, Maybe (Id, CoreExpr)) + -> CoreAlt + -> (UsageDetails, Alt IdWithOccInfo) +occAnalAlt (env, scrut_bind) (con, bndrs, rhs) + = case occAnal env rhs of { (rhs_usage1, rhs1) -> + let + (alt_usg, tagged_bndrs) = tagLamBinders rhs_usage1 bndrs + -- See Note [Binders in case alternatives] + (alt_usg', rhs2) = wrapAltRHS env scrut_bind alt_usg tagged_bndrs rhs1 + in + (alt_usg', (con, tagged_bndrs, rhs2)) } + +wrapAltRHS :: OccEnv + -> Maybe (Id, CoreExpr) -- proxy mapping generated by mkAltEnv + -> UsageDetails -- usage for entire alt (p -> rhs) + -> [Var] -- alt binders + -> CoreExpr -- alt RHS + -> (UsageDetails, CoreExpr) +wrapAltRHS env (Just (scrut_var, let_rhs)) alt_usg bndrs alt_rhs + | occ_binder_swap env + , scrut_var `usedIn` alt_usg -- bndrs are not be present in alt_usg so this + -- handles condition (a) in Note [Binder swap] + , not captured -- See condition (b) in Note [Binder swap] + = ( alt_usg' `andUDs` let_rhs_usg + , Let (NonRec tagged_scrut_var let_rhs') alt_rhs ) + where + captured = any (`usedIn` let_rhs_usg) bndrs -- Check condition (b) + + -- The rhs of the let may include coercion variables + -- if the scrutinee was a cast, so we must gather their + -- usage. See Note [Gather occurrences of coercion variables] + -- Moreover, the rhs of the let may mention the case-binder, and + -- we want to gather its occ-info as well + (let_rhs_usg, let_rhs') = occAnal env let_rhs + + (alt_usg', tagged_scrut_var) = tagLamBinder alt_usg scrut_var + +wrapAltRHS _ _ alt_usg _ alt_rhs + = (alt_usg, alt_rhs) + +{- +************************************************************************ +* * + OccEnv +* * +************************************************************************ +-} + +data OccEnv + = OccEnv { occ_encl :: !OccEncl -- Enclosing context information + , occ_one_shots :: !OneShots -- See Note [OneShots] + , occ_gbl_scrut :: GlobalScruts + + , occ_unf_act :: Id -> Bool -- Which Id unfoldings are active + + , occ_rule_act :: Activation -> Bool -- Which rules are active + -- See Note [Finding rule RHS free vars] + + , occ_binder_swap :: !Bool -- enable the binder_swap + -- See CorePrep Note [Dead code in CorePrep] + } + +type GlobalScruts = IdSet -- See Note [Binder swap on GlobalId scrutinees] + +----------------------------- +-- OccEncl is used to control whether to inline into constructor arguments +-- For example: +-- x = (p,q) -- Don't inline p or q +-- y = /\a -> (p a, q a) -- Still don't inline p or q +-- z = f (p,q) -- Do inline p,q; it may make a rule fire +-- So OccEncl tells enough about the context to know what to do when +-- we encounter a constructor application or PAP. + +data OccEncl + = OccRhs -- RHS of let(rec), albeit perhaps inside a type lambda + -- Don't inline into constructor args here + | OccVanilla -- Argument of function, body of lambda, scruintee of case etc. + -- Do inline into constructor args here + +instance Outputable OccEncl where + ppr OccRhs = text "occRhs" + ppr OccVanilla = text "occVanilla" + +-- See note [OneShots] +type OneShots = [OneShotInfo] + +initOccEnv :: OccEnv +initOccEnv + = OccEnv { occ_encl = OccVanilla + , occ_one_shots = [] + , occ_gbl_scrut = emptyVarSet + -- To be conservative, we say that all + -- inlines and rules are active + , occ_unf_act = \_ -> True + , occ_rule_act = \_ -> True + , occ_binder_swap = True } + +vanillaCtxt :: OccEnv -> OccEnv +vanillaCtxt env = env { occ_encl = OccVanilla, occ_one_shots = [] } + +rhsCtxt :: OccEnv -> OccEnv +rhsCtxt env = env { occ_encl = OccRhs, occ_one_shots = [] } + +argCtxt :: OccEnv -> [OneShots] -> (OccEnv, [OneShots]) +argCtxt env [] + = (env { occ_encl = OccVanilla, occ_one_shots = [] }, []) +argCtxt env (one_shots:one_shots_s) + = (env { occ_encl = OccVanilla, occ_one_shots = one_shots }, one_shots_s) + +isRhsEnv :: OccEnv -> Bool +isRhsEnv (OccEnv { occ_encl = OccRhs }) = True +isRhsEnv (OccEnv { occ_encl = OccVanilla }) = False + +oneShotGroup :: OccEnv -> [CoreBndr] + -> ( OccEnv + , [CoreBndr] ) + -- The result binders have one-shot-ness set that they might not have had originally. + -- This happens in (build (\c n -> e)). Here the occurrence analyser + -- linearity context knows that c,n are one-shot, and it records that fact in + -- the binder. This is useful to guide subsequent float-in/float-out transformations + +oneShotGroup env@(OccEnv { occ_one_shots = ctxt }) bndrs + = go ctxt bndrs [] + where + go ctxt [] rev_bndrs + = ( env { occ_one_shots = ctxt, occ_encl = OccVanilla } + , reverse rev_bndrs ) + + go [] bndrs rev_bndrs + = ( env { occ_one_shots = [], occ_encl = OccVanilla } + , reverse rev_bndrs ++ bndrs ) + + go ctxt@(one_shot : ctxt') (bndr : bndrs) rev_bndrs + | isId bndr = go ctxt' bndrs (bndr': rev_bndrs) + | otherwise = go ctxt bndrs (bndr : rev_bndrs) + where + bndr' = updOneShotInfo bndr one_shot + -- Use updOneShotInfo, not setOneShotInfo, as pre-existing + -- one-shot info might be better than what we can infer, e.g. + -- due to explicit use of the magic 'oneShot' function. + -- See Note [The oneShot function] + + +markJoinOneShots :: Maybe JoinArity -> [Var] -> [Var] +-- Mark the lambdas of a non-recursive join point as one-shot. +-- This is good to prevent gratuitous float-out etc +markJoinOneShots mb_join_arity bndrs + = case mb_join_arity of + Nothing -> bndrs + Just n -> go n bndrs + where + go 0 bndrs = bndrs + go _ [] = [] -- This can legitimately happen. + -- e.g. let j = case ... in j True + -- This will become an arity-1 join point after the + -- simplifier has eta-expanded it; but it may not have + -- enough lambdas /yet/. (Lint checks that JoinIds do + -- have enough lambdas.) + go n (b:bs) = b' : go (n-1) bs + where + b' | isId b = setOneShotLambda b + | otherwise = b + +addAppCtxt :: OccEnv -> [Arg CoreBndr] -> OccEnv +addAppCtxt env@(OccEnv { occ_one_shots = ctxt }) args + = env { occ_one_shots = replicate (valArgCount args) OneShotLam ++ ctxt } + +transClosureFV :: UniqFM VarSet -> UniqFM VarSet +-- If (f,g), (g,h) are in the input, then (f,h) is in the output +-- as well as (f,g), (g,h) +transClosureFV env + | no_change = env + | otherwise = transClosureFV (listToUFM new_fv_list) + where + (no_change, new_fv_list) = mapAccumL bump True (nonDetUFMToList env) + -- It's OK to use nonDetUFMToList here because we'll forget the + -- ordering by creating a new set with listToUFM + bump no_change (b,fvs) + | no_change_here = (no_change, (b,fvs)) + | otherwise = (False, (b,new_fvs)) + where + (new_fvs, no_change_here) = extendFvs env fvs + +------------- +extendFvs_ :: UniqFM VarSet -> VarSet -> VarSet +extendFvs_ env s = fst (extendFvs env s) -- Discard the Bool flag + +extendFvs :: UniqFM VarSet -> VarSet -> (VarSet, Bool) +-- (extendFVs env s) returns +-- (s `union` env(s), env(s) `subset` s) +extendFvs env s + | isNullUFM env + = (s, True) + | otherwise + = (s `unionVarSet` extras, extras `subVarSet` s) + where + extras :: VarSet -- env(s) + extras = nonDetFoldUFM unionVarSet emptyVarSet $ + -- It's OK to use nonDetFoldUFM here because unionVarSet commutes + intersectUFM_C (\x _ -> x) env (getUniqSet s) + +{- +************************************************************************ +* * + Binder swap +* * +************************************************************************ + +Note [Binder swap] +~~~~~~~~~~~~~~~~~~ +The "binder swap" transformation swaps occurrence of the +scrutinee of a case for occurrences of the case-binder: + + (1) case x of b { pi -> ri } + ==> + case x of b { pi -> let x=b in ri } + + (2) case (x |> co) of b { pi -> ri } + ==> + case (x |> co) of b { pi -> let x = b |> sym co in ri } + +In both cases, the trivial 'let' can be eliminated by the +immediately following simplifier pass. + +There are two reasons for making this swap: + +(A) It reduces the number of occurrences of the scrutinee, x. + That in turn might reduce its occurrences to one, so we + can inline it and save an allocation. E.g. + let x = factorial y in case x of b { I# v -> ...x... } + If we replace 'x' by 'b' in the alternative we get + let x = factorial y in case x of b { I# v -> ...b... } + and now we can inline 'x', thus + case (factorial y) of b { I# v -> ...b... } + +(B) The case-binder b has unfolding information; in the + example above we know that b = I# v. That in turn allows + nested cases to simplify. Consider + case x of b { I# v -> + ...(case x of b2 { I# v2 -> rhs })... + If we replace 'x' by 'b' in the alternative we get + case x of b { I# v -> + ...(case b of b2 { I# v2 -> rhs })... + and now it is trivial to simplify the inner case: + case x of b { I# v -> + ...(let b2 = b in rhs)... + + The same can happen even if the scrutinee is a variable + with a cast: see Note [Case of cast] + +In both cases, in a particular alternative (pi -> ri), we only +add the binding if + (a) x occurs free in (pi -> ri) + (ie it occurs in ri, but is not bound in pi) + (b) the pi does not bind b (or the free vars of co) +We need (a) and (b) for the inserted binding to be correct. + +For the alternatives where we inject the binding, we can transfer +all x's OccInfo to b. And that is the point. + +Notice that + * The deliberate shadowing of 'x'. + * That (a) rapidly becomes false, so no bindings are injected. + +The reason for doing these transformations /here in the occurrence +analyser/ is because it allows us to adjust the OccInfo for 'x' and +'b' as we go. + + * Suppose the only occurrences of 'x' are the scrutinee and in the + ri; then this transformation makes it occur just once, and hence + get inlined right away. + + * If instead we do this in the Simplifier, we don't know whether 'x' + is used in ri, so we are forced to pessimistically zap b's OccInfo + even though it is typically dead (ie neither it nor x appear in + the ri). There's nothing actually wrong with zapping it, except + that it's kind of nice to know which variables are dead. My nose + tells me to keep this information as robustly as possible. + +The Maybe (Id,CoreExpr) passed to occAnalAlt is the extra let-binding +{x=b}; it's Nothing if the binder-swap doesn't happen. + +There is a danger though. Consider + let v = x +# y + in case (f v) of w -> ...v...v... +And suppose that (f v) expands to just v. Then we'd like to +use 'w' instead of 'v' in the alternative. But it may be too +late; we may have substituted the (cheap) x+#y for v in the +same simplifier pass that reduced (f v) to v. + +I think this is just too bad. CSE will recover some of it. + +Note [Case of cast] +~~~~~~~~~~~~~~~~~~~ +Consider case (x `cast` co) of b { I# -> + ... (case (x `cast` co) of {...}) ... +We'd like to eliminate the inner case. That is the motivation for +equation (2) in Note [Binder swap]. When we get to the inner case, we +inline x, cancel the casts, and away we go. + +Note [Binder swap on GlobalId scrutinees] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When the scrutinee is a GlobalId we must take care in two ways + + i) In order to *know* whether 'x' occurs free in the RHS, we need its + occurrence info. BUT, we don't gather occurrence info for + GlobalIds. That's the reason for the (small) occ_gbl_scrut env in + OccEnv is for: it says "gather occurrence info for these". + + ii) We must call localiseId on 'x' first, in case it's a GlobalId, or + has an External Name. See, for example, SimplEnv Note [Global Ids in + the substitution]. + +Note [Zap case binders in proxy bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +From the original + case x of cb(dead) { p -> ...x... } +we will get + case x of cb(live) { p -> let x = cb in ...x... } + +Core Lint never expects to find an *occurrence* of an Id marked +as Dead, so we must zap the OccInfo on cb before making the +binding x = cb. See #5028. + +NB: the OccInfo on /occurrences/ really doesn't matter much; the simplifier +doesn't use it. So this is only to satisfy the perhaps-over-picky Lint. + +Historical note [no-case-of-case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We *used* to suppress the binder-swap in case expressions when +-fno-case-of-case is on. Old remarks: + "This happens in the first simplifier pass, + and enhances full laziness. Here's the bad case: + f = \ y -> ...(case x of I# v -> ...(case x of ...) ... ) + If we eliminate the inner case, we trap it inside the I# v -> arm, + which might prevent some full laziness happening. I've seen this + in action in spectral/cichelli/Prog.hs: + [(m,n) | m <- [1..max], n <- [1..max]] + Hence the check for NoCaseOfCase." +However, now the full-laziness pass itself reverses the binder-swap, so this +check is no longer necessary. + +Historical note [Suppressing the case binder-swap] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This old note describes a problem that is also fixed by doing the +binder-swap in OccAnal: + + There is another situation when it might make sense to suppress the + case-expression binde-swap. If we have + + case x of w1 { DEFAULT -> case x of w2 { A -> e1; B -> e2 } + ...other cases .... } + + We'll perform the binder-swap for the outer case, giving + + case x of w1 { DEFAULT -> case w1 of w2 { A -> e1; B -> e2 } + ...other cases .... } + + But there is no point in doing it for the inner case, because w1 can't + be inlined anyway. Furthermore, doing the case-swapping involves + zapping w2's occurrence info (see paragraphs that follow), and that + forces us to bind w2 when doing case merging. So we get + + case x of w1 { A -> let w2 = w1 in e1 + B -> let w2 = w1 in e2 + ...other cases .... } + + This is plain silly in the common case where w2 is dead. + + Even so, I can't see a good way to implement this idea. I tried + not doing the binder-swap if the scrutinee was already evaluated + but that failed big-time: + + data T = MkT !Int + + case v of w { MkT x -> + case x of x1 { I# y1 -> + case x of x2 { I# y2 -> ... + + Notice that because MkT is strict, x is marked "evaluated". But to + eliminate the last case, we must either make sure that x (as well as + x1) has unfolding MkT y1. The straightforward thing to do is to do + the binder-swap. So this whole note is a no-op. + +It's fixed by doing the binder-swap in OccAnal because we can do the +binder-swap unconditionally and still get occurrence analysis +information right. +-} + +mkAltEnv :: OccEnv -> CoreExpr -> Id -> (OccEnv, Maybe (Id, CoreExpr)) +-- Does three things: a) makes the occ_one_shots = OccVanilla +-- b) extends the GlobalScruts if possible +-- c) returns a proxy mapping, binding the scrutinee +-- to the case binder, if possible +mkAltEnv env@(OccEnv { occ_gbl_scrut = pe }) scrut case_bndr + = case stripTicksTopE (const True) scrut of + Var v -> add_scrut v case_bndr' + Cast (Var v) co -> add_scrut v (Cast case_bndr' (mkSymCo co)) + -- See Note [Case of cast] + _ -> (env { occ_encl = OccVanilla }, Nothing) + + where + add_scrut v rhs + | isGlobalId v = (env { occ_encl = OccVanilla }, Nothing) + | otherwise = ( env { occ_encl = OccVanilla + , occ_gbl_scrut = pe `extendVarSet` v } + , Just (localise v, rhs) ) + -- ToDO: this isGlobalId stuff is a TEMPORARY FIX + -- to avoid the binder-swap for GlobalIds + -- See #16346 + + case_bndr' = Var (zapIdOccInfo case_bndr) + -- See Note [Zap case binders in proxy bindings] + + -- Localise the scrut_var before shadowing it; we're making a + -- new binding for it, and it might have an External Name, or + -- even be a GlobalId; Note [Binder swap on GlobalId scrutinees] + -- Also we don't want any INLINE or NOINLINE pragmas! + localise scrut_var = mkLocalIdOrCoVar (localiseName (idName scrut_var)) + (idType scrut_var) + +{- +************************************************************************ +* * +\subsection[OccurAnal-types]{OccEnv} +* * +************************************************************************ + +Note [UsageDetails and zapping] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On many occasions, we must modify all gathered occurrence data at once. For +instance, all occurrences underneath a (non-one-shot) lambda set the +'occ_in_lam' flag to become 'True'. We could use 'mapVarEnv' to do this, but +that takes O(n) time and we will do this often---in particular, there are many +places where tail calls are not allowed, and each of these causes all variables +to get marked with 'NoTailCallInfo'. + +Instead of relying on `mapVarEnv`, then, we carry three 'IdEnv's around along +with the 'OccInfoEnv'. Each of these extra environments is a "zapped set" +recording which variables have been zapped in some way. Zapping all occurrence +info then simply means setting the corresponding zapped set to the whole +'OccInfoEnv', a fast O(1) operation. +-} + +type OccInfoEnv = IdEnv OccInfo -- A finite map from ids to their usage + -- INVARIANT: never IAmDead + -- (Deadness is signalled by not being in the map at all) + +type ZappedSet = OccInfoEnv -- Values are ignored + +data UsageDetails + = UD { ud_env :: !OccInfoEnv + , ud_z_many :: ZappedSet -- apply 'markMany' to these + , ud_z_in_lam :: ZappedSet -- apply 'markInsideLam' to these + , ud_z_no_tail :: ZappedSet } -- apply 'markNonTailCalled' to these + -- INVARIANT: All three zapped sets are subsets of the OccInfoEnv + +instance Outputable UsageDetails where + ppr ud = ppr (ud_env (flattenUsageDetails ud)) + +------------------- +-- UsageDetails API + +andUDs, orUDs + :: UsageDetails -> UsageDetails -> UsageDetails +andUDs = combineUsageDetailsWith addOccInfo +orUDs = combineUsageDetailsWith orOccInfo + +andUDsList :: [UsageDetails] -> UsageDetails +andUDsList = foldl' andUDs emptyDetails + +mkOneOcc :: OccEnv -> Id -> InterestingCxt -> JoinArity -> UsageDetails +mkOneOcc env id int_cxt arity + | isLocalId id + = singleton $ OneOcc { occ_in_lam = NotInsideLam + , occ_one_br = InOneBranch + , occ_int_cxt = int_cxt + , occ_tail = AlwaysTailCalled arity } + | id `elemVarSet` occ_gbl_scrut env + = singleton noOccInfo + + | otherwise + = emptyDetails + where + singleton info = emptyDetails { ud_env = unitVarEnv id info } + +addOneOcc :: UsageDetails -> Id -> OccInfo -> UsageDetails +addOneOcc ud id info + = ud { ud_env = extendVarEnv_C plus_zapped (ud_env ud) id info } + `alterZappedSets` (`delVarEnv` id) + where + plus_zapped old new = doZapping ud id old `addOccInfo` new + +addManyOccsSet :: UsageDetails -> VarSet -> UsageDetails +addManyOccsSet usage id_set = nonDetFoldUniqSet addManyOccs usage id_set + -- It's OK to use nonDetFoldUFM here because addManyOccs commutes + +-- Add several occurrences, assumed not to be tail calls +addManyOccs :: Var -> UsageDetails -> UsageDetails +addManyOccs v u | isId v = addOneOcc u v noOccInfo + | otherwise = u + -- Give a non-committal binder info (i.e noOccInfo) because + -- a) Many copies of the specialised thing can appear + -- b) We don't want to substitute a BIG expression inside a RULE + -- even if that's the only occurrence of the thing + -- (Same goes for INLINE.) + +delDetails :: UsageDetails -> Id -> UsageDetails +delDetails ud bndr + = ud `alterUsageDetails` (`delVarEnv` bndr) + +delDetailsList :: UsageDetails -> [Id] -> UsageDetails +delDetailsList ud bndrs + = ud `alterUsageDetails` (`delVarEnvList` bndrs) + +emptyDetails :: UsageDetails +emptyDetails = UD { ud_env = emptyVarEnv + , ud_z_many = emptyVarEnv + , ud_z_in_lam = emptyVarEnv + , ud_z_no_tail = emptyVarEnv } + +isEmptyDetails :: UsageDetails -> Bool +isEmptyDetails = isEmptyVarEnv . ud_env + +markAllMany, markAllInsideLam, markAllNonTailCalled, zapDetails + :: UsageDetails -> UsageDetails +markAllMany ud = ud { ud_z_many = ud_env ud } +markAllInsideLam ud = ud { ud_z_in_lam = ud_env ud } +markAllNonTailCalled ud = ud { ud_z_no_tail = ud_env ud } + +zapDetails = markAllMany . markAllNonTailCalled -- effectively sets to noOccInfo + +lookupDetails :: UsageDetails -> Id -> OccInfo +lookupDetails ud id + | isCoVar id -- We do not currently gather occurrence info (from types) + = noOccInfo -- for CoVars, so we must conservatively mark them as used + -- See Note [DoO not mark CoVars as dead] + | otherwise + = case lookupVarEnv (ud_env ud) id of + Just occ -> doZapping ud id occ + Nothing -> IAmDead + +usedIn :: Id -> UsageDetails -> Bool +v `usedIn` ud = isExportedId v || v `elemVarEnv` ud_env ud + +udFreeVars :: VarSet -> UsageDetails -> VarSet +-- Find the subset of bndrs that are mentioned in uds +udFreeVars bndrs ud = restrictUniqSetToUFM bndrs (ud_env ud) + +{- Note [Do not mark CoVars as dead] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's obviously wrong to mark CoVars as dead if they are used. +Currently we don't traverse types to gather usase info for CoVars, +so we had better treat them as having noOccInfo. + +This showed up in #15696 we had something like + case eq_sel d of co -> ...(typeError @(...co...) "urk")... + +Then 'd' was substituted by a dictionary, so the expression +simpified to + case (Coercion <blah>) of co -> ...(typeError @(...co...) "urk")... + +But then the "drop the case altogether" equation of rebuildCase +thought that 'co' was dead, and discarded the entire case. Urk! + +I have no idea how we managed to avoid this pitfall for so long! +-} + +------------------- +-- Auxiliary functions for UsageDetails implementation + +combineUsageDetailsWith :: (OccInfo -> OccInfo -> OccInfo) + -> UsageDetails -> UsageDetails -> UsageDetails +combineUsageDetailsWith plus_occ_info ud1 ud2 + | isEmptyDetails ud1 = ud2 + | isEmptyDetails ud2 = ud1 + | otherwise + = UD { ud_env = plusVarEnv_C plus_occ_info (ud_env ud1) (ud_env ud2) + , ud_z_many = plusVarEnv (ud_z_many ud1) (ud_z_many ud2) + , ud_z_in_lam = plusVarEnv (ud_z_in_lam ud1) (ud_z_in_lam ud2) + , ud_z_no_tail = plusVarEnv (ud_z_no_tail ud1) (ud_z_no_tail ud2) } + +doZapping :: UsageDetails -> Var -> OccInfo -> OccInfo +doZapping ud var occ + = doZappingByUnique ud (varUnique var) occ + +doZappingByUnique :: UsageDetails -> Unique -> OccInfo -> OccInfo +doZappingByUnique ud uniq + = (if | in_subset ud_z_many -> markMany + | in_subset ud_z_in_lam -> markInsideLam + | otherwise -> id) . + (if | in_subset ud_z_no_tail -> markNonTailCalled + | otherwise -> id) + where + in_subset field = uniq `elemVarEnvByKey` field ud + +alterZappedSets :: UsageDetails -> (ZappedSet -> ZappedSet) -> UsageDetails +alterZappedSets ud f + = ud { ud_z_many = f (ud_z_many ud) + , ud_z_in_lam = f (ud_z_in_lam ud) + , ud_z_no_tail = f (ud_z_no_tail ud) } + +alterUsageDetails :: UsageDetails -> (OccInfoEnv -> OccInfoEnv) -> UsageDetails +alterUsageDetails ud f + = ud { ud_env = f (ud_env ud) } + `alterZappedSets` f + +flattenUsageDetails :: UsageDetails -> UsageDetails +flattenUsageDetails ud + = ud { ud_env = mapUFM_Directly (doZappingByUnique ud) (ud_env ud) } + `alterZappedSets` const emptyVarEnv + +------------------- +-- See Note [Adjusting right-hand sides] +adjustRhsUsage :: Maybe JoinArity -> RecFlag + -> [CoreBndr] -- Outer lambdas, AFTER occ anal + -> UsageDetails -> UsageDetails +adjustRhsUsage mb_join_arity rec_flag bndrs usage + = maybe_mark_lam (maybe_drop_tails usage) + where + maybe_mark_lam ud | one_shot = ud + | otherwise = markAllInsideLam ud + maybe_drop_tails ud | exact_join = ud + | otherwise = markAllNonTailCalled ud + + one_shot = case mb_join_arity of + Just join_arity + | isRec rec_flag -> False + | otherwise -> all isOneShotBndr (drop join_arity bndrs) + Nothing -> all isOneShotBndr bndrs + + exact_join = case mb_join_arity of + Just join_arity -> bndrs `lengthIs` join_arity + _ -> False + +type IdWithOccInfo = Id + +tagLamBinders :: UsageDetails -- Of scope + -> [Id] -- Binders + -> (UsageDetails, -- Details with binders removed + [IdWithOccInfo]) -- Tagged binders +tagLamBinders usage binders + = usage' `seq` (usage', bndrs') + where + (usage', bndrs') = mapAccumR tagLamBinder usage binders + +tagLamBinder :: UsageDetails -- Of scope + -> Id -- Binder + -> (UsageDetails, -- Details with binder removed + IdWithOccInfo) -- Tagged binders +-- Used for lambda and case binders +-- It copes with the fact that lambda bindings can have a +-- stable unfolding, used for join points +tagLamBinder usage bndr + = (usage2, bndr') + where + occ = lookupDetails usage bndr + bndr' = setBinderOcc (markNonTailCalled occ) bndr + -- Don't try to make an argument into a join point + usage1 = usage `delDetails` bndr + usage2 | isId bndr = addManyOccsSet usage1 (idUnfoldingVars bndr) + -- This is effectively the RHS of a + -- non-join-point binding, so it's okay to use + -- addManyOccsSet, which assumes no tail calls + | otherwise = usage1 + +tagNonRecBinder :: TopLevelFlag -- At top level? + -> UsageDetails -- Of scope + -> CoreBndr -- Binder + -> (UsageDetails, -- Details with binder removed + IdWithOccInfo) -- Tagged binder + +tagNonRecBinder lvl usage binder + = let + occ = lookupDetails usage binder + will_be_join = decideJoinPointHood lvl usage [binder] + occ' | will_be_join = -- must already be marked AlwaysTailCalled + ASSERT(isAlwaysTailCalled occ) occ + | otherwise = markNonTailCalled occ + binder' = setBinderOcc occ' binder + usage' = usage `delDetails` binder + in + usage' `seq` (usage', binder') + +tagRecBinders :: TopLevelFlag -- At top level? + -> UsageDetails -- Of body of let ONLY + -> [(CoreBndr, -- Binder + UsageDetails, -- RHS usage details + [CoreBndr])] -- Lambdas in new RHS + -> (UsageDetails, -- Adjusted details for whole scope, + -- with binders removed + [IdWithOccInfo]) -- Tagged binders +-- Substantially more complicated than non-recursive case. Need to adjust RHS +-- details *before* tagging binders (because the tags depend on the RHSes). +tagRecBinders lvl body_uds triples + = let + (bndrs, rhs_udss, _) = unzip3 triples + + -- 1. Determine join-point-hood of whole group, as determined by + -- the *unadjusted* usage details + unadj_uds = foldr andUDs body_uds rhs_udss + will_be_joins = decideJoinPointHood lvl unadj_uds bndrs + + -- 2. Adjust usage details of each RHS, taking into account the + -- join-point-hood decision + rhs_udss' = map adjust triples + adjust (bndr, rhs_uds, rhs_bndrs) + = adjustRhsUsage mb_join_arity Recursive rhs_bndrs rhs_uds + where + -- Can't use willBeJoinId_maybe here because we haven't tagged the + -- binder yet (the tag depends on these adjustments!) + mb_join_arity + | will_be_joins + , let occ = lookupDetails unadj_uds bndr + , AlwaysTailCalled arity <- tailCallInfo occ + = Just arity + | otherwise + = ASSERT(not will_be_joins) -- Should be AlwaysTailCalled if + Nothing -- we are making join points! + + -- 3. Compute final usage details from adjusted RHS details + adj_uds = foldr andUDs body_uds rhs_udss' + + -- 4. Tag each binder with its adjusted details + bndrs' = [ setBinderOcc (lookupDetails adj_uds bndr) bndr + | bndr <- bndrs ] + + -- 5. Drop the binders from the adjusted details and return + usage' = adj_uds `delDetailsList` bndrs + in + (usage', bndrs') + +setBinderOcc :: OccInfo -> CoreBndr -> CoreBndr +setBinderOcc occ_info bndr + | isTyVar bndr = bndr + | isExportedId bndr = if isManyOccs (idOccInfo bndr) + then bndr + else setIdOccInfo bndr noOccInfo + -- Don't use local usage info for visible-elsewhere things + -- BUT *do* erase any IAmALoopBreaker annotation, because we're + -- about to re-generate it and it shouldn't be "sticky" + + | otherwise = setIdOccInfo bndr occ_info + +-- | Decide whether some bindings should be made into join points or not. +-- Returns `False` if they can't be join points. Note that it's an +-- all-or-nothing decision, as if multiple binders are given, they're +-- assumed to be mutually recursive. +-- +-- It must, however, be a final decision. If we say "True" for 'f', +-- and then subsequently decide /not/ make 'f' into a join point, then +-- the decision about another binding 'g' might be invalidated if (say) +-- 'f' tail-calls 'g'. +-- +-- See Note [Invariants on join points] in GHC.Core. +decideJoinPointHood :: TopLevelFlag -> UsageDetails + -> [CoreBndr] + -> Bool +decideJoinPointHood TopLevel _ _ + = False +decideJoinPointHood NotTopLevel usage bndrs + | isJoinId (head bndrs) + = WARN(not all_ok, text "OccurAnal failed to rediscover join point(s):" <+> + ppr bndrs) + all_ok + | otherwise + = all_ok + where + -- See Note [Invariants on join points]; invariants cited by number below. + -- Invariant 2 is always satisfiable by the simplifier by eta expansion. + all_ok = -- Invariant 3: Either all are join points or none are + all ok bndrs + + ok bndr + | -- Invariant 1: Only tail calls, all same join arity + AlwaysTailCalled arity <- tailCallInfo (lookupDetails usage bndr) + + , -- Invariant 1 as applied to LHSes of rules + all (ok_rule arity) (idCoreRules bndr) + + -- Invariant 2a: stable unfoldings + -- See Note [Join points and INLINE pragmas] + , ok_unfolding arity (realIdUnfolding bndr) + + -- Invariant 4: Satisfies polymorphism rule + , isValidJoinPointType arity (idType bndr) + = True + + | otherwise + = False + + ok_rule _ BuiltinRule{} = False -- only possible with plugin shenanigans + ok_rule join_arity (Rule { ru_args = args }) + = args `lengthIs` join_arity + -- Invariant 1 as applied to LHSes of rules + + -- ok_unfolding returns False if we should /not/ convert a non-join-id + -- into a join-id, even though it is AlwaysTailCalled + ok_unfolding join_arity (CoreUnfolding { uf_src = src, uf_tmpl = rhs }) + = not (isStableSource src && join_arity > joinRhsArity rhs) + ok_unfolding _ (DFunUnfolding {}) + = False + ok_unfolding _ _ + = True + +willBeJoinId_maybe :: CoreBndr -> Maybe JoinArity +willBeJoinId_maybe bndr + = case tailCallInfo (idOccInfo bndr) of + AlwaysTailCalled arity -> Just arity + _ -> isJoinId_maybe bndr + + +{- Note [Join points and INLINE pragmas] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + f x = let g = \x. not -- Arity 1 + {-# INLINE g #-} + in case x of + A -> g True True + B -> g True False + C -> blah2 + +Here 'g' is always tail-called applied to 2 args, but the stable +unfolding captured by the INLINE pragma has arity 1. If we try to +convert g to be a join point, its unfolding will still have arity 1 +(since it is stable, and we don't meddle with stable unfoldings), and +Lint will complain (see Note [Invariants on join points], (2a), in +GHC.Core. #13413. + +Moreover, since g is going to be inlined anyway, there is no benefit +from making it a join point. + +If it is recursive, and uselessly marked INLINE, this will stop us +making it a join point, which is annoying. But occasionally +(notably in class methods; see Note [Instances and loop breakers] in +TcInstDcls) we mark recursive things as INLINE but the recursion +unravels; so ignoring INLINE pragmas on recursive things isn't good +either. + +See Invariant 2a of Note [Invariants on join points] in GHC.Core + + +************************************************************************ +* * +\subsection{Operations over OccInfo} +* * +************************************************************************ +-} + +markMany, markInsideLam, markNonTailCalled :: OccInfo -> OccInfo + +markMany IAmDead = IAmDead +markMany occ = ManyOccs { occ_tail = occ_tail occ } + +markInsideLam occ@(OneOcc {}) = occ { occ_in_lam = IsInsideLam } +markInsideLam occ = occ + +markNonTailCalled IAmDead = IAmDead +markNonTailCalled occ = occ { occ_tail = NoTailCallInfo } + +addOccInfo, orOccInfo :: OccInfo -> OccInfo -> OccInfo + +addOccInfo a1 a2 = ASSERT( not (isDeadOcc a1 || isDeadOcc a2) ) + ManyOccs { occ_tail = tailCallInfo a1 `andTailCallInfo` + tailCallInfo a2 } + -- Both branches are at least One + -- (Argument is never IAmDead) + +-- (orOccInfo orig new) is used +-- when combining occurrence info from branches of a case + +orOccInfo (OneOcc { occ_in_lam = in_lam1, occ_int_cxt = int_cxt1 + , occ_tail = tail1 }) + (OneOcc { occ_in_lam = in_lam2, occ_int_cxt = int_cxt2 + , occ_tail = tail2 }) + = OneOcc { occ_one_br = MultipleBranches -- because it occurs in both branches + , occ_in_lam = in_lam1 `mappend` in_lam2 + , occ_int_cxt = int_cxt1 `mappend` int_cxt2 + , occ_tail = tail1 `andTailCallInfo` tail2 } + +orOccInfo a1 a2 = ASSERT( not (isDeadOcc a1 || isDeadOcc a2) ) + ManyOccs { occ_tail = tailCallInfo a1 `andTailCallInfo` + tailCallInfo a2 } + +andTailCallInfo :: TailCallInfo -> TailCallInfo -> TailCallInfo +andTailCallInfo info@(AlwaysTailCalled arity1) (AlwaysTailCalled arity2) + | arity1 == arity2 = info +andTailCallInfo _ _ = NoTailCallInfo diff --git a/compiler/GHC/Core/Op/SetLevels.hs b/compiler/GHC/Core/Op/SetLevels.hs new file mode 100644 index 0000000000..a3b1fd75b3 --- /dev/null +++ b/compiler/GHC/Core/Op/SetLevels.hs @@ -0,0 +1,1771 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +\section{GHC.Core.Op.SetLevels} + + *************************** + Overview + *************************** + +1. We attach binding levels to Core bindings, in preparation for floating + outwards (@FloatOut@). + +2. We also let-ify many expressions (notably case scrutinees), so they + will have a fighting chance of being floated sensible. + +3. Note [Need for cloning during float-out] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + We clone the binders of any floatable let-binding, so that when it is + floated out it will be unique. Example + (let x=2 in x) + (let x=3 in x) + we must clone before floating so we get + let x1=2 in + let x2=3 in + x1+x2 + + NOTE: this can't be done using the uniqAway idea, because the variable + must be unique in the whole program, not just its current scope, + because two variables in different scopes may float out to the + same top level place + + NOTE: Very tiresomely, we must apply this substitution to + the rules stored inside a variable too. + + We do *not* clone top-level bindings, because some of them must not change, + but we *do* clone bindings that are heading for the top level + +4. Note [Binder-swap during float-out] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + In the expression + case x of wild { p -> ...wild... } + we substitute x for wild in the RHS of the case alternatives: + case x of wild { p -> ...x... } + This means that a sub-expression involving x is not "trapped" inside the RHS. + And it's not inconvenient because we already have a substitution. + + Note that this is EXACTLY BACKWARDS from the what the simplifier does. + The simplifier tries to get rid of occurrences of x, in favour of wild, + in the hope that there will only be one remaining occurrence of x, namely + the scrutinee of the case, and we can inline it. +-} + +{-# LANGUAGE CPP, MultiWayIf #-} + +{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} +module GHC.Core.Op.SetLevels ( + setLevels, + + Level(..), LevelType(..), tOP_LEVEL, isJoinCeilLvl, asJoinCeilLvl, + LevelledBind, LevelledExpr, LevelledBndr, + FloatSpec(..), floatSpecLevel, + + incMinorLvl, ltMajLvl, ltLvl, isTopLvl + ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core +import GHC.Core.Op.Monad ( FloatOutSwitches(..) ) +import GHC.Core.Utils ( exprType, exprIsHNF + , exprOkForSpeculation + , exprIsTopLevelBindable + , isExprLevPoly + , collectMakeStaticArgs + ) +import GHC.Core.Arity ( exprBotStrictness_maybe ) +import GHC.Core.FVs -- all of it +import GHC.Core.Subst +import GHC.Core.Make ( sortQuantVars ) + +import Id +import IdInfo +import Var +import VarSet +import UniqSet ( nonDetFoldUniqSet ) +import UniqDSet ( getUniqDSet ) +import VarEnv +import Literal ( litIsTrivial ) +import Demand ( StrictSig, Demand, isStrictDmd, splitStrictSig, increaseStrictSigArity ) +import Cpr ( mkCprSig, botCpr ) +import Name ( getOccName, mkSystemVarName ) +import OccName ( occNameString ) +import GHC.Core.Type ( Type, mkLamTypes, splitTyConApp_maybe, tyCoVarsOfType + , mightBeUnliftedType, closeOverKindsDSet ) +import BasicTypes ( Arity, RecFlag(..), isRec ) +import GHC.Core.DataCon ( dataConOrigResTy ) +import TysWiredIn +import UniqSupply +import Util +import Outputable +import FastString +import UniqDFM +import FV +import Data.Maybe +import MonadUtils ( mapAccumLM ) + +{- +************************************************************************ +* * +\subsection{Level numbers} +* * +************************************************************************ +-} + +type LevelledExpr = TaggedExpr FloatSpec +type LevelledBind = TaggedBind FloatSpec +type LevelledBndr = TaggedBndr FloatSpec + +data Level = Level Int -- Level number of enclosing lambdas + Int -- Number of big-lambda and/or case expressions and/or + -- context boundaries between + -- here and the nearest enclosing lambda + LevelType -- Binder or join ceiling? +data LevelType = BndrLvl | JoinCeilLvl deriving (Eq) + +data FloatSpec + = FloatMe Level -- Float to just inside the binding + -- tagged with this level + | StayPut Level -- Stay where it is; binding is + -- tagged with this level + +floatSpecLevel :: FloatSpec -> Level +floatSpecLevel (FloatMe l) = l +floatSpecLevel (StayPut l) = l + +{- +The {\em level number} on a (type-)lambda-bound variable is the +nesting depth of the (type-)lambda which binds it. The outermost lambda +has level 1, so (Level 0 0) means that the variable is bound outside any lambda. + +On an expression, it's the maximum level number of its free +(type-)variables. On a let(rec)-bound variable, it's the level of its +RHS. On a case-bound variable, it's the number of enclosing lambdas. + +Top-level variables: level~0. Those bound on the RHS of a top-level +definition but ``before'' a lambda; e.g., the \tr{x} in (levels shown +as ``subscripts'')... +\begin{verbatim} +a_0 = let b_? = ... in + x_1 = ... b ... in ... +\end{verbatim} + +The main function @lvlExpr@ carries a ``context level'' (@le_ctxt_lvl@). +That's meant to be the level number of the enclosing binder in the +final (floated) program. If the level number of a sub-expression is +less than that of the context, then it might be worth let-binding the +sub-expression so that it will indeed float. + +If you can float to level @Level 0 0@ worth doing so because then your +allocation becomes static instead of dynamic. We always start with +context @Level 0 0@. + + +Note [FloatOut inside INLINE] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +@InlineCtxt@ very similar to @Level 0 0@, but is used for one purpose: +to say "don't float anything out of here". That's exactly what we +want for the body of an INLINE, where we don't want to float anything +out at all. See notes with lvlMFE below. + +But, check this out: + +-- At one time I tried the effect of not floating anything out of an InlineMe, +-- but it sometimes works badly. For example, consider PrelArr.done. It +-- has the form __inline (\d. e) +-- where e doesn't mention d. If we float this to +-- __inline (let x = e in \d. x) +-- things are bad. The inliner doesn't even inline it because it doesn't look +-- like a head-normal form. So it seems a lesser evil to let things float. +-- In GHC.Core.Op.SetLevels we do set the context to (Level 0 0) when we get to an InlineMe +-- which discourages floating out. + +So the conclusion is: don't do any floating at all inside an InlineMe. +(In the above example, don't float the {x=e} out of the \d.) + +One particular case is that of workers: we don't want to float the +call to the worker outside the wrapper, otherwise the worker might get +inlined into the floated expression, and an importing module won't see +the worker at all. + +Note [Join ceiling] +~~~~~~~~~~~~~~~~~~~ +Join points can't float very far; too far, and they can't remain join points +So, suppose we have: + + f x = (joinrec j y = ... x ... in jump j x) + 1 + +One may be tempted to float j out to the top of f's RHS, but then the jump +would not be a tail call. Thus we keep track of a level called the *join +ceiling* past which join points are not allowed to float. + +The troublesome thing is that, unlike most levels to which something might +float, there is not necessarily an identifier to which the join ceiling is +attached. Fortunately, if something is to be floated to a join ceiling, it must +be dropped at the *nearest* join ceiling. Thus each level is marked as to +whether it is a join ceiling, so that FloatOut can tell which binders are being +floated to the nearest join ceiling and which to a particular binder (or set of +binders). +-} + +instance Outputable FloatSpec where + ppr (FloatMe l) = char 'F' <> ppr l + ppr (StayPut l) = ppr l + +tOP_LEVEL :: Level +tOP_LEVEL = Level 0 0 BndrLvl + +incMajorLvl :: Level -> Level +incMajorLvl (Level major _ _) = Level (major + 1) 0 BndrLvl + +incMinorLvl :: Level -> Level +incMinorLvl (Level major minor _) = Level major (minor+1) BndrLvl + +asJoinCeilLvl :: Level -> Level +asJoinCeilLvl (Level major minor _) = Level major minor JoinCeilLvl + +maxLvl :: Level -> Level -> Level +maxLvl l1@(Level maj1 min1 _) l2@(Level maj2 min2 _) + | (maj1 > maj2) || (maj1 == maj2 && min1 > min2) = l1 + | otherwise = l2 + +ltLvl :: Level -> Level -> Bool +ltLvl (Level maj1 min1 _) (Level maj2 min2 _) + = (maj1 < maj2) || (maj1 == maj2 && min1 < min2) + +ltMajLvl :: Level -> Level -> Bool + -- Tells if one level belongs to a difft *lambda* level to another +ltMajLvl (Level maj1 _ _) (Level maj2 _ _) = maj1 < maj2 + +isTopLvl :: Level -> Bool +isTopLvl (Level 0 0 _) = True +isTopLvl _ = False + +isJoinCeilLvl :: Level -> Bool +isJoinCeilLvl (Level _ _ t) = t == JoinCeilLvl + +instance Outputable Level where + ppr (Level maj min typ) + = hcat [ char '<', int maj, char ',', int min, char '>' + , ppWhen (typ == JoinCeilLvl) (char 'C') ] + +instance Eq Level where + (Level maj1 min1 _) == (Level maj2 min2 _) = maj1 == maj2 && min1 == min2 + +{- +************************************************************************ +* * +\subsection{Main level-setting code} +* * +************************************************************************ +-} + +setLevels :: FloatOutSwitches + -> CoreProgram + -> UniqSupply + -> [LevelledBind] + +setLevels float_lams binds us + = initLvl us (do_them init_env binds) + where + init_env = initialEnv float_lams + + do_them :: LevelEnv -> [CoreBind] -> LvlM [LevelledBind] + do_them _ [] = return [] + do_them env (b:bs) + = do { (lvld_bind, env') <- lvlTopBind env b + ; lvld_binds <- do_them env' bs + ; return (lvld_bind : lvld_binds) } + +lvlTopBind :: LevelEnv -> Bind Id -> LvlM (LevelledBind, LevelEnv) +lvlTopBind env (NonRec bndr rhs) + = do { rhs' <- lvl_top env NonRecursive bndr rhs + ; let (env', [bndr']) = substAndLvlBndrs NonRecursive env tOP_LEVEL [bndr] + ; return (NonRec bndr' rhs', env') } + +lvlTopBind env (Rec pairs) + = do { let (env', bndrs') = substAndLvlBndrs Recursive env tOP_LEVEL + (map fst pairs) + ; rhss' <- mapM (\(b,r) -> lvl_top env' Recursive b r) pairs + ; return (Rec (bndrs' `zip` rhss'), env') } + +lvl_top :: LevelEnv -> RecFlag -> Id -> CoreExpr -> LvlM LevelledExpr +lvl_top env is_rec bndr rhs + = lvlRhs env is_rec + (isBottomingId bndr) + Nothing -- Not a join point + (freeVars rhs) + +{- +************************************************************************ +* * +\subsection{Setting expression levels} +* * +************************************************************************ + +Note [Floating over-saturated applications] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we see (f x y), and (f x) is a redex (ie f's arity is 1), +we call (f x) an "over-saturated application" + +Should we float out an over-sat app, if can escape a value lambda? +It is sometimes very beneficial (-7% runtime -4% alloc over nofib -O2). +But we don't want to do it for class selectors, because the work saved +is minimal, and the extra local thunks allocated cost money. + +Arguably we could float even class-op applications if they were going to +top level -- but then they must be applied to a constant dictionary and +will almost certainly be optimised away anyway. +-} + +lvlExpr :: LevelEnv -- Context + -> CoreExprWithFVs -- Input expression + -> LvlM LevelledExpr -- Result expression + +{- +The @le_ctxt_lvl@ is, roughly, the level of the innermost enclosing +binder. Here's an example + + v = \x -> ...\y -> let r = case (..x..) of + ..x.. + in .. + +When looking at the rhs of @r@, @le_ctxt_lvl@ will be 1 because that's +the level of @r@, even though it's inside a level-2 @\y@. It's +important that @le_ctxt_lvl@ is 1 and not 2 in @r@'s rhs, because we +don't want @lvlExpr@ to turn the scrutinee of the @case@ into an MFE +--- because it isn't a *maximal* free expression. + +If there were another lambda in @r@'s rhs, it would get level-2 as well. +-} + +lvlExpr env (_, AnnType ty) = return (Type (GHC.Core.Subst.substTy (le_subst env) ty)) +lvlExpr env (_, AnnCoercion co) = return (Coercion (substCo (le_subst env) co)) +lvlExpr env (_, AnnVar v) = return (lookupVar env v) +lvlExpr _ (_, AnnLit lit) = return (Lit lit) + +lvlExpr env (_, AnnCast expr (_, co)) = do + expr' <- lvlNonTailExpr env expr + return (Cast expr' (substCo (le_subst env) co)) + +lvlExpr env (_, AnnTick tickish expr) = do + expr' <- lvlNonTailExpr env expr + let tickish' = substTickish (le_subst env) tickish + return (Tick tickish' expr') + +lvlExpr env expr@(_, AnnApp _ _) = lvlApp env expr (collectAnnArgs expr) + +-- We don't split adjacent lambdas. That is, given +-- \x y -> (x+1,y) +-- we don't float to give +-- \x -> let v = x+1 in \y -> (v,y) +-- Why not? Because partial applications are fairly rare, and splitting +-- lambdas makes them more expensive. + +lvlExpr env expr@(_, AnnLam {}) + = do { new_body <- lvlNonTailMFE new_env True body + ; return (mkLams new_bndrs new_body) } + where + (bndrs, body) = collectAnnBndrs expr + (env1, bndrs1) = substBndrsSL NonRecursive env bndrs + (new_env, new_bndrs) = lvlLamBndrs env1 (le_ctxt_lvl env) bndrs1 + -- At one time we called a special version of collectBinders, + -- which ignored coercions, because we don't want to split + -- a lambda like this (\x -> coerce t (\s -> ...)) + -- This used to happen quite a bit in state-transformer programs, + -- but not nearly so much now non-recursive newtypes are transparent. + -- [See GHC.Core.Op.SetLevels rev 1.50 for a version with this approach.] + +lvlExpr env (_, AnnLet bind body) + = do { (bind', new_env) <- lvlBind env bind + ; body' <- lvlExpr new_env body + -- No point in going via lvlMFE here. If the binding is alive + -- (mentioned in body), and the whole let-expression doesn't + -- float, then neither will the body + ; return (Let bind' body') } + +lvlExpr env (_, AnnCase scrut case_bndr ty alts) + = do { scrut' <- lvlNonTailMFE env True scrut + ; lvlCase env (freeVarsOf scrut) scrut' case_bndr ty alts } + +lvlNonTailExpr :: LevelEnv -- Context + -> CoreExprWithFVs -- Input expression + -> LvlM LevelledExpr -- Result expression +lvlNonTailExpr env expr + = lvlExpr (placeJoinCeiling env) expr + +------------------------------------------- +lvlApp :: LevelEnv + -> CoreExprWithFVs + -> (CoreExprWithFVs, [CoreExprWithFVs]) -- Input application + -> LvlM LevelledExpr -- Result expression +lvlApp env orig_expr ((_,AnnVar fn), args) + | floatOverSat env -- See Note [Floating over-saturated applications] + , arity > 0 + , arity < n_val_args + , Nothing <- isClassOpId_maybe fn + = do { rargs' <- mapM (lvlNonTailMFE env False) rargs + ; lapp' <- lvlNonTailMFE env False lapp + ; return (foldl' App lapp' rargs') } + + | otherwise + = do { (_, args') <- mapAccumLM lvl_arg stricts args + -- Take account of argument strictness; see + -- Note [Floating to the top] + ; return (foldl' App (lookupVar env fn) args') } + where + n_val_args = count (isValArg . deAnnotate) args + arity = idArity fn + + stricts :: [Demand] -- True for strict /value/ arguments + stricts = case splitStrictSig (idStrictness fn) of + (arg_ds, _) | arg_ds `lengthExceeds` n_val_args + -> [] + | otherwise + -> arg_ds + + -- Separate out the PAP that we are floating from the extra + -- arguments, by traversing the spine until we have collected + -- (n_val_args - arity) value arguments. + (lapp, rargs) = left (n_val_args - arity) orig_expr [] + + left 0 e rargs = (e, rargs) + left n (_, AnnApp f a) rargs + | isValArg (deAnnotate a) = left (n-1) f (a:rargs) + | otherwise = left n f (a:rargs) + left _ _ _ = panic "GHC.Core.Op.SetLevels.lvlExpr.left" + + is_val_arg :: CoreExprWithFVs -> Bool + is_val_arg (_, AnnType {}) = False + is_val_arg _ = True + + lvl_arg :: [Demand] -> CoreExprWithFVs -> LvlM ([Demand], LevelledExpr) + lvl_arg strs arg | (str1 : strs') <- strs + , is_val_arg arg + = do { arg' <- lvlMFE env (isStrictDmd str1) arg + ; return (strs', arg') } + | otherwise + = do { arg' <- lvlMFE env False arg + ; return (strs, arg') } + +lvlApp env _ (fun, args) + = -- No PAPs that we can float: just carry on with the + -- arguments and the function. + do { args' <- mapM (lvlNonTailMFE env False) args + ; fun' <- lvlNonTailExpr env fun + ; return (foldl' App fun' args') } + +------------------------------------------- +lvlCase :: LevelEnv -- Level of in-scope names/tyvars + -> DVarSet -- Free vars of input scrutinee + -> LevelledExpr -- Processed scrutinee + -> Id -> Type -- Case binder and result type + -> [CoreAltWithFVs] -- Input alternatives + -> LvlM LevelledExpr -- Result expression +lvlCase env scrut_fvs scrut' case_bndr ty alts + -- See Note [Floating single-alternative cases] + | [(con@(DataAlt {}), bs, body)] <- alts + , exprIsHNF (deTagExpr scrut') -- See Note [Check the output scrutinee for exprIsHNF] + , not (isTopLvl dest_lvl) -- Can't have top-level cases + , not (floatTopLvlOnly env) -- Can float anywhere + = -- Always float the case if possible + -- Unlike lets we don't insist that it escapes a value lambda + do { (env1, (case_bndr' : bs')) <- cloneCaseBndrs env dest_lvl (case_bndr : bs) + ; let rhs_env = extendCaseBndrEnv env1 case_bndr scrut' + ; body' <- lvlMFE rhs_env True body + ; let alt' = (con, map (stayPut dest_lvl) bs', body') + ; return (Case scrut' (TB case_bndr' (FloatMe dest_lvl)) ty' [alt']) } + + | otherwise -- Stays put + = do { let (alts_env1, [case_bndr']) = substAndLvlBndrs NonRecursive env incd_lvl [case_bndr] + alts_env = extendCaseBndrEnv alts_env1 case_bndr scrut' + ; alts' <- mapM (lvl_alt alts_env) alts + ; return (Case scrut' case_bndr' ty' alts') } + where + ty' = substTy (le_subst env) ty + + incd_lvl = incMinorLvl (le_ctxt_lvl env) + dest_lvl = maxFvLevel (const True) env scrut_fvs + -- Don't abstract over type variables, hence const True + + lvl_alt alts_env (con, bs, rhs) + = do { rhs' <- lvlMFE new_env True rhs + ; return (con, bs', rhs') } + where + (new_env, bs') = substAndLvlBndrs NonRecursive alts_env incd_lvl bs + +{- Note [Floating single-alternative cases] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this: + data T a = MkT !a + f :: T Int -> blah + f x vs = case x of { MkT y -> + let f vs = ...(case y of I# w -> e)...f.. + in f vs + +Here we can float the (case y ...) out, because y is sure +to be evaluated, to give + f x vs = case x of { MkT y -> + case y of I# w -> + let f vs = ...(e)...f.. + in f vs + +That saves unboxing it every time round the loop. It's important in +some DPH stuff where we really want to avoid that repeated unboxing in +the inner loop. + +Things to note: + + * The test we perform is exprIsHNF, and /not/ exprOkForSpeculation. + + - exrpIsHNF catches the key case of an evaluated variable + + - exprOkForSpeculation is /false/ of an evaluated variable; + See Note [exprOkForSpeculation and evaluated variables] in GHC.Core.Utils + So we'd actually miss the key case! + + - Nothing is gained from the extra generality of exprOkForSpeculation + since we only consider floating a case whose single alternative + is a DataAlt K a b -> rhs + + * We can't float a case to top level + + * It's worth doing this float even if we don't float + the case outside a value lambda. Example + case x of { + MkT y -> (case y of I# w2 -> ..., case y of I# w2 -> ...) + If we floated the cases out we could eliminate one of them. + + * We only do this with a single-alternative case + + +Note [Setting levels when floating single-alternative cases] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Handling level-setting when floating a single-alternative case binding +is a bit subtle, as evidenced by #16978. In particular, we must keep +in mind that we are merely moving the case and its binders, not the +body. For example, suppose 'a' is known to be evaluated and we have + + \z -> case a of + (x,_) -> <body involving x and z> + +After floating we may have: + + case a of + (x,_) -> \z -> <body involving x and z> + {- some expression involving x and z -} + +When analysing <body involving...> we want to use the /ambient/ level, +and /not/ the destination level of the 'case a of (x,-) ->' binding. + +#16978 was caused by us setting the context level to the destination +level of `x` when analysing <body>. This led us to conclude that we +needed to quantify over some of its free variables (e.g. z), resulting +in shadowing and very confusing Core Lint failures. + + +Note [Check the output scrutinee for exprIsHNF] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this: + case x of y { + A -> ....(case y of alts).... + } + +Because of the binder-swap, the inner case will get substituted to +(case x of ..). So when testing whether the scrutinee is in HNF we +must be careful to test the *result* scrutinee ('x' in this case), not +the *input* one 'y'. The latter *is* in HNF here (because y is +evaluated), but the former is not -- and indeed we can't float the +inner case out, at least not unless x is also evaluated at its binding +site. See #5453. + +That's why we apply exprIsHNF to scrut' and not to scrut. + +See Note [Floating single-alternative cases] for why +we use exprIsHNF in the first place. +-} + +lvlNonTailMFE :: LevelEnv -- Level of in-scope names/tyvars + -> Bool -- True <=> strict context [body of case + -- or let] + -> CoreExprWithFVs -- input expression + -> LvlM LevelledExpr -- Result expression +lvlNonTailMFE env strict_ctxt ann_expr + = lvlMFE (placeJoinCeiling env) strict_ctxt ann_expr + +lvlMFE :: LevelEnv -- Level of in-scope names/tyvars + -> Bool -- True <=> strict context [body of case or let] + -> CoreExprWithFVs -- input expression + -> LvlM LevelledExpr -- Result expression +-- lvlMFE is just like lvlExpr, except that it might let-bind +-- the expression, so that it can itself be floated. + +lvlMFE env _ (_, AnnType ty) + = return (Type (GHC.Core.Subst.substTy (le_subst env) ty)) + +-- No point in floating out an expression wrapped in a coercion or note +-- If we do we'll transform lvl = e |> co +-- to lvl' = e; lvl = lvl' |> co +-- and then inline lvl. Better just to float out the payload. +lvlMFE env strict_ctxt (_, AnnTick t e) + = do { e' <- lvlMFE env strict_ctxt e + ; let t' = substTickish (le_subst env) t + ; return (Tick t' e') } + +lvlMFE env strict_ctxt (_, AnnCast e (_, co)) + = do { e' <- lvlMFE env strict_ctxt e + ; return (Cast e' (substCo (le_subst env) co)) } + +lvlMFE env strict_ctxt e@(_, AnnCase {}) + | strict_ctxt -- Don't share cases in a strict context + = lvlExpr env e -- See Note [Case MFEs] + +lvlMFE env strict_ctxt ann_expr + | floatTopLvlOnly env && not (isTopLvl dest_lvl) + -- Only floating to the top level is allowed. + || anyDVarSet isJoinId fvs -- If there is a free join, don't float + -- See Note [Free join points] + || isExprLevPoly expr + -- We can't let-bind levity polymorphic expressions + -- See Note [Levity polymorphism invariants] in GHC.Core + || notWorthFloating expr abs_vars + || not float_me + = -- Don't float it out + lvlExpr env ann_expr + + | float_is_new_lam || exprIsTopLevelBindable expr expr_ty + -- No wrapping needed if the type is lifted, or is a literal string + -- or if we are wrapping it in one or more value lambdas + = do { expr1 <- lvlFloatRhs abs_vars dest_lvl rhs_env NonRecursive + (isJust mb_bot_str) + join_arity_maybe + ann_expr + -- Treat the expr just like a right-hand side + ; var <- newLvlVar expr1 join_arity_maybe is_mk_static + ; let var2 = annotateBotStr var float_n_lams mb_bot_str + ; return (Let (NonRec (TB var2 (FloatMe dest_lvl)) expr1) + (mkVarApps (Var var2) abs_vars)) } + + -- OK, so the float has an unlifted type (not top-level bindable) + -- and no new value lambdas (float_is_new_lam is False) + -- Try for the boxing strategy + -- See Note [Floating MFEs of unlifted type] + | escapes_value_lam + , not expr_ok_for_spec -- Boxing/unboxing isn't worth it for cheap expressions + -- See Note [Test cheapness with exprOkForSpeculation] + , Just (tc, _) <- splitTyConApp_maybe expr_ty + , Just dc <- boxingDataCon_maybe tc + , let dc_res_ty = dataConOrigResTy dc -- No free type variables + [bx_bndr, ubx_bndr] = mkTemplateLocals [dc_res_ty, expr_ty] + = do { expr1 <- lvlExpr rhs_env ann_expr + ; let l1r = incMinorLvlFrom rhs_env + float_rhs = mkLams abs_vars_w_lvls $ + Case expr1 (stayPut l1r ubx_bndr) dc_res_ty + [(DEFAULT, [], mkConApp dc [Var ubx_bndr])] + + ; var <- newLvlVar float_rhs Nothing is_mk_static + ; let l1u = incMinorLvlFrom env + use_expr = Case (mkVarApps (Var var) abs_vars) + (stayPut l1u bx_bndr) expr_ty + [(DataAlt dc, [stayPut l1u ubx_bndr], Var ubx_bndr)] + ; return (Let (NonRec (TB var (FloatMe dest_lvl)) float_rhs) + use_expr) } + + | otherwise -- e.g. do not float unboxed tuples + = lvlExpr env ann_expr + + where + expr = deAnnotate ann_expr + expr_ty = exprType expr + fvs = freeVarsOf ann_expr + fvs_ty = tyCoVarsOfType expr_ty + is_bot = isBottomThunk mb_bot_str + is_function = isFunction ann_expr + mb_bot_str = exprBotStrictness_maybe expr + -- See Note [Bottoming floats] + -- esp Bottoming floats (2) + expr_ok_for_spec = exprOkForSpeculation expr + dest_lvl = destLevel env fvs fvs_ty is_function is_bot False + abs_vars = abstractVars dest_lvl env fvs + + -- float_is_new_lam: the floated thing will be a new value lambda + -- replacing, say (g (x+4)) by (lvl x). No work is saved, nor is + -- allocation saved. The benefit is to get it to the top level + -- and hence out of the body of this function altogether, making + -- it smaller and more inlinable + float_is_new_lam = float_n_lams > 0 + float_n_lams = count isId abs_vars + + (rhs_env, abs_vars_w_lvls) = lvlLamBndrs env dest_lvl abs_vars + + join_arity_maybe = Nothing + + is_mk_static = isJust (collectMakeStaticArgs expr) + -- Yuk: See Note [Grand plan for static forms] in main/StaticPtrTable + + -- A decision to float entails let-binding this thing, and we only do + -- that if we'll escape a value lambda, or will go to the top level. + float_me = saves_work || saves_alloc || is_mk_static + + -- We can save work if we can move a redex outside a value lambda + -- But if float_is_new_lam is True, then the redex is wrapped in a + -- a new lambda, so no work is saved + saves_work = escapes_value_lam && not float_is_new_lam + + escapes_value_lam = dest_lvl `ltMajLvl` (le_ctxt_lvl env) + -- See Note [Escaping a value lambda] + + -- See Note [Floating to the top] + saves_alloc = isTopLvl dest_lvl + && floatConsts env + && (not strict_ctxt || is_bot || exprIsHNF expr) + +isBottomThunk :: Maybe (Arity, s) -> Bool +-- See Note [Bottoming floats] (2) +isBottomThunk (Just (0, _)) = True -- Zero arity +isBottomThunk _ = False + +{- Note [Floating to the top] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We are keen to float something to the top level, even if it does not +escape a value lambda (and hence save work), for two reasons: + + * Doing so makes the function smaller, by floating out + bottoming expressions, or integer or string literals. That in + turn makes it easier to inline, with less duplication. + + * (Minor) Doing so may turn a dynamic allocation (done by machine + instructions) into a static one. Minor because we are assuming + we are not escaping a value lambda. + +But do not so if: + - the context is a strict, and + - the expression is not a HNF, and + - the expression is not bottoming + +Exammples: + +* Bottoming + f x = case x of + 0 -> error <big thing> + _ -> x+1 + Here we want to float (error <big thing>) to top level, abstracting + over 'x', so as to make f's RHS smaller. + +* HNF + f = case y of + True -> p:q + False -> blah + We may as well float the (p:q) so it becomes a static data structure. + +* Case scrutinee + f = case g True of .... + Don't float (g True) to top level; then we have the admin of a + top-level thunk to worry about, with zero gain. + +* Case alternative + h = case y of + True -> g True + False -> False + Don't float (g True) to the top level + +* Arguments + t = f (g True) + If f is lazy, we /do/ float (g True) because then we can allocate + the thunk statically rather than dynamically. But if f is strict + we don't (see the use of idStrictness in lvlApp). It's not clear + if this test is worth the bother: it's only about CAFs! + +It's controlled by a flag (floatConsts), because doing this too +early loses opportunities for RULES which (needless to say) are +important in some nofib programs (gcd is an example). [SPJ note: +I think this is obsolete; the flag seems always on.] + +Note [Floating join point bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Mostly we only float a join point if it can /stay/ a join point. But +there is one exception: if it can go to the top level (#13286). +Consider + f x = joinrec j y n = <...j y' n'...> + in jump j x 0 + +Here we may just as well produce + j y n = <....j y' n'...> + f x = j x 0 + +and now there is a chance that 'f' will be inlined at its call sites. +It shouldn't make a lot of difference, but these tests + perf/should_run/MethSharing + simplCore/should_compile/spec-inline +and one nofib program, all improve if you do float to top, because +of the resulting inlining of f. So ok, let's do it. + +Note [Free join points] +~~~~~~~~~~~~~~~~~~~~~~~ +We never float a MFE that has a free join-point variable. You might think +this can never occur. After all, consider + join j x = ... + in ....(jump j x).... +How might we ever want to float that (jump j x)? + * If it would escape a value lambda, thus + join j x = ... in (\y. ...(jump j x)... ) + then 'j' isn't a valid join point in the first place. + +But consider + join j x = .... in + joinrec j2 y = ...(jump j x)...(a+b).... + +Since j2 is recursive, it /is/ worth floating (a+b) out of the joinrec. +But it is emphatically /not/ good to float the (jump j x) out: + (a) 'j' will stop being a join point + (b) In any case, jumping to 'j' must be an exit of the j2 loop, so no + work would be saved by floating it out of the \y. + +Even if we floated 'j' to top level, (b) would still hold. + +Bottom line: never float a MFE that has a free JoinId. + +Note [Floating MFEs of unlifted type] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose we have + case f x of (r::Int#) -> blah +we'd like to float (f x). But it's not trivial because it has type +Int#, and we don't want to evaluate it too early. But we can instead +float a boxed version + y = case f x of r -> I# r +and replace the original (f x) with + case (case y of I# r -> r) of r -> blah + +Being able to float unboxed expressions is sometimes important; see +#12603. I'm not sure how /often/ it is important, but it's +not hard to achieve. + +We only do it for a fixed collection of types for which we have a +convenient boxing constructor (see boxingDataCon_maybe). In +particular we /don't/ do it for unboxed tuples; it's better to float +the components of the tuple individually. + +I did experiment with a form of boxing that works for any type, namely +wrapping in a function. In our example + + let y = case f x of r -> \v. f x + in case y void of r -> blah + +It works fine, but it's 50% slower (based on some crude benchmarking). +I suppose we could do it for types not covered by boxingDataCon_maybe, +but it's more code and I'll wait to see if anyone wants it. + +Note [Test cheapness with exprOkForSpeculation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We don't want to float very cheap expressions by boxing and unboxing. +But we use exprOkForSpeculation for the test, not exprIsCheap. +Why? Because it's important /not/ to transform + f (a /# 3) +to + f (case bx of I# a -> a /# 3) +and float bx = I# (a /# 3), because the application of f no +longer obeys the let/app invariant. But (a /# 3) is ok-for-spec +due to a special hack that says division operators can't fail +when the denominator is definitely non-zero. And yet that +same expression says False to exprIsCheap. Simplest way to +guarantee the let/app invariant is to use the same function! + +If an expression is okay for speculation, we could also float it out +*without* boxing and unboxing, since evaluating it early is okay. +However, it turned out to usually be better not to float such expressions, +since they tend to be extremely cheap things like (x +# 1#). Even the +cost of spilling the let-bound variable to the stack across a call may +exceed the cost of recomputing such an expression. (And we can't float +unlifted bindings to top-level.) + +We could try to do something smarter here, and float out expensive yet +okay-for-speculation things, such as division by non-zero constants. +But I suspect it's a narrow target. + +Note [Bottoming floats] +~~~~~~~~~~~~~~~~~~~~~~~ +If we see + f = \x. g (error "urk") +we'd like to float the call to error, to get + lvl = error "urk" + f = \x. g lvl + +But, as ever, we need to be careful: + +(1) We want to float a bottoming + expression even if it has free variables: + f = \x. g (let v = h x in error ("urk" ++ v)) + Then we'd like to abstract over 'x' can float the whole arg of g: + lvl = \x. let v = h x in error ("urk" ++ v) + f = \x. g (lvl x) + To achieve this we pass is_bot to destLevel + +(2) We do not do this for lambdas that return + bottom. Instead we treat the /body/ of such a function specially, + via point (1). For example: + f = \x. ....(\y z. if x then error y else error z).... + ===> + lvl = \x z y. if b then error y else error z + f = \x. ...(\y z. lvl x z y)... + (There is no guarantee that we'll choose the perfect argument order.) + +(3) If we have a /binding/ that returns bottom, we want to float it to top + level, even if it has free vars (point (1)), and even it has lambdas. + Example: + ... let { v = \y. error (show x ++ show y) } in ... + We want to abstract over x and float the whole thing to top: + lvl = \xy. errror (show x ++ show y) + ...let {v = lvl x} in ... + + Then of course we don't want to separately float the body (error ...) + as /another/ MFE, so we tell lvlFloatRhs not to do that, via the is_bot + argument. + +See Maessen's paper 1999 "Bottom extraction: factoring error handling out +of functional programs" (unpublished I think). + +When we do this, we set the strictness and arity of the new bottoming +Id, *immediately*, for three reasons: + + * To prevent the abstracted thing being immediately inlined back in again + via preInlineUnconditionally. The latter has a test for bottoming Ids + to stop inlining them, so we'd better make sure it *is* a bottoming Id! + + * So that it's properly exposed as such in the interface file, even if + this is all happening after strictness analysis. + + * In case we do CSE with the same expression that *is* marked bottom + lvl = error "urk" + x{str=bot) = error "urk" + Here we don't want to replace 'x' with 'lvl', else we may get Lint + errors, e.g. via a case with empty alternatives: (case x of {}) + Lint complains unless the scrutinee of such a case is clearly bottom. + + This was reported in #11290. But since the whole bottoming-float + thing is based on the cheap-and-cheerful exprIsBottom, I'm not sure + that it'll nail all such cases. + +Note [Bottoming floats: eta expansion] c.f Note [Bottoming floats] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Tiresomely, though, the simplifier has an invariant that the manifest +arity of the RHS should be the same as the arity; but we can't call +etaExpand during GHC.Core.Op.SetLevels because it works over a decorated form of +CoreExpr. So we do the eta expansion later, in GHC.Core.Op.FloatOut. + +Note [Case MFEs] +~~~~~~~~~~~~~~~~ +We don't float a case expression as an MFE from a strict context. Why not? +Because in doing so we share a tiny bit of computation (the switch) but +in exchange we build a thunk, which is bad. This case reduces allocation +by 7% in spectral/puzzle (a rather strange benchmark) and 1.2% in real/fem. +Doesn't change any other allocation at all. + +We will make a separate decision for the scrutinee and alternatives. + +However this can have a knock-on effect for fusion: consider + \v -> foldr k z (case x of I# y -> build ..y..) +Perhaps we can float the entire (case x of ...) out of the \v. Then +fusion will not happen, but we will get more sharing. But if we don't +float the case (as advocated here) we won't float the (build ...y..) +either, so fusion will happen. It can be a big effect, esp in some +artificial benchmarks (e.g. integer, queens), but there is no perfect +answer. + +-} + +annotateBotStr :: Id -> Arity -> Maybe (Arity, StrictSig) -> Id +-- See Note [Bottoming floats] for why we want to add +-- bottoming information right now +-- +-- n_extra are the number of extra value arguments added during floating +annotateBotStr id n_extra mb_str + = case mb_str of + Nothing -> id + Just (arity, sig) -> id `setIdArity` (arity + n_extra) + `setIdStrictness` (increaseStrictSigArity n_extra sig) + `setIdCprInfo` mkCprSig (arity + n_extra) botCpr + +notWorthFloating :: CoreExpr -> [Var] -> Bool +-- Returns True if the expression would be replaced by +-- something bigger than it is now. For example: +-- abs_vars = tvars only: return True if e is trivial, +-- but False for anything bigger +-- abs_vars = [x] (an Id): return True for trivial, or an application (f x) +-- but False for (f x x) +-- +-- One big goal is that floating should be idempotent. Eg if +-- we replace e with (lvl79 x y) and then run FloatOut again, don't want +-- to replace (lvl79 x y) with (lvl83 x y)! + +notWorthFloating e abs_vars + = go e (count isId abs_vars) + where + go (Var {}) n = n >= 0 + go (Lit lit) n = ASSERT( n==0 ) + litIsTrivial lit -- Note [Floating literals] + go (Tick t e) n = not (tickishIsCode t) && go e n + go (Cast e _) n = go e n + go (App e arg) n + -- See Note [Floating applications to coercions] + | Type {} <- arg = go e n + | n==0 = False + | is_triv arg = go e (n-1) + | otherwise = False + go _ _ = False + + is_triv (Lit {}) = True -- Treat all literals as trivial + is_triv (Var {}) = True -- (ie not worth floating) + is_triv (Cast e _) = is_triv e + is_triv (App e (Type {})) = is_triv e -- See Note [Floating applications to coercions] + is_triv (Tick t e) = not (tickishIsCode t) && is_triv e + is_triv _ = False + +{- +Note [Floating literals] +~~~~~~~~~~~~~~~~~~~~~~~~ +It's important to float Integer literals, so that they get shared, +rather than being allocated every time round the loop. +Hence the litIsTrivial. + +Ditto literal strings (LitString), which we'd like to float to top +level, which is now possible. + +Note [Floating applications to coercions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We don’t float out variables applied only to type arguments, since the +extra binding would be pointless: type arguments are completely erased. +But *coercion* arguments aren’t (see Note [Coercion tokens] in +CoreToStg.hs and Note [Count coercion arguments in boring contexts] in +CoreUnfold.hs), so we still want to float out variables applied only to +coercion arguments. + +Note [Escaping a value lambda] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We want to float even cheap expressions out of value lambdas, +because that saves allocation. Consider + f = \x. .. (\y.e) ... +Then we'd like to avoid allocating the (\y.e) every time we call f, +(assuming e does not mention x). An example where this really makes a +difference is simplrun009. + +Another reason it's good is because it makes SpecContr fire on functions. +Consider + f = \x. ....(f (\y.e)).... +After floating we get + lvl = \y.e + f = \x. ....(f lvl)... +and that is much easier for SpecConstr to generate a robust +specialisation for. + +However, if we are wrapping the thing in extra value lambdas (in +abs_vars), then nothing is saved. E.g. + f = \xyz. ...(e1[y],e2).... +If we float + lvl = \y. (e1[y],e2) + f = \xyz. ...(lvl y)... +we have saved nothing: one pair will still be allocated for each +call of 'f'. Hence the (not float_is_lam) in float_me. + + +************************************************************************ +* * +\subsection{Bindings} +* * +************************************************************************ + +The binding stuff works for top level too. +-} + +lvlBind :: LevelEnv + -> CoreBindWithFVs + -> LvlM (LevelledBind, LevelEnv) + +lvlBind env (AnnNonRec bndr rhs) + | isTyVar bndr -- Don't do anything for TyVar binders + -- (simplifier gets rid of them pronto) + || isCoVar bndr -- Difficult to fix up CoVar occurrences (see extendPolyLvlEnv) + -- so we will ignore this case for now + || not (profitableFloat env dest_lvl) + || (isTopLvl dest_lvl && not (exprIsTopLevelBindable deann_rhs bndr_ty)) + -- We can't float an unlifted binding to top level (except + -- literal strings), so we don't float it at all. It's a + -- bit brutal, but unlifted bindings aren't expensive either + + = -- No float + do { rhs' <- lvlRhs env NonRecursive is_bot mb_join_arity rhs + ; let bind_lvl = incMinorLvl (le_ctxt_lvl env) + (env', [bndr']) = substAndLvlBndrs NonRecursive env bind_lvl [bndr] + ; return (NonRec bndr' rhs', env') } + + -- Otherwise we are going to float + | null abs_vars + = do { -- No type abstraction; clone existing binder + rhs' <- lvlFloatRhs [] dest_lvl env NonRecursive + is_bot mb_join_arity rhs + ; (env', [bndr']) <- cloneLetVars NonRecursive env dest_lvl [bndr] + ; let bndr2 = annotateBotStr bndr' 0 mb_bot_str + ; return (NonRec (TB bndr2 (FloatMe dest_lvl)) rhs', env') } + + | otherwise + = do { -- Yes, type abstraction; create a new binder, extend substitution, etc + rhs' <- lvlFloatRhs abs_vars dest_lvl env NonRecursive + is_bot mb_join_arity rhs + ; (env', [bndr']) <- newPolyBndrs dest_lvl env abs_vars [bndr] + ; let bndr2 = annotateBotStr bndr' n_extra mb_bot_str + ; return (NonRec (TB bndr2 (FloatMe dest_lvl)) rhs', env') } + + where + bndr_ty = idType bndr + ty_fvs = tyCoVarsOfType bndr_ty + rhs_fvs = freeVarsOf rhs + bind_fvs = rhs_fvs `unionDVarSet` dIdFreeVars bndr + abs_vars = abstractVars dest_lvl env bind_fvs + dest_lvl = destLevel env bind_fvs ty_fvs (isFunction rhs) is_bot is_join + + deann_rhs = deAnnotate rhs + mb_bot_str = exprBotStrictness_maybe deann_rhs + is_bot = isJust mb_bot_str + -- NB: not isBottomThunk! See Note [Bottoming floats] point (3) + + n_extra = count isId abs_vars + mb_join_arity = isJoinId_maybe bndr + is_join = isJust mb_join_arity + +lvlBind env (AnnRec pairs) + | floatTopLvlOnly env && not (isTopLvl dest_lvl) + -- Only floating to the top level is allowed. + || not (profitableFloat env dest_lvl) + || (isTopLvl dest_lvl && any (mightBeUnliftedType . idType) bndrs) + -- This mightBeUnliftedType stuff is the same test as in the non-rec case + -- You might wonder whether we can have a recursive binding for + -- an unlifted value -- but we can if it's a /join binding/ (#16978) + -- (Ultimately I think we should not use GHC.Core.Op.SetLevels to + -- float join bindings at all, but that's another story.) + = -- No float + do { let bind_lvl = incMinorLvl (le_ctxt_lvl env) + (env', bndrs') = substAndLvlBndrs Recursive env bind_lvl bndrs + lvl_rhs (b,r) = lvlRhs env' Recursive is_bot (isJoinId_maybe b) r + ; rhss' <- mapM lvl_rhs pairs + ; return (Rec (bndrs' `zip` rhss'), env') } + + -- Otherwise we are going to float + | null abs_vars + = do { (new_env, new_bndrs) <- cloneLetVars Recursive env dest_lvl bndrs + ; new_rhss <- mapM (do_rhs new_env) pairs + ; return ( Rec ([TB b (FloatMe dest_lvl) | b <- new_bndrs] `zip` new_rhss) + , new_env) } + +-- ToDo: when enabling the floatLambda stuff, +-- I think we want to stop doing this + | [(bndr,rhs)] <- pairs + , count isId abs_vars > 1 + = do -- Special case for self recursion where there are + -- several variables carried around: build a local loop: + -- poly_f = \abs_vars. \lam_vars . letrec f = \lam_vars. rhs in f lam_vars + -- This just makes the closures a bit smaller. If we don't do + -- this, allocation rises significantly on some programs + -- + -- We could elaborate it for the case where there are several + -- mutually recursive functions, but it's quite a bit more complicated + -- + -- This all seems a bit ad hoc -- sigh + let (rhs_env, abs_vars_w_lvls) = lvlLamBndrs env dest_lvl abs_vars + rhs_lvl = le_ctxt_lvl rhs_env + + (rhs_env', [new_bndr]) <- cloneLetVars Recursive rhs_env rhs_lvl [bndr] + let + (lam_bndrs, rhs_body) = collectAnnBndrs rhs + (body_env1, lam_bndrs1) = substBndrsSL NonRecursive rhs_env' lam_bndrs + (body_env2, lam_bndrs2) = lvlLamBndrs body_env1 rhs_lvl lam_bndrs1 + new_rhs_body <- lvlRhs body_env2 Recursive is_bot (get_join bndr) rhs_body + (poly_env, [poly_bndr]) <- newPolyBndrs dest_lvl env abs_vars [bndr] + return (Rec [(TB poly_bndr (FloatMe dest_lvl) + , mkLams abs_vars_w_lvls $ + mkLams lam_bndrs2 $ + Let (Rec [( TB new_bndr (StayPut rhs_lvl) + , mkLams lam_bndrs2 new_rhs_body)]) + (mkVarApps (Var new_bndr) lam_bndrs1))] + , poly_env) + + | otherwise -- Non-null abs_vars + = do { (new_env, new_bndrs) <- newPolyBndrs dest_lvl env abs_vars bndrs + ; new_rhss <- mapM (do_rhs new_env) pairs + ; return ( Rec ([TB b (FloatMe dest_lvl) | b <- new_bndrs] `zip` new_rhss) + , new_env) } + + where + (bndrs,rhss) = unzip pairs + is_join = isJoinId (head bndrs) + -- bndrs is always non-empty and if one is a join they all are + -- Both are checked by Lint + is_fun = all isFunction rhss + is_bot = False -- It's odd to have an unconditionally divergent + -- function in a Rec, and we don't much care what + -- happens to it. False is simple! + + do_rhs env (bndr,rhs) = lvlFloatRhs abs_vars dest_lvl env Recursive + is_bot (get_join bndr) + rhs + + get_join bndr | need_zap = Nothing + | otherwise = isJoinId_maybe bndr + need_zap = dest_lvl `ltLvl` joinCeilingLevel env + + -- Finding the free vars of the binding group is annoying + bind_fvs = ((unionDVarSets [ freeVarsOf rhs | (_, rhs) <- pairs]) + `unionDVarSet` + (fvDVarSet $ unionsFV [ idFVs bndr + | (bndr, (_,_)) <- pairs])) + `delDVarSetList` + bndrs + + ty_fvs = foldr (unionVarSet . tyCoVarsOfType . idType) emptyVarSet bndrs + dest_lvl = destLevel env bind_fvs ty_fvs is_fun is_bot is_join + abs_vars = abstractVars dest_lvl env bind_fvs + +profitableFloat :: LevelEnv -> Level -> Bool +profitableFloat env dest_lvl + = (dest_lvl `ltMajLvl` le_ctxt_lvl env) -- Escapes a value lambda + || isTopLvl dest_lvl -- Going all the way to top level + + +---------------------------------------------------- +-- Three help functions for the type-abstraction case + +lvlRhs :: LevelEnv + -> RecFlag + -> Bool -- Is this a bottoming function + -> Maybe JoinArity + -> CoreExprWithFVs + -> LvlM LevelledExpr +lvlRhs env rec_flag is_bot mb_join_arity expr + = lvlFloatRhs [] (le_ctxt_lvl env) env + rec_flag is_bot mb_join_arity expr + +lvlFloatRhs :: [OutVar] -> Level -> LevelEnv -> RecFlag + -> Bool -- Binding is for a bottoming function + -> Maybe JoinArity + -> CoreExprWithFVs + -> LvlM (Expr LevelledBndr) +-- Ignores the le_ctxt_lvl in env; treats dest_lvl as the baseline +lvlFloatRhs abs_vars dest_lvl env rec is_bot mb_join_arity rhs + = do { body' <- if not is_bot -- See Note [Floating from a RHS] + && any isId bndrs + then lvlMFE body_env True body + else lvlExpr body_env body + ; return (mkLams bndrs' body') } + where + (bndrs, body) | Just join_arity <- mb_join_arity + = collectNAnnBndrs join_arity rhs + | otherwise + = collectAnnBndrs rhs + (env1, bndrs1) = substBndrsSL NonRecursive env bndrs + all_bndrs = abs_vars ++ bndrs1 + (body_env, bndrs') | Just _ <- mb_join_arity + = lvlJoinBndrs env1 dest_lvl rec all_bndrs + | otherwise + = case lvlLamBndrs env1 dest_lvl all_bndrs of + (env2, bndrs') -> (placeJoinCeiling env2, bndrs') + -- The important thing here is that we call lvlLamBndrs on + -- all these binders at once (abs_vars and bndrs), so they + -- all get the same major level. Otherwise we create stupid + -- let-bindings inside, joyfully thinking they can float; but + -- in the end they don't because we never float bindings in + -- between lambdas + +{- Note [Floating from a RHS] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When floating the RHS of a let-binding, we don't always want to apply +lvlMFE to the body of a lambda, as we usually do, because the entire +binding body is already going to the right place (dest_lvl). + +A particular example is the top level. Consider + concat = /\ a -> foldr ..a.. (++) [] +We don't want to float the body of the lambda to get + lvl = /\ a -> foldr ..a.. (++) [] + concat = /\ a -> lvl a +That would be stupid. + +Previously this was avoided in a much nastier way, by testing strict_ctxt +in float_me in lvlMFE. But that wasn't even right because it would fail +to float out the error sub-expression in + f = \x. case x of + True -> error ("blah" ++ show x) + False -> ... + +But we must be careful: + +* If we had + f = \x -> factorial 20 + we /would/ want to float that (factorial 20) out! Functions are treated + differently: see the use of isFunction in the calls to destLevel. If + there are only type lambdas, then destLevel will say "go to top, and + abstract over the free tyvars" and we don't want that here. + +* But if we had + f = \x -> error (...x....) + we would NOT want to float the bottoming expression out to give + lvl = \x -> error (...x...) + f = \x -> lvl x + +Conclusion: use lvlMFE if there are + * any value lambdas in the original function, and + * this is not a bottoming function (the is_bot argument) +Use lvlExpr otherwise. A little subtle, and I got it wrong at least twice +(e.g. #13369). +-} + +{- +************************************************************************ +* * +\subsection{Deciding floatability} +* * +************************************************************************ +-} + +substAndLvlBndrs :: RecFlag -> LevelEnv -> Level -> [InVar] -> (LevelEnv, [LevelledBndr]) +substAndLvlBndrs is_rec env lvl bndrs + = lvlBndrs subst_env lvl subst_bndrs + where + (subst_env, subst_bndrs) = substBndrsSL is_rec env bndrs + +substBndrsSL :: RecFlag -> LevelEnv -> [InVar] -> (LevelEnv, [OutVar]) +-- So named only to avoid the name clash with GHC.Core.Subst.substBndrs +substBndrsSL is_rec env@(LE { le_subst = subst, le_env = id_env }) bndrs + = ( env { le_subst = subst' + , le_env = foldl' add_id id_env (bndrs `zip` bndrs') } + , bndrs') + where + (subst', bndrs') = case is_rec of + NonRecursive -> substBndrs subst bndrs + Recursive -> substRecBndrs subst bndrs + +lvlLamBndrs :: LevelEnv -> Level -> [OutVar] -> (LevelEnv, [LevelledBndr]) +-- Compute the levels for the binders of a lambda group +lvlLamBndrs env lvl bndrs + = lvlBndrs env new_lvl bndrs + where + new_lvl | any is_major bndrs = incMajorLvl lvl + | otherwise = incMinorLvl lvl + + is_major bndr = isId bndr && not (isProbablyOneShotLambda bndr) + -- The "probably" part says "don't float things out of a + -- probable one-shot lambda" + -- See Note [Computing one-shot info] in Demand.hs + +lvlJoinBndrs :: LevelEnv -> Level -> RecFlag -> [OutVar] + -> (LevelEnv, [LevelledBndr]) +lvlJoinBndrs env lvl rec bndrs + = lvlBndrs env new_lvl bndrs + where + new_lvl | isRec rec = incMajorLvl lvl + | otherwise = incMinorLvl lvl + -- Non-recursive join points are one-shot; recursive ones are not + +lvlBndrs :: LevelEnv -> Level -> [CoreBndr] -> (LevelEnv, [LevelledBndr]) +-- The binders returned are exactly the same as the ones passed, +-- apart from applying the substitution, but they are now paired +-- with a (StayPut level) +-- +-- The returned envt has le_ctxt_lvl updated to the new_lvl +-- +-- All the new binders get the same level, because +-- any floating binding is either going to float past +-- all or none. We never separate binders. +lvlBndrs env@(LE { le_lvl_env = lvl_env }) new_lvl bndrs + = ( env { le_ctxt_lvl = new_lvl + , le_join_ceil = new_lvl + , le_lvl_env = addLvls new_lvl lvl_env bndrs } + , map (stayPut new_lvl) bndrs) + +stayPut :: Level -> OutVar -> LevelledBndr +stayPut new_lvl bndr = TB bndr (StayPut new_lvl) + + -- Destination level is the max Id level of the expression + -- (We'll abstract the type variables, if any.) +destLevel :: LevelEnv + -> DVarSet -- Free vars of the term + -> TyCoVarSet -- Free in the /type/ of the term + -- (a subset of the previous argument) + -> Bool -- True <=> is function + -> Bool -- True <=> is bottom + -> Bool -- True <=> is a join point + -> Level +-- INVARIANT: if is_join=True then result >= join_ceiling +destLevel env fvs fvs_ty is_function is_bot is_join + | isTopLvl max_fv_id_level -- Float even joins if they get to top level + -- See Note [Floating join point bindings] + = tOP_LEVEL + + | is_join -- Never float a join point past the join ceiling + -- See Note [Join points] in GHC.Core.Op.FloatOut + = if max_fv_id_level `ltLvl` join_ceiling + then join_ceiling + else max_fv_id_level + + | is_bot -- Send bottoming bindings to the top + = as_far_as_poss -- regardless; see Note [Bottoming floats] + -- Esp Bottoming floats (1) + + | Just n_args <- floatLams env + , n_args > 0 -- n=0 case handled uniformly by the 'otherwise' case + , is_function + , countFreeIds fvs <= n_args + = as_far_as_poss -- Send functions to top level; see + -- the comments with isFunction + + | otherwise = max_fv_id_level + where + join_ceiling = joinCeilingLevel env + max_fv_id_level = maxFvLevel isId env fvs -- Max over Ids only; the + -- tyvars will be abstracted + + as_far_as_poss = maxFvLevel' isId env fvs_ty + -- See Note [Floating and kind casts] + +{- Note [Floating and kind casts] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this + case x of + K (co :: * ~# k) -> let v :: Int |> co + v = e + in blah + +Then, even if we are abstracting over Ids, or if e is bottom, we can't +float v outside the 'co' binding. Reason: if we did we'd get + v' :: forall k. (Int ~# Age) => Int |> co +and now 'co' isn't in scope in that type. The underlying reason is +that 'co' is a value-level thing and we can't abstract over that in a +type (else we'd get a dependent type). So if v's /type/ mentions 'co' +we can't float it out beyond the binding site of 'co'. + +That's why we have this as_far_as_poss stuff. Usually as_far_as_poss +is just tOP_LEVEL; but occasionally a coercion variable (which is an +Id) mentioned in type prevents this. + +Example #14270 comment:15. +-} + + +isFunction :: CoreExprWithFVs -> Bool +-- The idea here is that we want to float *functions* to +-- the top level. This saves no work, but +-- (a) it can make the host function body a lot smaller, +-- and hence inlinable. +-- (b) it can also save allocation when the function is recursive: +-- h = \x -> letrec f = \y -> ...f...y...x... +-- in f x +-- becomes +-- f = \x y -> ...(f x)...y...x... +-- h = \x -> f x x +-- No allocation for f now. +-- We may only want to do this if there are sufficiently few free +-- variables. We certainly only want to do it for values, and not for +-- constructors. So the simple thing is just to look for lambdas +isFunction (_, AnnLam b e) | isId b = True + | otherwise = isFunction e +-- isFunction (_, AnnTick _ e) = isFunction e -- dubious +isFunction _ = False + +countFreeIds :: DVarSet -> Int +countFreeIds = nonDetFoldUDFM add 0 . getUniqDSet + -- It's OK to use nonDetFoldUDFM here because we're just counting things. + where + add :: Var -> Int -> Int + add v n | isId v = n+1 + | otherwise = n + +{- +************************************************************************ +* * +\subsection{Free-To-Level Monad} +* * +************************************************************************ +-} + +data LevelEnv + = LE { le_switches :: FloatOutSwitches + , le_ctxt_lvl :: Level -- The current level + , le_lvl_env :: VarEnv Level -- Domain is *post-cloned* TyVars and Ids + , le_join_ceil:: Level -- Highest level to which joins float + -- Invariant: always >= le_ctxt_lvl + + -- See Note [le_subst and le_env] + , le_subst :: Subst -- Domain is pre-cloned TyVars and Ids + -- The Id -> CoreExpr in the Subst is ignored + -- (since we want to substitute a LevelledExpr for + -- an Id via le_env) but we do use the Co/TyVar substs + , le_env :: IdEnv ([OutVar], LevelledExpr) -- Domain is pre-cloned Ids + } + +{- Note [le_subst and le_env] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We clone let- and case-bound variables so that they are still distinct +when floated out; hence the le_subst/le_env. (see point 3 of the +module overview comment). We also use these envs when making a +variable polymorphic because we want to float it out past a big +lambda. + +The le_subst and le_env always implement the same mapping, + in_x :-> out_x a b +where out_x is an OutVar, and a,b are its arguments (when +we perform abstraction at the same time as floating). + + le_subst maps to CoreExpr + le_env maps to LevelledExpr + +Since the range is always a variable or application, there is never +any difference between the two, but sadly the types differ. The +le_subst is used when substituting in a variable's IdInfo; the le_env +when we find a Var. + +In addition the le_env records a [OutVar] of variables free in the +OutExpr/LevelledExpr, just so we don't have to call freeVars +repeatedly. This list is always non-empty, and the first element is +out_x + +The domain of the both envs is *pre-cloned* Ids, though + +The domain of the le_lvl_env is the *post-cloned* Ids +-} + +initialEnv :: FloatOutSwitches -> LevelEnv +initialEnv float_lams + = LE { le_switches = float_lams + , le_ctxt_lvl = tOP_LEVEL + , le_join_ceil = panic "initialEnv" + , le_lvl_env = emptyVarEnv + , le_subst = emptySubst + , le_env = emptyVarEnv } + +addLvl :: Level -> VarEnv Level -> OutVar -> VarEnv Level +addLvl dest_lvl env v' = extendVarEnv env v' dest_lvl + +addLvls :: Level -> VarEnv Level -> [OutVar] -> VarEnv Level +addLvls dest_lvl env vs = foldl' (addLvl dest_lvl) env vs + +floatLams :: LevelEnv -> Maybe Int +floatLams le = floatOutLambdas (le_switches le) + +floatConsts :: LevelEnv -> Bool +floatConsts le = floatOutConstants (le_switches le) + +floatOverSat :: LevelEnv -> Bool +floatOverSat le = floatOutOverSatApps (le_switches le) + +floatTopLvlOnly :: LevelEnv -> Bool +floatTopLvlOnly le = floatToTopLevelOnly (le_switches le) + +incMinorLvlFrom :: LevelEnv -> Level +incMinorLvlFrom env = incMinorLvl (le_ctxt_lvl env) + +-- extendCaseBndrEnv adds the mapping case-bndr->scrut-var if it can +-- See Note [Binder-swap during float-out] +extendCaseBndrEnv :: LevelEnv + -> Id -- Pre-cloned case binder + -> Expr LevelledBndr -- Post-cloned scrutinee + -> LevelEnv +extendCaseBndrEnv le@(LE { le_subst = subst, le_env = id_env }) + case_bndr (Var scrut_var) + = le { le_subst = extendSubstWithVar subst case_bndr scrut_var + , le_env = add_id id_env (case_bndr, scrut_var) } +extendCaseBndrEnv env _ _ = env + +-- See Note [Join ceiling] +placeJoinCeiling :: LevelEnv -> LevelEnv +placeJoinCeiling le@(LE { le_ctxt_lvl = lvl }) + = le { le_ctxt_lvl = lvl', le_join_ceil = lvl' } + where + lvl' = asJoinCeilLvl (incMinorLvl lvl) + +maxFvLevel :: (Var -> Bool) -> LevelEnv -> DVarSet -> Level +maxFvLevel max_me env var_set + = foldDVarSet (maxIn max_me env) tOP_LEVEL var_set + +maxFvLevel' :: (Var -> Bool) -> LevelEnv -> TyCoVarSet -> Level +-- Same but for TyCoVarSet +maxFvLevel' max_me env var_set + = nonDetFoldUniqSet (maxIn max_me env) tOP_LEVEL var_set + +maxIn :: (Var -> Bool) -> LevelEnv -> InVar -> Level -> Level +maxIn max_me (LE { le_lvl_env = lvl_env, le_env = id_env }) in_var lvl + = case lookupVarEnv id_env in_var of + Just (abs_vars, _) -> foldr max_out lvl abs_vars + Nothing -> max_out in_var lvl + where + max_out out_var lvl + | max_me out_var = case lookupVarEnv lvl_env out_var of + Just lvl' -> maxLvl lvl' lvl + Nothing -> lvl + | otherwise = lvl -- Ignore some vars depending on max_me + +lookupVar :: LevelEnv -> Id -> LevelledExpr +lookupVar le v = case lookupVarEnv (le_env le) v of + Just (_, expr) -> expr + _ -> Var v + +-- Level to which join points are allowed to float (boundary of current tail +-- context). See Note [Join ceiling] +joinCeilingLevel :: LevelEnv -> Level +joinCeilingLevel = le_join_ceil + +abstractVars :: Level -> LevelEnv -> DVarSet -> [OutVar] + -- Find the variables in fvs, free vars of the target expression, + -- whose level is greater than the destination level + -- These are the ones we are going to abstract out + -- + -- Note that to get reproducible builds, the variables need to be + -- abstracted in deterministic order, not dependent on the values of + -- Uniques. This is achieved by using DVarSets, deterministic free + -- variable computation and deterministic sort. + -- See Note [Unique Determinism] in Unique for explanation of why + -- Uniques are not deterministic. +abstractVars dest_lvl (LE { le_subst = subst, le_lvl_env = lvl_env }) in_fvs + = -- NB: sortQuantVars might not put duplicates next to each other + map zap $ sortQuantVars $ + filter abstract_me $ + dVarSetElems $ + closeOverKindsDSet $ + substDVarSet subst in_fvs + -- NB: it's important to call abstract_me only on the OutIds the + -- come from substDVarSet (not on fv, which is an InId) + where + abstract_me v = case lookupVarEnv lvl_env v of + Just lvl -> dest_lvl `ltLvl` lvl + Nothing -> False + + -- We are going to lambda-abstract, so nuke any IdInfo, + -- and add the tyvars of the Id (if necessary) + zap v | isId v = WARN( isStableUnfolding (idUnfolding v) || + not (isEmptyRuleInfo (idSpecialisation v)), + text "absVarsOf: discarding info on" <+> ppr v ) + setIdInfo v vanillaIdInfo + | otherwise = v + +type LvlM result = UniqSM result + +initLvl :: UniqSupply -> UniqSM a -> a +initLvl = initUs_ + +newPolyBndrs :: Level -> LevelEnv -> [OutVar] -> [InId] + -> LvlM (LevelEnv, [OutId]) +-- The envt is extended to bind the new bndrs to dest_lvl, but +-- the le_ctxt_lvl is unaffected +newPolyBndrs dest_lvl + env@(LE { le_lvl_env = lvl_env, le_subst = subst, le_env = id_env }) + abs_vars bndrs + = ASSERT( all (not . isCoVar) bndrs ) -- What would we add to the CoSubst in this case. No easy answer. + do { uniqs <- getUniquesM + ; let new_bndrs = zipWith mk_poly_bndr bndrs uniqs + bndr_prs = bndrs `zip` new_bndrs + env' = env { le_lvl_env = addLvls dest_lvl lvl_env new_bndrs + , le_subst = foldl' add_subst subst bndr_prs + , le_env = foldl' add_id id_env bndr_prs } + ; return (env', new_bndrs) } + where + add_subst env (v, v') = extendIdSubst env v (mkVarApps (Var v') abs_vars) + add_id env (v, v') = extendVarEnv env v ((v':abs_vars), mkVarApps (Var v') abs_vars) + + mk_poly_bndr bndr uniq = transferPolyIdInfo bndr abs_vars $ -- Note [transferPolyIdInfo] in Id.hs + transfer_join_info bndr $ + mkSysLocal (mkFastString str) uniq poly_ty + where + str = "poly_" ++ occNameString (getOccName bndr) + poly_ty = mkLamTypes abs_vars (GHC.Core.Subst.substTy subst (idType bndr)) + + -- If we are floating a join point to top level, it stops being + -- a join point. Otherwise it continues to be a join point, + -- but we may need to adjust its arity + dest_is_top = isTopLvl dest_lvl + transfer_join_info bndr new_bndr + | Just join_arity <- isJoinId_maybe bndr + , not dest_is_top + = new_bndr `asJoinId` join_arity + length abs_vars + | otherwise + = new_bndr + +newLvlVar :: LevelledExpr -- The RHS of the new binding + -> Maybe JoinArity -- Its join arity, if it is a join point + -> Bool -- True <=> the RHS looks like (makeStatic ...) + -> LvlM Id +newLvlVar lvld_rhs join_arity_maybe is_mk_static + = do { uniq <- getUniqueM + ; return (add_join_info (mk_id uniq rhs_ty)) + } + where + add_join_info var = var `asJoinId_maybe` join_arity_maybe + de_tagged_rhs = deTagExpr lvld_rhs + rhs_ty = exprType de_tagged_rhs + + mk_id uniq rhs_ty + -- See Note [Grand plan for static forms] in StaticPtrTable. + | is_mk_static + = mkExportedVanillaId (mkSystemVarName uniq (mkFastString "static_ptr")) + rhs_ty + | otherwise + = mkSysLocal (mkFastString "lvl") uniq rhs_ty + +-- | Clone the binders bound by a single-alternative case. +cloneCaseBndrs :: LevelEnv -> Level -> [Var] -> LvlM (LevelEnv, [Var]) +cloneCaseBndrs env@(LE { le_subst = subst, le_lvl_env = lvl_env, le_env = id_env }) + new_lvl vs + = do { us <- getUniqueSupplyM + ; let (subst', vs') = cloneBndrs subst us vs + -- N.B. We are not moving the body of the case, merely its case + -- binders. Consequently we should *not* set le_ctxt_lvl and + -- le_join_ceil. See Note [Setting levels when floating + -- single-alternative cases]. + env' = env { le_lvl_env = addLvls new_lvl lvl_env vs' + , le_subst = subst' + , le_env = foldl' add_id id_env (vs `zip` vs') } + + ; return (env', vs') } + +cloneLetVars :: RecFlag -> LevelEnv -> Level -> [InVar] + -> LvlM (LevelEnv, [OutVar]) +-- See Note [Need for cloning during float-out] +-- Works for Ids bound by let(rec) +-- The dest_lvl is attributed to the binders in the new env, +-- but cloneVars doesn't affect the le_ctxt_lvl of the incoming env +cloneLetVars is_rec + env@(LE { le_subst = subst, le_lvl_env = lvl_env, le_env = id_env }) + dest_lvl vs + = do { us <- getUniqueSupplyM + ; let vs1 = map zap vs + -- See Note [Zapping the demand info] + (subst', vs2) = case is_rec of + NonRecursive -> cloneBndrs subst us vs1 + Recursive -> cloneRecIdBndrs subst us vs1 + prs = vs `zip` vs2 + env' = env { le_lvl_env = addLvls dest_lvl lvl_env vs2 + , le_subst = subst' + , le_env = foldl' add_id id_env prs } + + ; return (env', vs2) } + where + zap :: Var -> Var + zap v | isId v = zap_join (zapIdDemandInfo v) + | otherwise = v + + zap_join | isTopLvl dest_lvl = zapJoinId + | otherwise = id + +add_id :: IdEnv ([Var], LevelledExpr) -> (Var, Var) -> IdEnv ([Var], LevelledExpr) +add_id id_env (v, v1) + | isTyVar v = delVarEnv id_env v + | otherwise = extendVarEnv id_env v ([v1], ASSERT(not (isCoVar v1)) Var v1) + +{- +Note [Zapping the demand info] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +VERY IMPORTANT: we must zap the demand info if the thing is going to +float out, because it may be less demanded than at its original +binding site. Eg + f :: Int -> Int + f x = let v = 3*4 in v+x +Here v is strict; but if we float v to top level, it isn't any more. + +Similarly, if we're floating a join point, it won't be one anymore, so we zap +join point information as well. +-} diff --git a/compiler/GHC/Core/Op/Simplify.hs b/compiler/GHC/Core/Op/Simplify.hs new file mode 100644 index 0000000000..448edd21f6 --- /dev/null +++ b/compiler/GHC/Core/Op/Simplify.hs @@ -0,0 +1,3666 @@ +{- +(c) The AQUA Project, Glasgow University, 1993-1998 + +\section[Simplify]{The main module of the simplifier} +-} + +{-# LANGUAGE CPP #-} + +{-# OPTIONS_GHC -Wno-incomplete-record-updates #-} +module GHC.Core.Op.Simplify ( simplTopBinds, simplExpr, simplRules ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Driver.Session +import GHC.Core.Op.Simplify.Monad +import GHC.Core.Type hiding ( substTy, substTyVar, extendTvSubst, extendCvSubst ) +import GHC.Core.Op.Simplify.Env +import GHC.Core.Op.Simplify.Utils +import GHC.Core.Op.OccurAnal ( occurAnalyseExpr ) +import GHC.Core.FamInstEnv ( FamInstEnv ) +import Literal ( litIsLifted ) --, mkLitInt ) -- temporalily commented out. See #8326 +import Id +import MkId ( seqId ) +import GHC.Core.Make ( FloatBind, mkImpossibleExpr, castBottomExpr ) +import qualified GHC.Core.Make +import IdInfo +import Name ( mkSystemVarName, isExternalName, getOccFS ) +import GHC.Core.Coercion hiding ( substCo, substCoVar ) +import GHC.Core.Coercion.Opt ( optCoercion ) +import GHC.Core.FamInstEnv ( topNormaliseType_maybe ) +import GHC.Core.DataCon + ( DataCon, dataConWorkId, dataConRepStrictness + , dataConRepArgTys, isUnboxedTupleCon + , StrictnessMark (..) ) +import GHC.Core.Op.Monad ( Tick(..), SimplMode(..) ) +import GHC.Core +import Demand ( StrictSig(..), dmdTypeDepth, isStrictDmd + , mkClosedStrictSig, topDmd, botDiv ) +import Cpr ( mkCprSig, botCpr ) +import GHC.Core.Ppr ( pprCoreExpr ) +import GHC.Core.Unfold +import GHC.Core.Utils +import GHC.Core.SimpleOpt ( pushCoTyArg, pushCoValArg + , joinPointBinding_maybe, joinPointBindings_maybe ) +import GHC.Core.Rules ( mkRuleInfo, lookupRule, getRules ) +import BasicTypes ( TopLevelFlag(..), isNotTopLevel, isTopLevel, + RecFlag(..), Arity ) +import MonadUtils ( mapAccumLM, liftIO ) +import Var ( isTyCoVar ) +import Maybes ( orElse ) +import Control.Monad +import Outputable +import FastString +import Util +import ErrUtils +import Module ( moduleName, pprModuleName ) +import PrimOp ( PrimOp (SeqOp) ) + + +{- +The guts of the simplifier is in this module, but the driver loop for +the simplifier is in GHC.Core.Op.Simplify.Driver + +Note [The big picture] +~~~~~~~~~~~~~~~~~~~~~~ +The general shape of the simplifier is this: + + simplExpr :: SimplEnv -> InExpr -> SimplCont -> SimplM (SimplFloats, OutExpr) + simplBind :: SimplEnv -> InBind -> SimplM (SimplFloats, SimplEnv) + + * SimplEnv contains + - Simplifier mode (which includes DynFlags for convenience) + - Ambient substitution + - InScopeSet + + * SimplFloats contains + - Let-floats (which includes ok-for-spec case-floats) + - Join floats + - InScopeSet (including all the floats) + + * Expressions + simplExpr :: SimplEnv -> InExpr -> SimplCont + -> SimplM (SimplFloats, OutExpr) + The result of simplifying an /expression/ is (floats, expr) + - A bunch of floats (let bindings, join bindings) + - A simplified expression. + The overall result is effectively (let floats in expr) + + * Bindings + simplBind :: SimplEnv -> InBind -> SimplM (SimplFloats, SimplEnv) + The result of simplifying a binding is + - A bunch of floats, the last of which is the simplified binding + There may be auxiliary bindings too; see prepareRhs + - An environment suitable for simplifying the scope of the binding + + The floats may also be empty, if the binding is inlined unconditionally; + in that case the returned SimplEnv will have an augmented substitution. + + The returned floats and env both have an in-scope set, and they are + guaranteed to be the same. + + +Note [Shadowing] +~~~~~~~~~~~~~~~~ +The simplifier used to guarantee that the output had no shadowing, but +it does not do so any more. (Actually, it never did!) The reason is +documented with simplifyArgs. + + +Eta expansion +~~~~~~~~~~~~~~ +For eta expansion, we want to catch things like + + case e of (a,b) -> \x -> case a of (p,q) -> \y -> r + +If the \x was on the RHS of a let, we'd eta expand to bring the two +lambdas together. And in general that's a good thing to do. Perhaps +we should eta expand wherever we find a (value) lambda? Then the eta +expansion at a let RHS can concentrate solely on the PAP case. + +Note [In-scope set as a substitution] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +As per Note [Lookups in in-scope set], an in-scope set can act as +a substitution. Specifically, it acts as a substitution from variable to +variables /with the same unique/. + +Why do we need this? Well, during the course of the simplifier, we may want to +adjust inessential properties of a variable. For instance, when performing a +beta-reduction, we change + + (\x. e) u ==> let x = u in e + +We typically want to add an unfolding to `x` so that it inlines to (the +simplification of) `u`. + +We do that by adding the unfolding to the binder `x`, which is added to the +in-scope set. When simplifying occurrences of `x` (every occurrence!), they are +replaced by their “updated” version from the in-scope set, hence inherit the +unfolding. This happens in `SimplEnv.substId`. + +Another example. Consider + + case x of y { Node a b -> ...y... + ; Leaf v -> ...y... } + +In the Node branch want y's unfolding to be (Node a b); in the Leaf branch we +want y's unfolding to be (Leaf v). We achieve this by adding the appropriate +unfolding to y, and re-adding it to the in-scope set. See the calls to +`addBinderUnfolding` in `Simplify.addAltUnfoldings` and elsewhere. + +It's quite convenient. This way we don't need to manipulate the substitution all +the time: every update to a binder is automatically reflected to its bound +occurrences. + +************************************************************************ +* * +\subsection{Bindings} +* * +************************************************************************ +-} + +simplTopBinds :: SimplEnv -> [InBind] -> SimplM (SimplFloats, SimplEnv) +-- See Note [The big picture] +simplTopBinds env0 binds0 + = do { -- Put all the top-level binders into scope at the start + -- so that if a transformation rule has unexpectedly brought + -- anything into scope, then we don't get a complaint about that. + -- It's rather as if the top-level binders were imported. + -- See note [Glomming] in OccurAnal. + ; env1 <- {-#SCC "simplTopBinds-simplRecBndrs" #-} simplRecBndrs env0 (bindersOfBinds binds0) + ; (floats, env2) <- {-#SCC "simplTopBinds-simpl_binds" #-} simpl_binds env1 binds0 + ; freeTick SimplifierDone + ; return (floats, env2) } + where + -- We need to track the zapped top-level binders, because + -- they should have their fragile IdInfo zapped (notably occurrence info) + -- That's why we run down binds and bndrs' simultaneously. + -- + simpl_binds :: SimplEnv -> [InBind] -> SimplM (SimplFloats, SimplEnv) + simpl_binds env [] = return (emptyFloats env, env) + simpl_binds env (bind:binds) = do { (float, env1) <- simpl_bind env bind + ; (floats, env2) <- simpl_binds env1 binds + ; return (float `addFloats` floats, env2) } + + simpl_bind env (Rec pairs) + = simplRecBind env TopLevel Nothing pairs + simpl_bind env (NonRec b r) + = do { (env', b') <- addBndrRules env b (lookupRecBndr env b) Nothing + ; simplRecOrTopPair env' TopLevel NonRecursive Nothing b b' r } + +{- +************************************************************************ +* * + Lazy bindings +* * +************************************************************************ + +simplRecBind is used for + * recursive bindings only +-} + +simplRecBind :: SimplEnv -> TopLevelFlag -> MaybeJoinCont + -> [(InId, InExpr)] + -> SimplM (SimplFloats, SimplEnv) +simplRecBind env0 top_lvl mb_cont pairs0 + = do { (env_with_info, triples) <- mapAccumLM add_rules env0 pairs0 + ; (rec_floats, env1) <- go env_with_info triples + ; return (mkRecFloats rec_floats, env1) } + where + add_rules :: SimplEnv -> (InBndr,InExpr) -> SimplM (SimplEnv, (InBndr, OutBndr, InExpr)) + -- Add the (substituted) rules to the binder + add_rules env (bndr, rhs) + = do { (env', bndr') <- addBndrRules env bndr (lookupRecBndr env bndr) mb_cont + ; return (env', (bndr, bndr', rhs)) } + + go env [] = return (emptyFloats env, env) + + go env ((old_bndr, new_bndr, rhs) : pairs) + = do { (float, env1) <- simplRecOrTopPair env top_lvl Recursive mb_cont + old_bndr new_bndr rhs + ; (floats, env2) <- go env1 pairs + ; return (float `addFloats` floats, env2) } + +{- +simplOrTopPair is used for + * recursive bindings (whether top level or not) + * top-level non-recursive bindings + +It assumes the binder has already been simplified, but not its IdInfo. +-} + +simplRecOrTopPair :: SimplEnv + -> TopLevelFlag -> RecFlag -> MaybeJoinCont + -> InId -> OutBndr -> InExpr -- Binder and rhs + -> SimplM (SimplFloats, SimplEnv) + +simplRecOrTopPair env top_lvl is_rec mb_cont old_bndr new_bndr rhs + | Just env' <- preInlineUnconditionally env top_lvl old_bndr rhs env + = {-#SCC "simplRecOrTopPair-pre-inline-uncond" #-} + trace_bind "pre-inline-uncond" $ + do { tick (PreInlineUnconditionally old_bndr) + ; return ( emptyFloats env, env' ) } + + | Just cont <- mb_cont + = {-#SCC "simplRecOrTopPair-join" #-} + ASSERT( isNotTopLevel top_lvl && isJoinId new_bndr ) + trace_bind "join" $ + simplJoinBind env cont old_bndr new_bndr rhs env + + | otherwise + = {-#SCC "simplRecOrTopPair-normal" #-} + trace_bind "normal" $ + simplLazyBind env top_lvl is_rec old_bndr new_bndr rhs env + + where + dflags = seDynFlags env + + -- trace_bind emits a trace for each top-level binding, which + -- helps to locate the tracing for inlining and rule firing + trace_bind what thing_inside + | not (dopt Opt_D_verbose_core2core dflags) + = thing_inside + | otherwise + = traceAction dflags ("SimplBind " ++ what) + (ppr old_bndr) thing_inside + +-------------------------- +simplLazyBind :: SimplEnv + -> TopLevelFlag -> RecFlag + -> InId -> OutId -- Binder, both pre-and post simpl + -- Not a JoinId + -- The OutId has IdInfo, except arity, unfolding + -- Ids only, no TyVars + -> InExpr -> SimplEnv -- The RHS and its environment + -> SimplM (SimplFloats, SimplEnv) +-- Precondition: not a JoinId +-- Precondition: rhs obeys the let/app invariant +-- NOT used for JoinIds +simplLazyBind env top_lvl is_rec bndr bndr1 rhs rhs_se + = ASSERT( isId bndr ) + ASSERT2( not (isJoinId bndr), ppr bndr ) + -- pprTrace "simplLazyBind" ((ppr bndr <+> ppr bndr1) $$ ppr rhs $$ ppr (seIdSubst rhs_se)) $ + do { let rhs_env = rhs_se `setInScopeFromE` env + (tvs, body) = case collectTyAndValBinders rhs of + (tvs, [], body) + | surely_not_lam body -> (tvs, body) + _ -> ([], rhs) + + surely_not_lam (Lam {}) = False + surely_not_lam (Tick t e) + | not (tickishFloatable t) = surely_not_lam e + -- eta-reduction could float + surely_not_lam _ = True + -- Do not do the "abstract tyvar" thing if there's + -- a lambda inside, because it defeats eta-reduction + -- f = /\a. \x. g a x + -- should eta-reduce. + + + ; (body_env, tvs') <- {-#SCC "simplBinders" #-} simplBinders rhs_env tvs + -- See Note [Floating and type abstraction] in GHC.Core.Op.Simplify.Utils + + -- Simplify the RHS + ; let rhs_cont = mkRhsStop (substTy body_env (exprType body)) + ; (body_floats0, body0) <- {-#SCC "simplExprF" #-} simplExprF body_env body rhs_cont + + -- Never float join-floats out of a non-join let-binding + -- So wrap the body in the join-floats right now + -- Hence: body_floats1 consists only of let-floats + ; let (body_floats1, body1) = wrapJoinFloatsX body_floats0 body0 + + -- ANF-ise a constructor or PAP rhs + -- We get at most one float per argument here + ; (let_floats, body2) <- {-#SCC "prepareRhs" #-} prepareRhs (getMode env) top_lvl + (getOccFS bndr1) (idInfo bndr1) body1 + ; let body_floats2 = body_floats1 `addLetFloats` let_floats + + ; (rhs_floats, rhs') + <- if not (doFloatFromRhs top_lvl is_rec False body_floats2 body2) + then -- No floating, revert to body1 + {-#SCC "simplLazyBind-no-floating" #-} + do { rhs' <- mkLam env tvs' (wrapFloats body_floats2 body1) rhs_cont + ; return (emptyFloats env, rhs') } + + else if null tvs then -- Simple floating + {-#SCC "simplLazyBind-simple-floating" #-} + do { tick LetFloatFromLet + ; return (body_floats2, body2) } + + else -- Do type-abstraction first + {-#SCC "simplLazyBind-type-abstraction-first" #-} + do { tick LetFloatFromLet + ; (poly_binds, body3) <- abstractFloats (seDynFlags env) top_lvl + tvs' body_floats2 body2 + ; let floats = foldl' extendFloats (emptyFloats env) poly_binds + ; rhs' <- mkLam env tvs' body3 rhs_cont + ; return (floats, rhs') } + + ; (bind_float, env2) <- completeBind (env `setInScopeFromF` rhs_floats) + top_lvl Nothing bndr bndr1 rhs' + ; return (rhs_floats `addFloats` bind_float, env2) } + +-------------------------- +simplJoinBind :: SimplEnv + -> SimplCont + -> InId -> OutId -- Binder, both pre-and post simpl + -- The OutId has IdInfo, except arity, + -- unfolding + -> InExpr -> SimplEnv -- The right hand side and its env + -> SimplM (SimplFloats, SimplEnv) +simplJoinBind env cont old_bndr new_bndr rhs rhs_se + = do { let rhs_env = rhs_se `setInScopeFromE` env + ; rhs' <- simplJoinRhs rhs_env old_bndr rhs cont + ; completeBind env NotTopLevel (Just cont) old_bndr new_bndr rhs' } + +-------------------------- +simplNonRecX :: SimplEnv + -> InId -- Old binder; not a JoinId + -> OutExpr -- Simplified RHS + -> SimplM (SimplFloats, SimplEnv) +-- A specialised variant of simplNonRec used when the RHS is already +-- simplified, notably in knownCon. It uses case-binding where necessary. +-- +-- Precondition: rhs satisfies the let/app invariant + +simplNonRecX env bndr new_rhs + | ASSERT2( not (isJoinId bndr), ppr bndr ) + isDeadBinder bndr -- Not uncommon; e.g. case (a,b) of c { (p,q) -> p } + = return (emptyFloats env, env) -- Here c is dead, and we avoid + -- creating the binding c = (a,b) + + | Coercion co <- new_rhs + = return (emptyFloats env, extendCvSubst env bndr co) + + | otherwise + = do { (env', bndr') <- simplBinder env bndr + ; completeNonRecX NotTopLevel env' (isStrictId bndr) bndr bndr' new_rhs } + -- simplNonRecX is only used for NotTopLevel things + +-------------------------- +completeNonRecX :: TopLevelFlag -> SimplEnv + -> Bool + -> InId -- Old binder; not a JoinId + -> OutId -- New binder + -> OutExpr -- Simplified RHS + -> SimplM (SimplFloats, SimplEnv) -- The new binding is in the floats +-- Precondition: rhs satisfies the let/app invariant +-- See Note [Core let/app invariant] in GHC.Core + +completeNonRecX top_lvl env is_strict old_bndr new_bndr new_rhs + = ASSERT2( not (isJoinId new_bndr), ppr new_bndr ) + do { (prepd_floats, rhs1) <- prepareRhs (getMode env) top_lvl (getOccFS new_bndr) + (idInfo new_bndr) new_rhs + ; let floats = emptyFloats env `addLetFloats` prepd_floats + ; (rhs_floats, rhs2) <- + if doFloatFromRhs NotTopLevel NonRecursive is_strict floats rhs1 + then -- Add the floats to the main env + do { tick LetFloatFromLet + ; return (floats, rhs1) } + else -- Do not float; wrap the floats around the RHS + return (emptyFloats env, wrapFloats floats rhs1) + + ; (bind_float, env2) <- completeBind (env `setInScopeFromF` rhs_floats) + NotTopLevel Nothing + old_bndr new_bndr rhs2 + ; return (rhs_floats `addFloats` bind_float, env2) } + + +{- ********************************************************************* +* * + prepareRhs, makeTrivial +* * +************************************************************************ + +Note [prepareRhs] +~~~~~~~~~~~~~~~~~ +prepareRhs takes a putative RHS, checks whether it's a PAP or +constructor application and, if so, converts it to ANF, so that the +resulting thing can be inlined more easily. Thus + x = (f a, g b) +becomes + t1 = f a + t2 = g b + x = (t1,t2) + +We also want to deal well cases like this + v = (f e1 `cast` co) e2 +Here we want to make e1,e2 trivial and get + x1 = e1; x2 = e2; v = (f x1 `cast` co) v2 +That's what the 'go' loop in prepareRhs does +-} + +prepareRhs :: SimplMode -> TopLevelFlag + -> FastString -- Base for any new variables + -> IdInfo -- IdInfo for the LHS of this binding + -> OutExpr + -> SimplM (LetFloats, OutExpr) +-- Transforms a RHS into a better RHS by adding floats +-- e.g x = Just e +-- becomes a = e +-- x = Just a +-- See Note [prepareRhs] +prepareRhs mode top_lvl occ info (Cast rhs co) -- Note [Float coercions] + | let ty1 = coercionLKind co -- Do *not* do this if rhs has an unlifted type + , not (isUnliftedType ty1) -- see Note [Float coercions (unlifted)] + = do { (floats, rhs') <- makeTrivialWithInfo mode top_lvl occ sanitised_info rhs + ; return (floats, Cast rhs' co) } + where + sanitised_info = vanillaIdInfo `setStrictnessInfo` strictnessInfo info + `setCprInfo` cprInfo info + `setDemandInfo` demandInfo info + +prepareRhs mode top_lvl occ _ rhs0 + = do { (_is_exp, floats, rhs1) <- go 0 rhs0 + ; return (floats, rhs1) } + where + go :: Int -> OutExpr -> SimplM (Bool, LetFloats, OutExpr) + go n_val_args (Cast rhs co) + = do { (is_exp, floats, rhs') <- go n_val_args rhs + ; return (is_exp, floats, Cast rhs' co) } + go n_val_args (App fun (Type ty)) + = do { (is_exp, floats, rhs') <- go n_val_args fun + ; return (is_exp, floats, App rhs' (Type ty)) } + go n_val_args (App fun arg) + = do { (is_exp, floats1, fun') <- go (n_val_args+1) fun + ; case is_exp of + False -> return (False, emptyLetFloats, App fun arg) + True -> do { (floats2, arg') <- makeTrivial mode top_lvl occ arg + ; return (True, floats1 `addLetFlts` floats2, App fun' arg') } } + go n_val_args (Var fun) + = return (is_exp, emptyLetFloats, Var fun) + where + is_exp = isExpandableApp fun n_val_args -- The fun a constructor or PAP + -- See Note [CONLIKE pragma] in BasicTypes + -- The definition of is_exp should match that in + -- OccurAnal.occAnalApp + + go n_val_args (Tick t rhs) + -- We want to be able to float bindings past this + -- tick. Non-scoping ticks don't care. + | tickishScoped t == NoScope + = do { (is_exp, floats, rhs') <- go n_val_args rhs + ; return (is_exp, floats, Tick t rhs') } + + -- On the other hand, for scoping ticks we need to be able to + -- copy them on the floats, which in turn is only allowed if + -- we can obtain non-counting ticks. + | (not (tickishCounts t) || tickishCanSplit t) + = do { (is_exp, floats, rhs') <- go n_val_args rhs + ; let tickIt (id, expr) = (id, mkTick (mkNoCount t) expr) + floats' = mapLetFloats floats tickIt + ; return (is_exp, floats', Tick t rhs') } + + go _ other + = return (False, emptyLetFloats, other) + +{- +Note [Float coercions] +~~~~~~~~~~~~~~~~~~~~~~ +When we find the binding + x = e `cast` co +we'd like to transform it to + x' = e + x = x `cast` co -- A trivial binding +There's a chance that e will be a constructor application or function, or something +like that, so moving the coercion to the usage site may well cancel the coercions +and lead to further optimisation. Example: + + data family T a :: * + data instance T Int = T Int + + foo :: Int -> Int -> Int + foo m n = ... + where + x = T m + go 0 = 0 + go n = case x of { T m -> go (n-m) } + -- This case should optimise + +Note [Preserve strictness when floating coercions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In the Note [Float coercions] transformation, keep the strictness info. +Eg + f = e `cast` co -- f has strictness SSL +When we transform to + f' = e -- f' also has strictness SSL + f = f' `cast` co -- f still has strictness SSL + +Its not wrong to drop it on the floor, but better to keep it. + +Note [Float coercions (unlifted)] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +BUT don't do [Float coercions] if 'e' has an unlifted type. +This *can* happen: + + foo :: Int = (error (# Int,Int #) "urk") + `cast` CoUnsafe (# Int,Int #) Int + +If do the makeTrivial thing to the error call, we'll get + foo = case error (# Int,Int #) "urk" of v -> v `cast` ... +But 'v' isn't in scope! + +These strange casts can happen as a result of case-of-case + bar = case (case x of { T -> (# 2,3 #); F -> error "urk" }) of + (# p,q #) -> p+q +-} + +makeTrivialArg :: SimplMode -> ArgSpec -> SimplM (LetFloats, ArgSpec) +makeTrivialArg mode (ValArg e) + = do { (floats, e') <- makeTrivial mode NotTopLevel (fsLit "arg") e + ; return (floats, ValArg e') } +makeTrivialArg _ arg + = return (emptyLetFloats, arg) -- CastBy, TyArg + +makeTrivial :: SimplMode -> TopLevelFlag + -> FastString -- ^ A "friendly name" to build the new binder from + -> OutExpr -- ^ This expression satisfies the let/app invariant + -> SimplM (LetFloats, OutExpr) +-- Binds the expression to a variable, if it's not trivial, returning the variable +makeTrivial mode top_lvl context expr + = makeTrivialWithInfo mode top_lvl context vanillaIdInfo expr + +makeTrivialWithInfo :: SimplMode -> TopLevelFlag + -> FastString -- ^ a "friendly name" to build the new binder from + -> IdInfo + -> OutExpr -- ^ This expression satisfies the let/app invariant + -> SimplM (LetFloats, OutExpr) +-- Propagate strictness and demand info to the new binder +-- Note [Preserve strictness when floating coercions] +-- Returned SimplEnv has same substitution as incoming one +makeTrivialWithInfo mode top_lvl occ_fs info expr + | exprIsTrivial expr -- Already trivial + || not (bindingOk top_lvl expr expr_ty) -- Cannot trivialise + -- See Note [Cannot trivialise] + = return (emptyLetFloats, expr) + + | otherwise + = do { (floats, expr1) <- prepareRhs mode top_lvl occ_fs info expr + ; if exprIsTrivial expr1 -- See Note [Trivial after prepareRhs] + then return (floats, expr1) + else do + { uniq <- getUniqueM + ; let name = mkSystemVarName uniq occ_fs + var = mkLocalIdWithInfo name expr_ty info + + -- Now something very like completeBind, + -- but without the postInlineUnconditionally part + ; (arity, is_bot, expr2) <- tryEtaExpandRhs mode var expr1 + ; unf <- mkLetUnfolding (sm_dflags mode) top_lvl InlineRhs var expr2 + + ; let final_id = addLetBndrInfo var arity is_bot unf + bind = NonRec final_id expr2 + + ; return ( floats `addLetFlts` unitLetFloat bind, Var final_id ) }} + where + expr_ty = exprType expr + +bindingOk :: TopLevelFlag -> CoreExpr -> Type -> Bool +-- True iff we can have a binding of this expression at this level +-- Precondition: the type is the type of the expression +bindingOk top_lvl expr expr_ty + | isTopLevel top_lvl = exprIsTopLevelBindable expr expr_ty + | otherwise = True + +{- Note [Trivial after prepareRhs] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we call makeTrival on (e |> co), the recursive use of prepareRhs +may leave us with + { a1 = e } and (a1 |> co) +Now the latter is trivial, so we don't want to let-bind it. + +Note [Cannot trivialise] +~~~~~~~~~~~~~~~~~~~~~~~~ +Consider: + f :: Int -> Addr# + + foo :: Bar + foo = Bar (f 3) + +Then we can't ANF-ise foo, even though we'd like to, because +we can't make a top-level binding for the Addr# (f 3). And if +so we don't want to turn it into + foo = let x = f 3 in Bar x +because we'll just end up inlining x back, and that makes the +simplifier loop. Better not to ANF-ise it at all. + +Literal strings are an exception. + + foo = Ptr "blob"# + +We want to turn this into: + + foo1 = "blob"# + foo = Ptr foo1 + +See Note [Core top-level string literals] in GHC.Core. + +************************************************************************ +* * + Completing a lazy binding +* * +************************************************************************ + +completeBind + * deals only with Ids, not TyVars + * takes an already-simplified binder and RHS + * is used for both recursive and non-recursive bindings + * is used for both top-level and non-top-level bindings + +It does the following: + - tries discarding a dead binding + - tries PostInlineUnconditionally + - add unfolding [this is the only place we add an unfolding] + - add arity + +It does *not* attempt to do let-to-case. Why? Because it is used for + - top-level bindings (when let-to-case is impossible) + - many situations where the "rhs" is known to be a WHNF + (so let-to-case is inappropriate). + +Nor does it do the atomic-argument thing +-} + +completeBind :: SimplEnv + -> TopLevelFlag -- Flag stuck into unfolding + -> MaybeJoinCont -- Required only for join point + -> InId -- Old binder + -> OutId -> OutExpr -- New binder and RHS + -> SimplM (SimplFloats, SimplEnv) +-- completeBind may choose to do its work +-- * by extending the substitution (e.g. let x = y in ...) +-- * or by adding to the floats in the envt +-- +-- Binder /can/ be a JoinId +-- Precondition: rhs obeys the let/app invariant +completeBind env top_lvl mb_cont old_bndr new_bndr new_rhs + | isCoVar old_bndr + = case new_rhs of + Coercion co -> return (emptyFloats env, extendCvSubst env old_bndr co) + _ -> return (mkFloatBind env (NonRec new_bndr new_rhs)) + + | otherwise + = ASSERT( isId new_bndr ) + do { let old_info = idInfo old_bndr + old_unf = unfoldingInfo old_info + occ_info = occInfo old_info + + -- Do eta-expansion on the RHS of the binding + -- See Note [Eta-expanding at let bindings] in GHC.Core.Op.Simplify.Utils + ; (new_arity, is_bot, final_rhs) <- tryEtaExpandRhs (getMode env) + new_bndr new_rhs + + -- Simplify the unfolding + ; new_unfolding <- simplLetUnfolding env top_lvl mb_cont old_bndr + final_rhs (idType new_bndr) old_unf + + ; let final_bndr = addLetBndrInfo new_bndr new_arity is_bot new_unfolding + -- See Note [In-scope set as a substitution] + + ; if postInlineUnconditionally env top_lvl final_bndr occ_info final_rhs + + then -- Inline and discard the binding + do { tick (PostInlineUnconditionally old_bndr) + ; return ( emptyFloats env + , extendIdSubst env old_bndr $ + DoneEx final_rhs (isJoinId_maybe new_bndr)) } + -- Use the substitution to make quite, quite sure that the + -- substitution will happen, since we are going to discard the binding + + else -- Keep the binding + -- pprTrace "Binding" (ppr final_bndr <+> ppr new_unfolding) $ + return (mkFloatBind env (NonRec final_bndr final_rhs)) } + +addLetBndrInfo :: OutId -> Arity -> Bool -> Unfolding -> OutId +addLetBndrInfo new_bndr new_arity is_bot new_unf + = new_bndr `setIdInfo` info5 + where + info1 = idInfo new_bndr `setArityInfo` new_arity + + -- Unfolding info: Note [Setting the new unfolding] + info2 = info1 `setUnfoldingInfo` new_unf + + -- Demand info: Note [Setting the demand info] + -- We also have to nuke demand info if for some reason + -- eta-expansion *reduces* the arity of the binding to less + -- than that of the strictness sig. This can happen: see Note [Arity decrease]. + info3 | isEvaldUnfolding new_unf + || (case strictnessInfo info2 of + StrictSig dmd_ty -> new_arity < dmdTypeDepth dmd_ty) + = zapDemandInfo info2 `orElse` info2 + | otherwise + = info2 + + -- Bottoming bindings: see Note [Bottoming bindings] + info4 | is_bot = info3 + `setStrictnessInfo` + mkClosedStrictSig (replicate new_arity topDmd) botDiv + `setCprInfo` mkCprSig new_arity botCpr + | otherwise = info3 + + -- Zap call arity info. We have used it by now (via + -- `tryEtaExpandRhs`), and the simplifier can invalidate this + -- information, leading to broken code later (e.g. #13479) + info5 = zapCallArityInfo info4 + + +{- Note [Arity decrease] +~~~~~~~~~~~~~~~~~~~~~~~~ +Generally speaking the arity of a binding should not decrease. But it *can* +legitimately happen because of RULES. Eg + f = g Int +where g has arity 2, will have arity 2. But if there's a rewrite rule + g Int --> h +where h has arity 1, then f's arity will decrease. Here's a real-life example, +which is in the output of Specialise: + + Rec { + $dm {Arity 2} = \d.\x. op d + {-# RULES forall d. $dm Int d = $s$dm #-} + + dInt = MkD .... opInt ... + opInt {Arity 1} = $dm dInt + + $s$dm {Arity 0} = \x. op dInt } + +Here opInt has arity 1; but when we apply the rule its arity drops to 0. +That's why Specialise goes to a little trouble to pin the right arity +on specialised functions too. + +Note [Bottoming bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose we have + let x = error "urk" + in ...(case x of <alts>)... +or + let f = \x. error (x ++ "urk") + in ...(case f "foo" of <alts>)... + +Then we'd like to drop the dead <alts> immediately. So it's good to +propagate the info that x's RHS is bottom to x's IdInfo as rapidly as +possible. + +We use tryEtaExpandRhs on every binding, and it turns ou that the +arity computation it performs (via GHC.Core.Arity.findRhsArity) already +does a simple bottoming-expression analysis. So all we need to do +is propagate that info to the binder's IdInfo. + +This showed up in #12150; see comment:16. + +Note [Setting the demand info] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If the unfolding is a value, the demand info may +go pear-shaped, so we nuke it. Example: + let x = (a,b) in + case x of (p,q) -> h p q x +Here x is certainly demanded. But after we've nuked +the case, we'll get just + let x = (a,b) in h a b x +and now x is not demanded (I'm assuming h is lazy) +This really happens. Similarly + let f = \x -> e in ...f..f... +After inlining f at some of its call sites the original binding may +(for example) be no longer strictly demanded. +The solution here is a bit ad hoc... + + +************************************************************************ +* * +\subsection[Simplify-simplExpr]{The main function: simplExpr} +* * +************************************************************************ + +The reason for this OutExprStuff stuff is that we want to float *after* +simplifying a RHS, not before. If we do so naively we get quadratic +behaviour as things float out. + +To see why it's important to do it after, consider this (real) example: + + let t = f x + in fst t +==> + let t = let a = e1 + b = e2 + in (a,b) + in fst t +==> + let a = e1 + b = e2 + t = (a,b) + in + a -- Can't inline a this round, cos it appears twice +==> + e1 + +Each of the ==> steps is a round of simplification. We'd save a +whole round if we float first. This can cascade. Consider + + let f = g d + in \x -> ...f... +==> + let f = let d1 = ..d.. in \y -> e + in \x -> ...f... +==> + let d1 = ..d.. + in \x -> ...(\y ->e)... + +Only in this second round can the \y be applied, and it +might do the same again. +-} + +simplExpr :: SimplEnv -> CoreExpr -> SimplM CoreExpr +simplExpr env (Type ty) + = do { ty' <- simplType env ty -- See Note [Avoiding space leaks in OutType] + ; return (Type ty') } + +simplExpr env expr + = simplExprC env expr (mkBoringStop expr_out_ty) + where + expr_out_ty :: OutType + expr_out_ty = substTy env (exprType expr) + -- NB: Since 'expr' is term-valued, not (Type ty), this call + -- to exprType will succeed. exprType fails on (Type ty). + +simplExprC :: SimplEnv + -> InExpr -- A term-valued expression, never (Type ty) + -> SimplCont + -> SimplM OutExpr + -- Simplify an expression, given a continuation +simplExprC env expr cont + = -- pprTrace "simplExprC" (ppr expr $$ ppr cont {- $$ ppr (seIdSubst env) -} $$ ppr (seLetFloats env) ) $ + do { (floats, expr') <- simplExprF env expr cont + ; -- pprTrace "simplExprC ret" (ppr expr $$ ppr expr') $ + -- pprTrace "simplExprC ret3" (ppr (seInScope env')) $ + -- pprTrace "simplExprC ret4" (ppr (seLetFloats env')) $ + return (wrapFloats floats expr') } + +-------------------------------------------------- +simplExprF :: SimplEnv + -> InExpr -- A term-valued expression, never (Type ty) + -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +simplExprF env e cont + = {- pprTrace "simplExprF" (vcat + [ ppr e + , text "cont =" <+> ppr cont + , text "inscope =" <+> ppr (seInScope env) + , text "tvsubst =" <+> ppr (seTvSubst env) + , text "idsubst =" <+> ppr (seIdSubst env) + , text "cvsubst =" <+> ppr (seCvSubst env) + ]) $ -} + simplExprF1 env e cont + +simplExprF1 :: SimplEnv -> InExpr -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +simplExprF1 _ (Type ty) _ + = pprPanic "simplExprF: type" (ppr ty) + -- simplExprF does only with term-valued expressions + -- The (Type ty) case is handled separately by simplExpr + -- and by the other callers of simplExprF + +simplExprF1 env (Var v) cont = {-#SCC "simplIdF" #-} simplIdF env v cont +simplExprF1 env (Lit lit) cont = {-#SCC "rebuild" #-} rebuild env (Lit lit) cont +simplExprF1 env (Tick t expr) cont = {-#SCC "simplTick" #-} simplTick env t expr cont +simplExprF1 env (Cast body co) cont = {-#SCC "simplCast" #-} simplCast env body co cont +simplExprF1 env (Coercion co) cont = {-#SCC "simplCoercionF" #-} simplCoercionF env co cont + +simplExprF1 env (App fun arg) cont + = {-#SCC "simplExprF1-App" #-} case arg of + Type ty -> do { -- The argument type will (almost) certainly be used + -- in the output program, so just force it now. + -- See Note [Avoiding space leaks in OutType] + arg' <- simplType env ty + + -- But use substTy, not simplType, to avoid forcing + -- the hole type; it will likely not be needed. + -- See Note [The hole type in ApplyToTy] + ; let hole' = substTy env (exprType fun) + + ; simplExprF env fun $ + ApplyToTy { sc_arg_ty = arg' + , sc_hole_ty = hole' + , sc_cont = cont } } + _ -> simplExprF env fun $ + ApplyToVal { sc_arg = arg, sc_env = env + , sc_dup = NoDup, sc_cont = cont } + +simplExprF1 env expr@(Lam {}) cont + = {-#SCC "simplExprF1-Lam" #-} + simplLam env zapped_bndrs body cont + -- The main issue here is under-saturated lambdas + -- (\x1. \x2. e) arg1 + -- Here x1 might have "occurs-once" occ-info, because occ-info + -- is computed assuming that a group of lambdas is applied + -- all at once. If there are too few args, we must zap the + -- occ-info, UNLESS the remaining binders are one-shot + where + (bndrs, body) = collectBinders expr + zapped_bndrs | need_to_zap = map zap bndrs + | otherwise = bndrs + + need_to_zap = any zappable_bndr (drop n_args bndrs) + n_args = countArgs cont + -- NB: countArgs counts all the args (incl type args) + -- and likewise drop counts all binders (incl type lambdas) + + zappable_bndr b = isId b && not (isOneShotBndr b) + zap b | isTyVar b = b + | otherwise = zapLamIdInfo b + +simplExprF1 env (Case scrut bndr _ alts) cont + = {-#SCC "simplExprF1-Case" #-} + simplExprF env scrut (Select { sc_dup = NoDup, sc_bndr = bndr + , sc_alts = alts + , sc_env = env, sc_cont = cont }) + +simplExprF1 env (Let (Rec pairs) body) cont + | Just pairs' <- joinPointBindings_maybe pairs + = {-#SCC "simplRecJoinPoin" #-} simplRecJoinPoint env pairs' body cont + + | otherwise + = {-#SCC "simplRecE" #-} simplRecE env pairs body cont + +simplExprF1 env (Let (NonRec bndr rhs) body) cont + | Type ty <- rhs -- First deal with type lets (let a = Type ty in e) + = {-#SCC "simplExprF1-NonRecLet-Type" #-} + ASSERT( isTyVar bndr ) + do { ty' <- simplType env ty + ; simplExprF (extendTvSubst env bndr ty') body cont } + + | Just (bndr', rhs') <- joinPointBinding_maybe bndr rhs + = {-#SCC "simplNonRecJoinPoint" #-} simplNonRecJoinPoint env bndr' rhs' body cont + + | otherwise + = {-#SCC "simplNonRecE" #-} simplNonRecE env bndr (rhs, env) ([], body) cont + +{- Note [Avoiding space leaks in OutType] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Since the simplifier is run for multiple iterations, we need to ensure +that any thunks in the output of one simplifier iteration are forced +by the evaluation of the next simplifier iteration. Otherwise we may +retain multiple copies of the Core program and leak a terrible amount +of memory (as in #13426). + +The simplifier is naturally strict in the entire "Expr part" of the +input Core program, because any expression may contain binders, which +we must find in order to extend the SimplEnv accordingly. But types +do not contain binders and so it is tempting to write things like + + simplExpr env (Type ty) = return (Type (substTy env ty)) -- Bad! + +This is Bad because the result includes a thunk (substTy env ty) which +retains a reference to the whole simplifier environment; and the next +simplifier iteration will not force this thunk either, because the +line above is not strict in ty. + +So instead our strategy is for the simplifier to fully evaluate +OutTypes when it emits them into the output Core program, for example + + simplExpr env (Type ty) = do { ty' <- simplType env ty -- Good + ; return (Type ty') } + +where the only difference from above is that simplType calls seqType +on the result of substTy. + +However, SimplCont can also contain OutTypes and it's not necessarily +a good idea to force types on the way in to SimplCont, because they +may end up not being used and forcing them could be a lot of wasted +work. T5631 is a good example of this. + +- For ApplyToTy's sc_arg_ty, we force the type on the way in because + the type will almost certainly appear as a type argument in the + output program. + +- For the hole types in Stop and ApplyToTy, we force the type when we + emit it into the output program, after obtaining it from + contResultType. (The hole type in ApplyToTy is only directly used + to form the result type in a new Stop continuation.) +-} + +--------------------------------- +-- Simplify a join point, adding the context. +-- Context goes *inside* the lambdas. IOW, if the join point has arity n, we do: +-- \x1 .. xn -> e => \x1 .. xn -> E[e] +-- Note that we need the arity of the join point, since e may be a lambda +-- (though this is unlikely). See Note [Join points and case-of-case]. +simplJoinRhs :: SimplEnv -> InId -> InExpr -> SimplCont + -> SimplM OutExpr +simplJoinRhs env bndr expr cont + | Just arity <- isJoinId_maybe bndr + = do { let (join_bndrs, join_body) = collectNBinders arity expr + ; (env', join_bndrs') <- simplLamBndrs env join_bndrs + ; join_body' <- simplExprC env' join_body cont + ; return $ mkLams join_bndrs' join_body' } + + | otherwise + = pprPanic "simplJoinRhs" (ppr bndr) + +--------------------------------- +simplType :: SimplEnv -> InType -> SimplM OutType + -- Kept monadic just so we can do the seqType + -- See Note [Avoiding space leaks in OutType] +simplType env ty + = -- pprTrace "simplType" (ppr ty $$ ppr (seTvSubst env)) $ + seqType new_ty `seq` return new_ty + where + new_ty = substTy env ty + +--------------------------------- +simplCoercionF :: SimplEnv -> InCoercion -> SimplCont + -> SimplM (SimplFloats, OutExpr) +simplCoercionF env co cont + = do { co' <- simplCoercion env co + ; rebuild env (Coercion co') cont } + +simplCoercion :: SimplEnv -> InCoercion -> SimplM OutCoercion +simplCoercion env co + = do { dflags <- getDynFlags + ; let opt_co = optCoercion dflags (getTCvSubst env) co + ; seqCo opt_co `seq` return opt_co } + +----------------------------------- +-- | Push a TickIt context outwards past applications and cases, as +-- long as this is a non-scoping tick, to let case and application +-- optimisations apply. + +simplTick :: SimplEnv -> Tickish Id -> InExpr -> SimplCont + -> SimplM (SimplFloats, OutExpr) +simplTick env tickish expr cont + -- A scoped tick turns into a continuation, so that we can spot + -- (scc t (\x . e)) in simplLam and eliminate the scc. If we didn't do + -- it this way, then it would take two passes of the simplifier to + -- reduce ((scc t (\x . e)) e'). + -- NB, don't do this with counting ticks, because if the expr is + -- bottom, then rebuildCall will discard the continuation. + +-- XXX: we cannot do this, because the simplifier assumes that +-- the context can be pushed into a case with a single branch. e.g. +-- scc<f> case expensive of p -> e +-- becomes +-- case expensive of p -> scc<f> e +-- +-- So I'm disabling this for now. It just means we will do more +-- simplifier iterations that necessary in some cases. + +-- | tickishScoped tickish && not (tickishCounts tickish) +-- = simplExprF env expr (TickIt tickish cont) + + -- For unscoped or soft-scoped ticks, we are allowed to float in new + -- cost, so we simply push the continuation inside the tick. This + -- has the effect of moving the tick to the outside of a case or + -- application context, allowing the normal case and application + -- optimisations to fire. + | tickish `tickishScopesLike` SoftScope + = do { (floats, expr') <- simplExprF env expr cont + ; return (floats, mkTick tickish expr') + } + + -- Push tick inside if the context looks like this will allow us to + -- do a case-of-case - see Note [case-of-scc-of-case] + | Select {} <- cont, Just expr' <- push_tick_inside + = simplExprF env expr' cont + + -- We don't want to move the tick, but we might still want to allow + -- floats to pass through with appropriate wrapping (or not, see + -- wrap_floats below) + --- | not (tickishCounts tickish) || tickishCanSplit tickish + -- = wrap_floats + + | otherwise + = no_floating_past_tick + + where + + -- Try to push tick inside a case, see Note [case-of-scc-of-case]. + push_tick_inside = + case expr0 of + Case scrut bndr ty alts + -> Just $ Case (tickScrut scrut) bndr ty (map tickAlt alts) + _other -> Nothing + where (ticks, expr0) = stripTicksTop movable (Tick tickish expr) + movable t = not (tickishCounts t) || + t `tickishScopesLike` NoScope || + tickishCanSplit t + tickScrut e = foldr mkTick e ticks + -- Alternatives get annotated with all ticks that scope in some way, + -- but we don't want to count entries. + tickAlt (c,bs,e) = (c,bs, foldr mkTick e ts_scope) + ts_scope = map mkNoCount $ + filter (not . (`tickishScopesLike` NoScope)) ticks + + no_floating_past_tick = + do { let (inc,outc) = splitCont cont + ; (floats, expr1) <- simplExprF env expr inc + ; let expr2 = wrapFloats floats expr1 + tickish' = simplTickish env tickish + ; rebuild env (mkTick tickish' expr2) outc + } + +-- Alternative version that wraps outgoing floats with the tick. This +-- results in ticks being duplicated, as we don't make any attempt to +-- eliminate the tick if we re-inline the binding (because the tick +-- semantics allows unrestricted inlining of HNFs), so I'm not doing +-- this any more. FloatOut will catch any real opportunities for +-- floating. +-- +-- wrap_floats = +-- do { let (inc,outc) = splitCont cont +-- ; (env', expr') <- simplExprF (zapFloats env) expr inc +-- ; let tickish' = simplTickish env tickish +-- ; let wrap_float (b,rhs) = (zapIdStrictness (setIdArity b 0), +-- mkTick (mkNoCount tickish') rhs) +-- -- when wrapping a float with mkTick, we better zap the Id's +-- -- strictness info and arity, because it might be wrong now. +-- ; let env'' = addFloats env (mapFloats env' wrap_float) +-- ; rebuild env'' expr' (TickIt tickish' outc) +-- } + + + simplTickish env tickish + | Breakpoint n ids <- tickish + = Breakpoint n (map (getDoneId . substId env) ids) + | otherwise = tickish + + -- Push type application and coercion inside a tick + splitCont :: SimplCont -> (SimplCont, SimplCont) + splitCont cont@(ApplyToTy { sc_cont = tail }) = (cont { sc_cont = inc }, outc) + where (inc,outc) = splitCont tail + splitCont (CastIt co c) = (CastIt co inc, outc) + where (inc,outc) = splitCont c + splitCont other = (mkBoringStop (contHoleType other), other) + + getDoneId (DoneId id) = id + getDoneId (DoneEx e _) = getIdFromTrivialExpr e -- Note [substTickish] in GHC.Core.Subst + getDoneId other = pprPanic "getDoneId" (ppr other) + +-- Note [case-of-scc-of-case] +-- It's pretty important to be able to transform case-of-case when +-- there's an SCC in the way. For example, the following comes up +-- in nofib/real/compress/Encode.hs: +-- +-- case scctick<code_string.r1> +-- case $wcode_string_r13s wild_XC w1_s137 w2_s138 l_aje +-- of _ { (# ww1_s13f, ww2_s13g, ww3_s13h #) -> +-- (ww1_s13f, ww2_s13g, ww3_s13h) +-- } +-- of _ { (ww_s12Y, ww1_s12Z, ww2_s130) -> +-- tick<code_string.f1> +-- (ww_s12Y, +-- ww1_s12Z, +-- PTTrees.PT +-- @ GHC.Types.Char @ GHC.Types.Int wild2_Xj ww2_s130 r_ajf) +-- } +-- +-- We really want this case-of-case to fire, because then the 3-tuple +-- will go away (indeed, the CPR optimisation is relying on this +-- happening). But the scctick is in the way - we need to push it +-- inside to expose the case-of-case. So we perform this +-- transformation on the inner case: +-- +-- scctick c (case e of { p1 -> e1; ...; pn -> en }) +-- ==> +-- case (scctick c e) of { p1 -> scc c e1; ...; pn -> scc c en } +-- +-- So we've moved a constant amount of work out of the scc to expose +-- the case. We only do this when the continuation is interesting: in +-- for now, it has to be another Case (maybe generalise this later). + +{- +************************************************************************ +* * +\subsection{The main rebuilder} +* * +************************************************************************ +-} + +rebuild :: SimplEnv -> OutExpr -> SimplCont -> SimplM (SimplFloats, OutExpr) +-- At this point the substitution in the SimplEnv should be irrelevant; +-- only the in-scope set matters +rebuild env expr cont + = case cont of + Stop {} -> return (emptyFloats env, expr) + TickIt t cont -> rebuild env (mkTick t expr) cont + CastIt co cont -> rebuild env (mkCast expr co) cont + -- NB: mkCast implements the (Coercion co |> g) optimisation + + Select { sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont } + -> rebuildCase (se `setInScopeFromE` env) expr bndr alts cont + + StrictArg { sc_fun = fun, sc_cont = cont } + -> rebuildCall env (fun `addValArgTo` expr) cont + StrictBind { sc_bndr = b, sc_bndrs = bs, sc_body = body + , sc_env = se, sc_cont = cont } + -> do { (floats1, env') <- simplNonRecX (se `setInScopeFromE` env) b expr + -- expr satisfies let/app since it started life + -- in a call to simplNonRecE + ; (floats2, expr') <- simplLam env' bs body cont + ; return (floats1 `addFloats` floats2, expr') } + + ApplyToTy { sc_arg_ty = ty, sc_cont = cont} + -> rebuild env (App expr (Type ty)) cont + + ApplyToVal { sc_arg = arg, sc_env = se, sc_dup = dup_flag, sc_cont = cont} + -- See Note [Avoid redundant simplification] + -> do { (_, _, arg') <- simplArg env dup_flag se arg + ; rebuild env (App expr arg') cont } + +{- +************************************************************************ +* * +\subsection{Lambdas} +* * +************************************************************************ +-} + +{- Note [Optimising reflexivity] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's important (for compiler performance) to get rid of reflexivity as soon +as it appears. See #11735, #14737, and #15019. + +In particular, we want to behave well on + + * e |> co1 |> co2 + where the two happen to cancel out entirely. That is quite common; + e.g. a newtype wrapping and unwrapping cancel. + + + * (f |> co) @t1 @t2 ... @tn x1 .. xm + Here we wil use pushCoTyArg and pushCoValArg successively, which + build up NthCo stacks. Silly to do that if co is reflexive. + +However, we don't want to call isReflexiveCo too much, because it uses +type equality which is expensive on big types (#14737 comment:7). + +A good compromise (determined experimentally) seems to be to call +isReflexiveCo + * when composing casts, and + * at the end + +In investigating this I saw missed opportunities for on-the-fly +coercion shrinkage. See #15090. +-} + + +simplCast :: SimplEnv -> InExpr -> Coercion -> SimplCont + -> SimplM (SimplFloats, OutExpr) +simplCast env body co0 cont0 + = do { co1 <- {-#SCC "simplCast-simplCoercion" #-} simplCoercion env co0 + ; cont1 <- {-#SCC "simplCast-addCoerce" #-} + if isReflCo co1 + then return cont0 -- See Note [Optimising reflexivity] + else addCoerce co1 cont0 + ; {-#SCC "simplCast-simplExprF" #-} simplExprF env body cont1 } + where + -- If the first parameter is MRefl, then simplifying revealed a + -- reflexive coercion. Omit. + addCoerceM :: MOutCoercion -> SimplCont -> SimplM SimplCont + addCoerceM MRefl cont = return cont + addCoerceM (MCo co) cont = addCoerce co cont + + addCoerce :: OutCoercion -> SimplCont -> SimplM SimplCont + addCoerce co1 (CastIt co2 cont) -- See Note [Optimising reflexivity] + | isReflexiveCo co' = return cont + | otherwise = addCoerce co' cont + where + co' = mkTransCo co1 co2 + + addCoerce co cont@(ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail }) + | Just (arg_ty', m_co') <- pushCoTyArg co arg_ty + -- N.B. As mentioned in Note [The hole type in ApplyToTy] this is + -- only needed by `sc_hole_ty` which is often not forced. + -- Consequently it is worthwhile using a lazy pattern match here to + -- avoid unnecessary coercionKind evaluations. + , let hole_ty = coercionLKind co + = {-#SCC "addCoerce-pushCoTyArg" #-} + do { tail' <- addCoerceM m_co' tail + ; return (cont { sc_arg_ty = arg_ty' + , sc_hole_ty = hole_ty -- NB! As the cast goes past, the + -- type of the hole changes (#16312) + , sc_cont = tail' }) } + + addCoerce co cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se + , sc_dup = dup, sc_cont = tail }) + | Just (co1, m_co2) <- pushCoValArg co + , let new_ty = coercionRKind co1 + , not (isTypeLevPoly new_ty) -- Without this check, we get a lev-poly arg + -- See Note [Levity polymorphism invariants] in GHC.Core + -- test: typecheck/should_run/EtaExpandLevPoly + = {-#SCC "addCoerce-pushCoValArg" #-} + do { tail' <- addCoerceM m_co2 tail + ; if isReflCo co1 + then return (cont { sc_cont = tail' }) + -- Avoid simplifying if possible; + -- See Note [Avoiding exponential behaviour] + else do + { (dup', arg_se', arg') <- simplArg env dup arg_se arg + -- When we build the ApplyTo we can't mix the OutCoercion + -- 'co' with the InExpr 'arg', so we simplify + -- to make it all consistent. It's a bit messy. + -- But it isn't a common case. + -- Example of use: #995 + ; return (ApplyToVal { sc_arg = mkCast arg' co1 + , sc_env = arg_se' + , sc_dup = dup' + , sc_cont = tail' }) } } + + addCoerce co cont + | isReflexiveCo co = return cont -- Having this at the end makes a huge + -- difference in T12227, for some reason + -- See Note [Optimising reflexivity] + | otherwise = return (CastIt co cont) + +simplArg :: SimplEnv -> DupFlag -> StaticEnv -> CoreExpr + -> SimplM (DupFlag, StaticEnv, OutExpr) +simplArg env dup_flag arg_env arg + | isSimplified dup_flag + = return (dup_flag, arg_env, arg) + | otherwise + = do { arg' <- simplExpr (arg_env `setInScopeFromE` env) arg + ; return (Simplified, zapSubstEnv arg_env, arg') } + +{- +************************************************************************ +* * +\subsection{Lambdas} +* * +************************************************************************ +-} + +simplLam :: SimplEnv -> [InId] -> InExpr -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +simplLam env [] body cont + = simplExprF env body cont + +simplLam env (bndr:bndrs) body (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = cont }) + = do { tick (BetaReduction bndr) + ; simplLam (extendTvSubst env bndr arg_ty) bndrs body cont } + +simplLam env (bndr:bndrs) body (ApplyToVal { sc_arg = arg, sc_env = arg_se + , sc_cont = cont, sc_dup = dup }) + | isSimplified dup -- Don't re-simplify if we've simplified it once + -- See Note [Avoiding exponential behaviour] + = do { tick (BetaReduction bndr) + ; (floats1, env') <- simplNonRecX env zapped_bndr arg + ; (floats2, expr') <- simplLam env' bndrs body cont + ; return (floats1 `addFloats` floats2, expr') } + + | otherwise + = do { tick (BetaReduction bndr) + ; simplNonRecE env zapped_bndr (arg, arg_se) (bndrs, body) cont } + where + zapped_bndr -- See Note [Zap unfolding when beta-reducing] + | isId bndr = zapStableUnfolding bndr + | otherwise = bndr + + -- Discard a non-counting tick on a lambda. This may change the + -- cost attribution slightly (moving the allocation of the + -- lambda elsewhere), but we don't care: optimisation changes + -- cost attribution all the time. +simplLam env bndrs body (TickIt tickish cont) + | not (tickishCounts tickish) + = simplLam env bndrs body cont + + -- Not enough args, so there are real lambdas left to put in the result +simplLam env bndrs body cont + = do { (env', bndrs') <- simplLamBndrs env bndrs + ; body' <- simplExpr env' body + ; new_lam <- mkLam env bndrs' body' cont + ; rebuild env' new_lam cont } + +------------- +simplLamBndr :: SimplEnv -> InBndr -> SimplM (SimplEnv, OutBndr) +-- Used for lambda binders. These sometimes have unfoldings added by +-- the worker/wrapper pass that must be preserved, because they can't +-- be reconstructed from context. For example: +-- f x = case x of (a,b) -> fw a b x +-- fw a b x{=(a,b)} = ... +-- The "{=(a,b)}" is an unfolding we can't reconstruct otherwise. +simplLamBndr env bndr + | isId bndr && isFragileUnfolding old_unf -- Special case + = do { (env1, bndr1) <- simplBinder env bndr + ; unf' <- simplStableUnfolding env1 NotTopLevel Nothing bndr + old_unf (idType bndr1) + ; let bndr2 = bndr1 `setIdUnfolding` unf' + ; return (modifyInScope env1 bndr2, bndr2) } + + | otherwise + = simplBinder env bndr -- Normal case + where + old_unf = idUnfolding bndr + +simplLamBndrs :: SimplEnv -> [InBndr] -> SimplM (SimplEnv, [OutBndr]) +simplLamBndrs env bndrs = mapAccumLM simplLamBndr env bndrs + +------------------ +simplNonRecE :: SimplEnv + -> InId -- The binder, always an Id + -- Never a join point + -> (InExpr, SimplEnv) -- Rhs of binding (or arg of lambda) + -> ([InBndr], InExpr) -- Body of the let/lambda + -- \xs.e + -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +-- simplNonRecE is used for +-- * non-top-level non-recursive non-join-point lets in expressions +-- * beta reduction +-- +-- simplNonRec env b (rhs, rhs_se) (bs, body) k +-- = let env in +-- cont< let b = rhs_se(rhs) in \bs.body > +-- +-- It deals with strict bindings, via the StrictBind continuation, +-- which may abort the whole process +-- +-- Precondition: rhs satisfies the let/app invariant +-- Note [Core let/app invariant] in GHC.Core +-- +-- The "body" of the binding comes as a pair of ([InId],InExpr) +-- representing a lambda; so we recurse back to simplLam +-- Why? Because of the binder-occ-info-zapping done before +-- the call to simplLam in simplExprF (Lam ...) + +simplNonRecE env bndr (rhs, rhs_se) (bndrs, body) cont + | ASSERT( isId bndr && not (isJoinId bndr) ) True + , Just env' <- preInlineUnconditionally env NotTopLevel bndr rhs rhs_se + = do { tick (PreInlineUnconditionally bndr) + ; -- pprTrace "preInlineUncond" (ppr bndr <+> ppr rhs) $ + simplLam env' bndrs body cont } + + -- Deal with strict bindings + | isStrictId bndr -- Includes coercions + , sm_case_case (getMode env) + = simplExprF (rhs_se `setInScopeFromE` env) rhs + (StrictBind { sc_bndr = bndr, sc_bndrs = bndrs, sc_body = body + , sc_env = env, sc_cont = cont, sc_dup = NoDup }) + + -- Deal with lazy bindings + | otherwise + = ASSERT( not (isTyVar bndr) ) + do { (env1, bndr1) <- simplNonRecBndr env bndr + ; (env2, bndr2) <- addBndrRules env1 bndr bndr1 Nothing + ; (floats1, env3) <- simplLazyBind env2 NotTopLevel NonRecursive bndr bndr2 rhs rhs_se + ; (floats2, expr') <- simplLam env3 bndrs body cont + ; return (floats1 `addFloats` floats2, expr') } + +------------------ +simplRecE :: SimplEnv + -> [(InId, InExpr)] + -> InExpr + -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +-- simplRecE is used for +-- * non-top-level recursive lets in expressions +simplRecE env pairs body cont + = do { let bndrs = map fst pairs + ; MASSERT(all (not . isJoinId) bndrs) + ; env1 <- simplRecBndrs env bndrs + -- NB: bndrs' don't have unfoldings or rules + -- We add them as we go down + ; (floats1, env2) <- simplRecBind env1 NotTopLevel Nothing pairs + ; (floats2, expr') <- simplExprF env2 body cont + ; return (floats1 `addFloats` floats2, expr') } + +{- Note [Avoiding exponential behaviour] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +One way in which we can get exponential behaviour is if we simplify a +big expression, and the re-simplify it -- and then this happens in a +deeply-nested way. So we must be jolly careful about re-simplifying +an expression. That is why completeNonRecX does not try +preInlineUnconditionally. + +Example: + f BIG, where f has a RULE +Then + * We simplify BIG before trying the rule; but the rule does not fire + * We inline f = \x. x True + * So if we did preInlineUnconditionally we'd re-simplify (BIG True) + +However, if BIG has /not/ already been simplified, we'd /like/ to +simplify BIG True; maybe good things happen. That is why + +* simplLam has + - a case for (isSimplified dup), which goes via simplNonRecX, and + - a case for the un-simplified case, which goes via simplNonRecE + +* We go to some efforts to avoid unnecessarily simplifying ApplyToVal, + in at least two places + - In simplCast/addCoerce, where we check for isReflCo + - In rebuildCall we avoid simplifying arguments before we have to + (see Note [Trying rewrite rules]) + + +Note [Zap unfolding when beta-reducing] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Lambda-bound variables can have stable unfoldings, such as + $j = \x. \b{Unf=Just x}. e +See Note [Case binders and join points] below; the unfolding for lets +us optimise e better. However when we beta-reduce it we want to +revert to using the actual value, otherwise we can end up in the +stupid situation of + let x = blah in + let b{Unf=Just x} = y + in ...b... +Here it'd be far better to drop the unfolding and use the actual RHS. + +************************************************************************ +* * + Join points +* * +********************************************************************* -} + +{- Note [Rules and unfolding for join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose we have + + simplExpr (join j x = rhs ) cont + ( {- RULE j (p:ps) = blah -} ) + ( {- StableUnfolding j = blah -} ) + (in blah ) + +Then we will push 'cont' into the rhs of 'j'. But we should *also* push +'cont' into the RHS of + * Any RULEs for j, e.g. generated by SpecConstr + * Any stable unfolding for j, e.g. the result of an INLINE pragma + +Simplifying rules and stable-unfoldings happens a bit after +simplifying the right-hand side, so we remember whether or not it +is a join point, and what 'cont' is, in a value of type MaybeJoinCont + +#13900 was caused by forgetting to push 'cont' into the RHS +of a SpecConstr-generated RULE for a join point. +-} + +type MaybeJoinCont = Maybe SimplCont + -- Nothing => Not a join point + -- Just k => This is a join binding with continuation k + -- See Note [Rules and unfolding for join points] + +simplNonRecJoinPoint :: SimplEnv -> InId -> InExpr + -> InExpr -> SimplCont + -> SimplM (SimplFloats, OutExpr) +simplNonRecJoinPoint env bndr rhs body cont + | ASSERT( isJoinId bndr ) True + , Just env' <- preInlineUnconditionally env NotTopLevel bndr rhs env + = do { tick (PreInlineUnconditionally bndr) + ; simplExprF env' body cont } + + | otherwise + = wrapJoinCont env cont $ \ env cont -> + do { -- We push join_cont into the join RHS and the body; + -- and wrap wrap_cont around the whole thing + ; let res_ty = contResultType cont + ; (env1, bndr1) <- simplNonRecJoinBndr env res_ty bndr + ; (env2, bndr2) <- addBndrRules env1 bndr bndr1 (Just cont) + ; (floats1, env3) <- simplJoinBind env2 cont bndr bndr2 rhs env + ; (floats2, body') <- simplExprF env3 body cont + ; return (floats1 `addFloats` floats2, body') } + + +------------------ +simplRecJoinPoint :: SimplEnv -> [(InId, InExpr)] + -> InExpr -> SimplCont + -> SimplM (SimplFloats, OutExpr) +simplRecJoinPoint env pairs body cont + = wrapJoinCont env cont $ \ env cont -> + do { let bndrs = map fst pairs + res_ty = contResultType cont + ; env1 <- simplRecJoinBndrs env res_ty bndrs + -- NB: bndrs' don't have unfoldings or rules + -- We add them as we go down + ; (floats1, env2) <- simplRecBind env1 NotTopLevel (Just cont) pairs + ; (floats2, body') <- simplExprF env2 body cont + ; return (floats1 `addFloats` floats2, body') } + +-------------------- +wrapJoinCont :: SimplEnv -> SimplCont + -> (SimplEnv -> SimplCont -> SimplM (SimplFloats, OutExpr)) + -> SimplM (SimplFloats, OutExpr) +-- Deal with making the continuation duplicable if necessary, +-- and with the no-case-of-case situation. +wrapJoinCont env cont thing_inside + | contIsStop cont -- Common case; no need for fancy footwork + = thing_inside env cont + + | not (sm_case_case (getMode env)) + -- See Note [Join points with -fno-case-of-case] + = do { (floats1, expr1) <- thing_inside env (mkBoringStop (contHoleType cont)) + ; let (floats2, expr2) = wrapJoinFloatsX floats1 expr1 + ; (floats3, expr3) <- rebuild (env `setInScopeFromF` floats2) expr2 cont + ; return (floats2 `addFloats` floats3, expr3) } + + | otherwise + -- Normal case; see Note [Join points and case-of-case] + = do { (floats1, cont') <- mkDupableCont env cont + ; (floats2, result) <- thing_inside (env `setInScopeFromF` floats1) cont' + ; return (floats1 `addFloats` floats2, result) } + + +-------------------- +trimJoinCont :: Id -> Maybe JoinArity -> SimplCont -> SimplCont +-- Drop outer context from join point invocation (jump) +-- See Note [Join points and case-of-case] + +trimJoinCont _ Nothing cont + = cont -- Not a jump +trimJoinCont var (Just arity) cont + = trim arity cont + where + trim 0 cont@(Stop {}) + = cont + trim 0 cont + = mkBoringStop (contResultType cont) + trim n cont@(ApplyToVal { sc_cont = k }) + = cont { sc_cont = trim (n-1) k } + trim n cont@(ApplyToTy { sc_cont = k }) + = cont { sc_cont = trim (n-1) k } -- join arity counts types! + trim _ cont + = pprPanic "completeCall" $ ppr var $$ ppr cont + + +{- Note [Join points and case-of-case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we perform the case-of-case transform (or otherwise push continuations +inward), we want to treat join points specially. Since they're always +tail-called and we want to maintain this invariant, we can do this (for any +evaluation context E): + + E[join j = e + in case ... of + A -> jump j 1 + B -> jump j 2 + C -> f 3] + + --> + + join j = E[e] + in case ... of + A -> jump j 1 + B -> jump j 2 + C -> E[f 3] + +As is evident from the example, there are two components to this behavior: + + 1. When entering the RHS of a join point, copy the context inside. + 2. When a join point is invoked, discard the outer context. + +We need to be very careful here to remain consistent---neither part is +optional! + +We need do make the continuation E duplicable (since we are duplicating it) +with mkDupableCont. + + +Note [Join points with -fno-case-of-case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Supose case-of-case is switched off, and we are simplifying + + case (join j x = <j-rhs> in + case y of + A -> j 1 + B -> j 2 + C -> e) of <outer-alts> + +Usually, we'd push the outer continuation (case . of <outer-alts>) into +both the RHS and the body of the join point j. But since we aren't doing +case-of-case we may then end up with this totally bogus result + + join x = case <j-rhs> of <outer-alts> in + case (case y of + A -> j 1 + B -> j 2 + C -> e) of <outer-alts> + +This would be OK in the language of the paper, but not in GHC: j is no longer +a join point. We can only do the "push continuation into the RHS of the +join point j" if we also push the continuation right down to the /jumps/ to +j, so that it can evaporate there. If we are doing case-of-case, we'll get to + + join x = case <j-rhs> of <outer-alts> in + case y of + A -> j 1 + B -> j 2 + C -> case e of <outer-alts> + +which is great. + +Bottom line: if case-of-case is off, we must stop pushing the continuation +inwards altogether at any join point. Instead simplify the (join ... in ...) +with a Stop continuation, and wrap the original continuation around the +outside. Surprisingly tricky! + + +************************************************************************ +* * + Variables +* * +************************************************************************ +-} + +simplVar :: SimplEnv -> InVar -> SimplM OutExpr +-- Look up an InVar in the environment +simplVar env var + | isTyVar var = return (Type (substTyVar env var)) + | isCoVar var = return (Coercion (substCoVar env var)) + | otherwise + = case substId env var of + ContEx tvs cvs ids e -> simplExpr (setSubstEnv env tvs cvs ids) e + DoneId var1 -> return (Var var1) + DoneEx e _ -> return e + +simplIdF :: SimplEnv -> InId -> SimplCont -> SimplM (SimplFloats, OutExpr) +simplIdF env var cont + = case substId env var of + ContEx tvs cvs ids e -> simplExprF (setSubstEnv env tvs cvs ids) e cont + -- Don't trim; haven't already simplified e, + -- so the cont is not embodied in e + + DoneId var1 -> completeCall env var1 (trimJoinCont var (isJoinId_maybe var1) cont) + + DoneEx e mb_join -> simplExprF (zapSubstEnv env) e (trimJoinCont var mb_join cont) + -- Note [zapSubstEnv] + -- The template is already simplified, so don't re-substitute. + -- This is VITAL. Consider + -- let x = e in + -- let y = \z -> ...x... in + -- \ x -> ...y... + -- We'll clone the inner \x, adding x->x' in the id_subst + -- Then when we inline y, we must *not* replace x by x' in + -- the inlined copy!! + +--------------------------------------------------------- +-- Dealing with a call site + +completeCall :: SimplEnv -> OutId -> SimplCont -> SimplM (SimplFloats, OutExpr) +completeCall env var cont + | Just expr <- callSiteInline dflags var active_unf + lone_variable arg_infos interesting_cont + -- Inline the variable's RHS + = do { checkedTick (UnfoldingDone var) + ; dump_inline expr cont + ; simplExprF (zapSubstEnv env) expr cont } + + | otherwise + -- Don't inline; instead rebuild the call + = do { rule_base <- getSimplRules + ; let info = mkArgInfo env var (getRules rule_base var) + n_val_args call_cont + ; rebuildCall env info cont } + + where + dflags = seDynFlags env + (lone_variable, arg_infos, call_cont) = contArgs cont + n_val_args = length arg_infos + interesting_cont = interestingCallContext env call_cont + active_unf = activeUnfolding (getMode env) var + + log_inlining doc + = liftIO $ dumpAction dflags + (mkUserStyle dflags alwaysQualify AllTheWay) + (dumpOptionsFromFlag Opt_D_dump_inlinings) + "" FormatText doc + + dump_inline unfolding cont + | not (dopt Opt_D_dump_inlinings dflags) = return () + | not (dopt Opt_D_verbose_core2core dflags) + = when (isExternalName (idName var)) $ + log_inlining $ + sep [text "Inlining done:", nest 4 (ppr var)] + | otherwise + = liftIO $ log_inlining $ + sep [text "Inlining done: " <> ppr var, + nest 4 (vcat [text "Inlined fn: " <+> nest 2 (ppr unfolding), + text "Cont: " <+> ppr cont])] + +rebuildCall :: SimplEnv + -> ArgInfo + -> SimplCont + -> SimplM (SimplFloats, OutExpr) +-- We decided not to inline, so +-- - simplify the arguments +-- - try rewrite rules +-- - and rebuild + +---------- Bottoming applications -------------- +rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args, ai_strs = [] }) cont + -- When we run out of strictness args, it means + -- that the call is definitely bottom; see GHC.Core.Op.Simplify.Utils.mkArgInfo + -- Then we want to discard the entire strict continuation. E.g. + -- * case (error "hello") of { ... } + -- * (error "Hello") arg + -- * f (error "Hello") where f is strict + -- etc + -- Then, especially in the first of these cases, we'd like to discard + -- the continuation, leaving just the bottoming expression. But the + -- type might not be right, so we may have to add a coerce. + | not (contIsTrivial cont) -- Only do this if there is a non-trivial + -- continuation to discard, else we do it + -- again and again! + = seqType cont_ty `seq` -- See Note [Avoiding space leaks in OutType] + return (emptyFloats env, castBottomExpr res cont_ty) + where + res = argInfoExpr fun rev_args + cont_ty = contResultType cont + +---------- Try rewrite RULES -------------- +-- See Note [Trying rewrite rules] +rebuildCall env info@(ArgInfo { ai_fun = fun, ai_args = rev_args + , ai_rules = Just (nr_wanted, rules) }) cont + | nr_wanted == 0 || no_more_args + , let info' = info { ai_rules = Nothing } + = -- We've accumulated a simplified call in <fun,rev_args> + -- so try rewrite rules; see Note [RULEs apply to simplified arguments] + -- See also Note [Rules for recursive functions] + do { mb_match <- tryRules env rules fun (reverse rev_args) cont + ; case mb_match of + Just (env', rhs, cont') -> simplExprF env' rhs cont' + Nothing -> rebuildCall env info' cont } + where + no_more_args = case cont of + ApplyToTy {} -> False + ApplyToVal {} -> False + _ -> True + + +---------- Simplify applications and casts -------------- +rebuildCall env info (CastIt co cont) + = rebuildCall env (addCastTo info co) cont + +rebuildCall env info (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = cont }) + = rebuildCall env (addTyArgTo info arg_ty) cont + +rebuildCall env info@(ArgInfo { ai_encl = encl_rules, ai_type = fun_ty + , ai_strs = str:strs, ai_discs = disc:discs }) + (ApplyToVal { sc_arg = arg, sc_env = arg_se + , sc_dup = dup_flag, sc_cont = cont }) + | isSimplified dup_flag -- See Note [Avoid redundant simplification] + = rebuildCall env (addValArgTo info' arg) cont + + | str -- Strict argument + , sm_case_case (getMode env) + = -- pprTrace "Strict Arg" (ppr arg $$ ppr (seIdSubst env) $$ ppr (seInScope env)) $ + simplExprF (arg_se `setInScopeFromE` env) arg + (StrictArg { sc_fun = info', sc_cci = cci_strict + , sc_dup = Simplified, sc_cont = cont }) + -- Note [Shadowing] + + | otherwise -- Lazy argument + -- DO NOT float anything outside, hence simplExprC + -- There is no benefit (unlike in a let-binding), and we'd + -- have to be very careful about bogus strictness through + -- floating a demanded let. + = do { arg' <- simplExprC (arg_se `setInScopeFromE` env) arg + (mkLazyArgStop arg_ty cci_lazy) + ; rebuildCall env (addValArgTo info' arg') cont } + where + info' = info { ai_strs = strs, ai_discs = discs } + arg_ty = funArgTy fun_ty + + -- Use this for lazy arguments + cci_lazy | encl_rules = RuleArgCtxt + | disc > 0 = DiscArgCtxt -- Be keener here + | otherwise = BoringCtxt -- Nothing interesting + + -- ..and this for strict arguments + cci_strict | encl_rules = RuleArgCtxt + | disc > 0 = DiscArgCtxt + | otherwise = RhsCtxt + -- Why RhsCtxt? if we see f (g x) (h x), and f is strict, we + -- want to be a bit more eager to inline g, because it may + -- expose an eval (on x perhaps) that can be eliminated or + -- shared. I saw this in nofib 'boyer2', RewriteFuns.onewayunify1 + -- It's worth an 18% improvement in allocation for this + -- particular benchmark; 5% on 'mate' and 1.3% on 'multiplier' + +---------- No further useful info, revert to generic rebuild ------------ +rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args }) cont + = rebuild env (argInfoExpr fun rev_args) cont + +{- Note [Trying rewrite rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider an application (f e1 e2 e3) where the e1,e2,e3 are not yet +simplified. We want to simplify enough arguments to allow the rules +to apply, but it's more efficient to avoid simplifying e2,e3 if e1 alone +is sufficient. Example: class ops + (+) dNumInt e2 e3 +If we rewrite ((+) dNumInt) to plusInt, we can take advantage of the +latter's strictness when simplifying e2, e3. Moreover, suppose we have + RULE f Int = \x. x True + +Then given (f Int e1) we rewrite to + (\x. x True) e1 +without simplifying e1. Now we can inline x into its unique call site, +and absorb the True into it all in the same pass. If we simplified +e1 first, we couldn't do that; see Note [Avoiding exponential behaviour]. + +So we try to apply rules if either + (a) no_more_args: we've run out of argument that the rules can "see" + (b) nr_wanted: none of the rules wants any more arguments + + +Note [RULES apply to simplified arguments] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's very desirable to try RULES once the arguments have been simplified, because +doing so ensures that rule cascades work in one pass. Consider + {-# RULES g (h x) = k x + f (k x) = x #-} + ...f (g (h x))... +Then we want to rewrite (g (h x)) to (k x) and only then try f's rules. If +we match f's rules against the un-simplified RHS, it won't match. This +makes a particularly big difference when superclass selectors are involved: + op ($p1 ($p2 (df d))) +We want all this to unravel in one sweep. + +Note [Avoid redundant simplification] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Because RULES apply to simplified arguments, there's a danger of repeatedly +simplifying already-simplified arguments. An important example is that of + (>>=) d e1 e2 +Here e1, e2 are simplified before the rule is applied, but don't really +participate in the rule firing. So we mark them as Simplified to avoid +re-simplifying them. + +Note [Shadowing] +~~~~~~~~~~~~~~~~ +This part of the simplifier may break the no-shadowing invariant +Consider + f (...(\a -> e)...) (case y of (a,b) -> e') +where f is strict in its second arg +If we simplify the innermost one first we get (...(\a -> e)...) +Simplifying the second arg makes us float the case out, so we end up with + case y of (a,b) -> f (...(\a -> e)...) e' +So the output does not have the no-shadowing invariant. However, there is +no danger of getting name-capture, because when the first arg was simplified +we used an in-scope set that at least mentioned all the variables free in its +static environment, and that is enough. + +We can't just do innermost first, or we'd end up with a dual problem: + case x of (a,b) -> f e (...(\a -> e')...) + +I spent hours trying to recover the no-shadowing invariant, but I just could +not think of an elegant way to do it. The simplifier is already knee-deep in +continuations. We have to keep the right in-scope set around; AND we have +to get the effect that finding (error "foo") in a strict arg position will +discard the entire application and replace it with (error "foo"). Getting +all this at once is TOO HARD! + + +************************************************************************ +* * + Rewrite rules +* * +************************************************************************ +-} + +tryRules :: SimplEnv -> [CoreRule] + -> Id -> [ArgSpec] + -> SimplCont + -> SimplM (Maybe (SimplEnv, CoreExpr, SimplCont)) + +tryRules env rules fn args call_cont + | null rules + = return Nothing + +{- Disabled until we fix #8326 + | fn `hasKey` tagToEnumKey -- See Note [Optimising tagToEnum#] + , [_type_arg, val_arg] <- args + , Select dup bndr ((_,[],rhs1) : rest_alts) se cont <- call_cont + , isDeadBinder bndr + = do { let enum_to_tag :: CoreAlt -> CoreAlt + -- Takes K -> e into tagK# -> e + -- where tagK# is the tag of constructor K + enum_to_tag (DataAlt con, [], rhs) + = ASSERT( isEnumerationTyCon (dataConTyCon con) ) + (LitAlt tag, [], rhs) + where + tag = mkLitInt dflags (toInteger (dataConTag con - fIRST_TAG)) + enum_to_tag alt = pprPanic "tryRules: tagToEnum" (ppr alt) + + new_alts = (DEFAULT, [], rhs1) : map enum_to_tag rest_alts + new_bndr = setIdType bndr intPrimTy + -- The binder is dead, but should have the right type + ; return (Just (val_arg, Select dup new_bndr new_alts se cont)) } +-} + + | Just (rule, rule_rhs) <- lookupRule dflags (getUnfoldingInRuleMatch env) + (activeRule (getMode env)) fn + (argInfoAppArgs args) rules + -- Fire a rule for the function + = do { checkedTick (RuleFired (ruleName rule)) + ; let cont' = pushSimplifiedArgs zapped_env + (drop (ruleArity rule) args) + call_cont + -- (ruleArity rule) says how + -- many args the rule consumed + + occ_anald_rhs = occurAnalyseExpr rule_rhs + -- See Note [Occurrence-analyse after rule firing] + ; dump rule rule_rhs + ; return (Just (zapped_env, occ_anald_rhs, cont')) } + -- The occ_anald_rhs and cont' are all Out things + -- hence zapping the environment + + | otherwise -- No rule fires + = do { nodump -- This ensures that an empty file is written + ; return Nothing } + + where + dflags = seDynFlags env + zapped_env = zapSubstEnv env -- See Note [zapSubstEnv] + + printRuleModule rule + = parens (maybe (text "BUILTIN") + (pprModuleName . moduleName) + (ruleModule rule)) + + dump rule rule_rhs + | dopt Opt_D_dump_rule_rewrites dflags + = log_rule dflags Opt_D_dump_rule_rewrites "Rule fired" $ vcat + [ text "Rule:" <+> ftext (ruleName rule) + , text "Module:" <+> printRuleModule rule + , text "Before:" <+> hang (ppr fn) 2 (sep (map ppr args)) + , text "After: " <+> pprCoreExpr rule_rhs + , text "Cont: " <+> ppr call_cont ] + + | dopt Opt_D_dump_rule_firings dflags + = log_rule dflags Opt_D_dump_rule_firings "Rule fired:" $ + ftext (ruleName rule) + <+> printRuleModule rule + + | otherwise + = return () + + nodump + | dopt Opt_D_dump_rule_rewrites dflags + = liftIO $ do + touchDumpFile dflags (dumpOptionsFromFlag Opt_D_dump_rule_rewrites) + + | dopt Opt_D_dump_rule_firings dflags + = liftIO $ do + touchDumpFile dflags (dumpOptionsFromFlag Opt_D_dump_rule_firings) + + | otherwise + = return () + + log_rule dflags flag hdr details + = liftIO $ do + let sty = mkDumpStyle dflags alwaysQualify + dumpAction dflags sty (dumpOptionsFromFlag flag) "" FormatText $ + sep [text hdr, nest 4 details] + +trySeqRules :: SimplEnv + -> OutExpr -> InExpr -- Scrutinee and RHS + -> SimplCont + -> SimplM (Maybe (SimplEnv, CoreExpr, SimplCont)) +-- See Note [User-defined RULES for seq] +trySeqRules in_env scrut rhs cont + = do { rule_base <- getSimplRules + ; tryRules in_env (getRules rule_base seqId) seqId out_args rule_cont } + where + no_cast_scrut = drop_casts scrut + scrut_ty = exprType no_cast_scrut + seq_id_ty = idType seqId + res1_ty = piResultTy seq_id_ty rhs_rep + res2_ty = piResultTy res1_ty scrut_ty + rhs_ty = substTy in_env (exprType rhs) + rhs_rep = getRuntimeRep rhs_ty + out_args = [ TyArg { as_arg_ty = rhs_rep + , as_hole_ty = seq_id_ty } + , TyArg { as_arg_ty = scrut_ty + , as_hole_ty = res1_ty } + , TyArg { as_arg_ty = rhs_ty + , as_hole_ty = res2_ty } + , ValArg no_cast_scrut] + rule_cont = ApplyToVal { sc_dup = NoDup, sc_arg = rhs + , sc_env = in_env, sc_cont = cont } + -- Lazily evaluated, so we don't do most of this + + drop_casts (Cast e _) = drop_casts e + drop_casts e = e + +{- Note [User-defined RULES for seq] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Given + case (scrut |> co) of _ -> rhs +look for rules that match the expression + seq @t1 @t2 scrut +where scrut :: t1 + rhs :: t2 + +If you find a match, rewrite it, and apply to 'rhs'. + +Notice that we can simply drop casts on the fly here, which +makes it more likely that a rule will match. + +See Note [User-defined RULES for seq] in MkId. + +Note [Occurrence-analyse after rule firing] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +After firing a rule, we occurrence-analyse the instantiated RHS before +simplifying it. Usually this doesn't make much difference, but it can +be huge. Here's an example (simplCore/should_compile/T7785) + + map f (map f (map f xs) + += -- Use build/fold form of map, twice + map f (build (\cn. foldr (mapFB c f) n + (build (\cn. foldr (mapFB c f) n xs)))) + += -- Apply fold/build rule + map f (build (\cn. (\cn. foldr (mapFB c f) n xs) (mapFB c f) n)) + += -- Beta-reduce + -- Alas we have no occurrence-analysed, so we don't know + -- that c is used exactly once + map f (build (\cn. let c1 = mapFB c f in + foldr (mapFB c1 f) n xs)) + += -- Use mapFB rule: mapFB (mapFB c f) g = mapFB c (f.g) + -- We can do this because (mapFB c n) is a PAP and hence expandable + map f (build (\cn. let c1 = mapFB c n in + foldr (mapFB c (f.f)) n x)) + +This is not too bad. But now do the same with the outer map, and +we get another use of mapFB, and t can interact with /both/ remaining +mapFB calls in the above expression. This is stupid because actually +that 'c1' binding is dead. The outer map introduces another c2. If +there is a deep stack of maps we get lots of dead bindings, and lots +of redundant work as we repeatedly simplify the result of firing rules. + +The easy thing to do is simply to occurrence analyse the result of +the rule firing. Note that this occ-anals not only the RHS of the +rule, but also the function arguments, which by now are OutExprs. +E.g. + RULE f (g x) = x+1 + +Call f (g BIG) --> (\x. x+1) BIG + +The rule binders are lambda-bound and applied to the OutExpr arguments +(here BIG) which lack all internal occurrence info. + +Is this inefficient? Not really: we are about to walk over the result +of the rule firing to simplify it, so occurrence analysis is at most +a constant factor. + +Possible improvement: occ-anal the rules when putting them in the +database; and in the simplifier just occ-anal the OutExpr arguments. +But that's more complicated and the rule RHS is usually tiny; so I'm +just doing the simple thing. + +Historical note: previously we did occ-anal the rules in Rule.hs, +but failed to occ-anal the OutExpr arguments, which led to the +nasty performance problem described above. + + +Note [Optimising tagToEnum#] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have an enumeration data type: + + data Foo = A | B | C + +Then we want to transform + + case tagToEnum# x of ==> case x of + A -> e1 DEFAULT -> e1 + B -> e2 1# -> e2 + C -> e3 2# -> e3 + +thereby getting rid of the tagToEnum# altogether. If there was a DEFAULT +alternative we retain it (remember it comes first). If not the case must +be exhaustive, and we reflect that in the transformed version by adding +a DEFAULT. Otherwise Lint complains that the new case is not exhaustive. +See #8317. + +Note [Rules for recursive functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +You might think that we shouldn't apply rules for a loop breaker: +doing so might give rise to an infinite loop, because a RULE is +rather like an extra equation for the function: + RULE: f (g x) y = x+y + Eqn: f a y = a-y + +But it's too drastic to disable rules for loop breakers. +Even the foldr/build rule would be disabled, because foldr +is recursive, and hence a loop breaker: + foldr k z (build g) = g k z +So it's up to the programmer: rules can cause divergence + + +************************************************************************ +* * + Rebuilding a case expression +* * +************************************************************************ + +Note [Case elimination] +~~~~~~~~~~~~~~~~~~~~~~~ +The case-elimination transformation discards redundant case expressions. +Start with a simple situation: + + case x# of ===> let y# = x# in e + y# -> e + +(when x#, y# are of primitive type, of course). We can't (in general) +do this for algebraic cases, because we might turn bottom into +non-bottom! + +The code in GHC.Core.Op.Simplify.Utils.prepareAlts has the effect of generalise +this idea to look for a case where we're scrutinising a variable, and we know +that only the default case can match. For example: + + case x of + 0# -> ... + DEFAULT -> ...(case x of + 0# -> ... + DEFAULT -> ...) ... + +Here the inner case is first trimmed to have only one alternative, the +DEFAULT, after which it's an instance of the previous case. This +really only shows up in eliminating error-checking code. + +Note that GHC.Core.Op.Simplify.Utils.mkCase combines identical RHSs. So + + case e of ===> case e of DEFAULT -> r + True -> r + False -> r + +Now again the case may be eliminated by the CaseElim transformation. +This includes things like (==# a# b#)::Bool so that we simplify + case ==# a# b# of { True -> x; False -> x } +to just + x +This particular example shows up in default methods for +comparison operations (e.g. in (>=) for Int.Int32) + +Note [Case to let transformation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If a case over a lifted type has a single alternative, and is being +used as a strict 'let' (all isDeadBinder bndrs), we may want to do +this transformation: + + case e of r ===> let r = e in ...r... + _ -> ...r... + +We treat the unlifted and lifted cases separately: + +* Unlifted case: 'e' satisfies exprOkForSpeculation + (ok-for-spec is needed to satisfy the let/app invariant). + This turns case a +# b of r -> ...r... + into let r = a +# b in ...r... + and thence .....(a +# b).... + + However, if we have + case indexArray# a i of r -> ...r... + we might like to do the same, and inline the (indexArray# a i). + But indexArray# is not okForSpeculation, so we don't build a let + in rebuildCase (lest it get floated *out*), so the inlining doesn't + happen either. Annoying. + +* Lifted case: we need to be sure that the expression is already + evaluated (exprIsHNF). If it's not already evaluated + - we risk losing exceptions, divergence or + user-specified thunk-forcing + - even if 'e' is guaranteed to converge, we don't want to + create a thunk (call by need) instead of evaluating it + right away (call by value) + + However, we can turn the case into a /strict/ let if the 'r' is + used strictly in the body. Then we won't lose divergence; and + we won't build a thunk because the let is strict. + See also Note [Case-to-let for strictly-used binders] + + NB: absentError satisfies exprIsHNF: see Note [aBSENT_ERROR_ID] in GHC.Core.Make. + We want to turn + case (absentError "foo") of r -> ...MkT r... + into + let r = absentError "foo" in ...MkT r... + + +Note [Case-to-let for strictly-used binders] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have this: + case <scrut> of r { _ -> ..r.. } + +where 'r' is used strictly in (..r..), we can safely transform to + let r = <scrut> in ...r... + +This is a Good Thing, because 'r' might be dead (if the body just +calls error), or might be used just once (in which case it can be +inlined); or we might be able to float the let-binding up or down. +E.g. #15631 has an example. + +Note that this can change the error behaviour. For example, we might +transform + case x of { _ -> error "bad" } + --> error "bad" +which is might be puzzling if 'x' currently lambda-bound, but later gets +let-bound to (error "good"). + +Nevertheless, the paper "A semantics for imprecise exceptions" allows +this transformation. If you want to fix the evaluation order, use +'pseq'. See #8900 for an example where the loss of this +transformation bit us in practice. + +See also Note [Empty case alternatives] in GHC.Core. + +Historical notes + +There have been various earlier versions of this patch: + +* By Sept 18 the code looked like this: + || scrut_is_demanded_var scrut + + scrut_is_demanded_var :: CoreExpr -> Bool + scrut_is_demanded_var (Cast s _) = scrut_is_demanded_var s + scrut_is_demanded_var (Var _) = isStrictDmd (idDemandInfo case_bndr) + scrut_is_demanded_var _ = False + + This only fired if the scrutinee was a /variable/, which seems + an unnecessary restriction. So in #15631 I relaxed it to allow + arbitrary scrutinees. Less code, less to explain -- but the change + had 0.00% effect on nofib. + +* Previously, in Jan 13 the code looked like this: + || case_bndr_evald_next rhs + + case_bndr_evald_next :: CoreExpr -> Bool + -- See Note [Case binder next] + case_bndr_evald_next (Var v) = v == case_bndr + case_bndr_evald_next (Cast e _) = case_bndr_evald_next e + case_bndr_evald_next (App e _) = case_bndr_evald_next e + case_bndr_evald_next (Case e _ _ _) = case_bndr_evald_next e + case_bndr_evald_next _ = False + + This patch was part of fixing #7542. See also + Note [Eta reduction of an eval'd function] in GHC.Core.Utils.) + + +Further notes about case elimination +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider: test :: Integer -> IO () + test = print + +Turns out that this compiles to: + Print.test + = \ eta :: Integer + eta1 :: Void# -> + case PrelNum.< eta PrelNum.zeroInteger of wild { __DEFAULT -> + case hPutStr stdout + (PrelNum.jtos eta ($w[] @ Char)) + eta1 + of wild1 { (# new_s, a4 #) -> PrelIO.lvl23 new_s }} + +Notice the strange '<' which has no effect at all. This is a funny one. +It started like this: + +f x y = if x < 0 then jtos x + else if y==0 then "" else jtos x + +At a particular call site we have (f v 1). So we inline to get + + if v < 0 then jtos x + else if 1==0 then "" else jtos x + +Now simplify the 1==0 conditional: + + if v<0 then jtos v else jtos v + +Now common-up the two branches of the case: + + case (v<0) of DEFAULT -> jtos v + +Why don't we drop the case? Because it's strict in v. It's technically +wrong to drop even unnecessary evaluations, and in practice they +may be a result of 'seq' so we *definitely* don't want to drop those. +I don't really know how to improve this situation. + + +Note [FloatBinds from constructor wrappers] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have FloatBinds coming from the constructor wrapper +(as in Note [exprIsConApp_maybe on data constructors with wrappers]), +we cannot float past them. We'd need to float the FloatBind +together with the simplify floats, unfortunately the +simplifier doesn't have case-floats. The simplest thing we can +do is to wrap all the floats here. The next iteration of the +simplifier will take care of all these cases and lets. + +Given data T = MkT !Bool, this allows us to simplify +case $WMkT b of { MkT x -> f x } +to +case b of { b' -> f b' }. + +We could try and be more clever (like maybe wfloats only contain +let binders, so we could float them). But the need for the +extra complication is not clear. +-} + +--------------------------------------------------------- +-- Eliminate the case if possible + +rebuildCase, reallyRebuildCase + :: SimplEnv + -> OutExpr -- Scrutinee + -> InId -- Case binder + -> [InAlt] -- Alternatives (increasing order) + -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +-------------------------------------------------- +-- 1. Eliminate the case if there's a known constructor +-------------------------------------------------- + +rebuildCase env scrut case_bndr alts cont + | Lit lit <- scrut -- No need for same treatment as constructors + -- because literals are inlined more vigorously + , not (litIsLifted lit) + = do { tick (KnownBranch case_bndr) + ; case findAlt (LitAlt lit) alts of + Nothing -> missingAlt env case_bndr alts cont + Just (_, bs, rhs) -> simple_rhs env [] scrut bs rhs } + + | Just (in_scope', wfloats, con, ty_args, other_args) + <- exprIsConApp_maybe (getUnfoldingInRuleMatch env) scrut + -- Works when the scrutinee is a variable with a known unfolding + -- as well as when it's an explicit constructor application + , let env0 = setInScopeSet env in_scope' + = do { tick (KnownBranch case_bndr) + ; case findAlt (DataAlt con) alts of + Nothing -> missingAlt env0 case_bndr alts cont + Just (DEFAULT, bs, rhs) -> let con_app = Var (dataConWorkId con) + `mkTyApps` ty_args + `mkApps` other_args + in simple_rhs env0 wfloats con_app bs rhs + Just (_, bs, rhs) -> knownCon env0 scrut wfloats con ty_args other_args + case_bndr bs rhs cont + } + where + simple_rhs env wfloats scrut' bs rhs = + ASSERT( null bs ) + do { (floats1, env') <- simplNonRecX env case_bndr scrut' + -- scrut is a constructor application, + -- hence satisfies let/app invariant + ; (floats2, expr') <- simplExprF env' rhs cont + ; case wfloats of + [] -> return (floats1 `addFloats` floats2, expr') + _ -> return + -- See Note [FloatBinds from constructor wrappers] + ( emptyFloats env, + GHC.Core.Make.wrapFloats wfloats $ + wrapFloats (floats1 `addFloats` floats2) expr' )} + + +-------------------------------------------------- +-- 2. Eliminate the case if scrutinee is evaluated +-------------------------------------------------- + +rebuildCase env scrut case_bndr alts@[(_, bndrs, rhs)] cont + -- See if we can get rid of the case altogether + -- See Note [Case elimination] + -- mkCase made sure that if all the alternatives are equal, + -- then there is now only one (DEFAULT) rhs + + -- 2a. Dropping the case altogether, if + -- a) it binds nothing (so it's really just a 'seq') + -- b) evaluating the scrutinee has no side effects + | is_plain_seq + , exprOkForSideEffects scrut + -- The entire case is dead, so we can drop it + -- if the scrutinee converges without having imperative + -- side effects or raising a Haskell exception + -- See Note [PrimOp can_fail and has_side_effects] in PrimOp + = simplExprF env rhs cont + + -- 2b. Turn the case into a let, if + -- a) it binds only the case-binder + -- b) unlifted case: the scrutinee is ok-for-speculation + -- lifted case: the scrutinee is in HNF (or will later be demanded) + -- See Note [Case to let transformation] + | all_dead_bndrs + , doCaseToLet scrut case_bndr + = do { tick (CaseElim case_bndr) + ; (floats1, env') <- simplNonRecX env case_bndr scrut + ; (floats2, expr') <- simplExprF env' rhs cont + ; return (floats1 `addFloats` floats2, expr') } + + -- 2c. Try the seq rules if + -- a) it binds only the case binder + -- b) a rule for seq applies + -- See Note [User-defined RULES for seq] in MkId + | is_plain_seq + = do { mb_rule <- trySeqRules env scrut rhs cont + ; case mb_rule of + Just (env', rule_rhs, cont') -> simplExprF env' rule_rhs cont' + Nothing -> reallyRebuildCase env scrut case_bndr alts cont } + where + all_dead_bndrs = all isDeadBinder bndrs -- bndrs are [InId] + is_plain_seq = all_dead_bndrs && isDeadBinder case_bndr -- Evaluation *only* for effect + +rebuildCase env scrut case_bndr alts cont + = reallyRebuildCase env scrut case_bndr alts cont + + +doCaseToLet :: OutExpr -- Scrutinee + -> InId -- Case binder + -> Bool +-- The situation is case scrut of b { DEFAULT -> body } +-- Can we transform thus? let { b = scrut } in body +doCaseToLet scrut case_bndr + | isTyCoVar case_bndr -- Respect GHC.Core + = isTyCoArg scrut -- Note [Core type and coercion invariant] + + | isUnliftedType (idType case_bndr) + = exprOkForSpeculation scrut + + | otherwise -- Scrut has a lifted type + = exprIsHNF scrut + || isStrictDmd (idDemandInfo case_bndr) + -- See Note [Case-to-let for strictly-used binders] + +-------------------------------------------------- +-- 3. Catch-all case +-------------------------------------------------- + +reallyRebuildCase env scrut case_bndr alts cont + | not (sm_case_case (getMode env)) + = do { case_expr <- simplAlts env scrut case_bndr alts + (mkBoringStop (contHoleType cont)) + ; rebuild env case_expr cont } + + | otherwise + = do { (floats, cont') <- mkDupableCaseCont env alts cont + ; case_expr <- simplAlts (env `setInScopeFromF` floats) + scrut case_bndr alts cont' + ; return (floats, case_expr) } + +{- +simplCaseBinder checks whether the scrutinee is a variable, v. If so, +try to eliminate uses of v in the RHSs in favour of case_bndr; that +way, there's a chance that v will now only be used once, and hence +inlined. + +Historical note: we use to do the "case binder swap" in the Simplifier +so there were additional complications if the scrutinee was a variable. +Now the binder-swap stuff is done in the occurrence analyser; see +OccurAnal Note [Binder swap]. + +Note [knownCon occ info] +~~~~~~~~~~~~~~~~~~~~~~~~ +If the case binder is not dead, then neither are the pattern bound +variables: + case <any> of x { (a,b) -> + case x of { (p,q) -> p } } +Here (a,b) both look dead, but come alive after the inner case is eliminated. +The point is that we bring into the envt a binding + let x = (a,b) +after the outer case, and that makes (a,b) alive. At least we do unless +the case binder is guaranteed dead. + +Note [Case alternative occ info] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we are simply reconstructing a case (the common case), we always +zap the occurrence info on the binders in the alternatives. Even +if the case binder is dead, the scrutinee is usually a variable, and *that* +can bring the case-alternative binders back to life. +See Note [Add unfolding for scrutinee] + +Note [Improving seq] +~~~~~~~~~~~~~~~~~~~ +Consider + type family F :: * -> * + type instance F Int = Int + +We'd like to transform + case e of (x :: F Int) { DEFAULT -> rhs } +===> + case e `cast` co of (x'::Int) + I# x# -> let x = x' `cast` sym co + in rhs + +so that 'rhs' can take advantage of the form of x'. Notice that Note +[Case of cast] (in OccurAnal) may then apply to the result. + +We'd also like to eliminate empty types (#13468). So if + + data Void + type instance F Bool = Void + +then we'd like to transform + case (x :: F Bool) of { _ -> error "urk" } +===> + case (x |> co) of (x' :: Void) of {} + +Nota Bene: we used to have a built-in rule for 'seq' that dropped +casts, so that + case (x |> co) of { _ -> blah } +dropped the cast; in order to improve the chances of trySeqRules +firing. But that works in the /opposite/ direction to Note [Improving +seq] so there's a danger of flip/flopping. Better to make trySeqRules +insensitive to the cast, which is now is. + +The need for [Improving seq] showed up in Roman's experiments. Example: + foo :: F Int -> Int -> Int + foo t n = t `seq` bar n + where + bar 0 = 0 + bar n = bar (n - case t of TI i -> i) +Here we'd like to avoid repeated evaluating t inside the loop, by +taking advantage of the `seq`. + +At one point I did transformation in LiberateCase, but it's more +robust here. (Otherwise, there's a danger that we'll simply drop the +'seq' altogether, before LiberateCase gets to see it.) +-} + +simplAlts :: SimplEnv + -> OutExpr -- Scrutinee + -> InId -- Case binder + -> [InAlt] -- Non-empty + -> SimplCont + -> SimplM OutExpr -- Returns the complete simplified case expression + +simplAlts env0 scrut case_bndr alts cont' + = do { traceSmpl "simplAlts" (vcat [ ppr case_bndr + , text "cont':" <+> ppr cont' + , text "in_scope" <+> ppr (seInScope env0) ]) + ; (env1, case_bndr1) <- simplBinder env0 case_bndr + ; let case_bndr2 = case_bndr1 `setIdUnfolding` evaldUnfolding + env2 = modifyInScope env1 case_bndr2 + -- See Note [Case binder evaluated-ness] + + ; fam_envs <- getFamEnvs + ; (alt_env', scrut', case_bndr') <- improveSeq fam_envs env2 scrut + case_bndr case_bndr2 alts + + ; (imposs_deflt_cons, in_alts) <- prepareAlts scrut' case_bndr' alts + -- NB: it's possible that the returned in_alts is empty: this is handled + -- by the caller (rebuildCase) in the missingAlt function + + ; alts' <- mapM (simplAlt alt_env' (Just scrut') imposs_deflt_cons case_bndr' cont') in_alts + ; -- pprTrace "simplAlts" (ppr case_bndr $$ ppr alts_ty $$ ppr alts_ty' $$ ppr alts $$ ppr cont') $ + + ; let alts_ty' = contResultType cont' + -- See Note [Avoiding space leaks in OutType] + ; seqType alts_ty' `seq` + mkCase (seDynFlags env0) scrut' case_bndr' alts_ty' alts' } + + +------------------------------------ +improveSeq :: (FamInstEnv, FamInstEnv) -> SimplEnv + -> OutExpr -> InId -> OutId -> [InAlt] + -> SimplM (SimplEnv, OutExpr, OutId) +-- Note [Improving seq] +improveSeq fam_envs env scrut case_bndr case_bndr1 [(DEFAULT,_,_)] + | Just (co, ty2) <- topNormaliseType_maybe fam_envs (idType case_bndr1) + = do { case_bndr2 <- newId (fsLit "nt") ty2 + ; let rhs = DoneEx (Var case_bndr2 `Cast` mkSymCo co) Nothing + env2 = extendIdSubst env case_bndr rhs + ; return (env2, scrut `Cast` co, case_bndr2) } + +improveSeq _ env scrut _ case_bndr1 _ + = return (env, scrut, case_bndr1) + + +------------------------------------ +simplAlt :: SimplEnv + -> Maybe OutExpr -- The scrutinee + -> [AltCon] -- These constructors can't be present when + -- matching the DEFAULT alternative + -> OutId -- The case binder + -> SimplCont + -> InAlt + -> SimplM OutAlt + +simplAlt env _ imposs_deflt_cons case_bndr' cont' (DEFAULT, bndrs, rhs) + = ASSERT( null bndrs ) + do { let env' = addBinderUnfolding env case_bndr' + (mkOtherCon imposs_deflt_cons) + -- Record the constructors that the case-binder *can't* be. + ; rhs' <- simplExprC env' rhs cont' + ; return (DEFAULT, [], rhs') } + +simplAlt env scrut' _ case_bndr' cont' (LitAlt lit, bndrs, rhs) + = ASSERT( null bndrs ) + do { env' <- addAltUnfoldings env scrut' case_bndr' (Lit lit) + ; rhs' <- simplExprC env' rhs cont' + ; return (LitAlt lit, [], rhs') } + +simplAlt env scrut' _ case_bndr' cont' (DataAlt con, vs, rhs) + = do { -- See Note [Adding evaluatedness info to pattern-bound variables] + let vs_with_evals = addEvals scrut' con vs + ; (env', vs') <- simplLamBndrs env vs_with_evals + + -- Bind the case-binder to (con args) + ; let inst_tys' = tyConAppArgs (idType case_bndr') + con_app :: OutExpr + con_app = mkConApp2 con inst_tys' vs' + + ; env'' <- addAltUnfoldings env' scrut' case_bndr' con_app + ; rhs' <- simplExprC env'' rhs cont' + ; return (DataAlt con, vs', rhs') } + +{- Note [Adding evaluatedness info to pattern-bound variables] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +addEvals records the evaluated-ness of the bound variables of +a case pattern. This is *important*. Consider + + data T = T !Int !Int + + case x of { T a b -> T (a+1) b } + +We really must record that b is already evaluated so that we don't +go and re-evaluate it when constructing the result. +See Note [Data-con worker strictness] in MkId.hs + +NB: simplLamBndrs preserves this eval info + +In addition to handling data constructor fields with !s, addEvals +also records the fact that the result of seq# is always in WHNF. +See Note [seq# magic] in GHC.Core.Op.ConstantFold. Example (#15226): + + case seq# v s of + (# s', v' #) -> E + +we want the compiler to be aware that v' is in WHNF in E. + +Open problem: we don't record that v itself is in WHNF (and we can't +do it here). The right thing is to do some kind of binder-swap; +see #15226 for discussion. +-} + +addEvals :: Maybe OutExpr -> DataCon -> [Id] -> [Id] +-- See Note [Adding evaluatedness info to pattern-bound variables] +addEvals scrut con vs + -- Deal with seq# applications + | Just scr <- scrut + , isUnboxedTupleCon con + , [s,x] <- vs + -- Use stripNArgs rather than collectArgsTicks to avoid building + -- a list of arguments only to throw it away immediately. + , Just (Var f) <- stripNArgs 4 scr + , Just SeqOp <- isPrimOpId_maybe f + , let x' = zapIdOccInfoAndSetEvald MarkedStrict x + = [s, x'] + + -- Deal with banged datacon fields +addEvals _scrut con vs = go vs the_strs + where + the_strs = dataConRepStrictness con + + go [] [] = [] + go (v:vs') strs | isTyVar v = v : go vs' strs + go (v:vs') (str:strs) = zapIdOccInfoAndSetEvald str v : go vs' strs + go _ _ = pprPanic "Simplify.addEvals" + (ppr con $$ + ppr vs $$ + ppr_with_length (map strdisp the_strs) $$ + ppr_with_length (dataConRepArgTys con) $$ + ppr_with_length (dataConRepStrictness con)) + where + ppr_with_length list + = ppr list <+> parens (text "length =" <+> ppr (length list)) + strdisp MarkedStrict = text "MarkedStrict" + strdisp NotMarkedStrict = text "NotMarkedStrict" + +zapIdOccInfoAndSetEvald :: StrictnessMark -> Id -> Id +zapIdOccInfoAndSetEvald str v = + setCaseBndrEvald str $ -- Add eval'dness info + zapIdOccInfo v -- And kill occ info; + -- see Note [Case alternative occ info] + +addAltUnfoldings :: SimplEnv -> Maybe OutExpr -> OutId -> OutExpr -> SimplM SimplEnv +addAltUnfoldings env scrut case_bndr con_app + = do { let con_app_unf = mk_simple_unf con_app + env1 = addBinderUnfolding env case_bndr con_app_unf + + -- See Note [Add unfolding for scrutinee] + env2 = case scrut of + Just (Var v) -> addBinderUnfolding env1 v con_app_unf + Just (Cast (Var v) co) -> addBinderUnfolding env1 v $ + mk_simple_unf (Cast con_app (mkSymCo co)) + _ -> env1 + + ; traceSmpl "addAltUnf" (vcat [ppr case_bndr <+> ppr scrut, ppr con_app]) + ; return env2 } + where + mk_simple_unf = mkSimpleUnfolding (seDynFlags env) + +addBinderUnfolding :: SimplEnv -> Id -> Unfolding -> SimplEnv +addBinderUnfolding env bndr unf + | debugIsOn, Just tmpl <- maybeUnfoldingTemplate unf + = WARN( not (eqType (idType bndr) (exprType tmpl)), + ppr bndr $$ ppr (idType bndr) $$ ppr tmpl $$ ppr (exprType tmpl) ) + modifyInScope env (bndr `setIdUnfolding` unf) + + | otherwise + = modifyInScope env (bndr `setIdUnfolding` unf) + +zapBndrOccInfo :: Bool -> Id -> Id +-- Consider case e of b { (a,b) -> ... } +-- Then if we bind b to (a,b) in "...", and b is not dead, +-- then we must zap the deadness info on a,b +zapBndrOccInfo keep_occ_info pat_id + | keep_occ_info = pat_id + | otherwise = zapIdOccInfo pat_id + +{- Note [Case binder evaluated-ness] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We pin on a (OtherCon []) unfolding to the case-binder of a Case, +even though it'll be over-ridden in every case alternative with a more +informative unfolding. Why? Because suppose a later, less clever, pass +simply replaces all occurrences of the case binder with the binder itself; +then Lint may complain about the let/app invariant. Example + case e of b { DEFAULT -> let v = reallyUnsafePtrEq# b y in .... + ; K -> blah } + +The let/app invariant requires that y is evaluated in the call to +reallyUnsafePtrEq#, which it is. But we still want that to be true if we +propagate binders to occurrences. + +This showed up in #13027. + +Note [Add unfolding for scrutinee] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In general it's unlikely that a variable scrutinee will appear +in the case alternatives case x of { ...x unlikely to appear... } +because the binder-swap in OccAnal has got rid of all such occurrences +See Note [Binder swap] in OccAnal. + +BUT it is still VERY IMPORTANT to add a suitable unfolding for a +variable scrutinee, in simplAlt. Here's why + case x of y + (a,b) -> case b of c + I# v -> ...(f y)... +There is no occurrence of 'b' in the (...(f y)...). But y gets +the unfolding (a,b), and *that* mentions b. If f has a RULE + RULE f (p, I# q) = ... +we want that rule to match, so we must extend the in-scope env with a +suitable unfolding for 'y'. It's *essential* for rule matching; but +it's also good for case-elimintation -- suppose that 'f' was inlined +and did multi-level case analysis, then we'd solve it in one +simplifier sweep instead of two. + +Exactly the same issue arises in GHC.Core.Op.SpecConstr; +see Note [Add scrutinee to ValueEnv too] in GHC.Core.Op.SpecConstr + +HOWEVER, given + case x of y { Just a -> r1; Nothing -> r2 } +we do not want to add the unfolding x -> y to 'x', which might seem cool, +since 'y' itself has different unfoldings in r1 and r2. Reason: if we +did that, we'd have to zap y's deadness info and that is a very useful +piece of information. + +So instead we add the unfolding x -> Just a, and x -> Nothing in the +respective RHSs. + + +************************************************************************ +* * +\subsection{Known constructor} +* * +************************************************************************ + +We are a bit careful with occurrence info. Here's an example + + (\x* -> case x of (a*, b) -> f a) (h v, e) + +where the * means "occurs once". This effectively becomes + case (h v, e) of (a*, b) -> f a) +and then + let a* = h v; b = e in f a +and then + f (h v) + +All this should happen in one sweep. +-} + +knownCon :: SimplEnv + -> OutExpr -- The scrutinee + -> [FloatBind] -> DataCon -> [OutType] -> [OutExpr] -- The scrutinee (in pieces) + -> InId -> [InBndr] -> InExpr -- The alternative + -> SimplCont + -> SimplM (SimplFloats, OutExpr) + +knownCon env scrut dc_floats dc dc_ty_args dc_args bndr bs rhs cont + = do { (floats1, env1) <- bind_args env bs dc_args + ; (floats2, env2) <- bind_case_bndr env1 + ; (floats3, expr') <- simplExprF env2 rhs cont + ; case dc_floats of + [] -> + return (floats1 `addFloats` floats2 `addFloats` floats3, expr') + _ -> + return ( emptyFloats env + -- See Note [FloatBinds from constructor wrappers] + , GHC.Core.Make.wrapFloats dc_floats $ + wrapFloats (floats1 `addFloats` floats2 `addFloats` floats3) expr') } + where + zap_occ = zapBndrOccInfo (isDeadBinder bndr) -- bndr is an InId + + -- Ugh! + bind_args env' [] _ = return (emptyFloats env', env') + + bind_args env' (b:bs') (Type ty : args) + = ASSERT( isTyVar b ) + bind_args (extendTvSubst env' b ty) bs' args + + bind_args env' (b:bs') (Coercion co : args) + = ASSERT( isCoVar b ) + bind_args (extendCvSubst env' b co) bs' args + + bind_args env' (b:bs') (arg : args) + = ASSERT( isId b ) + do { let b' = zap_occ b + -- Note that the binder might be "dead", because it doesn't + -- occur in the RHS; and simplNonRecX may therefore discard + -- it via postInlineUnconditionally. + -- Nevertheless we must keep it if the case-binder is alive, + -- because it may be used in the con_app. See Note [knownCon occ info] + ; (floats1, env2) <- simplNonRecX env' b' arg -- arg satisfies let/app invariant + ; (floats2, env3) <- bind_args env2 bs' args + ; return (floats1 `addFloats` floats2, env3) } + + bind_args _ _ _ = + pprPanic "bind_args" $ ppr dc $$ ppr bs $$ ppr dc_args $$ + text "scrut:" <+> ppr scrut + + -- It's useful to bind bndr to scrut, rather than to a fresh + -- binding x = Con arg1 .. argn + -- because very often the scrut is a variable, so we avoid + -- creating, and then subsequently eliminating, a let-binding + -- BUT, if scrut is a not a variable, we must be careful + -- about duplicating the arg redexes; in that case, make + -- a new con-app from the args + bind_case_bndr env + | isDeadBinder bndr = return (emptyFloats env, env) + | exprIsTrivial scrut = return (emptyFloats env + , extendIdSubst env bndr (DoneEx scrut Nothing)) + | otherwise = do { dc_args <- mapM (simplVar env) bs + -- dc_ty_args are already OutTypes, + -- but bs are InBndrs + ; let con_app = Var (dataConWorkId dc) + `mkTyApps` dc_ty_args + `mkApps` dc_args + ; simplNonRecX env bndr con_app } + +------------------- +missingAlt :: SimplEnv -> Id -> [InAlt] -> SimplCont + -> SimplM (SimplFloats, OutExpr) + -- This isn't strictly an error, although it is unusual. + -- It's possible that the simplifier might "see" that + -- an inner case has no accessible alternatives before + -- it "sees" that the entire branch of an outer case is + -- inaccessible. So we simply put an error case here instead. +missingAlt env case_bndr _ cont + = WARN( True, text "missingAlt" <+> ppr case_bndr ) + -- See Note [Avoiding space leaks in OutType] + let cont_ty = contResultType cont + in seqType cont_ty `seq` + return (emptyFloats env, mkImpossibleExpr cont_ty) + +{- +************************************************************************ +* * +\subsection{Duplicating continuations} +* * +************************************************************************ + +Consider + let x* = case e of { True -> e1; False -> e2 } + in b +where x* is a strict binding. Then mkDupableCont will be given +the continuation + case [] of { True -> e1; False -> e2 } ; let x* = [] in b ; stop +and will split it into + dupable: case [] of { True -> $j1; False -> $j2 } ; stop + join floats: $j1 = e1, $j2 = e2 + non_dupable: let x* = [] in b; stop + +Putting this back together would give + let x* = let { $j1 = e1; $j2 = e2 } in + case e of { True -> $j1; False -> $j2 } + in b +(Of course we only do this if 'e' wants to duplicate that continuation.) +Note how important it is that the new join points wrap around the +inner expression, and not around the whole thing. + +In contrast, any let-bindings introduced by mkDupableCont can wrap +around the entire thing. + +Note [Bottom alternatives] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we have + case (case x of { A -> error .. ; B -> e; C -> error ..) + of alts +then we can just duplicate those alts because the A and C cases +will disappear immediately. This is more direct than creating +join points and inlining them away. See #4930. +-} + +-------------------- +mkDupableCaseCont :: SimplEnv -> [InAlt] -> SimplCont + -> SimplM (SimplFloats, SimplCont) +mkDupableCaseCont env alts cont + | altsWouldDup alts = mkDupableCont env cont + | otherwise = return (emptyFloats env, cont) + +altsWouldDup :: [InAlt] -> Bool -- True iff strictly > 1 non-bottom alternative +altsWouldDup [] = False -- See Note [Bottom alternatives] +altsWouldDup [_] = False +altsWouldDup (alt:alts) + | is_bot_alt alt = altsWouldDup alts + | otherwise = not (all is_bot_alt alts) + where + is_bot_alt (_,_,rhs) = exprIsBottom rhs + +------------------------- +mkDupableCont :: SimplEnv -> SimplCont + -> SimplM ( SimplFloats -- Incoming SimplEnv augmented with + -- extra let/join-floats and in-scope variables + , SimplCont) -- dup_cont: duplicable continuation + +mkDupableCont env cont + | contIsDupable cont + = return (emptyFloats env, cont) + +mkDupableCont _ (Stop {}) = panic "mkDupableCont" -- Handled by previous eqn + +mkDupableCont env (CastIt ty cont) + = do { (floats, cont') <- mkDupableCont env cont + ; return (floats, CastIt ty cont') } + +-- Duplicating ticks for now, not sure if this is good or not +mkDupableCont env (TickIt t cont) + = do { (floats, cont') <- mkDupableCont env cont + ; return (floats, TickIt t cont') } + +mkDupableCont env (StrictBind { sc_bndr = bndr, sc_bndrs = bndrs + , sc_body = body, sc_env = se, sc_cont = cont}) + -- See Note [Duplicating StrictBind] + = do { let sb_env = se `setInScopeFromE` env + ; (sb_env1, bndr') <- simplBinder sb_env bndr + ; (floats1, join_inner) <- simplLam sb_env1 bndrs body cont + -- No need to use mkDupableCont before simplLam; we + -- use cont once here, and then share the result if necessary + + ; let join_body = wrapFloats floats1 join_inner + res_ty = contResultType cont + + ; (floats2, body2) + <- if exprIsDupable (seDynFlags env) join_body + then return (emptyFloats env, join_body) + else do { join_bndr <- newJoinId [bndr'] res_ty + ; let join_call = App (Var join_bndr) (Var bndr') + join_rhs = Lam (setOneShotLambda bndr') join_body + join_bind = NonRec join_bndr join_rhs + floats = emptyFloats env `extendFloats` join_bind + ; return (floats, join_call) } + ; return ( floats2 + , StrictBind { sc_bndr = bndr', sc_bndrs = [] + , sc_body = body2 + , sc_env = zapSubstEnv se `setInScopeFromF` floats2 + -- See Note [StaticEnv invariant] in GHC.Core.Op.Simplify.Utils + , sc_dup = OkToDup + , sc_cont = mkBoringStop res_ty } ) } + +mkDupableCont env (StrictArg { sc_fun = info, sc_cci = cci, sc_cont = cont }) + -- See Note [Duplicating StrictArg] + -- NB: sc_dup /= OkToDup; that is caught earlier by contIsDupable + = do { (floats1, cont') <- mkDupableCont env cont + ; (floats_s, args') <- mapAndUnzipM (makeTrivialArg (getMode env)) + (ai_args info) + ; return ( foldl' addLetFloats floats1 floats_s + , StrictArg { sc_fun = info { ai_args = args' } + , sc_cci = cci + , sc_cont = cont' + , sc_dup = OkToDup} ) } + +mkDupableCont env (ApplyToTy { sc_cont = cont + , sc_arg_ty = arg_ty, sc_hole_ty = hole_ty }) + = do { (floats, cont') <- mkDupableCont env cont + ; return (floats, ApplyToTy { sc_cont = cont' + , sc_arg_ty = arg_ty, sc_hole_ty = hole_ty }) } + +mkDupableCont env (ApplyToVal { sc_arg = arg, sc_dup = dup + , sc_env = se, sc_cont = cont }) + = -- e.g. [...hole...] (...arg...) + -- ==> + -- let a = ...arg... + -- in [...hole...] a + -- NB: sc_dup /= OkToDup; that is caught earlier by contIsDupable + do { (floats1, cont') <- mkDupableCont env cont + ; let env' = env `setInScopeFromF` floats1 + ; (_, se', arg') <- simplArg env' dup se arg + ; (let_floats2, arg'') <- makeTrivial (getMode env) NotTopLevel (fsLit "karg") arg' + ; let all_floats = floats1 `addLetFloats` let_floats2 + ; return ( all_floats + , ApplyToVal { sc_arg = arg'' + , sc_env = se' `setInScopeFromF` all_floats + -- Ensure that sc_env includes the free vars of + -- arg'' in its in-scope set, even if makeTrivial + -- has turned arg'' into a fresh variable + -- See Note [StaticEnv invariant] in GHC.Core.Op.Simplify.Utils + , sc_dup = OkToDup, sc_cont = cont' }) } + +mkDupableCont env (Select { sc_bndr = case_bndr, sc_alts = alts + , sc_env = se, sc_cont = cont }) + = -- e.g. (case [...hole...] of { pi -> ei }) + -- ===> + -- let ji = \xij -> ei + -- in case [...hole...] of { pi -> ji xij } + -- NB: sc_dup /= OkToDup; that is caught earlier by contIsDupable + do { tick (CaseOfCase case_bndr) + ; (floats, alt_cont) <- mkDupableCaseCont env alts cont + -- NB: We call mkDupableCaseCont here to make cont duplicable + -- (if necessary, depending on the number of alts) + -- And this is important: see Note [Fusing case continuations] + + ; let alt_env = se `setInScopeFromF` floats + ; (alt_env', case_bndr') <- simplBinder alt_env case_bndr + ; alts' <- mapM (simplAlt alt_env' Nothing [] case_bndr' alt_cont) alts + -- Safe to say that there are no handled-cons for the DEFAULT case + -- NB: simplBinder does not zap deadness occ-info, so + -- a dead case_bndr' will still advertise its deadness + -- This is really important because in + -- case e of b { (# p,q #) -> ... } + -- b is always dead, and indeed we are not allowed to bind b to (# p,q #), + -- which might happen if e was an explicit unboxed pair and b wasn't marked dead. + -- In the new alts we build, we have the new case binder, so it must retain + -- its deadness. + -- NB: we don't use alt_env further; it has the substEnv for + -- the alternatives, and we don't want that + + ; (join_floats, alts'') <- mapAccumLM (mkDupableAlt (seDynFlags env) case_bndr') + emptyJoinFloats alts' + + ; let all_floats = floats `addJoinFloats` join_floats + -- Note [Duplicated env] + ; return (all_floats + , Select { sc_dup = OkToDup + , sc_bndr = case_bndr' + , sc_alts = alts'' + , sc_env = zapSubstEnv se `setInScopeFromF` all_floats + -- See Note [StaticEnv invariant] in GHC.Core.Op.Simplify.Utils + , sc_cont = mkBoringStop (contResultType cont) } ) } + +mkDupableAlt :: DynFlags -> OutId + -> JoinFloats -> OutAlt + -> SimplM (JoinFloats, OutAlt) +mkDupableAlt dflags case_bndr jfloats (con, bndrs', rhs') + | exprIsDupable dflags rhs' -- Note [Small alternative rhs] + = return (jfloats, (con, bndrs', rhs')) + + | otherwise + = do { let rhs_ty' = exprType rhs' + scrut_ty = idType case_bndr + case_bndr_w_unf + = case con of + DEFAULT -> case_bndr + DataAlt dc -> setIdUnfolding case_bndr unf + where + -- See Note [Case binders and join points] + unf = mkInlineUnfolding rhs + rhs = mkConApp2 dc (tyConAppArgs scrut_ty) bndrs' + + LitAlt {} -> WARN( True, text "mkDupableAlt" + <+> ppr case_bndr <+> ppr con ) + case_bndr + -- The case binder is alive but trivial, so why has + -- it not been substituted away? + + final_bndrs' + | isDeadBinder case_bndr = filter abstract_over bndrs' + | otherwise = bndrs' ++ [case_bndr_w_unf] + + abstract_over bndr + | isTyVar bndr = True -- Abstract over all type variables just in case + | otherwise = not (isDeadBinder bndr) + -- The deadness info on the new Ids is preserved by simplBinders + final_args = varsToCoreExprs final_bndrs' + -- Note [Join point abstraction] + + -- We make the lambdas into one-shot-lambdas. The + -- join point is sure to be applied at most once, and doing so + -- prevents the body of the join point being floated out by + -- the full laziness pass + really_final_bndrs = map one_shot final_bndrs' + one_shot v | isId v = setOneShotLambda v + | otherwise = v + join_rhs = mkLams really_final_bndrs rhs' + + ; join_bndr <- newJoinId final_bndrs' rhs_ty' + + ; let join_call = mkApps (Var join_bndr) final_args + alt' = (con, bndrs', join_call) + + ; return ( jfloats `addJoinFlts` unitJoinFloat (NonRec join_bndr join_rhs) + , alt') } + -- See Note [Duplicated env] + +{- +Note [Fusing case continuations] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's important to fuse two successive case continuations when the +first has one alternative. That's why we call prepareCaseCont here. +Consider this, which arises from thunk splitting (see Note [Thunk +splitting] in GHC.Core.Op.WorkWrap): + + let + x* = case (case v of {pn -> rn}) of + I# a -> I# a + in body + +The simplifier will find + (Var v) with continuation + Select (pn -> rn) ( + Select [I# a -> I# a] ( + StrictBind body Stop + +So we'll call mkDupableCont on + Select [I# a -> I# a] (StrictBind body Stop) +There is just one alternative in the first Select, so we want to +simplify the rhs (I# a) with continuation (StrictBind body Stop) +Supposing that body is big, we end up with + let $j a = <let x = I# a in body> + in case v of { pn -> case rn of + I# a -> $j a } +This is just what we want because the rn produces a box that +the case rn cancels with. + +See #4957 a fuller example. + +Note [Case binders and join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this + case (case .. ) of c { + I# c# -> ....c.... + +If we make a join point with c but not c# we get + $j = \c -> ....c.... + +But if later inlining scrutinises the c, thus + + $j = \c -> ... case c of { I# y -> ... } ... + +we won't see that 'c' has already been scrutinised. This actually +happens in the 'tabulate' function in wave4main, and makes a significant +difference to allocation. + +An alternative plan is this: + + $j = \c# -> let c = I# c# in ...c.... + +but that is bad if 'c' is *not* later scrutinised. + +So instead we do both: we pass 'c' and 'c#' , and record in c's inlining +(a stable unfolding) that it's really I# c#, thus + + $j = \c# -> \c[=I# c#] -> ...c.... + +Absence analysis may later discard 'c'. + +NB: take great care when doing strictness analysis; + see Note [Lambda-bound unfoldings] in GHC.Core.Op.DmdAnal. + +Also note that we can still end up passing stuff that isn't used. Before +strictness analysis we have + let $j x y c{=(x,y)} = (h c, ...) + in ... +After strictness analysis we see that h is strict, we end up with + let $j x y c{=(x,y)} = ($wh x y, ...) +and c is unused. + +Note [Duplicated env] +~~~~~~~~~~~~~~~~~~~~~ +Some of the alternatives are simplified, but have not been turned into a join point +So they *must* have a zapped subst-env. So we can't use completeNonRecX to +bind the join point, because it might to do PostInlineUnconditionally, and +we'd lose that when zapping the subst-env. We could have a per-alt subst-env, +but zapping it (as we do in mkDupableCont, the Select case) is safe, and +at worst delays the join-point inlining. + +Note [Small alternative rhs] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It is worth checking for a small RHS because otherwise we +get extra let bindings that may cause an extra iteration of the simplifier to +inline back in place. Quite often the rhs is just a variable or constructor. +The Ord instance of Maybe in PrelMaybe.hs, for example, took several extra +iterations because the version with the let bindings looked big, and so wasn't +inlined, but after the join points had been inlined it looked smaller, and so +was inlined. + +NB: we have to check the size of rhs', not rhs. +Duplicating a small InAlt might invalidate occurrence information +However, if it *is* dupable, we return the *un* simplified alternative, +because otherwise we'd need to pair it up with an empty subst-env.... +but we only have one env shared between all the alts. +(Remember we must zap the subst-env before re-simplifying something). +Rather than do this we simply agree to re-simplify the original (small) thing later. + +Note [Funky mkLamTypes] +~~~~~~~~~~~~~~~~~~~~~~ +Notice the funky mkLamTypes. If the constructor has existentials +it's possible that the join point will be abstracted over +type variables as well as term variables. + Example: Suppose we have + data T = forall t. C [t] + Then faced with + case (case e of ...) of + C t xs::[t] -> rhs + We get the join point + let j :: forall t. [t] -> ... + j = /\t \xs::[t] -> rhs + in + case (case e of ...) of + C t xs::[t] -> j t xs + +Note [Duplicating StrictArg] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We make a StrictArg duplicable simply by making all its +stored-up arguments (in sc_fun) trivial, by let-binding +them. Thus: + f E [..hole..] + ==> let a = E + in f a [..hole..] +Now if the thing in the hole is a case expression (which is when +we'll call mkDupableCont), we'll push the function call into the +branches, which is what we want. Now RULES for f may fire, and +call-pattern specialisation. Here's an example from #3116 + go (n+1) (case l of + 1 -> bs' + _ -> Chunk p fpc (o+1) (l-1) bs') +If we can push the call for 'go' inside the case, we get +call-pattern specialisation for 'go', which is *crucial* for +this program. + +Here is the (&&) example: + && E (case x of { T -> F; F -> T }) + ==> let a = E in + case x of { T -> && a F; F -> && a T } +Much better! + +Notice that + * Arguments to f *after* the strict one are handled by + the ApplyToVal case of mkDupableCont. Eg + f [..hole..] E + + * We can only do the let-binding of E because the function + part of a StrictArg continuation is an explicit syntax + tree. In earlier versions we represented it as a function + (CoreExpr -> CoreEpxr) which we couldn't take apart. + +Historical aide: previously we did this (where E is a +big argument: + f E [..hole..] + ==> let $j = \a -> f E a + in $j [..hole..] + +But this is terrible! Here's an example: + && E (case x of { T -> F; F -> T }) +Now, && is strict so we end up simplifying the case with +an ArgOf continuation. If we let-bind it, we get + let $j = \v -> && E v + in simplExpr (case x of { T -> F; F -> T }) + (ArgOf (\r -> $j r) +And after simplifying more we get + let $j = \v -> && E v + in case x of { T -> $j F; F -> $j T } +Which is a Very Bad Thing + + +Note [Duplicating StrictBind] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We make a StrictBind duplicable in a very similar way to +that for case expressions. After all, + let x* = e in b is similar to case e of x -> b + +So we potentially make a join-point for the body, thus: + let x = [] in b ==> join j x = b + in let x = [] in j x + + +Note [Join point abstraction] Historical note +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +NB: This note is now historical, describing how (in the past) we used +to add a void argument to nullary join points. But now that "join +point" is not a fuzzy concept but a formal syntactic construct (as +distinguished by the JoinId constructor of IdDetails), each of these +concerns is handled separately, with no need for a vestigial extra +argument. + +Join points always have at least one value argument, +for several reasons + +* If we try to lift a primitive-typed something out + for let-binding-purposes, we will *caseify* it (!), + with potentially-disastrous strictness results. So + instead we turn it into a function: \v -> e + where v::Void#. The value passed to this function is void, + which generates (almost) no code. + +* CPR. We used to say "&& isUnliftedType rhs_ty'" here, but now + we make the join point into a function whenever used_bndrs' + is empty. This makes the join-point more CPR friendly. + Consider: let j = if .. then I# 3 else I# 4 + in case .. of { A -> j; B -> j; C -> ... } + + Now CPR doesn't w/w j because it's a thunk, so + that means that the enclosing function can't w/w either, + which is a lose. Here's the example that happened in practice: + kgmod :: Int -> Int -> Int + kgmod x y = if x > 0 && y < 0 || x < 0 && y > 0 + then 78 + else 5 + +* Let-no-escape. We want a join point to turn into a let-no-escape + so that it is implemented as a jump, and one of the conditions + for LNE is that it's not updatable. In CoreToStg, see + Note [What is a non-escaping let] + +* Floating. Since a join point will be entered once, no sharing is + gained by floating out, but something might be lost by doing + so because it might be allocated. + +I have seen a case alternative like this: + True -> \v -> ... +It's a bit silly to add the realWorld dummy arg in this case, making + $j = \s v -> ... + True -> $j s +(the \v alone is enough to make CPR happy) but I think it's rare + +There's a slight infelicity here: we pass the overall +case_bndr to all the join points if it's used in *any* RHS, +because we don't know its usage in each RHS separately + + + +************************************************************************ +* * + Unfoldings +* * +************************************************************************ +-} + +simplLetUnfolding :: SimplEnv-> TopLevelFlag + -> MaybeJoinCont + -> InId + -> OutExpr -> OutType + -> Unfolding -> SimplM Unfolding +simplLetUnfolding env top_lvl cont_mb id new_rhs rhs_ty unf + | isStableUnfolding unf + = simplStableUnfolding env top_lvl cont_mb id unf rhs_ty + | isExitJoinId id + = return noUnfolding -- See Note [Do not inline exit join points] in GHC.Core.Op.Exitify + | otherwise + = mkLetUnfolding (seDynFlags env) top_lvl InlineRhs id new_rhs + +------------------- +mkLetUnfolding :: DynFlags -> TopLevelFlag -> UnfoldingSource + -> InId -> OutExpr -> SimplM Unfolding +mkLetUnfolding dflags top_lvl src id new_rhs + = is_bottoming `seq` -- See Note [Force bottoming field] + return (mkUnfolding dflags src is_top_lvl is_bottoming new_rhs) + -- We make an unfolding *even for loop-breakers*. + -- Reason: (a) It might be useful to know that they are WHNF + -- (b) In GHC.Iface.Tidy we currently assume that, if we want to + -- expose the unfolding then indeed we *have* an unfolding + -- to expose. (We could instead use the RHS, but currently + -- we don't.) The simple thing is always to have one. + where + is_top_lvl = isTopLevel top_lvl + is_bottoming = isBottomingId id + +------------------- +simplStableUnfolding :: SimplEnv -> TopLevelFlag + -> MaybeJoinCont -- Just k => a join point with continuation k + -> InId + -> Unfolding -> OutType -> SimplM Unfolding +-- Note [Setting the new unfolding] +simplStableUnfolding env top_lvl mb_cont id unf rhs_ty + = case unf of + NoUnfolding -> return unf + BootUnfolding -> return unf + OtherCon {} -> return unf + + DFunUnfolding { df_bndrs = bndrs, df_con = con, df_args = args } + -> do { (env', bndrs') <- simplBinders unf_env bndrs + ; args' <- mapM (simplExpr env') args + ; return (mkDFunUnfolding bndrs' con args') } + + CoreUnfolding { uf_tmpl = expr, uf_src = src, uf_guidance = guide } + | isStableSource src + -> do { expr' <- case mb_cont of -- See Note [Rules and unfolding for join points] + Just cont -> simplJoinRhs unf_env id expr cont + Nothing -> simplExprC unf_env expr (mkBoringStop rhs_ty) + ; case guide of + UnfWhen { ug_arity = arity + , ug_unsat_ok = sat_ok + , ug_boring_ok = boring_ok + } + -- Happens for INLINE things + -> let guide' = + UnfWhen { ug_arity = arity + , ug_unsat_ok = sat_ok + , ug_boring_ok = + boring_ok || inlineBoringOk expr' + } + -- Refresh the boring-ok flag, in case expr' + -- has got small. This happens, notably in the inlinings + -- for dfuns for single-method classes; see + -- Note [Single-method classes] in TcInstDcls. + -- A test case is #4138 + -- But retain a previous boring_ok of True; e.g. see + -- the way it is set in calcUnfoldingGuidanceWithArity + in return (mkCoreUnfolding src is_top_lvl expr' guide') + -- See Note [Top-level flag on inline rules] in GHC.Core.Unfold + + _other -- Happens for INLINABLE things + -> mkLetUnfolding dflags top_lvl src id expr' } + -- If the guidance is UnfIfGoodArgs, this is an INLINABLE + -- unfolding, and we need to make sure the guidance is kept up + -- to date with respect to any changes in the unfolding. + + | otherwise -> return noUnfolding -- Discard unstable unfoldings + where + dflags = seDynFlags env + is_top_lvl = isTopLevel top_lvl + act = idInlineActivation id + unf_env = updMode (updModeForStableUnfoldings act) env + -- See Note [Simplifying inside stable unfoldings] in GHC.Core.Op.Simplify.Utils + +{- +Note [Force bottoming field] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We need to force bottoming, or the new unfolding holds +on to the old unfolding (which is part of the id). + +Note [Setting the new unfolding] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* If there's an INLINE pragma, we simplify the RHS gently. Maybe we + should do nothing at all, but simplifying gently might get rid of + more crap. + +* If not, we make an unfolding from the new RHS. But *only* for + non-loop-breakers. Making loop breakers not have an unfolding at all + means that we can avoid tests in exprIsConApp, for example. This is + important: if exprIsConApp says 'yes' for a recursive thing, then we + can get into an infinite loop + +If there's a stable unfolding on a loop breaker (which happens for +INLINABLE), we hang on to the inlining. It's pretty dodgy, but the +user did say 'INLINE'. May need to revisit this choice. + +************************************************************************ +* * + Rules +* * +************************************************************************ + +Note [Rules in a letrec] +~~~~~~~~~~~~~~~~~~~~~~~~ +After creating fresh binders for the binders of a letrec, we +substitute the RULES and add them back onto the binders; this is done +*before* processing any of the RHSs. This is important. Manuel found +cases where he really, really wanted a RULE for a recursive function +to apply in that function's own right-hand side. + +See Note [Forming Rec groups] in OccurAnal +-} + +addBndrRules :: SimplEnv -> InBndr -> OutBndr + -> MaybeJoinCont -- Just k for a join point binder + -- Nothing otherwise + -> SimplM (SimplEnv, OutBndr) +-- Rules are added back into the bin +addBndrRules env in_id out_id mb_cont + | null old_rules + = return (env, out_id) + | otherwise + = do { new_rules <- simplRules env (Just out_id) old_rules mb_cont + ; let final_id = out_id `setIdSpecialisation` mkRuleInfo new_rules + ; return (modifyInScope env final_id, final_id) } + where + old_rules = ruleInfoRules (idSpecialisation in_id) + +simplRules :: SimplEnv -> Maybe OutId -> [CoreRule] + -> MaybeJoinCont -> SimplM [CoreRule] +simplRules env mb_new_id rules mb_cont + = mapM simpl_rule rules + where + simpl_rule rule@(BuiltinRule {}) + = return rule + + simpl_rule rule@(Rule { ru_bndrs = bndrs, ru_args = args + , ru_fn = fn_name, ru_rhs = rhs }) + = do { (env', bndrs') <- simplBinders env bndrs + ; let rhs_ty = substTy env' (exprType rhs) + rhs_cont = case mb_cont of -- See Note [Rules and unfolding for join points] + Nothing -> mkBoringStop rhs_ty + Just cont -> ASSERT2( join_ok, bad_join_msg ) + cont + rule_env = updMode updModeForRules env' + fn_name' = case mb_new_id of + Just id -> idName id + Nothing -> fn_name + + -- join_ok is an assertion check that the join-arity of the + -- binder matches that of the rule, so that pushing the + -- continuation into the RHS makes sense + join_ok = case mb_new_id of + Just id | Just join_arity <- isJoinId_maybe id + -> length args == join_arity + _ -> False + bad_join_msg = vcat [ ppr mb_new_id, ppr rule + , ppr (fmap isJoinId_maybe mb_new_id) ] + + ; args' <- mapM (simplExpr rule_env) args + ; rhs' <- simplExprC rule_env rhs rhs_cont + ; return (rule { ru_bndrs = bndrs' + , ru_fn = fn_name' + , ru_args = args' + , ru_rhs = rhs' }) } diff --git a/compiler/GHC/Core/Op/Simplify/Driver.hs b/compiler/GHC/Core/Op/Simplify/Driver.hs new file mode 100644 index 0000000000..b6ec392599 --- /dev/null +++ b/compiler/GHC/Core/Op/Simplify/Driver.hs @@ -0,0 +1,1037 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +\section[SimplCore]{Driver for simplifying @Core@ programs} +-} + +{-# LANGUAGE CPP #-} + +module GHC.Core.Op.Simplify.Driver ( core2core, simplifyExpr ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Driver.Session +import GHC.Core +import GHC.Driver.Types +import GHC.Core.Op.CSE ( cseProgram ) +import GHC.Core.Rules ( mkRuleBase, unionRuleBase, + extendRuleBaseList, ruleCheckProgram, addRuleInfo, + getRules ) +import GHC.Core.Ppr ( pprCoreBindings, pprCoreExpr ) +import GHC.Core.Op.OccurAnal ( occurAnalysePgm, occurAnalyseExpr ) +import IdInfo +import GHC.Core.Stats ( coreBindsSize, coreBindsStats, exprSize ) +import GHC.Core.Utils ( mkTicks, stripTicksTop ) +import GHC.Core.Lint ( endPass, lintPassResult, dumpPassResult, + lintAnnots ) +import GHC.Core.Op.Simplify ( simplTopBinds, simplExpr, simplRules ) +import GHC.Core.Op.Simplify.Utils ( simplEnvForGHCi, activeRule, activeUnfolding ) +import GHC.Core.Op.Simplify.Env +import GHC.Core.Op.Simplify.Monad +import GHC.Core.Op.Monad +import qualified ErrUtils as Err +import GHC.Core.Op.FloatIn ( floatInwards ) +import GHC.Core.Op.FloatOut ( floatOutwards ) +import GHC.Core.FamInstEnv +import Id +import ErrUtils ( withTiming, withTimingD, DumpFormat (..) ) +import BasicTypes ( CompilerPhase(..), isDefaultInlinePragma, defaultInlinePragma ) +import VarSet +import VarEnv +import GHC.Core.Op.LiberateCase ( liberateCase ) +import GHC.Core.Op.StaticArgs ( doStaticArgs ) +import GHC.Core.Op.Specialise ( specProgram) +import GHC.Core.Op.SpecConstr ( specConstrProgram) +import GHC.Core.Op.DmdAnal ( dmdAnalProgram ) +import GHC.Core.Op.CprAnal ( cprAnalProgram ) +import GHC.Core.Op.CallArity ( callArityAnalProgram ) +import GHC.Core.Op.Exitify ( exitifyProgram ) +import GHC.Core.Op.WorkWrap ( wwTopBinds ) +import SrcLoc +import Util +import Module +import GHC.Driver.Plugins ( withPlugins, installCoreToDos ) +import GHC.Runtime.Loader -- ( initializePlugins ) + +import UniqSupply ( UniqSupply, mkSplitUniqSupply, splitUniqSupply ) +import UniqFM +import Outputable +import Control.Monad +import qualified GHC.LanguageExtensions as LangExt +{- +************************************************************************ +* * +\subsection{The driver for the simplifier} +* * +************************************************************************ +-} + +core2core :: HscEnv -> ModGuts -> IO ModGuts +core2core hsc_env guts@(ModGuts { mg_module = mod + , mg_loc = loc + , mg_deps = deps + , mg_rdr_env = rdr_env }) + = do { -- make sure all plugins are loaded + + ; let builtin_passes = getCoreToDo dflags + orph_mods = mkModuleSet (mod : dep_orphs deps) + uniq_mask = 's' + ; + ; (guts2, stats) <- runCoreM hsc_env hpt_rule_base uniq_mask mod + orph_mods print_unqual loc $ + do { hsc_env' <- getHscEnv + ; dflags' <- liftIO $ initializePlugins hsc_env' + (hsc_dflags hsc_env') + ; all_passes <- withPlugins dflags' + installCoreToDos + builtin_passes + ; runCorePasses all_passes guts } + + ; Err.dumpIfSet_dyn dflags Opt_D_dump_simpl_stats + "Grand total simplifier statistics" + FormatText + (pprSimplCount stats) + + ; return guts2 } + where + dflags = hsc_dflags hsc_env + home_pkg_rules = hptRules hsc_env (dep_mods deps) + hpt_rule_base = mkRuleBase home_pkg_rules + print_unqual = mkPrintUnqualified dflags rdr_env + -- mod: get the module out of the current HscEnv so we can retrieve it from the monad. + -- This is very convienent for the users of the monad (e.g. plugins do not have to + -- consume the ModGuts to find the module) but somewhat ugly because mg_module may + -- _theoretically_ be changed during the Core pipeline (it's part of ModGuts), which + -- would mean our cached value would go out of date. + +{- +************************************************************************ +* * + Generating the main optimisation pipeline +* * +************************************************************************ +-} + +getCoreToDo :: DynFlags -> [CoreToDo] +getCoreToDo dflags + = flatten_todos core_todo + where + opt_level = optLevel dflags + phases = simplPhases dflags + max_iter = maxSimplIterations dflags + rule_check = ruleCheck dflags + call_arity = gopt Opt_CallArity dflags + exitification = gopt Opt_Exitification dflags + strictness = gopt Opt_Strictness dflags + full_laziness = gopt Opt_FullLaziness dflags + do_specialise = gopt Opt_Specialise dflags + do_float_in = gopt Opt_FloatIn dflags + cse = gopt Opt_CSE dflags + spec_constr = gopt Opt_SpecConstr dflags + liberate_case = gopt Opt_LiberateCase dflags + late_dmd_anal = gopt Opt_LateDmdAnal dflags + late_specialise = gopt Opt_LateSpecialise dflags + static_args = gopt Opt_StaticArgumentTransformation dflags + rules_on = gopt Opt_EnableRewriteRules dflags + eta_expand_on = gopt Opt_DoLambdaEtaExpansion dflags + ww_on = gopt Opt_WorkerWrapper dflags + static_ptrs = xopt LangExt.StaticPointers dflags + + maybe_rule_check phase = runMaybe rule_check (CoreDoRuleCheck phase) + + maybe_strictness_before phase + = runWhen (phase `elem` strictnessBefore dflags) CoreDoDemand + + base_mode = SimplMode { sm_phase = panic "base_mode" + , sm_names = [] + , sm_dflags = dflags + , sm_rules = rules_on + , sm_eta_expand = eta_expand_on + , sm_inline = True + , sm_case_case = True } + + simpl_phase phase names iter + = CoreDoPasses + $ [ maybe_strictness_before phase + , CoreDoSimplify iter + (base_mode { sm_phase = Phase phase + , sm_names = names }) + + , maybe_rule_check (Phase phase) ] + + simpl_phases = CoreDoPasses [ simpl_phase phase ["main"] max_iter + | phase <- [phases, phases-1 .. 1] ] + + + -- initial simplify: mk specialiser happy: minimum effort please + simpl_gently = CoreDoSimplify max_iter + (base_mode { sm_phase = InitialPhase + , sm_names = ["Gentle"] + , sm_rules = rules_on -- Note [RULEs enabled in InitialPhase] + , sm_inline = True + -- See Note [Inline in InitialPhase] + , sm_case_case = False }) + -- Don't do case-of-case transformations. + -- This makes full laziness work better + + dmd_cpr_ww = if ww_on then [CoreDoDemand,CoreDoCpr,CoreDoWorkerWrapper] + else [CoreDoDemand,CoreDoCpr] + + + demand_analyser = (CoreDoPasses ( + dmd_cpr_ww ++ + [simpl_phase 0 ["post-worker-wrapper"] max_iter] + )) + + -- Static forms are moved to the top level with the FloatOut pass. + -- See Note [Grand plan for static forms] in StaticPtrTable. + static_ptrs_float_outwards = + runWhen static_ptrs $ CoreDoPasses + [ simpl_gently -- Float Out can't handle type lets (sometimes created + -- by simpleOptPgm via mkParallelBindings) + , CoreDoFloatOutwards FloatOutSwitches + { floatOutLambdas = Just 0 + , floatOutConstants = True + , floatOutOverSatApps = False + , floatToTopLevelOnly = True + } + ] + + core_todo = + if opt_level == 0 then + [ static_ptrs_float_outwards, + CoreDoSimplify max_iter + (base_mode { sm_phase = Phase 0 + , sm_names = ["Non-opt simplification"] }) + ] + + else {- opt_level >= 1 -} [ + + -- We want to do the static argument transform before full laziness as it + -- may expose extra opportunities to float things outwards. However, to fix + -- up the output of the transformation we need at do at least one simplify + -- after this before anything else + runWhen static_args (CoreDoPasses [ simpl_gently, CoreDoStaticArgs ]), + + -- initial simplify: mk specialiser happy: minimum effort please + simpl_gently, + + -- Specialisation is best done before full laziness + -- so that overloaded functions have all their dictionary lambdas manifest + runWhen do_specialise CoreDoSpecialising, + + if full_laziness then + CoreDoFloatOutwards FloatOutSwitches { + floatOutLambdas = Just 0, + floatOutConstants = True, + floatOutOverSatApps = False, + floatToTopLevelOnly = False } + -- Was: gentleFloatOutSwitches + -- + -- I have no idea why, but not floating constants to + -- top level is very bad in some cases. + -- + -- Notably: p_ident in spectral/rewrite + -- Changing from "gentle" to "constantsOnly" + -- improved rewrite's allocation by 19%, and + -- made 0.0% difference to any other nofib + -- benchmark + -- + -- Not doing floatOutOverSatApps yet, we'll do + -- that later on when we've had a chance to get more + -- accurate arity information. In fact it makes no + -- difference at all to performance if we do it here, + -- but maybe we save some unnecessary to-and-fro in + -- the simplifier. + else + -- Even with full laziness turned off, we still need to float static + -- forms to the top level. See Note [Grand plan for static forms] in + -- StaticPtrTable. + static_ptrs_float_outwards, + + simpl_phases, + + -- Phase 0: allow all Ids to be inlined now + -- This gets foldr inlined before strictness analysis + + -- At least 3 iterations because otherwise we land up with + -- huge dead expressions because of an infelicity in the + -- simplifier. + -- let k = BIG in foldr k z xs + -- ==> let k = BIG in letrec go = \xs -> ...(k x).... in go xs + -- ==> let k = BIG in letrec go = \xs -> ...(BIG x).... in go xs + -- Don't stop now! + simpl_phase 0 ["main"] (max max_iter 3), + + runWhen do_float_in CoreDoFloatInwards, + -- Run float-inwards immediately before the strictness analyser + -- Doing so pushes bindings nearer their use site and hence makes + -- them more likely to be strict. These bindings might only show + -- up after the inlining from simplification. Example in fulsom, + -- Csg.calc, where an arg of timesDouble thereby becomes strict. + + runWhen call_arity $ CoreDoPasses + [ CoreDoCallArity + , simpl_phase 0 ["post-call-arity"] max_iter + ], + + runWhen strictness demand_analyser, + + runWhen exitification CoreDoExitify, + -- See note [Placement of the exitification pass] + + runWhen full_laziness $ + CoreDoFloatOutwards FloatOutSwitches { + floatOutLambdas = floatLamArgs dflags, + floatOutConstants = True, + floatOutOverSatApps = True, + floatToTopLevelOnly = False }, + -- nofib/spectral/hartel/wang doubles in speed if you + -- do full laziness late in the day. It only happens + -- after fusion and other stuff, so the early pass doesn't + -- catch it. For the record, the redex is + -- f_el22 (f_el21 r_midblock) + + + runWhen cse CoreCSE, + -- We want CSE to follow the final full-laziness pass, because it may + -- succeed in commoning up things floated out by full laziness. + -- CSE used to rely on the no-shadowing invariant, but it doesn't any more + + runWhen do_float_in CoreDoFloatInwards, + + maybe_rule_check (Phase 0), + + -- Case-liberation for -O2. This should be after + -- strictness analysis and the simplification which follows it. + runWhen liberate_case (CoreDoPasses [ + CoreLiberateCase, + simpl_phase 0 ["post-liberate-case"] max_iter + ]), -- Run the simplifier after LiberateCase to vastly + -- reduce the possibility of shadowing + -- Reason: see Note [Shadowing] in GHC.Core.Op.SpecConstr + + runWhen spec_constr CoreDoSpecConstr, + + maybe_rule_check (Phase 0), + + runWhen late_specialise + (CoreDoPasses [ CoreDoSpecialising + , simpl_phase 0 ["post-late-spec"] max_iter]), + + -- LiberateCase can yield new CSE opportunities because it peels + -- off one layer of a recursive function (concretely, I saw this + -- in wheel-sieve1), and I'm guessing that SpecConstr can too + -- And CSE is a very cheap pass. So it seems worth doing here. + runWhen ((liberate_case || spec_constr) && cse) CoreCSE, + + -- Final clean-up simplification: + simpl_phase 0 ["final"] max_iter, + + runWhen late_dmd_anal $ CoreDoPasses ( + dmd_cpr_ww ++ + [simpl_phase 0 ["post-late-ww"] max_iter] + ), + + -- Final run of the demand_analyser, ensures that one-shot thunks are + -- really really one-shot thunks. Only needed if the demand analyser + -- has run at all. See Note [Final Demand Analyser run] in GHC.Core.Op.DmdAnal + -- It is EXTREMELY IMPORTANT to run this pass, otherwise execution + -- can become /exponentially/ more expensive. See #11731, #12996. + runWhen (strictness || late_dmd_anal) CoreDoDemand, + + maybe_rule_check (Phase 0) + ] + + -- Remove 'CoreDoNothing' and flatten 'CoreDoPasses' for clarity. + flatten_todos [] = [] + flatten_todos (CoreDoNothing : rest) = flatten_todos rest + flatten_todos (CoreDoPasses passes : rest) = + flatten_todos passes ++ flatten_todos rest + flatten_todos (todo : rest) = todo : flatten_todos rest + +{- Note [Inline in InitialPhase] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In GHC 8 and earlier we did not inline anything in the InitialPhase. But that is +confusing for users because when they say INLINE they expect the function to inline +right away. + +So now we do inlining immediately, even in the InitialPhase, assuming that the +Id's Activation allows it. + +This is a surprisingly big deal. Compiler performance improved a lot +when I made this change: + + perf/compiler/T5837.run T5837 [stat too good] (normal) + perf/compiler/parsing001.run parsing001 [stat too good] (normal) + perf/compiler/T12234.run T12234 [stat too good] (optasm) + perf/compiler/T9020.run T9020 [stat too good] (optasm) + perf/compiler/T3064.run T3064 [stat too good] (normal) + perf/compiler/T9961.run T9961 [stat too good] (normal) + perf/compiler/T13056.run T13056 [stat too good] (optasm) + perf/compiler/T9872d.run T9872d [stat too good] (normal) + perf/compiler/T783.run T783 [stat too good] (normal) + perf/compiler/T12227.run T12227 [stat too good] (normal) + perf/should_run/lazy-bs-alloc.run lazy-bs-alloc [stat too good] (normal) + perf/compiler/T1969.run T1969 [stat too good] (normal) + perf/compiler/T9872a.run T9872a [stat too good] (normal) + perf/compiler/T9872c.run T9872c [stat too good] (normal) + perf/compiler/T9872b.run T9872b [stat too good] (normal) + perf/compiler/T9872d.run T9872d [stat too good] (normal) + +Note [RULEs enabled in InitialPhase] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +RULES are enabled when doing "gentle" simplification in InitialPhase, +or with -O0. Two reasons: + + * We really want the class-op cancellation to happen: + op (df d1 d2) --> $cop3 d1 d2 + because this breaks the mutual recursion between 'op' and 'df' + + * I wanted the RULE + lift String ===> ... + to work in Template Haskell when simplifying + splices, so we get simpler code for literal strings + +But watch out: list fusion can prevent floating. So use phase control +to switch off those rules until after floating. + +************************************************************************ +* * + The CoreToDo interpreter +* * +************************************************************************ +-} + +runCorePasses :: [CoreToDo] -> ModGuts -> CoreM ModGuts +runCorePasses passes guts + = foldM do_pass guts passes + where + do_pass guts CoreDoNothing = return guts + do_pass guts (CoreDoPasses ps) = runCorePasses ps guts + do_pass guts pass = do + withTimingD (ppr pass <+> brackets (ppr mod)) + (const ()) $ do + { guts' <- lintAnnots (ppr pass) (doCorePass pass) guts + ; endPass pass (mg_binds guts') (mg_rules guts') + ; return guts' } + + mod = mg_module guts + +doCorePass :: CoreToDo -> ModGuts -> CoreM ModGuts +doCorePass pass@(CoreDoSimplify {}) = {-# SCC "Simplify" #-} + simplifyPgm pass + +doCorePass CoreCSE = {-# SCC "CommonSubExpr" #-} + doPass cseProgram + +doCorePass CoreLiberateCase = {-# SCC "LiberateCase" #-} + doPassD liberateCase + +doCorePass CoreDoFloatInwards = {-# SCC "FloatInwards" #-} + floatInwards + +doCorePass (CoreDoFloatOutwards f) = {-# SCC "FloatOutwards" #-} + doPassDUM (floatOutwards f) + +doCorePass CoreDoStaticArgs = {-# SCC "StaticArgs" #-} + doPassU doStaticArgs + +doCorePass CoreDoCallArity = {-# SCC "CallArity" #-} + doPassD callArityAnalProgram + +doCorePass CoreDoExitify = {-# SCC "Exitify" #-} + doPass exitifyProgram + +doCorePass CoreDoDemand = {-# SCC "DmdAnal" #-} + doPassDFM dmdAnalProgram + +doCorePass CoreDoCpr = {-# SCC "CprAnal" #-} + doPassDFM cprAnalProgram + +doCorePass CoreDoWorkerWrapper = {-# SCC "WorkWrap" #-} + doPassDFU wwTopBinds + +doCorePass CoreDoSpecialising = {-# SCC "Specialise" #-} + specProgram + +doCorePass CoreDoSpecConstr = {-# SCC "SpecConstr" #-} + specConstrProgram + +doCorePass CoreDoPrintCore = observe printCore +doCorePass (CoreDoRuleCheck phase pat) = ruleCheckPass phase pat +doCorePass CoreDoNothing = return +doCorePass (CoreDoPasses passes) = runCorePasses passes + +doCorePass (CoreDoPluginPass _ pass) = {-# SCC "Plugin" #-} pass + +doCorePass pass@CoreDesugar = pprPanic "doCorePass" (ppr pass) +doCorePass pass@CoreDesugarOpt = pprPanic "doCorePass" (ppr pass) +doCorePass pass@CoreTidy = pprPanic "doCorePass" (ppr pass) +doCorePass pass@CorePrep = pprPanic "doCorePass" (ppr pass) +doCorePass pass@CoreOccurAnal = pprPanic "doCorePass" (ppr pass) + +{- +************************************************************************ +* * +\subsection{Core pass combinators} +* * +************************************************************************ +-} + +printCore :: DynFlags -> CoreProgram -> IO () +printCore dflags binds + = Err.dumpIfSet dflags True "Print Core" (pprCoreBindings binds) + +ruleCheckPass :: CompilerPhase -> String -> ModGuts -> CoreM ModGuts +ruleCheckPass current_phase pat guts = + withTimingD (text "RuleCheck"<+>brackets (ppr $ mg_module guts)) + (const ()) $ do + { rb <- getRuleBase + ; dflags <- getDynFlags + ; vis_orphs <- getVisibleOrphanMods + ; let rule_fn fn = getRules (RuleEnv rb vis_orphs) fn + ++ (mg_rules guts) + ; liftIO $ putLogMsg dflags NoReason Err.SevDump noSrcSpan + (defaultDumpStyle dflags) + (ruleCheckProgram current_phase pat + rule_fn (mg_binds guts)) + ; return guts } + +doPassDUM :: (DynFlags -> UniqSupply -> CoreProgram -> IO CoreProgram) -> ModGuts -> CoreM ModGuts +doPassDUM do_pass = doPassM $ \binds -> do + dflags <- getDynFlags + us <- getUniqueSupplyM + liftIO $ do_pass dflags us binds + +doPassDM :: (DynFlags -> CoreProgram -> IO CoreProgram) -> ModGuts -> CoreM ModGuts +doPassDM do_pass = doPassDUM (\dflags -> const (do_pass dflags)) + +doPassD :: (DynFlags -> CoreProgram -> CoreProgram) -> ModGuts -> CoreM ModGuts +doPassD do_pass = doPassDM (\dflags -> return . do_pass dflags) + +doPassDU :: (DynFlags -> UniqSupply -> CoreProgram -> CoreProgram) -> ModGuts -> CoreM ModGuts +doPassDU do_pass = doPassDUM (\dflags us -> return . do_pass dflags us) + +doPassU :: (UniqSupply -> CoreProgram -> CoreProgram) -> ModGuts -> CoreM ModGuts +doPassU do_pass = doPassDU (const do_pass) + +doPassDFM :: (DynFlags -> FamInstEnvs -> CoreProgram -> IO CoreProgram) -> ModGuts -> CoreM ModGuts +doPassDFM do_pass guts = do + dflags <- getDynFlags + p_fam_env <- getPackageFamInstEnv + let fam_envs = (p_fam_env, mg_fam_inst_env guts) + doPassM (liftIO . do_pass dflags fam_envs) guts + +doPassDFU :: (DynFlags -> FamInstEnvs -> UniqSupply -> CoreProgram -> CoreProgram) -> ModGuts -> CoreM ModGuts +doPassDFU do_pass guts = do + dflags <- getDynFlags + us <- getUniqueSupplyM + p_fam_env <- getPackageFamInstEnv + let fam_envs = (p_fam_env, mg_fam_inst_env guts) + doPass (do_pass dflags fam_envs us) guts + +-- Most passes return no stats and don't change rules: these combinators +-- let us lift them to the full blown ModGuts+CoreM world +doPassM :: Monad m => (CoreProgram -> m CoreProgram) -> ModGuts -> m ModGuts +doPassM bind_f guts = do + binds' <- bind_f (mg_binds guts) + return (guts { mg_binds = binds' }) + +doPass :: (CoreProgram -> CoreProgram) -> ModGuts -> CoreM ModGuts +doPass bind_f guts = return $ guts { mg_binds = bind_f (mg_binds guts) } + +-- Observer passes just peek; don't modify the bindings at all +observe :: (DynFlags -> CoreProgram -> IO a) -> ModGuts -> CoreM ModGuts +observe do_pass = doPassM $ \binds -> do + dflags <- getDynFlags + _ <- liftIO $ do_pass dflags binds + return binds + +{- +************************************************************************ +* * + Gentle simplification +* * +************************************************************************ +-} + +simplifyExpr :: HscEnv -- includes spec of what core-to-core passes to do + -> CoreExpr + -> IO CoreExpr +-- simplifyExpr is called by the driver to simplify an +-- expression typed in at the interactive prompt +simplifyExpr hsc_env expr + = withTiming dflags (text "Simplify [expr]") (const ()) $ + do { eps <- hscEPS hsc_env ; + ; let rule_env = mkRuleEnv (eps_rule_base eps) [] + fi_env = ( eps_fam_inst_env eps + , extendFamInstEnvList emptyFamInstEnv $ + snd $ ic_instances $ hsc_IC hsc_env ) + simpl_env = simplEnvForGHCi dflags + + ; us <- mkSplitUniqSupply 's' + ; let sz = exprSize expr + + ; (expr', counts) <- initSmpl dflags rule_env fi_env us sz $ + simplExprGently simpl_env expr + + ; Err.dumpIfSet dflags (dopt Opt_D_dump_simpl_stats dflags) + "Simplifier statistics" (pprSimplCount counts) + + ; Err.dumpIfSet_dyn dflags Opt_D_dump_simpl "Simplified expression" + FormatCore + (pprCoreExpr expr') + + ; return expr' + } + where + dflags = hsc_dflags hsc_env + +simplExprGently :: SimplEnv -> CoreExpr -> SimplM CoreExpr +-- Simplifies an expression +-- does occurrence analysis, then simplification +-- and repeats (twice currently) because one pass +-- alone leaves tons of crud. +-- Used (a) for user expressions typed in at the interactive prompt +-- (b) the LHS and RHS of a RULE +-- (c) Template Haskell splices +-- +-- The name 'Gently' suggests that the SimplMode is InitialPhase, +-- and in fact that is so.... but the 'Gently' in simplExprGently doesn't +-- enforce that; it just simplifies the expression twice + +-- It's important that simplExprGently does eta reduction; see +-- Note [Simplifying the left-hand side of a RULE] above. The +-- simplifier does indeed do eta reduction (it's in GHC.Core.Op.Simplify.completeLam) +-- but only if -O is on. + +simplExprGently env expr = do + expr1 <- simplExpr env (occurAnalyseExpr expr) + simplExpr env (occurAnalyseExpr expr1) + +{- +************************************************************************ +* * +\subsection{The driver for the simplifier} +* * +************************************************************************ +-} + +simplifyPgm :: CoreToDo -> ModGuts -> CoreM ModGuts +simplifyPgm pass guts + = do { hsc_env <- getHscEnv + ; us <- getUniqueSupplyM + ; rb <- getRuleBase + ; liftIOWithCount $ + simplifyPgmIO pass hsc_env us rb guts } + +simplifyPgmIO :: CoreToDo + -> HscEnv + -> UniqSupply + -> RuleBase + -> ModGuts + -> IO (SimplCount, ModGuts) -- New bindings + +simplifyPgmIO pass@(CoreDoSimplify max_iterations mode) + hsc_env us hpt_rule_base + guts@(ModGuts { mg_module = this_mod + , mg_rdr_env = rdr_env + , mg_deps = deps + , mg_binds = binds, mg_rules = rules + , mg_fam_inst_env = fam_inst_env }) + = do { (termination_msg, it_count, counts_out, guts') + <- do_iteration us 1 [] binds rules + + ; Err.dumpIfSet dflags (dopt Opt_D_verbose_core2core dflags && + dopt Opt_D_dump_simpl_stats dflags) + "Simplifier statistics for following pass" + (vcat [text termination_msg <+> text "after" <+> ppr it_count + <+> text "iterations", + blankLine, + pprSimplCount counts_out]) + + ; return (counts_out, guts') + } + where + dflags = hsc_dflags hsc_env + print_unqual = mkPrintUnqualified dflags rdr_env + simpl_env = mkSimplEnv mode + active_rule = activeRule mode + active_unf = activeUnfolding mode + + do_iteration :: UniqSupply + -> Int -- Counts iterations + -> [SimplCount] -- Counts from earlier iterations, reversed + -> CoreProgram -- Bindings in + -> [CoreRule] -- and orphan rules + -> IO (String, Int, SimplCount, ModGuts) + + do_iteration us iteration_no counts_so_far binds rules + -- iteration_no is the number of the iteration we are + -- about to begin, with '1' for the first + | iteration_no > max_iterations -- Stop if we've run out of iterations + = WARN( debugIsOn && (max_iterations > 2) + , hang (text "Simplifier bailing out after" <+> int max_iterations + <+> text "iterations" + <+> (brackets $ hsep $ punctuate comma $ + map (int . simplCountN) (reverse counts_so_far))) + 2 (text "Size =" <+> ppr (coreBindsStats binds))) + + -- Subtract 1 from iteration_no to get the + -- number of iterations we actually completed + return ( "Simplifier baled out", iteration_no - 1 + , totalise counts_so_far + , guts { mg_binds = binds, mg_rules = rules } ) + + -- Try and force thunks off the binds; significantly reduces + -- space usage, especially with -O. JRS, 000620. + | let sz = coreBindsSize binds + , () <- sz `seq` () -- Force it + = do { + -- Occurrence analysis + let { tagged_binds = {-# SCC "OccAnal" #-} + occurAnalysePgm this_mod active_unf active_rule rules + binds + } ; + Err.dumpIfSet_dyn dflags Opt_D_dump_occur_anal "Occurrence analysis" + FormatCore + (pprCoreBindings tagged_binds); + + -- Get any new rules, and extend the rule base + -- See Note [Overall plumbing for rules] in GHC.Core.Rules + -- We need to do this regularly, because simplification can + -- poke on IdInfo thunks, which in turn brings in new rules + -- behind the scenes. Otherwise there's a danger we'll simply + -- miss the rules for Ids hidden inside imported inlinings + eps <- hscEPS hsc_env ; + let { rule_base1 = unionRuleBase hpt_rule_base (eps_rule_base eps) + ; rule_base2 = extendRuleBaseList rule_base1 rules + ; fam_envs = (eps_fam_inst_env eps, fam_inst_env) + ; vis_orphs = this_mod : dep_orphs deps } ; + + -- Simplify the program + ((binds1, rules1), counts1) <- + initSmpl dflags (mkRuleEnv rule_base2 vis_orphs) fam_envs us1 sz $ + do { (floats, env1) <- {-# SCC "SimplTopBinds" #-} + simplTopBinds simpl_env tagged_binds + + -- Apply the substitution to rules defined in this module + -- for imported Ids. Eg RULE map my_f = blah + -- If we have a substitution my_f :-> other_f, we'd better + -- apply it to the rule to, or it'll never match + ; rules1 <- simplRules env1 Nothing rules Nothing + + ; return (getTopFloatBinds floats, rules1) } ; + + -- Stop if nothing happened; don't dump output + -- See Note [Which transformations are innocuous] in GHC.Core.Op.Monad + if isZeroSimplCount counts1 then + return ( "Simplifier reached fixed point", iteration_no + , totalise (counts1 : counts_so_far) -- Include "free" ticks + , guts { mg_binds = binds1, mg_rules = rules1 } ) + else do { + -- Short out indirections + -- We do this *after* at least one run of the simplifier + -- because indirection-shorting uses the export flag on *occurrences* + -- and that isn't guaranteed to be ok until after the first run propagates + -- stuff from the binding site to its occurrences + -- + -- ToDo: alas, this means that indirection-shorting does not happen at all + -- if the simplifier does nothing (not common, I know, but unsavoury) + let { binds2 = {-# SCC "ZapInd" #-} shortOutIndirections binds1 } ; + + -- Dump the result of this iteration + dump_end_iteration dflags print_unqual iteration_no counts1 binds2 rules1 ; + lintPassResult hsc_env pass binds2 ; + + -- Loop + do_iteration us2 (iteration_no + 1) (counts1:counts_so_far) binds2 rules1 + } } +#if __GLASGOW_HASKELL__ <= 810 + | otherwise = panic "do_iteration" +#endif + where + (us1, us2) = splitUniqSupply us + + -- Remember the counts_so_far are reversed + totalise :: [SimplCount] -> SimplCount + totalise = foldr (\c acc -> acc `plusSimplCount` c) + (zeroSimplCount dflags) + +simplifyPgmIO _ _ _ _ _ = panic "simplifyPgmIO" + +------------------- +dump_end_iteration :: DynFlags -> PrintUnqualified -> Int + -> SimplCount -> CoreProgram -> [CoreRule] -> IO () +dump_end_iteration dflags print_unqual iteration_no counts binds rules + = dumpPassResult dflags print_unqual mb_flag hdr pp_counts binds rules + where + mb_flag | dopt Opt_D_dump_simpl_iterations dflags = Just Opt_D_dump_simpl_iterations + | otherwise = Nothing + -- Show details if Opt_D_dump_simpl_iterations is on + + hdr = text "Simplifier iteration=" <> int iteration_no + pp_counts = vcat [ text "---- Simplifier counts for" <+> hdr + , pprSimplCount counts + , text "---- End of simplifier counts for" <+> hdr ] + +{- +************************************************************************ +* * + Shorting out indirections +* * +************************************************************************ + +If we have this: + + x_local = <expression> + ...bindings... + x_exported = x_local + +where x_exported is exported, and x_local is not, then we replace it with this: + + x_exported = <expression> + x_local = x_exported + ...bindings... + +Without this we never get rid of the x_exported = x_local thing. This +save a gratuitous jump (from \tr{x_exported} to \tr{x_local}), and +makes strictness information propagate better. This used to happen in +the final phase, but it's tidier to do it here. + +Note [Messing up the exported Id's RULES] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We must be careful about discarding (obviously) or even merging the +RULES on the exported Id. The example that went bad on me at one stage +was this one: + + iterate :: (a -> a) -> a -> [a] + [Exported] + iterate = iterateList + + iterateFB c f x = x `c` iterateFB c f (f x) + iterateList f x = x : iterateList f (f x) + [Not exported] + + {-# RULES + "iterate" forall f x. iterate f x = build (\c _n -> iterateFB c f x) + "iterateFB" iterateFB (:) = iterateList + #-} + +This got shorted out to: + + iterateList :: (a -> a) -> a -> [a] + iterateList = iterate + + iterateFB c f x = x `c` iterateFB c f (f x) + iterate f x = x : iterate f (f x) + + {-# RULES + "iterate" forall f x. iterate f x = build (\c _n -> iterateFB c f x) + "iterateFB" iterateFB (:) = iterate + #-} + +And now we get an infinite loop in the rule system + iterate f x -> build (\cn -> iterateFB c f x) + -> iterateFB (:) f x + -> iterate f x + +Old "solution": + use rule switching-off pragmas to get rid + of iterateList in the first place + +But in principle the user *might* want rules that only apply to the Id +he says. And inline pragmas are similar + {-# NOINLINE f #-} + f = local + local = <stuff> +Then we do not want to get rid of the NOINLINE. + +Hence hasShortableIdinfo. + + +Note [Rules and indirection-zapping] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Problem: what if x_exported has a RULE that mentions something in ...bindings...? +Then the things mentioned can be out of scope! Solution + a) Make sure that in this pass the usage-info from x_exported is + available for ...bindings... + b) If there are any such RULES, rec-ify the entire top-level. + It'll get sorted out next time round + +Other remarks +~~~~~~~~~~~~~ +If more than one exported thing is equal to a local thing (i.e., the +local thing really is shared), then we do one only: +\begin{verbatim} + x_local = .... + x_exported1 = x_local + x_exported2 = x_local +==> + x_exported1 = .... + + x_exported2 = x_exported1 +\end{verbatim} + +We rely on prior eta reduction to simplify things like +\begin{verbatim} + x_exported = /\ tyvars -> x_local tyvars +==> + x_exported = x_local +\end{verbatim} +Hence,there's a possibility of leaving unchanged something like this: +\begin{verbatim} + x_local = .... + x_exported1 = x_local Int +\end{verbatim} +By the time we've thrown away the types in STG land this +could be eliminated. But I don't think it's very common +and it's dangerous to do this fiddling in STG land +because we might eliminate a binding that's mentioned in the +unfolding for something. + +Note [Indirection zapping and ticks] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Unfortunately this is another place where we need a special case for +ticks. The following happens quite regularly: + + x_local = <expression> + x_exported = tick<x> x_local + +Which we want to become: + + x_exported = tick<x> <expression> + +As it makes no sense to keep the tick and the expression on separate +bindings. Note however that that this might increase the ticks scoping +over the execution of x_local, so we can only do this for floatable +ticks. More often than not, other references will be unfoldings of +x_exported, and therefore carry the tick anyway. +-} + +type IndEnv = IdEnv (Id, [Tickish Var]) -- Maps local_id -> exported_id, ticks + +shortOutIndirections :: CoreProgram -> CoreProgram +shortOutIndirections binds + | isEmptyVarEnv ind_env = binds + | no_need_to_flatten = binds' -- See Note [Rules and indirect-zapping] + | otherwise = [Rec (flattenBinds binds')] -- for this no_need_to_flatten stuff + where + ind_env = makeIndEnv binds + -- These exported Ids are the subjects of the indirection-elimination + exp_ids = map fst $ nonDetEltsUFM ind_env + -- It's OK to use nonDetEltsUFM here because we forget the ordering + -- by immediately converting to a set or check if all the elements + -- satisfy a predicate. + exp_id_set = mkVarSet exp_ids + no_need_to_flatten = all (null . ruleInfoRules . idSpecialisation) exp_ids + binds' = concatMap zap binds + + zap (NonRec bndr rhs) = [NonRec b r | (b,r) <- zapPair (bndr,rhs)] + zap (Rec pairs) = [Rec (concatMap zapPair pairs)] + + zapPair (bndr, rhs) + | bndr `elemVarSet` exp_id_set + = [] -- Kill the exported-id binding + + | Just (exp_id, ticks) <- lookupVarEnv ind_env bndr + , (exp_id', lcl_id') <- transferIdInfo exp_id bndr + = -- Turn a local-id binding into two bindings + -- exp_id = rhs; lcl_id = exp_id + [ (exp_id', mkTicks ticks rhs), + (lcl_id', Var exp_id') ] + + | otherwise + = [(bndr,rhs)] + +makeIndEnv :: [CoreBind] -> IndEnv +makeIndEnv binds + = foldl' add_bind emptyVarEnv binds + where + add_bind :: IndEnv -> CoreBind -> IndEnv + add_bind env (NonRec exported_id rhs) = add_pair env (exported_id, rhs) + add_bind env (Rec pairs) = foldl' add_pair env pairs + + add_pair :: IndEnv -> (Id,CoreExpr) -> IndEnv + add_pair env (exported_id, exported) + | (ticks, Var local_id) <- stripTicksTop tickishFloatable exported + , shortMeOut env exported_id local_id + = extendVarEnv env local_id (exported_id, ticks) + add_pair env _ = env + +----------------- +shortMeOut :: IndEnv -> Id -> Id -> Bool +shortMeOut ind_env exported_id local_id +-- The if-then-else stuff is just so I can get a pprTrace to see +-- how often I don't get shorting out because of IdInfo stuff + = if isExportedId exported_id && -- Only if this is exported + + isLocalId local_id && -- Only if this one is defined in this + -- module, so that we *can* change its + -- binding to be the exported thing! + + not (isExportedId local_id) && -- Only if this one is not itself exported, + -- since the transformation will nuke it + + not (local_id `elemVarEnv` ind_env) -- Only if not already substituted for + then + if hasShortableIdInfo exported_id + then True -- See Note [Messing up the exported Id's IdInfo] + else WARN( True, text "Not shorting out:" <+> ppr exported_id ) + False + else + False + +----------------- +hasShortableIdInfo :: Id -> Bool +-- True if there is no user-attached IdInfo on exported_id, +-- so we can safely discard it +-- See Note [Messing up the exported Id's IdInfo] +hasShortableIdInfo id + = isEmptyRuleInfo (ruleInfo info) + && isDefaultInlinePragma (inlinePragInfo info) + && not (isStableUnfolding (unfoldingInfo info)) + where + info = idInfo id + +----------------- +{- Note [Transferring IdInfo] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have + lcl_id = e; exp_id = lcl_id + +and lcl_id has useful IdInfo, we don't want to discard it by going + gbl_id = e; lcl_id = gbl_id + +Instead, transfer IdInfo from lcl_id to exp_id, specifically +* (Stable) unfolding +* Strictness +* Rules +* Inline pragma + +Overwriting, rather than merging, seems to work ok. + +We also zap the InlinePragma on the lcl_id. It might originally +have had a NOINLINE, which we have now transferred; and we really +want the lcl_id to inline now that its RHS is trivial! +-} + +transferIdInfo :: Id -> Id -> (Id, Id) +-- See Note [Transferring IdInfo] +transferIdInfo exported_id local_id + = ( modifyIdInfo transfer exported_id + , local_id `setInlinePragma` defaultInlinePragma ) + where + local_info = idInfo local_id + transfer exp_info = exp_info `setStrictnessInfo` strictnessInfo local_info + `setCprInfo` cprInfo local_info + `setUnfoldingInfo` unfoldingInfo local_info + `setInlinePragInfo` inlinePragInfo local_info + `setRuleInfo` addRuleInfo (ruleInfo exp_info) new_info + new_info = setRuleInfoHead (idName exported_id) + (ruleInfo local_info) + -- Remember to set the function-name field of the + -- rules as we transfer them from one function to another diff --git a/compiler/GHC/Core/Op/Simplify/Env.hs b/compiler/GHC/Core/Op/Simplify/Env.hs new file mode 100644 index 0000000000..0e94f734af --- /dev/null +++ b/compiler/GHC/Core/Op/Simplify/Env.hs @@ -0,0 +1,938 @@ +{- +(c) The AQUA Project, Glasgow University, 1993-1998 + +\section[GHC.Core.Op.Simplify.Monad]{The simplifier Monad} +-} + +{-# LANGUAGE CPP #-} + +module GHC.Core.Op.Simplify.Env ( + -- * The simplifier mode + setMode, getMode, updMode, seDynFlags, + + -- * Environments + SimplEnv(..), pprSimplEnv, -- Temp not abstract + mkSimplEnv, extendIdSubst, + extendTvSubst, extendCvSubst, + zapSubstEnv, setSubstEnv, + getInScope, setInScopeFromE, setInScopeFromF, + setInScopeSet, modifyInScope, addNewInScopeIds, + getSimplRules, + + -- * Substitution results + SimplSR(..), mkContEx, substId, lookupRecBndr, refineFromInScope, + + -- * Simplifying 'Id' binders + simplNonRecBndr, simplNonRecJoinBndr, simplRecBndrs, simplRecJoinBndrs, + simplBinder, simplBinders, + substTy, substTyVar, getTCvSubst, + substCo, substCoVar, + + -- * Floats + SimplFloats(..), emptyFloats, mkRecFloats, + mkFloatBind, addLetFloats, addJoinFloats, addFloats, + extendFloats, wrapFloats, + doFloatFromRhs, getTopFloatBinds, + + -- * LetFloats + LetFloats, letFloatBinds, emptyLetFloats, unitLetFloat, + addLetFlts, mapLetFloats, + + -- * JoinFloats + JoinFloat, JoinFloats, emptyJoinFloats, + wrapJoinFloats, wrapJoinFloatsX, unitJoinFloat, addJoinFlts + ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core.Op.Simplify.Monad +import GHC.Core.Op.Monad ( SimplMode(..) ) +import GHC.Core +import GHC.Core.Utils +import Var +import VarEnv +import VarSet +import OrdList +import Id +import GHC.Core.Make ( mkWildValBinder ) +import GHC.Driver.Session ( DynFlags ) +import TysWiredIn +import qualified GHC.Core.Type as Type +import GHC.Core.Type hiding ( substTy, substTyVar, substTyVarBndr, extendTvSubst, extendCvSubst ) +import qualified GHC.Core.Coercion as Coercion +import GHC.Core.Coercion hiding ( substCo, substCoVar, substCoVarBndr ) +import BasicTypes +import MonadUtils +import Outputable +import Util +import UniqFM ( pprUniqFM ) + +import Data.List (mapAccumL) + +{- +************************************************************************ +* * +\subsubsection{The @SimplEnv@ type} +* * +************************************************************************ +-} + +data SimplEnv + = SimplEnv { + ----------- Static part of the environment ----------- + -- Static in the sense of lexically scoped, + -- wrt the original expression + + seMode :: SimplMode + + -- The current substitution + , seTvSubst :: TvSubstEnv -- InTyVar |--> OutType + , seCvSubst :: CvSubstEnv -- InCoVar |--> OutCoercion + , seIdSubst :: SimplIdSubst -- InId |--> OutExpr + + ----------- Dynamic part of the environment ----------- + -- Dynamic in the sense of describing the setup where + -- the expression finally ends up + + -- The current set of in-scope variables + -- They are all OutVars, and all bound in this module + , seInScope :: InScopeSet -- OutVars only + } + +data SimplFloats + = SimplFloats + { -- Ordinary let bindings + sfLetFloats :: LetFloats + -- See Note [LetFloats] + + -- Join points + , sfJoinFloats :: JoinFloats + -- Handled separately; they don't go very far + -- We consider these to be /inside/ sfLetFloats + -- because join points can refer to ordinary bindings, + -- but not vice versa + + -- Includes all variables bound by sfLetFloats and + -- sfJoinFloats, plus at least whatever is in scope where + -- these bindings land up. + , sfInScope :: InScopeSet -- All OutVars + } + +instance Outputable SimplFloats where + ppr (SimplFloats { sfLetFloats = lf, sfJoinFloats = jf, sfInScope = is }) + = text "SimplFloats" + <+> braces (vcat [ text "lets: " <+> ppr lf + , text "joins:" <+> ppr jf + , text "in_scope:" <+> ppr is ]) + +emptyFloats :: SimplEnv -> SimplFloats +emptyFloats env + = SimplFloats { sfLetFloats = emptyLetFloats + , sfJoinFloats = emptyJoinFloats + , sfInScope = seInScope env } + +pprSimplEnv :: SimplEnv -> SDoc +-- Used for debugging; selective +pprSimplEnv env + = vcat [text "TvSubst:" <+> ppr (seTvSubst env), + text "CvSubst:" <+> ppr (seCvSubst env), + text "IdSubst:" <+> id_subst_doc, + text "InScope:" <+> in_scope_vars_doc + ] + where + id_subst_doc = pprUniqFM ppr (seIdSubst env) + in_scope_vars_doc = pprVarSet (getInScopeVars (seInScope env)) + (vcat . map ppr_one) + ppr_one v | isId v = ppr v <+> ppr (idUnfolding v) + | otherwise = ppr v + +type SimplIdSubst = IdEnv SimplSR -- IdId |--> OutExpr + -- See Note [Extending the Subst] in GHC.Core.Subst + +-- | A substitution result. +data SimplSR + = DoneEx OutExpr (Maybe JoinArity) + -- If x :-> DoneEx e ja is in the SimplIdSubst + -- then replace occurrences of x by e + -- and ja = Just a <=> x is a join-point of arity a + -- See Note [Join arity in SimplIdSubst] + + + | DoneId OutId + -- If x :-> DoneId v is in the SimplIdSubst + -- then replace occurrences of x by v + -- and v is a join-point of arity a + -- <=> x is a join-point of arity a + + | ContEx TvSubstEnv -- A suspended substitution + CvSubstEnv + SimplIdSubst + InExpr + -- If x :-> ContEx tv cv id e is in the SimplISubst + -- then replace occurrences of x by (subst (tv,cv,id) e) + +instance Outputable SimplSR where + ppr (DoneId v) = text "DoneId" <+> ppr v + ppr (DoneEx e mj) = text "DoneEx" <> pp_mj <+> ppr e + where + pp_mj = case mj of + Nothing -> empty + Just n -> parens (int n) + + ppr (ContEx _tv _cv _id e) = vcat [text "ContEx" <+> ppr e {-, + ppr (filter_env tv), ppr (filter_env id) -}] + -- where + -- fvs = exprFreeVars e + -- filter_env env = filterVarEnv_Directly keep env + -- keep uniq _ = uniq `elemUFM_Directly` fvs + +{- +Note [SimplEnv invariants] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +seInScope: + The in-scope part of Subst includes *all* in-scope TyVars and Ids + The elements of the set may have better IdInfo than the + occurrences of in-scope Ids, and (more important) they will + have a correctly-substituted type. So we use a lookup in this + set to replace occurrences + + The Ids in the InScopeSet are replete with their Rules, + and as we gather info about the unfolding of an Id, we replace + it in the in-scope set. + + The in-scope set is actually a mapping OutVar -> OutVar, and + in case expressions we sometimes bind + +seIdSubst: + The substitution is *apply-once* only, because InIds and OutIds + can overlap. + For example, we generally omit mappings + a77 -> a77 + from the substitution, when we decide not to clone a77, but it's quite + legitimate to put the mapping in the substitution anyway. + + Furthermore, consider + let x = case k of I# x77 -> ... in + let y = case k of I# x77 -> ... in ... + and suppose the body is strict in both x and y. Then the simplifier + will pull the first (case k) to the top; so the second (case k) will + cancel out, mapping x77 to, well, x77! But one is an in-Id and the + other is an out-Id. + + Of course, the substitution *must* applied! Things in its domain + simply aren't necessarily bound in the result. + +* substId adds a binding (DoneId new_id) to the substitution if + the Id's unique has changed + + Note, though that the substitution isn't necessarily extended + if the type of the Id changes. Why not? Because of the next point: + +* We *always, always* finish by looking up in the in-scope set + any variable that doesn't get a DoneEx or DoneVar hit in the substitution. + Reason: so that we never finish up with a "old" Id in the result. + An old Id might point to an old unfolding and so on... which gives a space + leak. + + [The DoneEx and DoneVar hits map to "new" stuff.] + +* It follows that substExpr must not do a no-op if the substitution is empty. + substType is free to do so, however. + +* When we come to a let-binding (say) we generate new IdInfo, including an + unfolding, attach it to the binder, and add this newly adorned binder to + the in-scope set. So all subsequent occurrences of the binder will get + mapped to the full-adorned binder, which is also the one put in the + binding site. + +* The in-scope "set" usually maps x->x; we use it simply for its domain. + But sometimes we have two in-scope Ids that are synomyms, and should + map to the same target: x->x, y->x. Notably: + case y of x { ... } + That's why the "set" is actually a VarEnv Var + +Note [Join arity in SimplIdSubst] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We have to remember which incoming variables are join points: the occurrences +may not be marked correctly yet, and we're in change of propagating the change if +OccurAnal makes something a join point). + +Normally the in-scope set is where we keep the latest information, but +the in-scope set tracks only OutVars; if a binding is unconditionally +inlined (via DoneEx), it never makes it into the in-scope set, and we +need to know at the occurrence site that the variable is a join point +so that we know to drop the context. Thus we remember which join +points we're substituting. -} + +mkSimplEnv :: SimplMode -> SimplEnv +mkSimplEnv mode + = SimplEnv { seMode = mode + , seInScope = init_in_scope + , seTvSubst = emptyVarEnv + , seCvSubst = emptyVarEnv + , seIdSubst = emptyVarEnv } + -- The top level "enclosing CC" is "SUBSUMED". + +init_in_scope :: InScopeSet +init_in_scope = mkInScopeSet (unitVarSet (mkWildValBinder unitTy)) + -- See Note [WildCard binders] + +{- +Note [WildCard binders] +~~~~~~~~~~~~~~~~~~~~~~~ +The program to be simplified may have wild binders + case e of wild { p -> ... } +We want to *rename* them away, so that there are no +occurrences of 'wild-id' (with wildCardKey). The easy +way to do that is to start of with a representative +Id in the in-scope set + +There can be *occurrences* of wild-id. For example, +GHC.Core.Make.mkCoreApp transforms + e (a /# b) --> case (a /# b) of wild { DEFAULT -> e wild } +This is ok provided 'wild' isn't free in 'e', and that's the delicate +thing. Generally, you want to run the simplifier to get rid of the +wild-ids before doing much else. + +It's a very dark corner of GHC. Maybe it should be cleaned up. +-} + +getMode :: SimplEnv -> SimplMode +getMode env = seMode env + +seDynFlags :: SimplEnv -> DynFlags +seDynFlags env = sm_dflags (seMode env) + +setMode :: SimplMode -> SimplEnv -> SimplEnv +setMode mode env = env { seMode = mode } + +updMode :: (SimplMode -> SimplMode) -> SimplEnv -> SimplEnv +updMode upd env = env { seMode = upd (seMode env) } + +--------------------- +extendIdSubst :: SimplEnv -> Id -> SimplSR -> SimplEnv +extendIdSubst env@(SimplEnv {seIdSubst = subst}) var res + = ASSERT2( isId var && not (isCoVar var), ppr var ) + env { seIdSubst = extendVarEnv subst var res } + +extendTvSubst :: SimplEnv -> TyVar -> Type -> SimplEnv +extendTvSubst env@(SimplEnv {seTvSubst = tsubst}) var res + = ASSERT2( isTyVar var, ppr var $$ ppr res ) + env {seTvSubst = extendVarEnv tsubst var res} + +extendCvSubst :: SimplEnv -> CoVar -> Coercion -> SimplEnv +extendCvSubst env@(SimplEnv {seCvSubst = csubst}) var co + = ASSERT( isCoVar var ) + env {seCvSubst = extendVarEnv csubst var co} + +--------------------- +getInScope :: SimplEnv -> InScopeSet +getInScope env = seInScope env + +setInScopeSet :: SimplEnv -> InScopeSet -> SimplEnv +setInScopeSet env in_scope = env {seInScope = in_scope} + +setInScopeFromE :: SimplEnv -> SimplEnv -> SimplEnv +-- See Note [Setting the right in-scope set] +setInScopeFromE rhs_env here_env = rhs_env { seInScope = seInScope here_env } + +setInScopeFromF :: SimplEnv -> SimplFloats -> SimplEnv +setInScopeFromF env floats = env { seInScope = sfInScope floats } + +addNewInScopeIds :: SimplEnv -> [CoreBndr] -> SimplEnv + -- The new Ids are guaranteed to be freshly allocated +addNewInScopeIds env@(SimplEnv { seInScope = in_scope, seIdSubst = id_subst }) vs + = env { seInScope = in_scope `extendInScopeSetList` vs, + seIdSubst = id_subst `delVarEnvList` vs } + -- Why delete? Consider + -- let x = a*b in (x, \x -> x+3) + -- We add [x |-> a*b] to the substitution, but we must + -- _delete_ it from the substitution when going inside + -- the (\x -> ...)! + +modifyInScope :: SimplEnv -> CoreBndr -> SimplEnv +-- The variable should already be in scope, but +-- replace the existing version with this new one +-- which has more information +modifyInScope env@(SimplEnv {seInScope = in_scope}) v + = env {seInScope = extendInScopeSet in_scope v} + +{- Note [Setting the right in-scope set] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + \x. (let x = e in b) arg[x] +where the let shadows the lambda. Really this means something like + \x1. (let x2 = e in b) arg[x1] + +- When we capture the 'arg' in an ApplyToVal continuation, we capture + the environment, which says what 'x' is bound to, namely x1 + +- Then that continuation gets pushed under the let + +- Finally we simplify 'arg'. We want + - the static, lexical environment binding x :-> x1 + - the in-scopeset from "here", under the 'let' which includes + both x1 and x2 + +It's important to have the right in-scope set, else we may rename a +variable to one that is already in scope. So we must pick up the +in-scope set from "here", but otherwise use the environment we +captured along with 'arg'. This transfer of in-scope set is done by +setInScopeFromE. +-} + +--------------------- +zapSubstEnv :: SimplEnv -> SimplEnv +zapSubstEnv env = env {seTvSubst = emptyVarEnv, seCvSubst = emptyVarEnv, seIdSubst = emptyVarEnv} + +setSubstEnv :: SimplEnv -> TvSubstEnv -> CvSubstEnv -> SimplIdSubst -> SimplEnv +setSubstEnv env tvs cvs ids = env { seTvSubst = tvs, seCvSubst = cvs, seIdSubst = ids } + +mkContEx :: SimplEnv -> InExpr -> SimplSR +mkContEx (SimplEnv { seTvSubst = tvs, seCvSubst = cvs, seIdSubst = ids }) e = ContEx tvs cvs ids e + +{- +************************************************************************ +* * +\subsection{LetFloats} +* * +************************************************************************ + +Note [LetFloats] +~~~~~~~~~~~~~~~~ +The LetFloats is a bunch of bindings, classified by a FloatFlag. + +* All of them satisfy the let/app invariant + +Examples + + NonRec x (y:ys) FltLifted + Rec [(x,rhs)] FltLifted + + NonRec x* (p:q) FltOKSpec -- RHS is WHNF. Question: why not FltLifted? + NonRec x# (y +# 3) FltOkSpec -- Unboxed, but ok-for-spec'n + + NonRec x* (f y) FltCareful -- Strict binding; might fail or diverge + +Can't happen: + NonRec x# (a /# b) -- Might fail; does not satisfy let/app + NonRec x# (f y) -- Might diverge; does not satisfy let/app +-} + +data LetFloats = LetFloats (OrdList OutBind) FloatFlag + -- See Note [LetFloats] + +type JoinFloat = OutBind +type JoinFloats = OrdList JoinFloat + +data FloatFlag + = FltLifted -- All bindings are lifted and lazy *or* + -- consist of a single primitive string literal + -- Hence ok to float to top level, or recursive + + | FltOkSpec -- All bindings are FltLifted *or* + -- strict (perhaps because unlifted, + -- perhaps because of a strict binder), + -- *and* ok-for-speculation + -- Hence ok to float out of the RHS + -- of a lazy non-recursive let binding + -- (but not to top level, or into a rec group) + + | FltCareful -- At least one binding is strict (or unlifted) + -- and not guaranteed cheap + -- Do not float these bindings out of a lazy let + +instance Outputable LetFloats where + ppr (LetFloats binds ff) = ppr ff $$ ppr (fromOL binds) + +instance Outputable FloatFlag where + ppr FltLifted = text "FltLifted" + ppr FltOkSpec = text "FltOkSpec" + ppr FltCareful = text "FltCareful" + +andFF :: FloatFlag -> FloatFlag -> FloatFlag +andFF FltCareful _ = FltCareful +andFF FltOkSpec FltCareful = FltCareful +andFF FltOkSpec _ = FltOkSpec +andFF FltLifted flt = flt + +doFloatFromRhs :: TopLevelFlag -> RecFlag -> Bool -> SimplFloats -> OutExpr -> Bool +-- If you change this function look also at FloatIn.noFloatFromRhs +doFloatFromRhs lvl rec str (SimplFloats { sfLetFloats = LetFloats fs ff }) rhs + = not (isNilOL fs) && want_to_float && can_float + where + want_to_float = isTopLevel lvl || exprIsCheap rhs || exprIsExpandable rhs + -- See Note [Float when cheap or expandable] + can_float = case ff of + FltLifted -> True + FltOkSpec -> isNotTopLevel lvl && isNonRec rec + FltCareful -> isNotTopLevel lvl && isNonRec rec && str + +{- +Note [Float when cheap or expandable] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We want to float a let from a let if the residual RHS is + a) cheap, such as (\x. blah) + b) expandable, such as (f b) if f is CONLIKE +But there are + - cheap things that are not expandable (eg \x. expensive) + - expandable things that are not cheap (eg (f b) where b is CONLIKE) +so we must take the 'or' of the two. +-} + +emptyLetFloats :: LetFloats +emptyLetFloats = LetFloats nilOL FltLifted + +emptyJoinFloats :: JoinFloats +emptyJoinFloats = nilOL + +unitLetFloat :: OutBind -> LetFloats +-- This key function constructs a singleton float with the right form +unitLetFloat bind = ASSERT(all (not . isJoinId) (bindersOf bind)) + LetFloats (unitOL bind) (flag bind) + where + flag (Rec {}) = FltLifted + flag (NonRec bndr rhs) + | not (isStrictId bndr) = FltLifted + | exprIsTickedString rhs = FltLifted + -- String literals can be floated freely. + -- See Note [Core top-level string literals] in GHC.Core. + | exprOkForSpeculation rhs = FltOkSpec -- Unlifted, and lifted but ok-for-spec (eg HNF) + | otherwise = ASSERT2( not (isUnliftedType (idType bndr)), ppr bndr ) + FltCareful + -- Unlifted binders can only be let-bound if exprOkForSpeculation holds + +unitJoinFloat :: OutBind -> JoinFloats +unitJoinFloat bind = ASSERT(all isJoinId (bindersOf bind)) + unitOL bind + +mkFloatBind :: SimplEnv -> OutBind -> (SimplFloats, SimplEnv) +-- Make a singleton SimplFloats, and +-- extend the incoming SimplEnv's in-scope set with its binders +-- These binders may already be in the in-scope set, +-- but may have by now been augmented with more IdInfo +mkFloatBind env bind + = (floats, env { seInScope = in_scope' }) + where + floats + | isJoinBind bind + = SimplFloats { sfLetFloats = emptyLetFloats + , sfJoinFloats = unitJoinFloat bind + , sfInScope = in_scope' } + | otherwise + = SimplFloats { sfLetFloats = unitLetFloat bind + , sfJoinFloats = emptyJoinFloats + , sfInScope = in_scope' } + + in_scope' = seInScope env `extendInScopeSetBind` bind + +extendFloats :: SimplFloats -> OutBind -> SimplFloats +-- Add this binding to the floats, and extend the in-scope env too +extendFloats (SimplFloats { sfLetFloats = floats + , sfJoinFloats = jfloats + , sfInScope = in_scope }) + bind + | isJoinBind bind + = SimplFloats { sfInScope = in_scope' + , sfLetFloats = floats + , sfJoinFloats = jfloats' } + | otherwise + = SimplFloats { sfInScope = in_scope' + , sfLetFloats = floats' + , sfJoinFloats = jfloats } + where + in_scope' = in_scope `extendInScopeSetBind` bind + floats' = floats `addLetFlts` unitLetFloat bind + jfloats' = jfloats `addJoinFlts` unitJoinFloat bind + +addLetFloats :: SimplFloats -> LetFloats -> SimplFloats +-- Add the let-floats for env2 to env1; +-- *plus* the in-scope set for env2, which is bigger +-- than that for env1 +addLetFloats floats let_floats@(LetFloats binds _) + = floats { sfLetFloats = sfLetFloats floats `addLetFlts` let_floats + , sfInScope = foldlOL extendInScopeSetBind + (sfInScope floats) binds } + +addJoinFloats :: SimplFloats -> JoinFloats -> SimplFloats +addJoinFloats floats join_floats + = floats { sfJoinFloats = sfJoinFloats floats `addJoinFlts` join_floats + , sfInScope = foldlOL extendInScopeSetBind + (sfInScope floats) join_floats } + +extendInScopeSetBind :: InScopeSet -> CoreBind -> InScopeSet +extendInScopeSetBind in_scope bind + = extendInScopeSetList in_scope (bindersOf bind) + +addFloats :: SimplFloats -> SimplFloats -> SimplFloats +-- Add both let-floats and join-floats for env2 to env1; +-- *plus* the in-scope set for env2, which is bigger +-- than that for env1 +addFloats (SimplFloats { sfLetFloats = lf1, sfJoinFloats = jf1 }) + (SimplFloats { sfLetFloats = lf2, sfJoinFloats = jf2, sfInScope = in_scope }) + = SimplFloats { sfLetFloats = lf1 `addLetFlts` lf2 + , sfJoinFloats = jf1 `addJoinFlts` jf2 + , sfInScope = in_scope } + +addLetFlts :: LetFloats -> LetFloats -> LetFloats +addLetFlts (LetFloats bs1 l1) (LetFloats bs2 l2) + = LetFloats (bs1 `appOL` bs2) (l1 `andFF` l2) + +letFloatBinds :: LetFloats -> [CoreBind] +letFloatBinds (LetFloats bs _) = fromOL bs + +addJoinFlts :: JoinFloats -> JoinFloats -> JoinFloats +addJoinFlts = appOL + +mkRecFloats :: SimplFloats -> SimplFloats +-- Flattens the floats from env2 into a single Rec group, +-- They must either all be lifted LetFloats or all JoinFloats +mkRecFloats floats@(SimplFloats { sfLetFloats = LetFloats bs ff + , sfJoinFloats = jbs + , sfInScope = in_scope }) + = ASSERT2( case ff of { FltLifted -> True; _ -> False }, ppr (fromOL bs) ) + ASSERT2( isNilOL bs || isNilOL jbs, ppr floats ) + SimplFloats { sfLetFloats = floats' + , sfJoinFloats = jfloats' + , sfInScope = in_scope } + where + floats' | isNilOL bs = emptyLetFloats + | otherwise = unitLetFloat (Rec (flattenBinds (fromOL bs))) + jfloats' | isNilOL jbs = emptyJoinFloats + | otherwise = unitJoinFloat (Rec (flattenBinds (fromOL jbs))) + +wrapFloats :: SimplFloats -> OutExpr -> OutExpr +-- Wrap the floats around the expression; they should all +-- satisfy the let/app invariant, so mkLets should do the job just fine +wrapFloats (SimplFloats { sfLetFloats = LetFloats bs _ + , sfJoinFloats = jbs }) body + = foldrOL Let (wrapJoinFloats jbs body) bs + -- Note: Always safe to put the joins on the inside + -- since the values can't refer to them + +wrapJoinFloatsX :: SimplFloats -> OutExpr -> (SimplFloats, OutExpr) +-- Wrap the sfJoinFloats of the env around the expression, +-- and take them out of the SimplEnv +wrapJoinFloatsX floats body + = ( floats { sfJoinFloats = emptyJoinFloats } + , wrapJoinFloats (sfJoinFloats floats) body ) + +wrapJoinFloats :: JoinFloats -> OutExpr -> OutExpr +-- Wrap the sfJoinFloats of the env around the expression, +-- and take them out of the SimplEnv +wrapJoinFloats join_floats body + = foldrOL Let body join_floats + +getTopFloatBinds :: SimplFloats -> [CoreBind] +getTopFloatBinds (SimplFloats { sfLetFloats = lbs + , sfJoinFloats = jbs}) + = ASSERT( isNilOL jbs ) -- Can't be any top-level join bindings + letFloatBinds lbs + +mapLetFloats :: LetFloats -> ((Id,CoreExpr) -> (Id,CoreExpr)) -> LetFloats +mapLetFloats (LetFloats fs ff) fun + = LetFloats (mapOL app fs) ff + where + app (NonRec b e) = case fun (b,e) of (b',e') -> NonRec b' e' + app (Rec bs) = Rec (map fun bs) + +{- +************************************************************************ +* * + Substitution of Vars +* * +************************************************************************ + +Note [Global Ids in the substitution] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We look up even a global (eg imported) Id in the substitution. Consider + case X.g_34 of b { (a,b) -> ... case X.g_34 of { (p,q) -> ...} ... } +The binder-swap in the occurrence analyser will add a binding +for a LocalId version of g (with the same unique though): + case X.g_34 of b { (a,b) -> let g_34 = b in + ... case X.g_34 of { (p,q) -> ...} ... } +So we want to look up the inner X.g_34 in the substitution, where we'll +find that it has been substituted by b. (Or conceivably cloned.) +-} + +substId :: SimplEnv -> InId -> SimplSR +-- Returns DoneEx only on a non-Var expression +substId (SimplEnv { seInScope = in_scope, seIdSubst = ids }) v + = case lookupVarEnv ids v of -- Note [Global Ids in the substitution] + Nothing -> DoneId (refineFromInScope in_scope v) + Just (DoneId v) -> DoneId (refineFromInScope in_scope v) + Just res -> res -- DoneEx non-var, or ContEx + + -- Get the most up-to-date thing from the in-scope set + -- Even though it isn't in the substitution, it may be in + -- the in-scope set with better IdInfo. + -- + -- See also Note [In-scope set as a substitution] in GHC.Core.Op.Simplify. + +refineFromInScope :: InScopeSet -> Var -> Var +refineFromInScope in_scope v + | isLocalId v = case lookupInScope in_scope v of + Just v' -> v' + Nothing -> WARN( True, ppr v ) v -- This is an error! + | otherwise = v + +lookupRecBndr :: SimplEnv -> InId -> OutId +-- Look up an Id which has been put into the envt by simplRecBndrs, +-- but where we have not yet done its RHS +lookupRecBndr (SimplEnv { seInScope = in_scope, seIdSubst = ids }) v + = case lookupVarEnv ids v of + Just (DoneId v) -> v + Just _ -> pprPanic "lookupRecBndr" (ppr v) + Nothing -> refineFromInScope in_scope v + +{- +************************************************************************ +* * +\section{Substituting an Id binder} +* * +************************************************************************ + + +These functions are in the monad only so that they can be made strict via seq. + +Note [Return type for join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + + (join j :: Char -> Int -> Int) 77 + ( j x = \y. y + ord x ) + (in case v of ) + ( A -> j 'x' ) + ( B -> j 'y' ) + ( C -> <blah> ) + +The simplifier pushes the "apply to 77" continuation inwards to give + + join j :: Char -> Int + j x = (\y. y + ord x) 77 + in case v of + A -> j 'x' + B -> j 'y' + C -> <blah> 77 + +Notice that the "apply to 77" continuation went into the RHS of the +join point. And that meant that the return type of the join point +changed!! + +That's why we pass res_ty into simplNonRecJoinBndr, and substIdBndr +takes a (Just res_ty) argument so that it knows to do the type-changing +thing. +-} + +simplBinders :: SimplEnv -> [InBndr] -> SimplM (SimplEnv, [OutBndr]) +simplBinders env bndrs = mapAccumLM simplBinder env bndrs + +------------- +simplBinder :: SimplEnv -> InBndr -> SimplM (SimplEnv, OutBndr) +-- Used for lambda and case-bound variables +-- Clone Id if necessary, substitute type +-- Return with IdInfo already substituted, but (fragile) occurrence info zapped +-- The substitution is extended only if the variable is cloned, because +-- we *don't* need to use it to track occurrence info. +simplBinder env bndr + | isTyVar bndr = do { let (env', tv) = substTyVarBndr env bndr + ; seqTyVar tv `seq` return (env', tv) } + | otherwise = do { let (env', id) = substIdBndr Nothing env bndr + ; seqId id `seq` return (env', id) } + +--------------- +simplNonRecBndr :: SimplEnv -> InBndr -> SimplM (SimplEnv, OutBndr) +-- A non-recursive let binder +simplNonRecBndr env id + = do { let (env1, id1) = substIdBndr Nothing env id + ; seqId id1 `seq` return (env1, id1) } + +--------------- +simplNonRecJoinBndr :: SimplEnv -> OutType -> InBndr + -> SimplM (SimplEnv, OutBndr) +-- A non-recursive let binder for a join point; +-- context being pushed inward may change the type +-- See Note [Return type for join points] +simplNonRecJoinBndr env res_ty id + = do { let (env1, id1) = substIdBndr (Just res_ty) env id + ; seqId id1 `seq` return (env1, id1) } + +--------------- +simplRecBndrs :: SimplEnv -> [InBndr] -> SimplM SimplEnv +-- Recursive let binders +simplRecBndrs env@(SimplEnv {}) ids + = ASSERT(all (not . isJoinId) ids) + do { let (env1, ids1) = mapAccumL (substIdBndr Nothing) env ids + ; seqIds ids1 `seq` return env1 } + +--------------- +simplRecJoinBndrs :: SimplEnv -> OutType -> [InBndr] -> SimplM SimplEnv +-- Recursive let binders for join points; +-- context being pushed inward may change types +-- See Note [Return type for join points] +simplRecJoinBndrs env@(SimplEnv {}) res_ty ids + = ASSERT(all isJoinId ids) + do { let (env1, ids1) = mapAccumL (substIdBndr (Just res_ty)) env ids + ; seqIds ids1 `seq` return env1 } + +--------------- +substIdBndr :: Maybe OutType -> SimplEnv -> InBndr -> (SimplEnv, OutBndr) +-- Might be a coercion variable +substIdBndr new_res_ty env bndr + | isCoVar bndr = substCoVarBndr env bndr + | otherwise = substNonCoVarIdBndr new_res_ty env bndr + +--------------- +substNonCoVarIdBndr + :: Maybe OutType -- New result type, if a join binder + -- See Note [Return type for join points] + -> SimplEnv + -> InBndr -- Env and binder to transform + -> (SimplEnv, OutBndr) +-- Clone Id if necessary, substitute its type +-- Return an Id with its +-- * Type substituted +-- * UnfoldingInfo, Rules, WorkerInfo zapped +-- * Fragile OccInfo (only) zapped: Note [Robust OccInfo] +-- * Robust info, retained especially arity and demand info, +-- so that they are available to occurrences that occur in an +-- earlier binding of a letrec +-- +-- For the robust info, see Note [Arity robustness] +-- +-- Augment the substitution if the unique changed +-- Extend the in-scope set with the new Id +-- +-- Similar to GHC.Core.Subst.substIdBndr, except that +-- the type of id_subst differs +-- all fragile info is zapped +substNonCoVarIdBndr new_res_ty + env@(SimplEnv { seInScope = in_scope + , seIdSubst = id_subst }) + old_id + = ASSERT2( not (isCoVar old_id), ppr old_id ) + (env { seInScope = in_scope `extendInScopeSet` new_id, + seIdSubst = new_subst }, new_id) + where + id1 = uniqAway in_scope old_id + id2 = substIdType env id1 + + id3 | Just res_ty <- new_res_ty + = id2 `setIdType` setJoinResTy (idJoinArity id2) res_ty (idType id2) + -- See Note [Return type for join points] + | otherwise + = id2 + + new_id = zapFragileIdInfo id3 -- Zaps rules, worker-info, unfolding + -- and fragile OccInfo + + -- Extend the substitution if the unique has changed, + -- or there's some useful occurrence information + -- See the notes with substTyVarBndr for the delSubstEnv + new_subst | new_id /= old_id + = extendVarEnv id_subst old_id (DoneId new_id) + | otherwise + = delVarEnv id_subst old_id + +------------------------------------ +seqTyVar :: TyVar -> () +seqTyVar b = b `seq` () + +seqId :: Id -> () +seqId id = seqType (idType id) `seq` + idInfo id `seq` + () + +seqIds :: [Id] -> () +seqIds [] = () +seqIds (id:ids) = seqId id `seq` seqIds ids + +{- +Note [Arity robustness] +~~~~~~~~~~~~~~~~~~~~~~~ +We *do* transfer the arity from from the in_id of a let binding to the +out_id. This is important, so that the arity of an Id is visible in +its own RHS. For example: + f = \x. ....g (\y. f y).... +We can eta-reduce the arg to g, because f is a value. But that +needs to be visible. + +This interacts with the 'state hack' too: + f :: Bool -> IO Int + f = \x. case x of + True -> f y + False -> \s -> ... +Can we eta-expand f? Only if we see that f has arity 1, and then we +take advantage of the 'state hack' on the result of +(f y) :: State# -> (State#, Int) to expand the arity one more. + +There is a disadvantage though. Making the arity visible in the RHS +allows us to eta-reduce + f = \x -> f x +to + f = f +which technically is not sound. This is very much a corner case, so +I'm not worried about it. Another idea is to ensure that f's arity +never decreases; its arity started as 1, and we should never eta-reduce +below that. + + +Note [Robust OccInfo] +~~~~~~~~~~~~~~~~~~~~~ +It's important that we *do* retain the loop-breaker OccInfo, because +that's what stops the Id getting inlined infinitely, in the body of +the letrec. +-} + + +{- +************************************************************************ +* * + Impedance matching to type substitution +* * +************************************************************************ +-} + +getTCvSubst :: SimplEnv -> TCvSubst +getTCvSubst (SimplEnv { seInScope = in_scope, seTvSubst = tv_env + , seCvSubst = cv_env }) + = mkTCvSubst in_scope (tv_env, cv_env) + +substTy :: SimplEnv -> Type -> Type +substTy env ty = Type.substTy (getTCvSubst env) ty + +substTyVar :: SimplEnv -> TyVar -> Type +substTyVar env tv = Type.substTyVar (getTCvSubst env) tv + +substTyVarBndr :: SimplEnv -> TyVar -> (SimplEnv, TyVar) +substTyVarBndr env tv + = case Type.substTyVarBndr (getTCvSubst env) tv of + (TCvSubst in_scope' tv_env' cv_env', tv') + -> (env { seInScope = in_scope', seTvSubst = tv_env', seCvSubst = cv_env' }, tv') + +substCoVar :: SimplEnv -> CoVar -> Coercion +substCoVar env tv = Coercion.substCoVar (getTCvSubst env) tv + +substCoVarBndr :: SimplEnv -> CoVar -> (SimplEnv, CoVar) +substCoVarBndr env cv + = case Coercion.substCoVarBndr (getTCvSubst env) cv of + (TCvSubst in_scope' tv_env' cv_env', cv') + -> (env { seInScope = in_scope', seTvSubst = tv_env', seCvSubst = cv_env' }, cv') + +substCo :: SimplEnv -> Coercion -> Coercion +substCo env co = Coercion.substCo (getTCvSubst env) co + +------------------ +substIdType :: SimplEnv -> Id -> Id +substIdType (SimplEnv { seInScope = in_scope, seTvSubst = tv_env, seCvSubst = cv_env }) id + | (isEmptyVarEnv tv_env && isEmptyVarEnv cv_env) + || noFreeVarsOfType old_ty + = id + | otherwise = Id.setIdType id (Type.substTy (TCvSubst in_scope tv_env cv_env) old_ty) + -- The tyCoVarsOfType is cheaper than it looks + -- because we cache the free tyvars of the type + -- in a Note in the id's type itself + where + old_ty = idType id diff --git a/compiler/GHC/Core/Op/Simplify/Monad.hs b/compiler/GHC/Core/Op/Simplify/Monad.hs new file mode 100644 index 0000000000..e6b23734c4 --- /dev/null +++ b/compiler/GHC/Core/Op/Simplify/Monad.hs @@ -0,0 +1,252 @@ +{- +(c) The AQUA Project, Glasgow University, 1993-1998 + +\section[GHC.Core.Op.Simplify.Monad]{The simplifier Monad} +-} + +{-# LANGUAGE DeriveFunctor #-} +module GHC.Core.Op.Simplify.Monad ( + -- The monad + SimplM, + initSmpl, traceSmpl, + getSimplRules, getFamEnvs, + + -- Unique supply + MonadUnique(..), newId, newJoinId, + + -- Counting + SimplCount, tick, freeTick, checkedTick, + getSimplCount, zeroSimplCount, pprSimplCount, + plusSimplCount, isZeroSimplCount + ) where + +import GhcPrelude + +import Var ( Var, isId, mkLocalVar ) +import Name ( mkSystemVarName ) +import Id ( Id, mkSysLocalOrCoVar ) +import IdInfo ( IdDetails(..), vanillaIdInfo, setArityInfo ) +import GHC.Core.Type ( Type, mkLamTypes ) +import GHC.Core.FamInstEnv ( FamInstEnv ) +import GHC.Core ( RuleEnv(..) ) +import UniqSupply +import GHC.Driver.Session +import GHC.Core.Op.Monad +import Outputable +import FastString +import MonadUtils +import ErrUtils as Err +import Util ( count ) +import Panic (throwGhcExceptionIO, GhcException (..)) +import BasicTypes ( IntWithInf, treatZeroAsInf, mkIntWithInf ) +import Control.Monad ( ap ) + +{- +************************************************************************ +* * +\subsection{Monad plumbing} +* * +************************************************************************ + +For the simplifier monad, we want to {\em thread} a unique supply and a counter. +(Command-line switches move around through the explicitly-passed SimplEnv.) +-} + +newtype SimplM result + = SM { unSM :: SimplTopEnv -- Envt that does not change much + -> UniqSupply -- We thread the unique supply because + -- constantly splitting it is rather expensive + -> SimplCount + -> IO (result, UniqSupply, SimplCount)} + -- we only need IO here for dump output + deriving (Functor) + +data SimplTopEnv + = STE { st_flags :: DynFlags + , st_max_ticks :: IntWithInf -- Max #ticks in this simplifier run + , st_rules :: RuleEnv + , st_fams :: (FamInstEnv, FamInstEnv) } + +initSmpl :: DynFlags -> RuleEnv -> (FamInstEnv, FamInstEnv) + -> UniqSupply -- No init count; set to 0 + -> Int -- Size of the bindings, used to limit + -- the number of ticks we allow + -> SimplM a + -> IO (a, SimplCount) + +initSmpl dflags rules fam_envs us size m + = do (result, _, count) <- unSM m env us (zeroSimplCount dflags) + return (result, count) + where + env = STE { st_flags = dflags, st_rules = rules + , st_max_ticks = computeMaxTicks dflags size + , st_fams = fam_envs } + +computeMaxTicks :: DynFlags -> Int -> IntWithInf +-- Compute the max simplifier ticks as +-- (base-size + pgm-size) * magic-multiplier * tick-factor/100 +-- where +-- magic-multiplier is a constant that gives reasonable results +-- base-size is a constant to deal with size-zero programs +computeMaxTicks dflags size + = treatZeroAsInf $ + fromInteger ((toInteger (size + base_size) + * toInteger (tick_factor * magic_multiplier)) + `div` 100) + where + tick_factor = simplTickFactor dflags + base_size = 100 + magic_multiplier = 40 + -- MAGIC NUMBER, multiplies the simplTickFactor + -- We can afford to be generous; this is really + -- just checking for loops, and shouldn't usually fire + -- A figure of 20 was too small: see #5539. + +{-# INLINE thenSmpl #-} +{-# INLINE thenSmpl_ #-} +{-# INLINE returnSmpl #-} + + +instance Applicative SimplM where + pure = returnSmpl + (<*>) = ap + (*>) = thenSmpl_ + +instance Monad SimplM where + (>>) = (*>) + (>>=) = thenSmpl + +returnSmpl :: a -> SimplM a +returnSmpl e = SM (\_st_env us sc -> return (e, us, sc)) + +thenSmpl :: SimplM a -> (a -> SimplM b) -> SimplM b +thenSmpl_ :: SimplM a -> SimplM b -> SimplM b + +thenSmpl m k + = SM $ \st_env us0 sc0 -> do + (m_result, us1, sc1) <- unSM m st_env us0 sc0 + unSM (k m_result) st_env us1 sc1 + +thenSmpl_ m k + = SM $ \st_env us0 sc0 -> do + (_, us1, sc1) <- unSM m st_env us0 sc0 + unSM k st_env us1 sc1 + +-- TODO: this specializing is not allowed +-- {-# SPECIALIZE mapM :: (a -> SimplM b) -> [a] -> SimplM [b] #-} +-- {-# SPECIALIZE mapAndUnzipM :: (a -> SimplM (b, c)) -> [a] -> SimplM ([b],[c]) #-} +-- {-# SPECIALIZE mapAccumLM :: (acc -> b -> SimplM (acc,c)) -> acc -> [b] -> SimplM (acc, [c]) #-} + +traceSmpl :: String -> SDoc -> SimplM () +traceSmpl herald doc + = do { dflags <- getDynFlags + ; liftIO $ Err.dumpIfSet_dyn dflags Opt_D_dump_simpl_trace "Simpl Trace" + FormatText + (hang (text herald) 2 doc) } + +{- +************************************************************************ +* * +\subsection{The unique supply} +* * +************************************************************************ +-} + +instance MonadUnique SimplM where + getUniqueSupplyM + = SM (\_st_env us sc -> case splitUniqSupply us of + (us1, us2) -> return (us1, us2, sc)) + + getUniqueM + = SM (\_st_env us sc -> case takeUniqFromSupply us of + (u, us') -> return (u, us', sc)) + + getUniquesM + = SM (\_st_env us sc -> case splitUniqSupply us of + (us1, us2) -> return (uniqsFromSupply us1, us2, sc)) + +instance HasDynFlags SimplM where + getDynFlags = SM (\st_env us sc -> return (st_flags st_env, us, sc)) + +instance MonadIO SimplM where + liftIO m = SM $ \_ us sc -> do + x <- m + return (x, us, sc) + +getSimplRules :: SimplM RuleEnv +getSimplRules = SM (\st_env us sc -> return (st_rules st_env, us, sc)) + +getFamEnvs :: SimplM (FamInstEnv, FamInstEnv) +getFamEnvs = SM (\st_env us sc -> return (st_fams st_env, us, sc)) + +newId :: FastString -> Type -> SimplM Id +newId fs ty = do uniq <- getUniqueM + return (mkSysLocalOrCoVar fs uniq ty) + +newJoinId :: [Var] -> Type -> SimplM Id +newJoinId bndrs body_ty + = do { uniq <- getUniqueM + ; let name = mkSystemVarName uniq (fsLit "$j") + join_id_ty = mkLamTypes bndrs body_ty -- Note [Funky mkLamTypes] + arity = count isId bndrs + -- arity: See Note [Invariants on join points] invariant 2b, in GHC.Core + join_arity = length bndrs + details = JoinId join_arity + id_info = vanillaIdInfo `setArityInfo` arity +-- `setOccInfo` strongLoopBreaker + + ; return (mkLocalVar details name join_id_ty id_info) } + +{- +************************************************************************ +* * +\subsection{Counting up what we've done} +* * +************************************************************************ +-} + +getSimplCount :: SimplM SimplCount +getSimplCount = SM (\_st_env us sc -> return (sc, us, sc)) + +tick :: Tick -> SimplM () +tick t = SM (\st_env us sc -> let sc' = doSimplTick (st_flags st_env) t sc + in sc' `seq` return ((), us, sc')) + +checkedTick :: Tick -> SimplM () +-- Try to take a tick, but fail if too many +checkedTick t + = SM (\st_env us sc -> + if st_max_ticks st_env <= mkIntWithInf (simplCountN sc) + then throwGhcExceptionIO $ + PprProgramError "Simplifier ticks exhausted" (msg sc) + else let sc' = doSimplTick (st_flags st_env) t sc + in sc' `seq` return ((), us, sc')) + where + msg sc = vcat + [ text "When trying" <+> ppr t + , text "To increase the limit, use -fsimpl-tick-factor=N (default 100)." + , space + , text "If you need to increase the limit substantially, please file a" + , text "bug report and indicate the factor you needed." + , space + , text "If GHC was unable to complete compilation even" + <+> text "with a very large factor" + , text "(a thousand or more), please consult the" + <+> doubleQuotes (text "Known bugs or infelicities") + , text "section in the Users Guide before filing a report. There are a" + , text "few situations unlikely to occur in practical programs for which" + , text "simplifier non-termination has been judged acceptable." + , space + , pp_details sc + , pprSimplCount sc ] + pp_details sc + | hasDetailedCounts sc = empty + | otherwise = text "To see detailed counts use -ddump-simpl-stats" + + +freeTick :: Tick -> SimplM () +-- Record a tick, but don't add to the total tick count, which is +-- used to decide when nothing further has happened +freeTick t + = SM (\_st_env us sc -> let sc' = doFreeSimplTick t sc + in sc' `seq` return ((), us, sc')) diff --git a/compiler/GHC/Core/Op/Simplify/Utils.hs b/compiler/GHC/Core/Op/Simplify/Utils.hs new file mode 100644 index 0000000000..e62c256354 --- /dev/null +++ b/compiler/GHC/Core/Op/Simplify/Utils.hs @@ -0,0 +1,2329 @@ +{- +(c) The AQUA Project, Glasgow University, 1993-1998 + +The simplifier utilities +-} + +{-# LANGUAGE CPP #-} + +module GHC.Core.Op.Simplify.Utils ( + -- Rebuilding + mkLam, mkCase, prepareAlts, tryEtaExpandRhs, + + -- Inlining, + preInlineUnconditionally, postInlineUnconditionally, + activeUnfolding, activeRule, + getUnfoldingInRuleMatch, + simplEnvForGHCi, updModeForStableUnfoldings, updModeForRules, + + -- The continuation type + SimplCont(..), DupFlag(..), StaticEnv, + isSimplified, contIsStop, + contIsDupable, contResultType, contHoleType, + contIsTrivial, contArgs, + countArgs, + mkBoringStop, mkRhsStop, mkLazyArgStop, contIsRhsOrArg, + interestingCallContext, + + -- ArgInfo + ArgInfo(..), ArgSpec(..), mkArgInfo, + addValArgTo, addCastTo, addTyArgTo, + argInfoExpr, argInfoAppArgs, pushSimplifiedArgs, + + abstractFloats, + + -- Utilities + isExitJoinId + ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core.Op.Simplify.Env +import GHC.Core.Op.Monad ( SimplMode(..), Tick(..) ) +import GHC.Driver.Session +import GHC.Core +import qualified GHC.Core.Subst +import GHC.Core.Ppr +import GHC.Core.TyCo.Ppr ( pprParendType ) +import GHC.Core.FVs +import GHC.Core.Utils +import GHC.Core.Arity +import GHC.Core.Unfold +import Name +import Id +import IdInfo +import Var +import Demand +import GHC.Core.Op.Simplify.Monad +import GHC.Core.Type hiding( substTy ) +import GHC.Core.Coercion hiding( substCo ) +import GHC.Core.DataCon ( dataConWorkId, isNullaryRepDataCon ) +import VarSet +import BasicTypes +import Util +import OrdList ( isNilOL ) +import MonadUtils +import Outputable +import GHC.Core.Op.ConstantFold +import FastString ( fsLit ) + +import Control.Monad ( when ) +import Data.List ( sortBy ) + +{- +************************************************************************ +* * + The SimplCont and DupFlag types +* * +************************************************************************ + +A SimplCont allows the simplifier to traverse the expression in a +zipper-like fashion. The SimplCont represents the rest of the expression, +"above" the point of interest. + +You can also think of a SimplCont as an "evaluation context", using +that term in the way it is used for operational semantics. This is the +way I usually think of it, For example you'll often see a syntax for +evaluation context looking like + C ::= [] | C e | case C of alts | C `cast` co +That's the kind of thing we are doing here, and I use that syntax in +the comments. + + +Key points: + * A SimplCont describes a *strict* context (just like + evaluation contexts do). E.g. Just [] is not a SimplCont + + * A SimplCont describes a context that *does not* bind + any variables. E.g. \x. [] is not a SimplCont +-} + +data SimplCont + = Stop -- Stop[e] = e + OutType -- Type of the <hole> + CallCtxt -- Tells if there is something interesting about + -- the context, and hence the inliner + -- should be a bit keener (see interestingCallContext) + -- Specifically: + -- This is an argument of a function that has RULES + -- Inlining the call might allow the rule to fire + -- Never ValAppCxt (use ApplyToVal instead) + -- or CaseCtxt (use Select instead) + + | CastIt -- (CastIt co K)[e] = K[ e `cast` co ] + OutCoercion -- The coercion simplified + -- Invariant: never an identity coercion + SimplCont + + | ApplyToVal -- (ApplyToVal arg K)[e] = K[ e arg ] + { sc_dup :: DupFlag -- See Note [DupFlag invariants] + , sc_arg :: InExpr -- The argument, + , sc_env :: StaticEnv -- see Note [StaticEnv invariant] + , sc_cont :: SimplCont } + + | ApplyToTy -- (ApplyToTy ty K)[e] = K[ e ty ] + { sc_arg_ty :: OutType -- Argument type + , sc_hole_ty :: OutType -- Type of the function, presumably (forall a. blah) + -- See Note [The hole type in ApplyToTy] + , sc_cont :: SimplCont } + + | Select -- (Select alts K)[e] = K[ case e of alts ] + { sc_dup :: DupFlag -- See Note [DupFlag invariants] + , sc_bndr :: InId -- case binder + , sc_alts :: [InAlt] -- Alternatives + , sc_env :: StaticEnv -- See Note [StaticEnv invariant] + , sc_cont :: SimplCont } + + -- The two strict forms have no DupFlag, because we never duplicate them + | StrictBind -- (StrictBind x xs b K)[e] = let x = e in K[\xs.b] + -- or, equivalently, = K[ (\x xs.b) e ] + { sc_dup :: DupFlag -- See Note [DupFlag invariants] + , sc_bndr :: InId + , sc_bndrs :: [InBndr] + , sc_body :: InExpr + , sc_env :: StaticEnv -- See Note [StaticEnv invariant] + , sc_cont :: SimplCont } + + | StrictArg -- (StrictArg (f e1 ..en) K)[e] = K[ f e1 .. en e ] + { sc_dup :: DupFlag -- Always Simplified or OkToDup + , sc_fun :: ArgInfo -- Specifies f, e1..en, Whether f has rules, etc + -- plus strictness flags for *further* args + , sc_cci :: CallCtxt -- Whether *this* argument position is interesting + , sc_cont :: SimplCont } + + | TickIt -- (TickIt t K)[e] = K[ tick t e ] + (Tickish Id) -- Tick tickish <hole> + SimplCont + +type StaticEnv = SimplEnv -- Just the static part is relevant + +data DupFlag = NoDup -- Unsimplified, might be big + | Simplified -- Simplified + | OkToDup -- Simplified and small + +isSimplified :: DupFlag -> Bool +isSimplified NoDup = False +isSimplified _ = True -- Invariant: the subst-env is empty + +perhapsSubstTy :: DupFlag -> StaticEnv -> Type -> Type +perhapsSubstTy dup env ty + | isSimplified dup = ty + | otherwise = substTy env ty + +{- Note [StaticEnv invariant] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We pair up an InExpr or InAlts with a StaticEnv, which establishes the +lexical scope for that InExpr. When we simplify that InExpr/InAlts, we +use + - Its captured StaticEnv + - Overriding its InScopeSet with the larger one at the + simplification point. + +Why override the InScopeSet? Example: + (let y = ey in f) ex +By the time we simplify ex, 'y' will be in scope. + +However the InScopeSet in the StaticEnv is not irrelevant: it should +include all the free vars of applying the substitution to the InExpr. +Reason: contHoleType uses perhapsSubstTy to apply the substitution to +the expression, and that (rightly) gives ASSERT failures if the InScopeSet +isn't big enough. + +Note [DupFlag invariants] +~~~~~~~~~~~~~~~~~~~~~~~~~ +In both (ApplyToVal dup _ env k) + and (Select dup _ _ env k) +the following invariants hold + + (a) if dup = OkToDup, then continuation k is also ok-to-dup + (b) if dup = OkToDup or Simplified, the subst-env is empty + (and and hence no need to re-simplify) +-} + +instance Outputable DupFlag where + ppr OkToDup = text "ok" + ppr NoDup = text "nodup" + ppr Simplified = text "simpl" + +instance Outputable SimplCont where + ppr (Stop ty interesting) = text "Stop" <> brackets (ppr interesting) <+> ppr ty + ppr (CastIt co cont ) = (text "CastIt" <+> pprOptCo co) $$ ppr cont + ppr (TickIt t cont) = (text "TickIt" <+> ppr t) $$ ppr cont + ppr (ApplyToTy { sc_arg_ty = ty, sc_cont = cont }) + = (text "ApplyToTy" <+> pprParendType ty) $$ ppr cont + ppr (ApplyToVal { sc_arg = arg, sc_dup = dup, sc_cont = cont }) + = (text "ApplyToVal" <+> ppr dup <+> pprParendExpr arg) + $$ ppr cont + ppr (StrictBind { sc_bndr = b, sc_cont = cont }) + = (text "StrictBind" <+> ppr b) $$ ppr cont + ppr (StrictArg { sc_fun = ai, sc_cont = cont }) + = (text "StrictArg" <+> ppr (ai_fun ai)) $$ ppr cont + ppr (Select { sc_dup = dup, sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont }) + = (text "Select" <+> ppr dup <+> ppr bndr) $$ + whenPprDebug (nest 2 $ vcat [ppr (seTvSubst se), ppr alts]) $$ ppr cont + + +{- Note [The hole type in ApplyToTy] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The sc_hole_ty field of ApplyToTy records the type of the "hole" in the +continuation. It is absolutely necessary to compute contHoleType, but it is +not used for anything else (and hence may not be evaluated). + +Why is it necessary for contHoleType? Consider the continuation + ApplyToType Int (Stop Int) +corresponding to + (<hole> @Int) :: Int +What is the type of <hole>? It could be (forall a. Int) or (forall a. a), +and there is no way to know which, so we must record it. + +In a chain of applications (f @t1 @t2 @t3) we'll lazily compute exprType +for (f @t1) and (f @t1 @t2), which is potentially non-linear; but it probably +doesn't matter because we'll never compute them all. + +************************************************************************ +* * + ArgInfo and ArgSpec +* * +************************************************************************ +-} + +data ArgInfo + = ArgInfo { + ai_fun :: OutId, -- The function + ai_args :: [ArgSpec], -- ...applied to these args (which are in *reverse* order) + + ai_type :: OutType, -- Type of (f a1 ... an) + + ai_rules :: FunRules, -- Rules for this function + + ai_encl :: Bool, -- Flag saying whether this function + -- or an enclosing one has rules (recursively) + -- True => be keener to inline in all args + + ai_strs :: [Bool], -- Strictness of remaining arguments + -- Usually infinite, but if it is finite it guarantees + -- that the function diverges after being given + -- that number of args + ai_discs :: [Int] -- Discounts for remaining arguments; non-zero => be keener to inline + -- Always infinite + } + +data ArgSpec + = ValArg OutExpr -- Apply to this (coercion or value); c.f. ApplyToVal + | TyArg { as_arg_ty :: OutType -- Apply to this type; c.f. ApplyToTy + , as_hole_ty :: OutType } -- Type of the function (presumably forall a. blah) + | CastBy OutCoercion -- Cast by this; c.f. CastIt + +instance Outputable ArgSpec where + ppr (ValArg e) = text "ValArg" <+> ppr e + ppr (TyArg { as_arg_ty = ty }) = text "TyArg" <+> ppr ty + ppr (CastBy c) = text "CastBy" <+> ppr c + +addValArgTo :: ArgInfo -> OutExpr -> ArgInfo +addValArgTo ai arg = ai { ai_args = ValArg arg : ai_args ai + , ai_type = applyTypeToArg (ai_type ai) arg + , ai_rules = decRules (ai_rules ai) } + +addTyArgTo :: ArgInfo -> OutType -> ArgInfo +addTyArgTo ai arg_ty = ai { ai_args = arg_spec : ai_args ai + , ai_type = piResultTy poly_fun_ty arg_ty + , ai_rules = decRules (ai_rules ai) } + where + poly_fun_ty = ai_type ai + arg_spec = TyArg { as_arg_ty = arg_ty, as_hole_ty = poly_fun_ty } + +addCastTo :: ArgInfo -> OutCoercion -> ArgInfo +addCastTo ai co = ai { ai_args = CastBy co : ai_args ai + , ai_type = coercionRKind co } + +argInfoAppArgs :: [ArgSpec] -> [OutExpr] +argInfoAppArgs [] = [] +argInfoAppArgs (CastBy {} : _) = [] -- Stop at a cast +argInfoAppArgs (ValArg e : as) = e : argInfoAppArgs as +argInfoAppArgs (TyArg { as_arg_ty = ty } : as) = Type ty : argInfoAppArgs as + +pushSimplifiedArgs :: SimplEnv -> [ArgSpec] -> SimplCont -> SimplCont +pushSimplifiedArgs _env [] k = k +pushSimplifiedArgs env (arg : args) k + = case arg of + TyArg { as_arg_ty = arg_ty, as_hole_ty = hole_ty } + -> ApplyToTy { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = rest } + ValArg e -> ApplyToVal { sc_arg = e, sc_env = env, sc_dup = Simplified, sc_cont = rest } + CastBy c -> CastIt c rest + where + rest = pushSimplifiedArgs env args k + -- The env has an empty SubstEnv + +argInfoExpr :: OutId -> [ArgSpec] -> OutExpr +-- NB: the [ArgSpec] is reversed so that the first arg +-- in the list is the last one in the application +argInfoExpr fun rev_args + = go rev_args + where + go [] = Var fun + go (ValArg a : as) = go as `App` a + go (TyArg { as_arg_ty = ty } : as) = go as `App` Type ty + go (CastBy co : as) = mkCast (go as) co + + +type FunRules = Maybe (Int, [CoreRule]) -- Remaining rules for this function + -- Nothing => No rules + -- Just (n, rules) => some rules, requiring at least n more type/value args + +decRules :: FunRules -> FunRules +decRules (Just (n, rules)) = Just (n-1, rules) +decRules Nothing = Nothing + +mkFunRules :: [CoreRule] -> FunRules +mkFunRules [] = Nothing +mkFunRules rs = Just (n_required, rs) + where + n_required = maximum (map ruleArity rs) + +{- +************************************************************************ +* * + Functions on SimplCont +* * +************************************************************************ +-} + +mkBoringStop :: OutType -> SimplCont +mkBoringStop ty = Stop ty BoringCtxt + +mkRhsStop :: OutType -> SimplCont -- See Note [RHS of lets] in GHC.Core.Unfold +mkRhsStop ty = Stop ty RhsCtxt + +mkLazyArgStop :: OutType -> CallCtxt -> SimplCont +mkLazyArgStop ty cci = Stop ty cci + +------------------- +contIsRhsOrArg :: SimplCont -> Bool +contIsRhsOrArg (Stop {}) = True +contIsRhsOrArg (StrictBind {}) = True +contIsRhsOrArg (StrictArg {}) = True +contIsRhsOrArg _ = False + +contIsRhs :: SimplCont -> Bool +contIsRhs (Stop _ RhsCtxt) = True +contIsRhs _ = False + +------------------- +contIsStop :: SimplCont -> Bool +contIsStop (Stop {}) = True +contIsStop _ = False + +contIsDupable :: SimplCont -> Bool +contIsDupable (Stop {}) = True +contIsDupable (ApplyToTy { sc_cont = k }) = contIsDupable k +contIsDupable (ApplyToVal { sc_dup = OkToDup }) = True -- See Note [DupFlag invariants] +contIsDupable (Select { sc_dup = OkToDup }) = True -- ...ditto... +contIsDupable (StrictArg { sc_dup = OkToDup }) = True -- ...ditto... +contIsDupable (CastIt _ k) = contIsDupable k +contIsDupable _ = False + +------------------- +contIsTrivial :: SimplCont -> Bool +contIsTrivial (Stop {}) = True +contIsTrivial (ApplyToTy { sc_cont = k }) = contIsTrivial k +contIsTrivial (ApplyToVal { sc_arg = Coercion _, sc_cont = k }) = contIsTrivial k +contIsTrivial (CastIt _ k) = contIsTrivial k +contIsTrivial _ = False + +------------------- +contResultType :: SimplCont -> OutType +contResultType (Stop ty _) = ty +contResultType (CastIt _ k) = contResultType k +contResultType (StrictBind { sc_cont = k }) = contResultType k +contResultType (StrictArg { sc_cont = k }) = contResultType k +contResultType (Select { sc_cont = k }) = contResultType k +contResultType (ApplyToTy { sc_cont = k }) = contResultType k +contResultType (ApplyToVal { sc_cont = k }) = contResultType k +contResultType (TickIt _ k) = contResultType k + +contHoleType :: SimplCont -> OutType +contHoleType (Stop ty _) = ty +contHoleType (TickIt _ k) = contHoleType k +contHoleType (CastIt co _) = coercionLKind co +contHoleType (StrictBind { sc_bndr = b, sc_dup = dup, sc_env = se }) + = perhapsSubstTy dup se (idType b) +contHoleType (StrictArg { sc_fun = ai }) = funArgTy (ai_type ai) +contHoleType (ApplyToTy { sc_hole_ty = ty }) = ty -- See Note [The hole type in ApplyToTy] +contHoleType (ApplyToVal { sc_arg = e, sc_env = se, sc_dup = dup, sc_cont = k }) + = mkVisFunTy (perhapsSubstTy dup se (exprType e)) + (contHoleType k) +contHoleType (Select { sc_dup = d, sc_bndr = b, sc_env = se }) + = perhapsSubstTy d se (idType b) + +------------------- +countArgs :: SimplCont -> Int +-- Count all arguments, including types, coercions, and other values +countArgs (ApplyToTy { sc_cont = cont }) = 1 + countArgs cont +countArgs (ApplyToVal { sc_cont = cont }) = 1 + countArgs cont +countArgs _ = 0 + +contArgs :: SimplCont -> (Bool, [ArgSummary], SimplCont) +-- Summarises value args, discards type args and coercions +-- The returned continuation of the call is only used to +-- answer questions like "are you interesting?" +contArgs cont + | lone cont = (True, [], cont) + | otherwise = go [] cont + where + lone (ApplyToTy {}) = False -- See Note [Lone variables] in GHC.Core.Unfold + lone (ApplyToVal {}) = False + lone (CastIt {}) = False + lone _ = True + + go args (ApplyToVal { sc_arg = arg, sc_env = se, sc_cont = k }) + = go (is_interesting arg se : args) k + go args (ApplyToTy { sc_cont = k }) = go args k + go args (CastIt _ k) = go args k + go args k = (False, reverse args, k) + + is_interesting arg se = interestingArg se arg + -- Do *not* use short-cutting substitution here + -- because we want to get as much IdInfo as possible + + +------------------- +mkArgInfo :: SimplEnv + -> Id + -> [CoreRule] -- Rules for function + -> Int -- Number of value args + -> SimplCont -- Context of the call + -> ArgInfo + +mkArgInfo env fun rules n_val_args call_cont + | n_val_args < idArity fun -- Note [Unsaturated functions] + = ArgInfo { ai_fun = fun, ai_args = [], ai_type = fun_ty + , ai_rules = fun_rules + , ai_encl = False + , ai_strs = vanilla_stricts + , ai_discs = vanilla_discounts } + | otherwise + = ArgInfo { ai_fun = fun, ai_args = [], ai_type = fun_ty + , ai_rules = fun_rules + , ai_encl = interestingArgContext rules call_cont + , ai_strs = arg_stricts + , ai_discs = arg_discounts } + where + fun_ty = idType fun + + fun_rules = mkFunRules rules + + vanilla_discounts, arg_discounts :: [Int] + vanilla_discounts = repeat 0 + arg_discounts = case idUnfolding fun of + CoreUnfolding {uf_guidance = UnfIfGoodArgs {ug_args = discounts}} + -> discounts ++ vanilla_discounts + _ -> vanilla_discounts + + vanilla_stricts, arg_stricts :: [Bool] + vanilla_stricts = repeat False + + arg_stricts + | not (sm_inline (seMode env)) + = vanilla_stricts -- See Note [Do not expose strictness if sm_inline=False] + | otherwise + = add_type_str fun_ty $ + case splitStrictSig (idStrictness fun) of + (demands, result_info) + | not (demands `lengthExceeds` n_val_args) + -> -- Enough args, use the strictness given. + -- For bottoming functions we used to pretend that the arg + -- is lazy, so that we don't treat the arg as an + -- interesting context. This avoids substituting + -- top-level bindings for (say) strings into + -- calls to error. But now we are more careful about + -- inlining lone variables, so it's ok + -- (see GHC.Core.Op.Simplify.Utils.analyseCont) + if isBotDiv result_info then + map isStrictDmd demands -- Finite => result is bottom + else + map isStrictDmd demands ++ vanilla_stricts + | otherwise + -> WARN( True, text "More demands than arity" <+> ppr fun <+> ppr (idArity fun) + <+> ppr n_val_args <+> ppr demands ) + vanilla_stricts -- Not enough args, or no strictness + + add_type_str :: Type -> [Bool] -> [Bool] + -- If the function arg types are strict, record that in the 'strictness bits' + -- No need to instantiate because unboxed types (which dominate the strict + -- types) can't instantiate type variables. + -- add_type_str is done repeatedly (for each call); + -- might be better once-for-all in the function + -- But beware primops/datacons with no strictness + + add_type_str _ [] = [] + add_type_str fun_ty all_strs@(str:strs) + | Just (arg_ty, fun_ty') <- splitFunTy_maybe fun_ty -- Add strict-type info + = (str || Just False == isLiftedType_maybe arg_ty) + : add_type_str fun_ty' strs + -- If the type is levity-polymorphic, we can't know whether it's + -- strict. isLiftedType_maybe will return Just False only when + -- we're sure the type is unlifted. + + | Just (_, fun_ty') <- splitForAllTy_maybe fun_ty + = add_type_str fun_ty' all_strs -- Look through foralls + + | otherwise + = all_strs + +{- Note [Unsaturated functions] + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider (test eyeball/inline4) + x = a:as + y = f x +where f has arity 2. Then we do not want to inline 'x', because +it'll just be floated out again. Even if f has lots of discounts +on its first argument -- it must be saturated for these to kick in + +Note [Do not expose strictness if sm_inline=False] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +#15163 showed a case in which we had + + {-# INLINE [1] zip #-} + zip = undefined + + {-# RULES "foo" forall as bs. stream (zip as bs) = ..blah... #-} + +If we expose zip's bottoming nature when simplifying the LHS of the +RULE we get + {-# RULES "foo" forall as bs. + stream (case zip of {}) = ..blah... #-} +discarding the arguments to zip. Usually this is fine, but on the +LHS of a rule it's not, because 'as' and 'bs' are now not bound on +the LHS. + +This is a pretty pathological example, so I'm not losing sleep over +it, but the simplest solution was to check sm_inline; if it is False, +which it is on the LHS of a rule (see updModeForRules), then don't +make use of the strictness info for the function. +-} + + +{- +************************************************************************ +* * + Interesting arguments +* * +************************************************************************ + +Note [Interesting call context] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We want to avoid inlining an expression where there can't possibly be +any gain, such as in an argument position. Hence, if the continuation +is interesting (eg. a case scrutinee, application etc.) then we +inline, otherwise we don't. + +Previously some_benefit used to return True only if the variable was +applied to some value arguments. This didn't work: + + let x = _coerce_ (T Int) Int (I# 3) in + case _coerce_ Int (T Int) x of + I# y -> .... + +we want to inline x, but can't see that it's a constructor in a case +scrutinee position, and some_benefit is False. + +Another example: + +dMonadST = _/\_ t -> :Monad (g1 _@_ t, g2 _@_ t, g3 _@_ t) + +.... case dMonadST _@_ x0 of (a,b,c) -> .... + +we'd really like to inline dMonadST here, but we *don't* want to +inline if the case expression is just + + case x of y { DEFAULT -> ... } + +since we can just eliminate this case instead (x is in WHNF). Similar +applies when x is bound to a lambda expression. Hence +contIsInteresting looks for case expressions with just a single +default case. + +Note [No case of case is boring] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we see + case f x of <alts> + +we'd usually treat the context as interesting, to encourage 'f' to +inline. But if case-of-case is off, it's really not so interesting +after all, because we are unlikely to be able to push the case +expression into the branches of any case in f's unfolding. So, to +reduce unnecessary code expansion, we just make the context look boring. +This made a small compile-time perf improvement in perf/compiler/T6048, +and it looks plausible to me. +-} + +interestingCallContext :: SimplEnv -> SimplCont -> CallCtxt +-- See Note [Interesting call context] +interestingCallContext env cont + = interesting cont + where + interesting (Select {}) + | sm_case_case (getMode env) = CaseCtxt + | otherwise = BoringCtxt + -- See Note [No case of case is boring] + + interesting (ApplyToVal {}) = ValAppCtxt + -- Can happen if we have (f Int |> co) y + -- If f has an INLINE prag we need to give it some + -- motivation to inline. See Note [Cast then apply] + -- in GHC.Core.Unfold + + interesting (StrictArg { sc_cci = cci }) = cci + interesting (StrictBind {}) = BoringCtxt + interesting (Stop _ cci) = cci + interesting (TickIt _ k) = interesting k + interesting (ApplyToTy { sc_cont = k }) = interesting k + interesting (CastIt _ k) = interesting k + -- If this call is the arg of a strict function, the context + -- is a bit interesting. If we inline here, we may get useful + -- evaluation information to avoid repeated evals: e.g. + -- x + (y * z) + -- Here the contIsInteresting makes the '*' keener to inline, + -- which in turn exposes a constructor which makes the '+' inline. + -- Assuming that +,* aren't small enough to inline regardless. + -- + -- It's also very important to inline in a strict context for things + -- like + -- foldr k z (f x) + -- Here, the context of (f x) is strict, and if f's unfolding is + -- a build it's *great* to inline it here. So we must ensure that + -- the context for (f x) is not totally uninteresting. + +interestingArgContext :: [CoreRule] -> SimplCont -> Bool +-- If the argument has form (f x y), where x,y are boring, +-- and f is marked INLINE, then we don't want to inline f. +-- But if the context of the argument is +-- g (f x y) +-- where g has rules, then we *do* want to inline f, in case it +-- exposes a rule that might fire. Similarly, if the context is +-- h (g (f x x)) +-- where h has rules, then we do want to inline f; hence the +-- call_cont argument to interestingArgContext +-- +-- The ai-rules flag makes this happen; if it's +-- set, the inliner gets just enough keener to inline f +-- regardless of how boring f's arguments are, if it's marked INLINE +-- +-- The alternative would be to *always* inline an INLINE function, +-- regardless of how boring its context is; but that seems overkill +-- For example, it'd mean that wrapper functions were always inlined +-- +-- The call_cont passed to interestingArgContext is the context of +-- the call itself, e.g. g <hole> in the example above +interestingArgContext rules call_cont + = notNull rules || enclosing_fn_has_rules + where + enclosing_fn_has_rules = go call_cont + + go (Select {}) = False + go (ApplyToVal {}) = False -- Shouldn't really happen + go (ApplyToTy {}) = False -- Ditto + go (StrictArg { sc_cci = cci }) = interesting cci + go (StrictBind {}) = False -- ?? + go (CastIt _ c) = go c + go (Stop _ cci) = interesting cci + go (TickIt _ c) = go c + + interesting RuleArgCtxt = True + interesting _ = False + + +{- Note [Interesting arguments] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +An argument is interesting if it deserves a discount for unfoldings +with a discount in that argument position. The idea is to avoid +unfolding a function that is applied only to variables that have no +unfolding (i.e. they are probably lambda bound): f x y z There is +little point in inlining f here. + +Generally, *values* (like (C a b) and (\x.e)) deserve discounts. But +we must look through lets, eg (let x = e in C a b), because the let will +float, exposing the value, if we inline. That makes it different to +exprIsHNF. + +Before 2009 we said it was interesting if the argument had *any* structure +at all; i.e. (hasSomeUnfolding v). But does too much inlining; see #3016. + +But we don't regard (f x y) as interesting, unless f is unsaturated. +If it's saturated and f hasn't inlined, then it's probably not going +to now! + +Note [Conlike is interesting] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + f d = ...((*) d x y)... + ... f (df d')... +where df is con-like. Then we'd really like to inline 'f' so that the +rule for (*) (df d) can fire. To do this + a) we give a discount for being an argument of a class-op (eg (*) d) + b) we say that a con-like argument (eg (df d)) is interesting +-} + +interestingArg :: SimplEnv -> CoreExpr -> ArgSummary +-- See Note [Interesting arguments] +interestingArg env e = go env 0 e + where + -- n is # value args to which the expression is applied + go env n (Var v) + = case substId env v of + DoneId v' -> go_var n v' + DoneEx e _ -> go (zapSubstEnv env) n e + ContEx tvs cvs ids e -> go (setSubstEnv env tvs cvs ids) n e + + go _ _ (Lit {}) = ValueArg + go _ _ (Type _) = TrivArg + go _ _ (Coercion _) = TrivArg + go env n (App fn (Type _)) = go env n fn + go env n (App fn _) = go env (n+1) fn + go env n (Tick _ a) = go env n a + go env n (Cast e _) = go env n e + go env n (Lam v e) + | isTyVar v = go env n e + | n>0 = NonTrivArg -- (\x.b) e is NonTriv + | otherwise = ValueArg + go _ _ (Case {}) = NonTrivArg + go env n (Let b e) = case go env' n e of + ValueArg -> ValueArg + _ -> NonTrivArg + where + env' = env `addNewInScopeIds` bindersOf b + + go_var n v + | isConLikeId v = ValueArg -- Experimenting with 'conlike' rather that + -- data constructors here + | idArity v > n = ValueArg -- Catches (eg) primops with arity but no unfolding + | n > 0 = NonTrivArg -- Saturated or unknown call + | conlike_unfolding = ValueArg -- n==0; look for an interesting unfolding + -- See Note [Conlike is interesting] + | otherwise = TrivArg -- n==0, no useful unfolding + where + conlike_unfolding = isConLikeUnfolding (idUnfolding v) + +{- +************************************************************************ +* * + SimplMode +* * +************************************************************************ + +The SimplMode controls several switches; see its definition in +GHC.Core.Op.Monad + sm_rules :: Bool -- Whether RULES are enabled + sm_inline :: Bool -- Whether inlining is enabled + sm_case_case :: Bool -- Whether case-of-case is enabled + sm_eta_expand :: Bool -- Whether eta-expansion is enabled +-} + +simplEnvForGHCi :: DynFlags -> SimplEnv +simplEnvForGHCi dflags + = mkSimplEnv $ SimplMode { sm_names = ["GHCi"] + , sm_phase = InitialPhase + , sm_dflags = dflags + , sm_rules = rules_on + , sm_inline = False + , sm_eta_expand = eta_expand_on + , sm_case_case = True } + where + rules_on = gopt Opt_EnableRewriteRules dflags + eta_expand_on = gopt Opt_DoLambdaEtaExpansion dflags + -- Do not do any inlining, in case we expose some unboxed + -- tuple stuff that confuses the bytecode interpreter + +updModeForStableUnfoldings :: Activation -> SimplMode -> SimplMode +-- See Note [Simplifying inside stable unfoldings] +updModeForStableUnfoldings inline_rule_act current_mode + = current_mode { sm_phase = phaseFromActivation inline_rule_act + , sm_inline = True + , sm_eta_expand = False } + -- sm_eta_expand: see Note [No eta expansion in stable unfoldings] + -- For sm_rules, just inherit; sm_rules might be "off" + -- because of -fno-enable-rewrite-rules + where + phaseFromActivation (ActiveAfter _ n) = Phase n + phaseFromActivation _ = InitialPhase + +updModeForRules :: SimplMode -> SimplMode +-- See Note [Simplifying rules] +updModeForRules current_mode + = current_mode { sm_phase = InitialPhase + , sm_inline = False -- See Note [Do not expose strictness if sm_inline=False] + , sm_rules = False + , sm_eta_expand = False } + +{- Note [Simplifying rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When simplifying a rule LHS, refrain from /any/ inlining or applying +of other RULES. + +Doing anything to the LHS is plain confusing, because it means that what the +rule matches is not what the user wrote. c.f. #10595, and #10528. +Moreover, inlining (or applying rules) on rule LHSs risks introducing +Ticks into the LHS, which makes matching trickier. #10665, #10745. + +Doing this to either side confounds tools like HERMIT, which seek to reason +about and apply the RULES as originally written. See #10829. + +Note [No eta expansion in stable unfoldings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have a stable unfolding + + f :: Ord a => a -> IO () + -- Unfolding template + -- = /\a \(d:Ord a) (x:a). bla + +we do not want to eta-expand to + + f :: Ord a => a -> IO () + -- Unfolding template + -- = (/\a \(d:Ord a) (x:a) (eta:State#). bla eta) |> co + +because not specialisation of the overloading doesn't work properly +(see Note [Specialisation shape] in GHC.Core.Op.Specialise), #9509. + +So we disable eta-expansion in stable unfoldings. + +Note [Inlining in gentle mode] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Something is inlined if + (i) the sm_inline flag is on, AND + (ii) the thing has an INLINE pragma, AND + (iii) the thing is inlinable in the earliest phase. + +Example of why (iii) is important: + {-# INLINE [~1] g #-} + g = ... + + {-# INLINE f #-} + f x = g (g x) + +If we were to inline g into f's inlining, then an importing module would +never be able to do + f e --> g (g e) ---> RULE fires +because the stable unfolding for f has had g inlined into it. + +On the other hand, it is bad not to do ANY inlining into an +stable unfolding, because then recursive knots in instance declarations +don't get unravelled. + +However, *sometimes* SimplGently must do no call-site inlining at all +(hence sm_inline = False). Before full laziness we must be careful +not to inline wrappers, because doing so inhibits floating + e.g. ...(case f x of ...)... + ==> ...(case (case x of I# x# -> fw x#) of ...)... + ==> ...(case x of I# x# -> case fw x# of ...)... +and now the redex (f x) isn't floatable any more. + +The no-inlining thing is also important for Template Haskell. You might be +compiling in one-shot mode with -O2; but when TH compiles a splice before +running it, we don't want to use -O2. Indeed, we don't want to inline +anything, because the byte-code interpreter might get confused about +unboxed tuples and suchlike. + +Note [Simplifying inside stable unfoldings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We must take care with simplification inside stable unfoldings (which come from +INLINE pragmas). + +First, consider the following example + let f = \pq -> BIG + in + let g = \y -> f y y + {-# INLINE g #-} + in ...g...g...g...g...g... +Now, if that's the ONLY occurrence of f, it might be inlined inside g, +and thence copied multiple times when g is inlined. HENCE we treat +any occurrence in a stable unfolding as a multiple occurrence, not a single +one; see OccurAnal.addRuleUsage. + +Second, we do want *do* to some modest rules/inlining stuff in stable +unfoldings, partly to eliminate senseless crap, and partly to break +the recursive knots generated by instance declarations. + +However, suppose we have + {-# INLINE <act> f #-} + f = <rhs> +meaning "inline f in phases p where activation <act>(p) holds". +Then what inlinings/rules can we apply to the copy of <rhs> captured in +f's stable unfolding? Our model is that literally <rhs> is substituted for +f when it is inlined. So our conservative plan (implemented by +updModeForStableUnfoldings) is this: + + ------------------------------------------------------------- + When simplifying the RHS of a stable unfolding, set the phase + to the phase in which the stable unfolding first becomes active + ------------------------------------------------------------- + +That ensures that + + a) Rules/inlinings that *cease* being active before p will + not apply to the stable unfolding, consistent with it being + inlined in its *original* form in phase p. + + b) Rules/inlinings that only become active *after* p will + not apply to the stable unfolding, again to be consistent with + inlining the *original* rhs in phase p. + +For example, + {-# INLINE f #-} + f x = ...g... + + {-# NOINLINE [1] g #-} + g y = ... + + {-# RULE h g = ... #-} +Here we must not inline g into f's RHS, even when we get to phase 0, +because when f is later inlined into some other module we want the +rule for h to fire. + +Similarly, consider + {-# INLINE f #-} + f x = ...g... + + g y = ... +and suppose that there are auto-generated specialisations and a strictness +wrapper for g. The specialisations get activation AlwaysActive, and the +strictness wrapper get activation (ActiveAfter 0). So the strictness +wrepper fails the test and won't be inlined into f's stable unfolding. That +means f can inline, expose the specialised call to g, so the specialisation +rules can fire. + +A note about wrappers +~~~~~~~~~~~~~~~~~~~~~ +It's also important not to inline a worker back into a wrapper. +A wrapper looks like + wraper = inline_me (\x -> ...worker... ) +Normally, the inline_me prevents the worker getting inlined into +the wrapper (initially, the worker's only call site!). But, +if the wrapper is sure to be called, the strictness analyser will +mark it 'demanded', so when the RHS is simplified, it'll get an ArgOf +continuation. +-} + +activeUnfolding :: SimplMode -> Id -> Bool +activeUnfolding mode id + | isCompulsoryUnfolding (realIdUnfolding id) + = True -- Even sm_inline can't override compulsory unfoldings + | otherwise + = isActive (sm_phase mode) (idInlineActivation id) + && sm_inline mode + -- `or` isStableUnfolding (realIdUnfolding id) + -- Inline things when + -- (a) they are active + -- (b) sm_inline says so, except that for stable unfoldings + -- (ie pragmas) we inline anyway + +getUnfoldingInRuleMatch :: SimplEnv -> InScopeEnv +-- When matching in RULE, we want to "look through" an unfolding +-- (to see a constructor) if *rules* are on, even if *inlinings* +-- are not. A notable example is DFuns, which really we want to +-- match in rules like (op dfun) in gentle mode. Another example +-- is 'otherwise' which we want exprIsConApp_maybe to be able to +-- see very early on +getUnfoldingInRuleMatch env + = (in_scope, id_unf) + where + in_scope = seInScope env + mode = getMode env + id_unf id | unf_is_active id = idUnfolding id + | otherwise = NoUnfolding + unf_is_active id + | not (sm_rules mode) = -- active_unfolding_minimal id + isStableUnfolding (realIdUnfolding id) + -- Do we even need to test this? I think this InScopeEnv + -- is only consulted if activeRule returns True, which + -- never happens if sm_rules is False + | otherwise = isActive (sm_phase mode) (idInlineActivation id) + +---------------------- +activeRule :: SimplMode -> Activation -> Bool +-- Nothing => No rules at all +activeRule mode + | not (sm_rules mode) = \_ -> False -- Rewriting is off + | otherwise = isActive (sm_phase mode) + +{- +************************************************************************ +* * + preInlineUnconditionally +* * +************************************************************************ + +preInlineUnconditionally +~~~~~~~~~~~~~~~~~~~~~~~~ +@preInlineUnconditionally@ examines a bndr to see if it is used just +once in a completely safe way, so that it is safe to discard the +binding inline its RHS at the (unique) usage site, REGARDLESS of how +big the RHS might be. If this is the case we don't simplify the RHS +first, but just inline it un-simplified. + +This is much better than first simplifying a perhaps-huge RHS and then +inlining and re-simplifying it. Indeed, it can be at least quadratically +better. Consider + + x1 = e1 + x2 = e2[x1] + x3 = e3[x2] + ...etc... + xN = eN[xN-1] + +We may end up simplifying e1 N times, e2 N-1 times, e3 N-3 times etc. +This can happen with cascades of functions too: + + f1 = \x1.e1 + f2 = \xs.e2[f1] + f3 = \xs.e3[f3] + ...etc... + +THE MAIN INVARIANT is this: + + ---- preInlineUnconditionally invariant ----- + IF preInlineUnconditionally chooses to inline x = <rhs> + THEN doing the inlining should not change the occurrence + info for the free vars of <rhs> + ---------------------------------------------- + +For example, it's tempting to look at trivial binding like + x = y +and inline it unconditionally. But suppose x is used many times, +but this is the unique occurrence of y. Then inlining x would change +y's occurrence info, which breaks the invariant. It matters: y +might have a BIG rhs, which will now be dup'd at every occurrence of x. + + +Even RHSs labelled InlineMe aren't caught here, because there might be +no benefit from inlining at the call site. + +[Sept 01] Don't unconditionally inline a top-level thing, because that +can simply make a static thing into something built dynamically. E.g. + x = (a,b) + main = \s -> h x + +[Remember that we treat \s as a one-shot lambda.] No point in +inlining x unless there is something interesting about the call site. + +But watch out: if you aren't careful, some useful foldr/build fusion +can be lost (most notably in spectral/hartel/parstof) because the +foldr didn't see the build. Doing the dynamic allocation isn't a big +deal, in fact, but losing the fusion can be. But the right thing here +seems to be to do a callSiteInline based on the fact that there is +something interesting about the call site (it's strict). Hmm. That +seems a bit fragile. + +Conclusion: inline top level things gaily until Phase 0 (the last +phase), at which point don't. + +Note [pre/postInlineUnconditionally in gentle mode] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Even in gentle mode we want to do preInlineUnconditionally. The +reason is that too little clean-up happens if you don't inline +use-once things. Also a bit of inlining is *good* for full laziness; +it can expose constant sub-expressions. Example in +spectral/mandel/Mandel.hs, where the mandelset function gets a useful +let-float if you inline windowToViewport + +However, as usual for Gentle mode, do not inline things that are +inactive in the initial stages. See Note [Gentle mode]. + +Note [Stable unfoldings and preInlineUnconditionally] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Surprisingly, do not pre-inline-unconditionally Ids with INLINE pragmas! +Example + + {-# INLINE f #-} + f :: Eq a => a -> a + f x = ... + + fInt :: Int -> Int + fInt = f Int dEqInt + + ...fInt...fInt...fInt... + +Here f occurs just once, in the RHS of fInt. But if we inline it there +it might make fInt look big, and we'll lose the opportunity to inline f +at each of fInt's call sites. The INLINE pragma will only inline when +the application is saturated for exactly this reason; and we don't +want PreInlineUnconditionally to second-guess it. A live example is +#3736. + c.f. Note [Stable unfoldings and postInlineUnconditionally] + +NB: if the pragma is INLINEABLE, then we don't want to behave in +this special way -- an INLINEABLE pragma just says to GHC "inline this +if you like". But if there is a unique occurrence, we want to inline +the stable unfolding, not the RHS. + +Note [Top-level bottoming Ids] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Don't inline top-level Ids that are bottoming, even if they are used just +once, because FloatOut has gone to some trouble to extract them out. +Inlining them won't make the program run faster! + +Note [Do not inline CoVars unconditionally] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Coercion variables appear inside coercions, and the RHS of a let-binding +is a term (not a coercion) so we can't necessarily inline the latter in +the former. +-} + +preInlineUnconditionally + :: SimplEnv -> TopLevelFlag -> InId + -> InExpr -> StaticEnv -- These two go together + -> Maybe SimplEnv -- Returned env has extended substitution +-- Precondition: rhs satisfies the let/app invariant +-- See Note [Core let/app invariant] in GHC.Core +-- Reason: we don't want to inline single uses, or discard dead bindings, +-- for unlifted, side-effect-ful bindings +preInlineUnconditionally env top_lvl bndr rhs rhs_env + | not pre_inline_unconditionally = Nothing + | not active = Nothing + | isTopLevel top_lvl && isBottomingId bndr = Nothing -- Note [Top-level bottoming Ids] + | isCoVar bndr = Nothing -- Note [Do not inline CoVars unconditionally] + | isExitJoinId bndr = Nothing -- Note [Do not inline exit join points] + -- in module Exitify + | not (one_occ (idOccInfo bndr)) = Nothing + | not (isStableUnfolding unf) = Just (extend_subst_with rhs) + + -- Note [Stable unfoldings and preInlineUnconditionally] + | isInlinablePragma inline_prag + , Just inl <- maybeUnfoldingTemplate unf = Just (extend_subst_with inl) + | otherwise = Nothing + where + unf = idUnfolding bndr + extend_subst_with inl_rhs = extendIdSubst env bndr (mkContEx rhs_env inl_rhs) + + one_occ IAmDead = True -- Happens in ((\x.1) v) + one_occ OneOcc{ occ_one_br = InOneBranch + , occ_in_lam = NotInsideLam } = isNotTopLevel top_lvl || early_phase + one_occ OneOcc{ occ_one_br = InOneBranch + , occ_in_lam = IsInsideLam + , occ_int_cxt = IsInteresting } = canInlineInLam rhs + one_occ _ = False + + pre_inline_unconditionally = gopt Opt_SimplPreInlining (seDynFlags env) + mode = getMode env + active = isActive (sm_phase mode) (inlinePragmaActivation inline_prag) + -- See Note [pre/postInlineUnconditionally in gentle mode] + inline_prag = idInlinePragma bndr + +-- Be very careful before inlining inside a lambda, because (a) we must not +-- invalidate occurrence information, and (b) we want to avoid pushing a +-- single allocation (here) into multiple allocations (inside lambda). +-- Inlining a *function* with a single *saturated* call would be ok, mind you. +-- || (if is_cheap && not (canInlineInLam rhs) then pprTrace "preinline" (ppr bndr <+> ppr rhs) ok else ok) +-- where +-- is_cheap = exprIsCheap rhs +-- ok = is_cheap && int_cxt + + -- int_cxt The context isn't totally boring + -- E.g. let f = \ab.BIG in \y. map f xs + -- Don't want to substitute for f, because then we allocate + -- its closure every time the \y is called + -- But: let f = \ab.BIG in \y. map (f y) xs + -- Now we do want to substitute for f, even though it's not + -- saturated, because we're going to allocate a closure for + -- (f y) every time round the loop anyhow. + + -- canInlineInLam => free vars of rhs are (Once in_lam) or Many, + -- so substituting rhs inside a lambda doesn't change the occ info. + -- Sadly, not quite the same as exprIsHNF. + canInlineInLam (Lit _) = True + canInlineInLam (Lam b e) = isRuntimeVar b || canInlineInLam e + canInlineInLam (Tick t e) = not (tickishIsCode t) && canInlineInLam e + canInlineInLam _ = False + -- not ticks. Counting ticks cannot be duplicated, and non-counting + -- ticks around a Lam will disappear anyway. + + early_phase = case sm_phase mode of + Phase 0 -> False + _ -> True +-- If we don't have this early_phase test, consider +-- x = length [1,2,3] +-- The full laziness pass carefully floats all the cons cells to +-- top level, and preInlineUnconditionally floats them all back in. +-- Result is (a) static allocation replaced by dynamic allocation +-- (b) many simplifier iterations because this tickles +-- a related problem; only one inlining per pass +-- +-- On the other hand, I have seen cases where top-level fusion is +-- lost if we don't inline top level thing (e.g. string constants) +-- Hence the test for phase zero (which is the phase for all the final +-- simplifications). Until phase zero we take no special notice of +-- top level things, but then we become more leery about inlining +-- them. + +{- +************************************************************************ +* * + postInlineUnconditionally +* * +************************************************************************ + +postInlineUnconditionally +~~~~~~~~~~~~~~~~~~~~~~~~~ +@postInlineUnconditionally@ decides whether to unconditionally inline +a thing based on the form of its RHS; in particular if it has a +trivial RHS. If so, we can inline and discard the binding altogether. + +NB: a loop breaker has must_keep_binding = True and non-loop-breakers +only have *forward* references. Hence, it's safe to discard the binding + +NOTE: This isn't our last opportunity to inline. We're at the binding +site right now, and we'll get another opportunity when we get to the +occurrence(s) + +Note that we do this unconditional inlining only for trivial RHSs. +Don't inline even WHNFs inside lambdas; doing so may simply increase +allocation when the function is called. This isn't the last chance; see +NOTE above. + +NB: Even inline pragmas (e.g. IMustBeINLINEd) are ignored here Why? +Because we don't even want to inline them into the RHS of constructor +arguments. See NOTE above + +NB: At one time even NOINLINE was ignored here: if the rhs is trivial +it's best to inline it anyway. We often get a=E; b=a from desugaring, +with both a and b marked NOINLINE. But that seems incompatible with +our new view that inlining is like a RULE, so I'm sticking to the 'active' +story for now. +-} + +postInlineUnconditionally + :: SimplEnv -> TopLevelFlag + -> OutId -- The binder (*not* a CoVar), including its unfolding + -> OccInfo -- From the InId + -> OutExpr + -> Bool +-- Precondition: rhs satisfies the let/app invariant +-- See Note [Core let/app invariant] in GHC.Core +-- Reason: we don't want to inline single uses, or discard dead bindings, +-- for unlifted, side-effect-ful bindings +postInlineUnconditionally env top_lvl bndr occ_info rhs + | not active = False + | isWeakLoopBreaker occ_info = False -- If it's a loop-breaker of any kind, don't inline + -- because it might be referred to "earlier" + | isStableUnfolding unfolding = False -- Note [Stable unfoldings and postInlineUnconditionally] + | isTopLevel top_lvl = False -- Note [Top level and postInlineUnconditionally] + | exprIsTrivial rhs = True + | otherwise + = case occ_info of + -- The point of examining occ_info here is that for *non-values* + -- that occur outside a lambda, the call-site inliner won't have + -- a chance (because it doesn't know that the thing + -- only occurs once). The pre-inliner won't have gotten + -- it either, if the thing occurs in more than one branch + -- So the main target is things like + -- let x = f y in + -- case v of + -- True -> case x of ... + -- False -> case x of ... + -- This is very important in practice; e.g. wheel-seive1 doubles + -- in allocation if you miss this out + OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt } + -- OneOcc => no code-duplication issue + -> smallEnoughToInline dflags unfolding -- Small enough to dup + -- ToDo: consider discount on smallEnoughToInline if int_cxt is true + -- + -- NB: Do NOT inline arbitrarily big things, even if one_br is True + -- Reason: doing so risks exponential behaviour. We simplify a big + -- expression, inline it, and simplify it again. But if the + -- very same thing happens in the big expression, we get + -- exponential cost! + -- PRINCIPLE: when we've already simplified an expression once, + -- make sure that we only inline it if it's reasonably small. + + && (in_lam == NotInsideLam || + -- Outside a lambda, we want to be reasonably aggressive + -- about inlining into multiple branches of case + -- e.g. let x = <non-value> + -- in case y of { C1 -> ..x..; C2 -> ..x..; C3 -> ... } + -- Inlining can be a big win if C3 is the hot-spot, even if + -- the uses in C1, C2 are not 'interesting' + -- An example that gets worse if you add int_cxt here is 'clausify' + + (isCheapUnfolding unfolding && int_cxt == IsInteresting)) + -- isCheap => acceptable work duplication; in_lam may be true + -- int_cxt to prevent us inlining inside a lambda without some + -- good reason. See the notes on int_cxt in preInlineUnconditionally + + IAmDead -> True -- This happens; for example, the case_bndr during case of + -- known constructor: case (a,b) of x { (p,q) -> ... } + -- Here x isn't mentioned in the RHS, so we don't want to + -- create the (dead) let-binding let x = (a,b) in ... + + _ -> False + +-- Here's an example that we don't handle well: +-- let f = if b then Left (\x.BIG) else Right (\y.BIG) +-- in \y. ....case f of {...} .... +-- Here f is used just once, and duplicating the case work is fine (exprIsCheap). +-- But +-- - We can't preInlineUnconditionally because that would invalidate +-- the occ info for b. +-- - We can't postInlineUnconditionally because the RHS is big, and +-- that risks exponential behaviour +-- - We can't call-site inline, because the rhs is big +-- Alas! + + where + unfolding = idUnfolding bndr + dflags = seDynFlags env + active = isActive (sm_phase (getMode env)) (idInlineActivation bndr) + -- See Note [pre/postInlineUnconditionally in gentle mode] + +{- +Note [Top level and postInlineUnconditionally] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We don't do postInlineUnconditionally for top-level things (even for +ones that are trivial): + + * Doing so will inline top-level error expressions that have been + carefully floated out by FloatOut. More generally, it might + replace static allocation with dynamic. + + * Even for trivial expressions there's a problem. Consider + {-# RULE "foo" forall (xs::[T]). reverse xs = ruggle xs #-} + blah xs = reverse xs + ruggle = sort + In one simplifier pass we might fire the rule, getting + blah xs = ruggle xs + but in *that* simplifier pass we must not do postInlineUnconditionally + on 'ruggle' because then we'll have an unbound occurrence of 'ruggle' + + If the rhs is trivial it'll be inlined by callSiteInline, and then + the binding will be dead and discarded by the next use of OccurAnal + + * There is less point, because the main goal is to get rid of local + bindings used in multiple case branches. + + * The inliner should inline trivial things at call sites anyway. + + * The Id might be exported. We could check for that separately, + but since we aren't going to postInlineUnconditionally /any/ + top-level bindings, we don't need to test. + +Note [Stable unfoldings and postInlineUnconditionally] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Do not do postInlineUnconditionally if the Id has a stable unfolding, +otherwise we lose the unfolding. Example + + -- f has stable unfolding with rhs (e |> co) + -- where 'e' is big + f = e |> co + +Then there's a danger we'll optimise to + + f' = e + f = f' |> co + +and now postInlineUnconditionally, losing the stable unfolding on f. Now f' +won't inline because 'e' is too big. + + c.f. Note [Stable unfoldings and preInlineUnconditionally] + + +************************************************************************ +* * + Rebuilding a lambda +* * +************************************************************************ +-} + +mkLam :: SimplEnv -> [OutBndr] -> OutExpr -> SimplCont -> SimplM OutExpr +-- mkLam tries three things +-- a) eta reduction, if that gives a trivial expression +-- b) eta expansion [only if there are some value lambdas] + +mkLam _env [] body _cont + = return body +mkLam env bndrs body cont + = do { dflags <- getDynFlags + ; mkLam' dflags bndrs body } + where + mkLam' :: DynFlags -> [OutBndr] -> OutExpr -> SimplM OutExpr + mkLam' dflags bndrs (Cast body co) + | not (any bad bndrs) + -- Note [Casts and lambdas] + = do { lam <- mkLam' dflags bndrs body + ; return (mkCast lam (mkPiCos Representational bndrs co)) } + where + co_vars = tyCoVarsOfCo co + bad bndr = isCoVar bndr && bndr `elemVarSet` co_vars + + mkLam' dflags bndrs body@(Lam {}) + = mkLam' dflags (bndrs ++ bndrs1) body1 + where + (bndrs1, body1) = collectBinders body + + mkLam' dflags bndrs (Tick t expr) + | tickishFloatable t + = mkTick t <$> mkLam' dflags bndrs expr + + mkLam' dflags bndrs body + | gopt Opt_DoEtaReduction dflags + , Just etad_lam <- tryEtaReduce bndrs body + = do { tick (EtaReduction (head bndrs)) + ; return etad_lam } + + | not (contIsRhs cont) -- See Note [Eta-expanding lambdas] + , sm_eta_expand (getMode env) + , any isRuntimeVar bndrs + , let body_arity = exprEtaExpandArity dflags body + , body_arity > 0 + = do { tick (EtaExpansion (head bndrs)) + ; let res = mkLams bndrs (etaExpand body_arity body) + ; traceSmpl "eta expand" (vcat [text "before" <+> ppr (mkLams bndrs body) + , text "after" <+> ppr res]) + ; return res } + + | otherwise + = return (mkLams bndrs body) + +{- +Note [Eta expanding lambdas] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In general we *do* want to eta-expand lambdas. Consider + f (\x -> case x of (a,b) -> \s -> blah) +where 's' is a state token, and hence can be eta expanded. This +showed up in the code for GHc.IO.Handle.Text.hPutChar, a rather +important function! + +The eta-expansion will never happen unless we do it now. (Well, it's +possible that CorePrep will do it, but CorePrep only has a half-baked +eta-expander that can't deal with casts. So it's much better to do it +here.) + +However, when the lambda is let-bound, as the RHS of a let, we have a +better eta-expander (in the form of tryEtaExpandRhs), so we don't +bother to try expansion in mkLam in that case; hence the contIsRhs +guard. + +NB: We check the SimplEnv (sm_eta_expand), not DynFlags. + See Note [No eta expansion in stable unfoldings] + +Note [Casts and lambdas] +~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + (\x. (\y. e) `cast` g1) `cast` g2 +There is a danger here that the two lambdas look separated, and the +full laziness pass might float an expression to between the two. + +So this equation in mkLam' floats the g1 out, thus: + (\x. e `cast` g1) --> (\x.e) `cast` (tx -> g1) +where x:tx. + +In general, this floats casts outside lambdas, where (I hope) they +might meet and cancel with some other cast: + \x. e `cast` co ===> (\x. e) `cast` (tx -> co) + /\a. e `cast` co ===> (/\a. e) `cast` (/\a. co) + /\g. e `cast` co ===> (/\g. e) `cast` (/\g. co) + (if not (g `in` co)) + +Notice that it works regardless of 'e'. Originally it worked only +if 'e' was itself a lambda, but in some cases that resulted in +fruitless iteration in the simplifier. A good example was when +compiling Text.ParserCombinators.ReadPrec, where we had a definition +like (\x. Get `cast` g) +where Get is a constructor with nonzero arity. Then mkLam eta-expanded +the Get, and the next iteration eta-reduced it, and then eta-expanded +it again. + +Note also the side condition for the case of coercion binders. +It does not make sense to transform + /\g. e `cast` g ==> (/\g.e) `cast` (/\g.g) +because the latter is not well-kinded. + +************************************************************************ +* * + Eta expansion +* * +************************************************************************ +-} + +tryEtaExpandRhs :: SimplMode -> OutId -> OutExpr + -> SimplM (Arity, Bool, OutExpr) +-- See Note [Eta-expanding at let bindings] +-- If tryEtaExpandRhs rhs = (n, is_bot, rhs') then +-- (a) rhs' has manifest arity n +-- (b) if is_bot is True then rhs' applied to n args is guaranteed bottom +tryEtaExpandRhs mode bndr rhs + | Just join_arity <- isJoinId_maybe bndr + = do { let (join_bndrs, join_body) = collectNBinders join_arity rhs + ; return (count isId join_bndrs, exprIsBottom join_body, rhs) } + -- Note [Do not eta-expand join points] + -- But do return the correct arity and bottom-ness, because + -- these are used to set the bndr's IdInfo (#15517) + -- Note [Invariants on join points] invariant 2b, in GHC.Core + + | otherwise + = do { (new_arity, is_bot, new_rhs) <- try_expand + + ; WARN( new_arity < old_id_arity, + (text "Arity decrease:" <+> (ppr bndr <+> ppr old_id_arity + <+> ppr old_arity <+> ppr new_arity) $$ ppr new_rhs) ) + -- Note [Arity decrease] in GHC.Core.Op.Simplify + return (new_arity, is_bot, new_rhs) } + where + try_expand + | exprIsTrivial rhs + = return (exprArity rhs, False, rhs) + + | sm_eta_expand mode -- Provided eta-expansion is on + , new_arity > old_arity -- And the current manifest arity isn't enough + = do { tick (EtaExpansion bndr) + ; return (new_arity, is_bot, etaExpand new_arity rhs) } + + | otherwise + = return (old_arity, is_bot && new_arity == old_arity, rhs) + + dflags = sm_dflags mode + old_arity = exprArity rhs -- See Note [Do not expand eta-expand PAPs] + old_id_arity = idArity bndr + + (new_arity1, is_bot) = findRhsArity dflags bndr rhs old_arity + new_arity2 = idCallArity bndr + new_arity = max new_arity1 new_arity2 + +{- +Note [Eta-expanding at let bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We now eta expand at let-bindings, which is where the payoff comes. +The most significant thing is that we can do a simple arity analysis +(in GHC.Core.Arity.findRhsArity), which we can't do for free-floating lambdas + +One useful consequence of not eta-expanding lambdas is this example: + genMap :: C a => ... + {-# INLINE genMap #-} + genMap f xs = ... + + myMap :: D a => ... + {-# INLINE myMap #-} + myMap = genMap + +Notice that 'genMap' should only inline if applied to two arguments. +In the stable unfolding for myMap we'll have the unfolding + (\d -> genMap Int (..d..)) +We do not want to eta-expand to + (\d f xs -> genMap Int (..d..) f xs) +because then 'genMap' will inline, and it really shouldn't: at least +as far as the programmer is concerned, it's not applied to two +arguments! + +Note [Do not eta-expand join points] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Similarly to CPR (see Note [Don't w/w join points for CPR] in +GHC.Core.Op.WorkWrap), a join point stands well to gain from its outer binding's +eta-expansion, and eta-expanding a join point is fraught with issues like how to +deal with a cast: + + let join $j1 :: IO () + $j1 = ... + $j2 :: Int -> IO () + $j2 n = if n > 0 then $j1 + else ... + + => + + let join $j1 :: IO () + $j1 = (\eta -> ...) + `cast` N:IO :: State# RealWorld -> (# State# RealWorld, ()) + ~ IO () + $j2 :: Int -> IO () + $j2 n = (\eta -> if n > 0 then $j1 + else ...) + `cast` N:IO :: State# RealWorld -> (# State# RealWorld, ()) + ~ IO () + +The cast here can't be pushed inside the lambda (since it's not casting to a +function type), so the lambda has to stay, but it can't because it contains a +reference to a join point. In fact, $j2 can't be eta-expanded at all. Rather +than try and detect this situation (and whatever other situations crop up!), we +don't bother; again, any surrounding eta-expansion will improve these join +points anyway, since an outer cast can *always* be pushed inside. By the time +CorePrep comes around, the code is very likely to look more like this: + + let join $j1 :: State# RealWorld -> (# State# RealWorld, ()) + $j1 = (...) eta + $j2 :: Int -> State# RealWorld -> (# State# RealWorld, ()) + $j2 = if n > 0 then $j1 + else (...) eta + +Note [Do not eta-expand PAPs] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We used to have old_arity = manifestArity rhs, which meant that we +would eta-expand even PAPs. But this gives no particular advantage, +and can lead to a massive blow-up in code size, exhibited by #9020. +Suppose we have a PAP + foo :: IO () + foo = returnIO () +Then we can eta-expand do + foo = (\eta. (returnIO () |> sym g) eta) |> g +where + g :: IO () ~ State# RealWorld -> (# State# RealWorld, () #) + +But there is really no point in doing this, and it generates masses of +coercions and whatnot that eventually disappear again. For T9020, GHC +allocated 6.6G before, and 0.8G afterwards; and residency dropped from +1.8G to 45M. + +But note that this won't eta-expand, say + f = \g -> map g +Does it matter not eta-expanding such functions? I'm not sure. Perhaps +strictness analysis will have less to bite on? + + +************************************************************************ +* * +\subsection{Floating lets out of big lambdas} +* * +************************************************************************ + +Note [Floating and type abstraction] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this: + x = /\a. C e1 e2 +We'd like to float this to + y1 = /\a. e1 + y2 = /\a. e2 + x = /\a. C (y1 a) (y2 a) +for the usual reasons: we want to inline x rather vigorously. + +You may think that this kind of thing is rare. But in some programs it is +common. For example, if you do closure conversion you might get: + + data a :-> b = forall e. (e -> a -> b) :$ e + + f_cc :: forall a. a :-> a + f_cc = /\a. (\e. id a) :$ () + +Now we really want to inline that f_cc thing so that the +construction of the closure goes away. + +So I have elaborated simplLazyBind to understand right-hand sides that look +like + /\ a1..an. body + +and treat them specially. The real work is done in +GHC.Core.Op.Simplify.Utils.abstractFloats, but there is quite a bit of plumbing +in simplLazyBind as well. + +The same transformation is good when there are lets in the body: + + /\abc -> let(rec) x = e in b + ==> + let(rec) x' = /\abc -> let x = x' a b c in e + in + /\abc -> let x = x' a b c in b + +This is good because it can turn things like: + + let f = /\a -> letrec g = ... g ... in g +into + letrec g' = /\a -> ... g' a ... + in + let f = /\ a -> g' a + +which is better. In effect, it means that big lambdas don't impede +let-floating. + +This optimisation is CRUCIAL in eliminating the junk introduced by +desugaring mutually recursive definitions. Don't eliminate it lightly! + +[May 1999] If we do this transformation *regardless* then we can +end up with some pretty silly stuff. For example, + + let + st = /\ s -> let { x1=r1 ; x2=r2 } in ... + in .. +becomes + let y1 = /\s -> r1 + y2 = /\s -> r2 + st = /\s -> ...[y1 s/x1, y2 s/x2] + in .. + +Unless the "..." is a WHNF there is really no point in doing this. +Indeed it can make things worse. Suppose x1 is used strictly, +and is of the form + + x1* = case f y of { (a,b) -> e } + +If we abstract this wrt the tyvar we then can't do the case inline +as we would normally do. + +That's why the whole transformation is part of the same process that +floats let-bindings and constructor arguments out of RHSs. In particular, +it is guarded by the doFloatFromRhs call in simplLazyBind. + +Note [Which type variables to abstract over] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Abstract only over the type variables free in the rhs wrt which the +new binding is abstracted. Note that + + * The naive approach of abstracting wrt the + tyvars free in the Id's /type/ fails. Consider: + /\ a b -> let t :: (a,b) = (e1, e2) + x :: a = fst t + in ... + Here, b isn't free in x's type, but we must nevertheless + abstract wrt b as well, because t's type mentions b. + Since t is floated too, we'd end up with the bogus: + poly_t = /\ a b -> (e1, e2) + poly_x = /\ a -> fst (poly_t a *b*) + + * We must do closeOverKinds. Example (#10934): + f = /\k (f:k->*) (a:k). let t = AccFailure @ (f a) in ... + Here we want to float 't', but we must remember to abstract over + 'k' as well, even though it is not explicitly mentioned in the RHS, + otherwise we get + t = /\ (f:k->*) (a:k). AccFailure @ (f a) + which is obviously bogus. +-} + +abstractFloats :: DynFlags -> TopLevelFlag -> [OutTyVar] -> SimplFloats + -> OutExpr -> SimplM ([OutBind], OutExpr) +abstractFloats dflags top_lvl main_tvs floats body + = ASSERT( notNull body_floats ) + ASSERT( isNilOL (sfJoinFloats floats) ) + do { (subst, float_binds) <- mapAccumLM abstract empty_subst body_floats + ; return (float_binds, GHC.Core.Subst.substExpr (text "abstract_floats1") subst body) } + where + is_top_lvl = isTopLevel top_lvl + main_tv_set = mkVarSet main_tvs + body_floats = letFloatBinds (sfLetFloats floats) + empty_subst = GHC.Core.Subst.mkEmptySubst (sfInScope floats) + + abstract :: GHC.Core.Subst.Subst -> OutBind -> SimplM (GHC.Core.Subst.Subst, OutBind) + abstract subst (NonRec id rhs) + = do { (poly_id1, poly_app) <- mk_poly1 tvs_here id + ; let (poly_id2, poly_rhs) = mk_poly2 poly_id1 tvs_here rhs' + subst' = GHC.Core.Subst.extendIdSubst subst id poly_app + ; return (subst', NonRec poly_id2 poly_rhs) } + where + rhs' = GHC.Core.Subst.substExpr (text "abstract_floats2") subst rhs + + -- tvs_here: see Note [Which type variables to abstract over] + tvs_here = scopedSort $ + filter (`elemVarSet` main_tv_set) $ + closeOverKindsList $ + exprSomeFreeVarsList isTyVar rhs' + + abstract subst (Rec prs) + = do { (poly_ids, poly_apps) <- mapAndUnzipM (mk_poly1 tvs_here) ids + ; let subst' = GHC.Core.Subst.extendSubstList subst (ids `zip` poly_apps) + poly_pairs = [ mk_poly2 poly_id tvs_here rhs' + | (poly_id, rhs) <- poly_ids `zip` rhss + , let rhs' = GHC.Core.Subst.substExpr (text "abstract_floats") + subst' rhs ] + ; return (subst', Rec poly_pairs) } + where + (ids,rhss) = unzip prs + -- For a recursive group, it's a bit of a pain to work out the minimal + -- set of tyvars over which to abstract: + -- /\ a b c. let x = ...a... in + -- letrec { p = ...x...q... + -- q = .....p...b... } in + -- ... + -- Since 'x' is abstracted over 'a', the {p,q} group must be abstracted + -- over 'a' (because x is replaced by (poly_x a)) as well as 'b'. + -- Since it's a pain, we just use the whole set, which is always safe + -- + -- If you ever want to be more selective, remember this bizarre case too: + -- x::a = x + -- Here, we must abstract 'x' over 'a'. + tvs_here = scopedSort main_tvs + + mk_poly1 :: [TyVar] -> Id -> SimplM (Id, CoreExpr) + mk_poly1 tvs_here var + = do { uniq <- getUniqueM + ; let poly_name = setNameUnique (idName var) uniq -- Keep same name + poly_ty = mkInvForAllTys tvs_here (idType var) -- But new type of course + poly_id = transferPolyIdInfo var tvs_here $ -- Note [transferPolyIdInfo] in Id.hs + mkLocalId poly_name poly_ty + ; return (poly_id, mkTyApps (Var poly_id) (mkTyVarTys tvs_here)) } + -- In the olden days, it was crucial to copy the occInfo of the original var, + -- because we were looking at occurrence-analysed but as yet unsimplified code! + -- In particular, we mustn't lose the loop breakers. BUT NOW we are looking + -- at already simplified code, so it doesn't matter + -- + -- It's even right to retain single-occurrence or dead-var info: + -- Suppose we started with /\a -> let x = E in B + -- where x occurs once in B. Then we transform to: + -- let x' = /\a -> E in /\a -> let x* = x' a in B + -- where x* has an INLINE prag on it. Now, once x* is inlined, + -- the occurrences of x' will be just the occurrences originally + -- pinned on x. + + mk_poly2 :: Id -> [TyVar] -> CoreExpr -> (Id, CoreExpr) + mk_poly2 poly_id tvs_here rhs + = (poly_id `setIdUnfolding` unf, poly_rhs) + where + poly_rhs = mkLams tvs_here rhs + unf = mkUnfolding dflags InlineRhs is_top_lvl False poly_rhs + + -- We want the unfolding. Consider + -- let + -- x = /\a. let y = ... in Just y + -- in body + -- Then we float the y-binding out (via abstractFloats and addPolyBind) + -- but 'x' may well then be inlined in 'body' in which case we'd like the + -- opportunity to inline 'y' too. + +{- +Note [Abstract over coercions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If a coercion variable (g :: a ~ Int) is free in the RHS, then so is the +type variable a. Rather than sort this mess out, we simply bale out and abstract +wrt all the type variables if any of them are coercion variables. + + +Historical note: if you use let-bindings instead of a substitution, beware of this: + + -- Suppose we start with: + -- + -- x = /\ a -> let g = G in E + -- + -- Then we'll float to get + -- + -- x = let poly_g = /\ a -> G + -- in /\ a -> let g = poly_g a in E + -- + -- But now the occurrence analyser will see just one occurrence + -- of poly_g, not inside a lambda, so the simplifier will + -- PreInlineUnconditionally poly_g back into g! Badk to square 1! + -- (I used to think that the "don't inline lone occurrences" stuff + -- would stop this happening, but since it's the *only* occurrence, + -- PreInlineUnconditionally kicks in first!) + -- + -- Solution: put an INLINE note on g's RHS, so that poly_g seems + -- to appear many times. (NB: mkInlineMe eliminates + -- such notes on trivial RHSs, so do it manually.) + +************************************************************************ +* * + prepareAlts +* * +************************************************************************ + +prepareAlts tries these things: + +1. Eliminate alternatives that cannot match, including the + DEFAULT alternative. + +2. If the DEFAULT alternative can match only one possible constructor, + then make that constructor explicit. + e.g. + case e of x { DEFAULT -> rhs } + ===> + case e of x { (a,b) -> rhs } + where the type is a single constructor type. This gives better code + when rhs also scrutinises x or e. + +3. Returns a list of the constructors that cannot holds in the + DEFAULT alternative (if there is one) + +Here "cannot match" includes knowledge from GADTs + +It's a good idea to do this stuff before simplifying the alternatives, to +avoid simplifying alternatives we know can't happen, and to come up with +the list of constructors that are handled, to put into the IdInfo of the +case binder, for use when simplifying the alternatives. + +Eliminating the default alternative in (1) isn't so obvious, but it can +happen: + +data Colour = Red | Green | Blue + +f x = case x of + Red -> .. + Green -> .. + DEFAULT -> h x + +h y = case y of + Blue -> .. + DEFAULT -> [ case y of ... ] + +If we inline h into f, the default case of the inlined h can't happen. +If we don't notice this, we may end up filtering out *all* the cases +of the inner case y, which give us nowhere to go! +-} + +prepareAlts :: OutExpr -> OutId -> [InAlt] -> SimplM ([AltCon], [InAlt]) +-- The returned alternatives can be empty, none are possible +prepareAlts scrut case_bndr' alts + | Just (tc, tys) <- splitTyConApp_maybe (varType case_bndr') + -- Case binder is needed just for its type. Note that as an + -- OutId, it has maximum information; this is important. + -- Test simpl013 is an example + = do { us <- getUniquesM + ; let (idcs1, alts1) = filterAlts tc tys imposs_cons alts + (yes2, alts2) = refineDefaultAlt us tc tys idcs1 alts1 + (yes3, idcs3, alts3) = combineIdenticalAlts idcs1 alts2 + -- "idcs" stands for "impossible default data constructors" + -- i.e. the constructors that can't match the default case + ; when yes2 $ tick (FillInCaseDefault case_bndr') + ; when yes3 $ tick (AltMerge case_bndr') + ; return (idcs3, alts3) } + + | otherwise -- Not a data type, so nothing interesting happens + = return ([], alts) + where + imposs_cons = case scrut of + Var v -> otherCons (idUnfolding v) + _ -> [] + + +{- +************************************************************************ +* * + mkCase +* * +************************************************************************ + +mkCase tries these things + +* Note [Nerge nested cases] +* Note [Eliminate identity case] +* Note [Scrutinee constant folding] + +Note [Merge Nested Cases] +~~~~~~~~~~~~~~~~~~~~~~~~~ + case e of b { ==> case e of b { + p1 -> rhs1 p1 -> rhs1 + ... ... + pm -> rhsm pm -> rhsm + _ -> case b of b' { pn -> let b'=b in rhsn + pn -> rhsn ... + ... po -> let b'=b in rhso + po -> rhso _ -> let b'=b in rhsd + _ -> rhsd + } + +which merges two cases in one case when -- the default alternative of +the outer case scrutises the same variable as the outer case. This +transformation is called Case Merging. It avoids that the same +variable is scrutinised multiple times. + +Note [Eliminate Identity Case] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + case e of ===> e + True -> True; + False -> False + +and similar friends. + +Note [Scrutinee Constant Folding] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + case x op# k# of _ { ===> case x of _ { + a1# -> e1 (a1# inv_op# k#) -> e1 + a2# -> e2 (a2# inv_op# k#) -> e2 + ... ... + DEFAULT -> ed DEFAULT -> ed + + where (x op# k#) inv_op# k# == x + +And similarly for commuted arguments and for some unary operations. + +The purpose of this transformation is not only to avoid an arithmetic +operation at runtime but to allow other transformations to apply in cascade. + +Example with the "Merge Nested Cases" optimization (from #12877): + + main = case t of t0 + 0## -> ... + DEFAULT -> case t0 `minusWord#` 1## of t1 + 0## -> ... + DEFAULT -> case t1 `minusWord#` 1## of t2 + 0## -> ... + DEFAULT -> case t2 `minusWord#` 1## of _ + 0## -> ... + DEFAULT -> ... + + becomes: + + main = case t of _ + 0## -> ... + 1## -> ... + 2## -> ... + 3## -> ... + DEFAULT -> ... + +There are some wrinkles + +* Do not apply caseRules if there is just a single DEFAULT alternative + case e +# 3# of b { DEFAULT -> rhs } + If we applied the transformation here we would (stupidly) get + case a of b' { DEFAULT -> let b = e +# 3# in rhs } + and now the process may repeat, because that let will really + be a case. + +* The type of the scrutinee might change. E.g. + case tagToEnum (x :: Int#) of (b::Bool) + False -> e1 + True -> e2 + ==> + case x of (b'::Int#) + DEFAULT -> e1 + 1# -> e2 + +* The case binder may be used in the right hand sides, so we need + to make a local binding for it, if it is alive. e.g. + case e +# 10# of b + DEFAULT -> blah...b... + 44# -> blah2...b... + ===> + case e of b' + DEFAULT -> let b = b' +# 10# in blah...b... + 34# -> let b = 44# in blah2...b... + + Note that in the non-DEFAULT cases we know what to bind 'b' to, + whereas in the DEFAULT case we must reconstruct the original value. + But NB: we use b'; we do not duplicate 'e'. + +* In dataToTag we might need to make up some fake binders; + see Note [caseRules for dataToTag] in GHC.Core.Op.ConstantFold +-} + +mkCase, mkCase1, mkCase2, mkCase3 + :: DynFlags + -> OutExpr -> OutId + -> OutType -> [OutAlt] -- Alternatives in standard (increasing) order + -> SimplM OutExpr + +-------------------------------------------------- +-- 1. Merge Nested Cases +-------------------------------------------------- + +mkCase dflags scrut outer_bndr alts_ty ((DEFAULT, _, deflt_rhs) : outer_alts) + | gopt Opt_CaseMerge dflags + , (ticks, Case (Var inner_scrut_var) inner_bndr _ inner_alts) + <- stripTicksTop tickishFloatable deflt_rhs + , inner_scrut_var == outer_bndr + = do { tick (CaseMerge outer_bndr) + + ; let wrap_alt (con, args, rhs) = ASSERT( outer_bndr `notElem` args ) + (con, args, wrap_rhs rhs) + -- Simplifier's no-shadowing invariant should ensure + -- that outer_bndr is not shadowed by the inner patterns + wrap_rhs rhs = Let (NonRec inner_bndr (Var outer_bndr)) rhs + -- The let is OK even for unboxed binders, + + wrapped_alts | isDeadBinder inner_bndr = inner_alts + | otherwise = map wrap_alt inner_alts + + merged_alts = mergeAlts outer_alts wrapped_alts + -- NB: mergeAlts gives priority to the left + -- case x of + -- A -> e1 + -- DEFAULT -> case x of + -- A -> e2 + -- B -> e3 + -- When we merge, we must ensure that e1 takes + -- precedence over e2 as the value for A! + + ; fmap (mkTicks ticks) $ + mkCase1 dflags scrut outer_bndr alts_ty merged_alts + } + -- Warning: don't call mkCase recursively! + -- Firstly, there's no point, because inner alts have already had + -- mkCase applied to them, so they won't have a case in their default + -- Secondly, if you do, you get an infinite loop, because the bindCaseBndr + -- in munge_rhs may put a case into the DEFAULT branch! + +mkCase dflags scrut bndr alts_ty alts = mkCase1 dflags scrut bndr alts_ty alts + +-------------------------------------------------- +-- 2. Eliminate Identity Case +-------------------------------------------------- + +mkCase1 _dflags scrut case_bndr _ alts@((_,_,rhs1) : _) -- Identity case + | all identity_alt alts + = do { tick (CaseIdentity case_bndr) + ; return (mkTicks ticks $ re_cast scrut rhs1) } + where + ticks = concatMap (stripTicksT tickishFloatable . thdOf3) (tail alts) + identity_alt (con, args, rhs) = check_eq rhs con args + + check_eq (Cast rhs co) con args -- See Note [RHS casts] + = not (any (`elemVarSet` tyCoVarsOfCo co) args) && check_eq rhs con args + check_eq (Tick t e) alt args + = tickishFloatable t && check_eq e alt args + + check_eq (Lit lit) (LitAlt lit') _ = lit == lit' + check_eq (Var v) _ _ | v == case_bndr = True + check_eq (Var v) (DataAlt con) args + | null arg_tys, null args = v == dataConWorkId con + -- Optimisation only + check_eq rhs (DataAlt con) args = cheapEqExpr' tickishFloatable rhs $ + mkConApp2 con arg_tys args + check_eq _ _ _ = False + + arg_tys = tyConAppArgs (idType case_bndr) + + -- Note [RHS casts] + -- ~~~~~~~~~~~~~~~~ + -- We've seen this: + -- case e of x { _ -> x `cast` c } + -- And we definitely want to eliminate this case, to give + -- e `cast` c + -- So we throw away the cast from the RHS, and reconstruct + -- it at the other end. All the RHS casts must be the same + -- if (all identity_alt alts) holds. + -- + -- Don't worry about nested casts, because the simplifier combines them + + re_cast scrut (Cast rhs co) = Cast (re_cast scrut rhs) co + re_cast scrut _ = scrut + +mkCase1 dflags scrut bndr alts_ty alts = mkCase2 dflags scrut bndr alts_ty alts + +-------------------------------------------------- +-- 2. Scrutinee Constant Folding +-------------------------------------------------- + +mkCase2 dflags scrut bndr alts_ty alts + | -- See Note [Scrutinee Constant Folding] + case alts of -- Not if there is just a DEFAULT alternative + [(DEFAULT,_,_)] -> False + _ -> True + , gopt Opt_CaseFolding dflags + , Just (scrut', tx_con, mk_orig) <- caseRules dflags scrut + = do { bndr' <- newId (fsLit "lwild") (exprType scrut') + + ; alts' <- mapMaybeM (tx_alt tx_con mk_orig bndr') alts + -- mapMaybeM: discard unreachable alternatives + -- See Note [Unreachable caseRules alternatives] + -- in GHC.Core.Op.ConstantFold + + ; mkCase3 dflags scrut' bndr' alts_ty $ + add_default (re_sort alts') + } + + | otherwise + = mkCase3 dflags scrut bndr alts_ty alts + where + -- We need to keep the correct association between the scrutinee and its + -- binder if the latter isn't dead. Hence we wrap rhs of alternatives with + -- "let bndr = ... in": + -- + -- case v + 10 of y =====> case v of y + -- 20 -> e1 10 -> let y = 20 in e1 + -- DEFAULT -> e2 DEFAULT -> let y = v + 10 in e2 + -- + -- Other transformations give: =====> case v of y' + -- 10 -> let y = 20 in e1 + -- DEFAULT -> let y = y' + 10 in e2 + -- + -- This wrapping is done in tx_alt; we use mk_orig, returned by caseRules, + -- to construct an expression equivalent to the original one, for use + -- in the DEFAULT case + + tx_alt :: (AltCon -> Maybe AltCon) -> (Id -> CoreExpr) -> Id + -> CoreAlt -> SimplM (Maybe CoreAlt) + tx_alt tx_con mk_orig new_bndr (con, bs, rhs) + = case tx_con con of + Nothing -> return Nothing + Just con' -> do { bs' <- mk_new_bndrs new_bndr con' + ; return (Just (con', bs', rhs')) } + where + rhs' | isDeadBinder bndr = rhs + | otherwise = bindNonRec bndr orig_val rhs + + orig_val = case con of + DEFAULT -> mk_orig new_bndr + LitAlt l -> Lit l + DataAlt dc -> mkConApp2 dc (tyConAppArgs (idType bndr)) bs + + mk_new_bndrs new_bndr (DataAlt dc) + | not (isNullaryRepDataCon dc) + = -- For non-nullary data cons we must invent some fake binders + -- See Note [caseRules for dataToTag] in GHC.Core.Op.ConstantFold + do { us <- getUniquesM + ; let (ex_tvs, arg_ids) = dataConRepInstPat us dc + (tyConAppArgs (idType new_bndr)) + ; return (ex_tvs ++ arg_ids) } + mk_new_bndrs _ _ = return [] + + re_sort :: [CoreAlt] -> [CoreAlt] + -- Sort the alternatives to re-establish + -- GHC.Core Note [Case expression invariants] + re_sort alts = sortBy cmpAlt alts + + add_default :: [CoreAlt] -> [CoreAlt] + -- See Note [Literal cases] + add_default ((LitAlt {}, bs, rhs) : alts) = (DEFAULT, bs, rhs) : alts + add_default alts = alts + +{- Note [Literal cases] +~~~~~~~~~~~~~~~~~~~~~~~ +If we have + case tagToEnum (a ># b) of + False -> e1 + True -> e2 + +then caseRules for TagToEnum will turn it into + case tagToEnum (a ># b) of + 0# -> e1 + 1# -> e2 + +Since the case is exhaustive (all cases are) we can convert it to + case tagToEnum (a ># b) of + DEFAULT -> e1 + 1# -> e2 + +This may generate sligthtly better code (although it should not, since +all cases are exhaustive) and/or optimise better. I'm not certain that +it's necessary, but currently we do make this change. We do it here, +NOT in the TagToEnum rules (see "Beware" in Note [caseRules for tagToEnum] +in GHC.Core.Op.ConstantFold) +-} + +-------------------------------------------------- +-- Catch-all +-------------------------------------------------- +mkCase3 _dflags scrut bndr alts_ty alts + = return (Case scrut bndr alts_ty alts) + +-- See Note [Exitification] and Note [Do not inline exit join points] in +-- GHC.Core.Op.Exitify +-- This lives here (and not in Id) because occurrence info is only valid on +-- InIds, so it's crucial that isExitJoinId is only called on freshly +-- occ-analysed code. It's not a generic function you can call anywhere. +isExitJoinId :: Var -> Bool +isExitJoinId id + = isJoinId id + && isOneOcc (idOccInfo id) + && occ_in_lam (idOccInfo id) == IsInsideLam + +{- +Note [Dead binders] +~~~~~~~~~~~~~~~~~~~~ +Note that dead-ness is maintained by the simplifier, so that it is +accurate after simplification as well as before. + + +Note [Cascading case merge] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Case merging should cascade in one sweep, because it +happens bottom-up + + case e of a { + DEFAULT -> case a of b + DEFAULT -> case b of c { + DEFAULT -> e + A -> ea + B -> eb + C -> ec +==> + case e of a { + DEFAULT -> case a of b + DEFAULT -> let c = b in e + A -> let c = b in ea + B -> eb + C -> ec +==> + case e of a { + DEFAULT -> let b = a in let c = b in e + A -> let b = a in let c = b in ea + B -> let b = a in eb + C -> ec + + +However here's a tricky case that we still don't catch, and I don't +see how to catch it in one pass: + + case x of c1 { I# a1 -> + case a1 of c2 -> + 0 -> ... + DEFAULT -> case x of c3 { I# a2 -> + case a2 of ... + +After occurrence analysis (and its binder-swap) we get this + + case x of c1 { I# a1 -> + let x = c1 in -- Binder-swap addition + case a1 of c2 -> + 0 -> ... + DEFAULT -> case x of c3 { I# a2 -> + case a2 of ... + +When we simplify the inner case x, we'll see that +x=c1=I# a1. So we'll bind a2 to a1, and get + + case x of c1 { I# a1 -> + case a1 of c2 -> + 0 -> ... + DEFAULT -> case a1 of ... + +This is correct, but we can't do a case merge in this sweep +because c2 /= a1. Reason: the binding c1=I# a1 went inwards +without getting changed to c1=I# c2. + +I don't think this is worth fixing, even if I knew how. It'll +all come out in the next pass anyway. +-} diff --git a/compiler/GHC/Core/Op/SpecConstr.hs b/compiler/GHC/Core/Op/SpecConstr.hs new file mode 100644 index 0000000000..4522e2d23c --- /dev/null +++ b/compiler/GHC/Core/Op/SpecConstr.hs @@ -0,0 +1,2360 @@ +{- +ToDo [Oct 2013] +~~~~~~~~~~~~~~~ +1. Nuke ForceSpecConstr for good (it is subsumed by GHC.Types.SPEC in ghc-prim) +2. Nuke NoSpecConstr + + +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + +\section[SpecConstr]{Specialise over constructors} +-} + +{-# LANGUAGE CPP #-} + +{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} + +module GHC.Core.Op.SpecConstr( + specConstrProgram, + SpecConstrAnnotation(..) + ) where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core +import GHC.Core.Subst +import GHC.Core.Utils +import GHC.Core.Unfold ( couldBeSmallEnoughToInline ) +import GHC.Core.FVs ( exprsFreeVarsList ) +import GHC.Core.Op.Monad +import Literal ( litIsLifted ) +import GHC.Driver.Types ( ModGuts(..) ) +import GHC.Core.Op.WorkWrap.Lib ( isWorkerSmallEnough, mkWorkerArgs ) +import GHC.Core.DataCon +import GHC.Core.Coercion hiding( substCo ) +import GHC.Core.Rules +import GHC.Core.Type hiding ( substTy ) +import GHC.Core.TyCon ( tyConName ) +import Id +import GHC.Core.Ppr ( pprParendExpr ) +import GHC.Core.Make ( mkImpossibleExpr ) +import VarEnv +import VarSet +import Name +import BasicTypes +import GHC.Driver.Session ( DynFlags(..), GeneralFlag( Opt_SpecConstrKeen ) + , gopt, hasPprDebug ) +import Maybes ( orElse, catMaybes, isJust, isNothing ) +import Demand +import Cpr +import GHC.Serialized ( deserializeWithData ) +import Util +import Pair +import UniqSupply +import Outputable +import FastString +import UniqFM +import MonadUtils +import Control.Monad ( zipWithM ) +import Data.List +import PrelNames ( specTyConName ) +import Module +import GHC.Core.TyCon ( TyCon ) +import GHC.Exts( SpecConstrAnnotation(..) ) +import Data.Ord( comparing ) + +{- +----------------------------------------------------- + Game plan +----------------------------------------------------- + +Consider + drop n [] = [] + drop 0 xs = [] + drop n (x:xs) = drop (n-1) xs + +After the first time round, we could pass n unboxed. This happens in +numerical code too. Here's what it looks like in Core: + + drop n xs = case xs of + [] -> [] + (y:ys) -> case n of + I# n# -> case n# of + 0 -> [] + _ -> drop (I# (n# -# 1#)) xs + +Notice that the recursive call has an explicit constructor as argument. +Noticing this, we can make a specialised version of drop + + RULE: drop (I# n#) xs ==> drop' n# xs + + drop' n# xs = let n = I# n# in ...orig RHS... + +Now the simplifier will apply the specialisation in the rhs of drop', giving + + drop' n# xs = case xs of + [] -> [] + (y:ys) -> case n# of + 0 -> [] + _ -> drop' (n# -# 1#) xs + +Much better! + +We'd also like to catch cases where a parameter is carried along unchanged, +but evaluated each time round the loop: + + f i n = if i>0 || i>n then i else f (i*2) n + +Here f isn't strict in n, but we'd like to avoid evaluating it each iteration. +In Core, by the time we've w/wd (f is strict in i) we get + + f i# n = case i# ># 0 of + False -> I# i# + True -> case n of { I# n# -> + case i# ># n# of + False -> I# i# + True -> f (i# *# 2#) n + +At the call to f, we see that the argument, n is known to be (I# n#), +and n is evaluated elsewhere in the body of f, so we can play the same +trick as above. + + +Note [Reboxing] +~~~~~~~~~~~~~~~ +We must be careful not to allocate the same constructor twice. Consider + f p = (...(case p of (a,b) -> e)...p..., + ...let t = (r,s) in ...t...(f t)...) +At the recursive call to f, we can see that t is a pair. But we do NOT want +to make a specialised copy: + f' a b = let p = (a,b) in (..., ...) +because now t is allocated by the caller, then r and s are passed to the +recursive call, which allocates the (r,s) pair again. + +This happens if + (a) the argument p is used in other than a case-scrutinisation way. + (b) the argument to the call is not a 'fresh' tuple; you have to + look into its unfolding to see that it's a tuple + +Hence the "OR" part of Note [Good arguments] below. + +ALTERNATIVE 2: pass both boxed and unboxed versions. This no longer saves +allocation, but does perhaps save evals. In the RULE we'd have +something like + + f (I# x#) = f' (I# x#) x# + +If at the call site the (I# x) was an unfolding, then we'd have to +rely on CSE to eliminate the duplicate allocation.... This alternative +doesn't look attractive enough to pursue. + +ALTERNATIVE 3: ignore the reboxing problem. The trouble is that +the conservative reboxing story prevents many useful functions from being +specialised. Example: + foo :: Maybe Int -> Int -> Int + foo (Just m) 0 = 0 + foo x@(Just m) n = foo x (n-m) +Here the use of 'x' will clearly not require boxing in the specialised function. + +The strictness analyser has the same problem, in fact. Example: + f p@(a,b) = ... +If we pass just 'a' and 'b' to the worker, it might need to rebox the +pair to create (a,b). A more sophisticated analysis might figure out +precisely the cases in which this could happen, but the strictness +analyser does no such analysis; it just passes 'a' and 'b', and hopes +for the best. + +So my current choice is to make SpecConstr similarly aggressive, and +ignore the bad potential of reboxing. + + +Note [Good arguments] +~~~~~~~~~~~~~~~~~~~~~ +So we look for + +* A self-recursive function. Ignore mutual recursion for now, + because it's less common, and the code is simpler for self-recursion. + +* EITHER + + a) At a recursive call, one or more parameters is an explicit + constructor application + AND + That same parameter is scrutinised by a case somewhere in + the RHS of the function + + OR + + b) At a recursive call, one or more parameters has an unfolding + that is an explicit constructor application + AND + That same parameter is scrutinised by a case somewhere in + the RHS of the function + AND + Those are the only uses of the parameter (see Note [Reboxing]) + + +What to abstract over +~~~~~~~~~~~~~~~~~~~~~ +There's a bit of a complication with type arguments. If the call +site looks like + + f p = ...f ((:) [a] x xs)... + +then our specialised function look like + + f_spec x xs = let p = (:) [a] x xs in ....as before.... + +This only makes sense if either + a) the type variable 'a' is in scope at the top of f, or + b) the type variable 'a' is an argument to f (and hence fs) + +Actually, (a) may hold for value arguments too, in which case +we may not want to pass them. Suppose 'x' is in scope at f's +defn, but xs is not. Then we'd like + + f_spec xs = let p = (:) [a] x xs in ....as before.... + +Similarly (b) may hold too. If x is already an argument at the +call, no need to pass it again. + +Finally, if 'a' is not in scope at the call site, we could abstract +it as we do the term variables: + + f_spec a x xs = let p = (:) [a] x xs in ...as before... + +So the grand plan is: + + * abstract the call site to a constructor-only pattern + e.g. C x (D (f p) (g q)) ==> C s1 (D s2 s3) + + * Find the free variables of the abstracted pattern + + * Pass these variables, less any that are in scope at + the fn defn. But see Note [Shadowing] below. + + +NOTICE that we only abstract over variables that are not in scope, +so we're in no danger of shadowing variables used in "higher up" +in f_spec's RHS. + + +Note [Shadowing] +~~~~~~~~~~~~~~~~ +In this pass we gather up usage information that may mention variables +that are bound between the usage site and the definition site; or (more +seriously) may be bound to something different at the definition site. +For example: + + f x = letrec g y v = let x = ... + in ...(g (a,b) x)... + +Since 'x' is in scope at the call site, we may make a rewrite rule that +looks like + RULE forall a,b. g (a,b) x = ... +But this rule will never match, because it's really a different 'x' at +the call site -- and that difference will be manifest by the time the +simplifier gets to it. [A worry: the simplifier doesn't *guarantee* +no-shadowing, so perhaps it may not be distinct?] + +Anyway, the rule isn't actually wrong, it's just not useful. One possibility +is to run deShadowBinds before running SpecConstr, but instead we run the +simplifier. That gives the simplest possible program for SpecConstr to +chew on; and it virtually guarantees no shadowing. + +Note [Specialising for constant parameters] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This one is about specialising on a *constant* (but not necessarily +constructor) argument + + foo :: Int -> (Int -> Int) -> Int + foo 0 f = 0 + foo m f = foo (f m) (+1) + +It produces + + lvl_rmV :: GHC.Base.Int -> GHC.Base.Int + lvl_rmV = + \ (ds_dlk :: GHC.Base.Int) -> + case ds_dlk of wild_alH { GHC.Base.I# x_alG -> + GHC.Base.I# (GHC.Prim.+# x_alG 1) + + T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) -> + GHC.Prim.Int# + T.$wfoo = + \ (ww_sme :: GHC.Prim.Int#) (w_smg :: GHC.Base.Int -> GHC.Base.Int) -> + case ww_sme of ds_Xlw { + __DEFAULT -> + case w_smg (GHC.Base.I# ds_Xlw) of w1_Xmo { GHC.Base.I# ww1_Xmz -> + T.$wfoo ww1_Xmz lvl_rmV + }; + 0 -> 0 + } + +The recursive call has lvl_rmV as its argument, so we could create a specialised copy +with that argument baked in; that is, not passed at all. Now it can perhaps be inlined. + +When is this worth it? Call the constant 'lvl' +- If 'lvl' has an unfolding that is a constructor, see if the corresponding + parameter is scrutinised anywhere in the body. + +- If 'lvl' has an unfolding that is a inlinable function, see if the corresponding + parameter is applied (...to enough arguments...?) + + Also do this is if the function has RULES? + +Also + +Note [Specialising for lambda parameters] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + foo :: Int -> (Int -> Int) -> Int + foo 0 f = 0 + foo m f = foo (f m) (\n -> n-m) + +This is subtly different from the previous one in that we get an +explicit lambda as the argument: + + T.$wfoo :: GHC.Prim.Int# -> (GHC.Base.Int -> GHC.Base.Int) -> + GHC.Prim.Int# + T.$wfoo = + \ (ww_sm8 :: GHC.Prim.Int#) (w_sma :: GHC.Base.Int -> GHC.Base.Int) -> + case ww_sm8 of ds_Xlr { + __DEFAULT -> + case w_sma (GHC.Base.I# ds_Xlr) of w1_Xmf { GHC.Base.I# ww1_Xmq -> + T.$wfoo + ww1_Xmq + (\ (n_ad3 :: GHC.Base.Int) -> + case n_ad3 of wild_alB { GHC.Base.I# x_alA -> + GHC.Base.I# (GHC.Prim.-# x_alA ds_Xlr) + }) + }; + 0 -> 0 + } + +I wonder if SpecConstr couldn't be extended to handle this? After all, +lambda is a sort of constructor for functions and perhaps it already +has most of the necessary machinery? + +Furthermore, there's an immediate win, because you don't need to allocate the lambda +at the call site; and if perchance it's called in the recursive call, then you +may avoid allocating it altogether. Just like for constructors. + +Looks cool, but probably rare...but it might be easy to implement. + + +Note [SpecConstr for casts] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + data family T a :: * + data instance T Int = T Int + + foo n = ... + where + go (T 0) = 0 + go (T n) = go (T (n-1)) + +The recursive call ends up looking like + go (T (I# ...) `cast` g) +So we want to spot the constructor application inside the cast. +That's why we have the Cast case in argToPat + +Note [Local recursive groups] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +For a *local* recursive group, we can see all the calls to the +function, so we seed the specialisation loop from the calls in the +body, not from the calls in the RHS. Consider: + + bar m n = foo n (n,n) (n,n) (n,n) (n,n) + where + foo n p q r s + | n == 0 = m + | n > 3000 = case p of { (p1,p2) -> foo (n-1) (p2,p1) q r s } + | n > 2000 = case q of { (q1,q2) -> foo (n-1) p (q2,q1) r s } + | n > 1000 = case r of { (r1,r2) -> foo (n-1) p q (r2,r1) s } + | otherwise = case s of { (s1,s2) -> foo (n-1) p q r (s2,s1) } + +If we start with the RHSs of 'foo', we get lots and lots of specialisations, +most of which are not needed. But if we start with the (single) call +in the rhs of 'bar' we get exactly one fully-specialised copy, and all +the recursive calls go to this fully-specialised copy. Indeed, the original +function is later collected as dead code. This is very important in +specialising the loops arising from stream fusion, for example in NDP where +we were getting literally hundreds of (mostly unused) specialisations of +a local function. + +In a case like the above we end up never calling the original un-specialised +function. (Although we still leave its code around just in case.) + +However, if we find any boring calls in the body, including *unsaturated* +ones, such as + letrec foo x y = ....foo... + in map foo xs +then we will end up calling the un-specialised function, so then we *should* +use the calls in the un-specialised RHS as seeds. We call these +"boring call patterns", and callsToPats reports if it finds any of these. + +Note [Seeding top-level recursive groups] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This seeding is done in the binding for seed_calls in specRec. + +1. If all the bindings in a top-level recursive group are local (not + exported), then all the calls are in the rest of the top-level + bindings. This means we can specialise with those call patterns + ONLY, and NOT with the RHSs of the recursive group (exactly like + Note [Local recursive groups]) + +2. But if any of the bindings are exported, the function may be called + with any old arguments, so (for lack of anything better) we specialise + based on + (a) the call patterns in the RHS + (b) the call patterns in the rest of the top-level bindings + NB: before Apr 15 we used (a) only, but Dimitrios had an example + where (b) was crucial, so I added that. + Adding (b) also improved nofib allocation results: + multiplier: 4% better + minimax: 2.8% better + +Actually in case (2), instead of using the calls from the RHS, it +would be better to specialise in the importing module. We'd need to +add an INLINABLE pragma to the function, and then it can be +specialised in the importing scope, just as is done for type classes +in GHC.Core.Op.Specialise.specImports. This remains to be done (#10346). + +Note [Top-level recursive groups] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To get the call usage information from "the rest of the top level +bindings" (c.f. Note [Seeding top-level recursive groups]), we work +backwards through the top-level bindings so we see the usage before we +get to the binding of the function. Before we can collect the usage +though, we go through all the bindings and add them to the +environment. This is necessary because usage is only tracked for +functions in the environment. These two passes are called + 'go' and 'goEnv' +in specConstrProgram. (Looks a bit revolting to me.) + +Note [Do not specialise diverging functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Specialising a function that just diverges is a waste of code. +Furthermore, it broke GHC (simpl014) thus: + {-# STR Sb #-} + f = \x. case x of (a,b) -> f x +If we specialise f we get + f = \x. case x of (a,b) -> fspec a b +But fspec doesn't have decent strictness info. As it happened, +(f x) :: IO t, so the state hack applied and we eta expanded fspec, +and hence f. But now f's strictness is less than its arity, which +breaks an invariant. + + +Note [Forcing specialisation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +With stream fusion and in other similar cases, we want to fully +specialise some (but not necessarily all!) loops regardless of their +size and the number of specialisations. + +We allow a library to do this, in one of two ways (one which is +deprecated): + + 1) Add a parameter of type GHC.Types.SPEC (from ghc-prim) to the loop body. + + 2) (Deprecated) Annotate a type with ForceSpecConstr from GHC.Exts, + and then add *that* type as a parameter to the loop body + +The reason #2 is deprecated is because it requires GHCi, which isn't +available for things like a cross compiler using stage1. + +Here's a (simplified) example from the `vector` package. You may bring +the special 'force specialization' type into scope by saying: + + import GHC.Types (SPEC(..)) + +or by defining your own type (again, deprecated): + + data SPEC = SPEC | SPEC2 + {-# ANN type SPEC ForceSpecConstr #-} + +(Note this is the exact same definition of GHC.Types.SPEC, just +without the annotation.) + +After that, you say: + + foldl :: (a -> b -> a) -> a -> Stream b -> a + {-# INLINE foldl #-} + foldl f z (Stream step s _) = foldl_loop SPEC z s + where + foldl_loop !sPEC z s = case step s of + Yield x s' -> foldl_loop sPEC (f z x) s' + Skip -> foldl_loop sPEC z s' + Done -> z + +SpecConstr will spot the SPEC parameter and always fully specialise +foldl_loop. Note that + + * We have to prevent the SPEC argument from being removed by + w/w which is why (a) SPEC is a sum type, and (b) we have to seq on + the SPEC argument. + + * And lastly, the SPEC argument is ultimately eliminated by + SpecConstr itself so there is no runtime overhead. + +This is all quite ugly; we ought to come up with a better design. + +ForceSpecConstr arguments are spotted in scExpr' and scTopBinds which then set +sc_force to True when calling specLoop. This flag does four things: + + * Ignore specConstrThreshold, to specialise functions of arbitrary size + (see scTopBind) + * Ignore specConstrCount, to make arbitrary numbers of specialisations + (see specialise) + * Specialise even for arguments that are not scrutinised in the loop + (see argToPat; #4448) + * Only specialise on recursive types a finite number of times + (see is_too_recursive; #5550; Note [Limit recursive specialisation]) + +The flag holds only for specialising a single binding group, and NOT +for nested bindings. (So really it should be passed around explicitly +and not stored in ScEnv.) #14379 turned out to be caused by + f SPEC x = let g1 x = ... + in ... +We force-specialise f (because of the SPEC), but that generates a specialised +copy of g1 (as well as the original). Alas g1 has a nested binding g2; and +in each copy of g1 we get an unspecialised and specialised copy of g2; and so +on. Result, exponential. So the force-spec flag now only applies to one +level of bindings at a time. + +Mechanism for this one-level-only thing: + + - Switch it on at the call to specRec, in scExpr and scTopBinds + - Switch it off when doing the RHSs; + this can be done very conveniently in decreaseSpecCount + +What alternatives did I consider? + +* Annotating the loop itself doesn't work because (a) it is local and + (b) it will be w/w'ed and having w/w propagating annotations somehow + doesn't seem like a good idea. The types of the loop arguments + really seem to be the most persistent thing. + +* Annotating the types that make up the loop state doesn't work, + either, because (a) it would prevent us from using types like Either + or tuples here, (b) we don't want to restrict the set of types that + can be used in Stream states and (c) some types are fixed by the + user (e.g., the accumulator here) but we still want to specialise as + much as possible. + +Alternatives to ForceSpecConstr +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Instead of giving the loop an extra argument of type SPEC, we +also considered *wrapping* arguments in SPEC, thus + data SPEC a = SPEC a | SPEC2 + + loop = \arg -> case arg of + SPEC state -> + case state of (x,y) -> ... loop (SPEC (x',y')) ... + S2 -> error ... +The idea is that a SPEC argument says "specialise this argument +regardless of whether the function case-analyses it". But this +doesn't work well: + * SPEC must still be a sum type, else the strictness analyser + eliminates it + * But that means that 'loop' won't be strict in its real payload +This loss of strictness in turn screws up specialisation, because +we may end up with calls like + loop (SPEC (case z of (p,q) -> (q,p))) +Without the SPEC, if 'loop' were strict, the case would move out +and we'd see loop applied to a pair. But if 'loop' isn't strict +this doesn't look like a specialisable call. + +Note [Limit recursive specialisation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It is possible for ForceSpecConstr to cause an infinite loop of specialisation. +Because there is no limit on the number of specialisations, a recursive call with +a recursive constructor as an argument (for example, list cons) will generate +a specialisation for that constructor. If the resulting specialisation also +contains a recursive call with the constructor, this could proceed indefinitely. + +For example, if ForceSpecConstr is on: + loop :: [Int] -> [Int] -> [Int] + loop z [] = z + loop z (x:xs) = loop (x:z) xs +this example will create a specialisation for the pattern + loop (a:b) c = loop' a b c + + loop' a b [] = (a:b) + loop' a b (x:xs) = loop (x:(a:b)) xs +and a new pattern is found: + loop (a:(b:c)) d = loop'' a b c d +which can continue indefinitely. + +Roman's suggestion to fix this was to stop after a couple of times on recursive types, +but still specialising on non-recursive types as much as possible. + +To implement this, we count the number of times we have gone round the +"specialise recursively" loop ('go' in 'specRec'). Once have gone round +more than N times (controlled by -fspec-constr-recursive=N) we check + + - If sc_force is off, and sc_count is (Just max) then we don't + need to do anything: trim_pats will limit the number of specs + + - Otherwise check if any function has now got more than (sc_count env) + specialisations. If sc_count is "no limit" then we arbitrarily + choose 10 as the limit (ugh). + +See #5550. Also #13623, where this test had become over-aggressive, +and we lost a wonderful specialisation that we really wanted! + +Note [NoSpecConstr] +~~~~~~~~~~~~~~~~~~~ +The ignoreDataCon stuff allows you to say + {-# ANN type T NoSpecConstr #-} +to mean "don't specialise on arguments of this type". It was added +before we had ForceSpecConstr. Lacking ForceSpecConstr we specialised +regardless of size; and then we needed a way to turn that *off*. Now +that we have ForceSpecConstr, this NoSpecConstr is probably redundant. +(Used only for PArray, TODO: remove?) + +----------------------------------------------------- + Stuff not yet handled +----------------------------------------------------- + +Here are notes arising from Roman's work that I don't want to lose. + +Example 1 +~~~~~~~~~ + data T a = T !a + + foo :: Int -> T Int -> Int + foo 0 t = 0 + foo x t | even x = case t of { T n -> foo (x-n) t } + | otherwise = foo (x-1) t + +SpecConstr does no specialisation, because the second recursive call +looks like a boxed use of the argument. A pity. + + $wfoo_sFw :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int# + $wfoo_sFw = + \ (ww_sFo [Just L] :: GHC.Prim.Int#) (w_sFq [Just L] :: T.T GHC.Base.Int) -> + case ww_sFo of ds_Xw6 [Just L] { + __DEFAULT -> + case GHC.Prim.remInt# ds_Xw6 2 of wild1_aEF [Dead Just A] { + __DEFAULT -> $wfoo_sFw (GHC.Prim.-# ds_Xw6 1) w_sFq; + 0 -> + case w_sFq of wild_Xy [Just L] { T.T n_ad5 [Just U(L)] -> + case n_ad5 of wild1_aET [Just A] { GHC.Base.I# y_aES [Just L] -> + $wfoo_sFw (GHC.Prim.-# ds_Xw6 y_aES) wild_Xy + } } }; + 0 -> 0 + +Example 2 +~~~~~~~~~ + data a :*: b = !a :*: !b + data T a = T !a + + foo :: (Int :*: T Int) -> Int + foo (0 :*: t) = 0 + foo (x :*: t) | even x = case t of { T n -> foo ((x-n) :*: t) } + | otherwise = foo ((x-1) :*: t) + +Very similar to the previous one, except that the parameters are now in +a strict tuple. Before SpecConstr, we have + + $wfoo_sG3 :: GHC.Prim.Int# -> T.T GHC.Base.Int -> GHC.Prim.Int# + $wfoo_sG3 = + \ (ww_sFU [Just L] :: GHC.Prim.Int#) (ww_sFW [Just L] :: T.T + GHC.Base.Int) -> + case ww_sFU of ds_Xws [Just L] { + __DEFAULT -> + case GHC.Prim.remInt# ds_Xws 2 of wild1_aEZ [Dead Just A] { + __DEFAULT -> + case ww_sFW of tpl_B2 [Just L] { T.T a_sFo [Just A] -> + $wfoo_sG3 (GHC.Prim.-# ds_Xws 1) tpl_B2 -- $wfoo1 + }; + 0 -> + case ww_sFW of wild_XB [Just A] { T.T n_ad7 [Just S(L)] -> + case n_ad7 of wild1_aFd [Just L] { GHC.Base.I# y_aFc [Just L] -> + $wfoo_sG3 (GHC.Prim.-# ds_Xws y_aFc) wild_XB -- $wfoo2 + } } }; + 0 -> 0 } + +We get two specialisations: +"SC:$wfoo1" [0] __forall {a_sFB :: GHC.Base.Int sc_sGC :: GHC.Prim.Int#} + Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int a_sFB) + = Foo.$s$wfoo1 a_sFB sc_sGC ; +"SC:$wfoo2" [0] __forall {y_aFp :: GHC.Prim.Int# sc_sGC :: GHC.Prim.Int#} + Foo.$wfoo sc_sGC (Foo.T @ GHC.Base.Int (GHC.Base.I# y_aFp)) + = Foo.$s$wfoo y_aFp sc_sGC ; + +But perhaps the first one isn't good. After all, we know that tpl_B2 is +a T (I# x) really, because T is strict and Int has one constructor. (We can't +unbox the strict fields, because T is polymorphic!) + +************************************************************************ +* * +\subsection{Top level wrapper stuff} +* * +************************************************************************ +-} + +specConstrProgram :: ModGuts -> CoreM ModGuts +specConstrProgram guts + = do + dflags <- getDynFlags + us <- getUniqueSupplyM + (_, annos) <- getFirstAnnotations deserializeWithData guts + this_mod <- getModule + let binds' = reverse $ fst $ initUs us $ do + -- Note [Top-level recursive groups] + (env, binds) <- goEnv (initScEnv dflags this_mod annos) + (mg_binds guts) + -- binds is identical to (mg_binds guts), except that the + -- binders on the LHS have been replaced by extendBndr + -- (SPJ this seems like overkill; I don't think the binders + -- will change at all; and we don't substitute in the RHSs anyway!!) + go env nullUsage (reverse binds) + + return (guts { mg_binds = binds' }) + where + -- See Note [Top-level recursive groups] + goEnv env [] = return (env, []) + goEnv env (bind:binds) = do (env', bind') <- scTopBindEnv env bind + (env'', binds') <- goEnv env' binds + return (env'', bind' : binds') + + -- Arg list of bindings is in reverse order + go _ _ [] = return [] + go env usg (bind:binds) = do (usg', bind') <- scTopBind env usg bind + binds' <- go env usg' binds + return (bind' : binds') + +{- +************************************************************************ +* * +\subsection{Environment: goes downwards} +* * +************************************************************************ + +Note [Work-free values only in environment] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The sc_vals field keeps track of in-scope value bindings, so +that if we come across (case x of Just y ->...) we can reduce the +case from knowing that x is bound to a pair. + +But only *work-free* values are ok here. For example if the envt had + x -> Just (expensive v) +then we do NOT want to expand to + let y = expensive v in ... +because the x-binding still exists and we've now duplicated (expensive v). + +This seldom happens because let-bound constructor applications are +ANF-ised, but it can happen as a result of on-the-fly transformations in +SpecConstr itself. Here is #7865: + + let { + a'_shr = + case xs_af8 of _ { + [] -> acc_af6; + : ds_dgt [Dmd=<L,A>] ds_dgu [Dmd=<L,A>] -> + (expensive x_af7, x_af7 + } } in + let { + ds_sht = + case a'_shr of _ { (p'_afd, q'_afe) -> + TSpecConstr_DoubleInline.recursive + (GHC.Types.: @ GHC.Types.Int x_af7 wild_X6) (q'_afe, p'_afd) + } } in + +When processed knowing that xs_af8 was bound to a cons, we simplify to + a'_shr = (expensive x_af7, x_af7) +and we do NOT want to inline that at the occurrence of a'_shr in ds_sht. +(There are other occurrences of a'_shr.) No no no. + +It would be possible to do some on-the-fly ANF-ising, so that a'_shr turned +into a work-free value again, thus + a1 = expensive x_af7 + a'_shr = (a1, x_af7) +but that's more work, so until its shown to be important I'm going to +leave it for now. + +Note [Making SpecConstr keener] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this, in (perf/should_run/T9339) + last (filter odd [1..1000]) + +After optimisation, including SpecConstr, we get: + f :: Int# -> Int -> Int + f x y = case case remInt# x 2# of + __DEFAULT -> case x of + __DEFAULT -> f (+# wild_Xp 1#) (I# x) + 1000000# -> ... + 0# -> case x of + __DEFAULT -> f (+# wild_Xp 1#) y + 1000000# -> y + +Not good! We build an (I# x) box every time around the loop. +SpecConstr (as described in the paper) does not specialise f, despite +the call (f ... (I# x)) because 'y' is not scrutinised in the body. +But it is much better to specialise f for the case where the argument +is of form (I# x); then we build the box only when returning y, which +is on the cold path. + +Another example: + + f x = ...(g x).... + +Here 'x' is not scrutinised in f's body; but if we did specialise 'f' +then the call (g x) might allow 'g' to be specialised in turn. + +So sc_keen controls whether or not we take account of whether argument is +scrutinised in the body. True <=> ignore that, and specialise whenever +the function is applied to a data constructor. +-} + +data ScEnv = SCE { sc_dflags :: DynFlags, + sc_module :: !Module, + sc_size :: Maybe Int, -- Size threshold + -- Nothing => no limit + + sc_count :: Maybe Int, -- Max # of specialisations for any one fn + -- Nothing => no limit + -- See Note [Avoiding exponential blowup] + + sc_recursive :: Int, -- Max # of specialisations over recursive type. + -- Stops ForceSpecConstr from diverging. + + sc_keen :: Bool, -- Specialise on arguments that are known + -- constructors, even if they are not + -- scrutinised in the body. See + -- Note [Making SpecConstr keener] + + sc_force :: Bool, -- Force specialisation? + -- See Note [Forcing specialisation] + + sc_subst :: Subst, -- Current substitution + -- Maps InIds to OutExprs + + sc_how_bound :: HowBoundEnv, + -- Binds interesting non-top-level variables + -- Domain is OutVars (*after* applying the substitution) + + sc_vals :: ValueEnv, + -- Domain is OutIds (*after* applying the substitution) + -- Used even for top-level bindings (but not imported ones) + -- The range of the ValueEnv is *work-free* values + -- such as (\x. blah), or (Just v) + -- but NOT (Just (expensive v)) + -- See Note [Work-free values only in environment] + + sc_annotations :: UniqFM SpecConstrAnnotation + } + +--------------------- +type HowBoundEnv = VarEnv HowBound -- Domain is OutVars + +--------------------- +type ValueEnv = IdEnv Value -- Domain is OutIds +data Value = ConVal AltCon [CoreArg] -- _Saturated_ constructors + -- The AltCon is never DEFAULT + | LambdaVal -- Inlinable lambdas or PAPs + +instance Outputable Value where + ppr (ConVal con args) = ppr con <+> interpp'SP args + ppr LambdaVal = text "<Lambda>" + +--------------------- +initScEnv :: DynFlags -> Module -> UniqFM SpecConstrAnnotation -> ScEnv +initScEnv dflags this_mod anns + = SCE { sc_dflags = dflags, + sc_module = this_mod, + sc_size = specConstrThreshold dflags, + sc_count = specConstrCount dflags, + sc_recursive = specConstrRecursive dflags, + sc_keen = gopt Opt_SpecConstrKeen dflags, + sc_force = False, + sc_subst = emptySubst, + sc_how_bound = emptyVarEnv, + sc_vals = emptyVarEnv, + sc_annotations = anns } + +data HowBound = RecFun -- These are the recursive functions for which + -- we seek interesting call patterns + + | RecArg -- These are those functions' arguments, or their sub-components; + -- we gather occurrence information for these + +instance Outputable HowBound where + ppr RecFun = text "RecFun" + ppr RecArg = text "RecArg" + +scForce :: ScEnv -> Bool -> ScEnv +scForce env b = env { sc_force = b } + +lookupHowBound :: ScEnv -> Id -> Maybe HowBound +lookupHowBound env id = lookupVarEnv (sc_how_bound env) id + +scSubstId :: ScEnv -> Id -> CoreExpr +scSubstId env v = lookupIdSubst (text "scSubstId") (sc_subst env) v + +scSubstTy :: ScEnv -> Type -> Type +scSubstTy env ty = substTy (sc_subst env) ty + +scSubstCo :: ScEnv -> Coercion -> Coercion +scSubstCo env co = substCo (sc_subst env) co + +zapScSubst :: ScEnv -> ScEnv +zapScSubst env = env { sc_subst = zapSubstEnv (sc_subst env) } + +extendScInScope :: ScEnv -> [Var] -> ScEnv + -- Bring the quantified variables into scope +extendScInScope env qvars = env { sc_subst = extendInScopeList (sc_subst env) qvars } + + -- Extend the substitution +extendScSubst :: ScEnv -> Var -> OutExpr -> ScEnv +extendScSubst env var expr = env { sc_subst = extendSubst (sc_subst env) var expr } + +extendScSubstList :: ScEnv -> [(Var,OutExpr)] -> ScEnv +extendScSubstList env prs = env { sc_subst = extendSubstList (sc_subst env) prs } + +extendHowBound :: ScEnv -> [Var] -> HowBound -> ScEnv +extendHowBound env bndrs how_bound + = env { sc_how_bound = extendVarEnvList (sc_how_bound env) + [(bndr,how_bound) | bndr <- bndrs] } + +extendBndrsWith :: HowBound -> ScEnv -> [Var] -> (ScEnv, [Var]) +extendBndrsWith how_bound env bndrs + = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndrs') + where + (subst', bndrs') = substBndrs (sc_subst env) bndrs + hb_env' = sc_how_bound env `extendVarEnvList` + [(bndr,how_bound) | bndr <- bndrs'] + +extendBndrWith :: HowBound -> ScEnv -> Var -> (ScEnv, Var) +extendBndrWith how_bound env bndr + = (env { sc_subst = subst', sc_how_bound = hb_env' }, bndr') + where + (subst', bndr') = substBndr (sc_subst env) bndr + hb_env' = extendVarEnv (sc_how_bound env) bndr' how_bound + +extendRecBndrs :: ScEnv -> [Var] -> (ScEnv, [Var]) +extendRecBndrs env bndrs = (env { sc_subst = subst' }, bndrs') + where + (subst', bndrs') = substRecBndrs (sc_subst env) bndrs + +extendBndr :: ScEnv -> Var -> (ScEnv, Var) +extendBndr env bndr = (env { sc_subst = subst' }, bndr') + where + (subst', bndr') = substBndr (sc_subst env) bndr + +extendValEnv :: ScEnv -> Id -> Maybe Value -> ScEnv +extendValEnv env _ Nothing = env +extendValEnv env id (Just cv) + | valueIsWorkFree cv -- Don't duplicate work!! #7865 + = env { sc_vals = extendVarEnv (sc_vals env) id cv } +extendValEnv env _ _ = env + +extendCaseBndrs :: ScEnv -> OutExpr -> OutId -> AltCon -> [Var] -> (ScEnv, [Var]) +-- When we encounter +-- case scrut of b +-- C x y -> ... +-- we want to bind b, to (C x y) +-- NB1: Extends only the sc_vals part of the envt +-- NB2: Kill the dead-ness info on the pattern binders x,y, since +-- they are potentially made alive by the [b -> C x y] binding +extendCaseBndrs env scrut case_bndr con alt_bndrs + = (env2, alt_bndrs') + where + live_case_bndr = not (isDeadBinder case_bndr) + env1 | Var v <- stripTicksTopE (const True) scrut + = extendValEnv env v cval + | otherwise = env -- See Note [Add scrutinee to ValueEnv too] + env2 | live_case_bndr = extendValEnv env1 case_bndr cval + | otherwise = env1 + + alt_bndrs' | case scrut of { Var {} -> True; _ -> live_case_bndr } + = map zap alt_bndrs + | otherwise + = alt_bndrs + + cval = case con of + DEFAULT -> Nothing + LitAlt {} -> Just (ConVal con []) + DataAlt {} -> Just (ConVal con vanilla_args) + where + vanilla_args = map Type (tyConAppArgs (idType case_bndr)) ++ + varsToCoreExprs alt_bndrs + + zap v | isTyVar v = v -- See NB2 above + | otherwise = zapIdOccInfo v + + +decreaseSpecCount :: ScEnv -> Int -> ScEnv +-- See Note [Avoiding exponential blowup] +decreaseSpecCount env n_specs + = env { sc_force = False -- See Note [Forcing specialisation] + , sc_count = case sc_count env of + Nothing -> Nothing + Just n -> Just (n `div` (n_specs + 1)) } + -- The "+1" takes account of the original function; + -- See Note [Avoiding exponential blowup] + +--------------------------------------------------- +-- See Note [Forcing specialisation] +ignoreType :: ScEnv -> Type -> Bool +ignoreDataCon :: ScEnv -> DataCon -> Bool +forceSpecBndr :: ScEnv -> Var -> Bool + +ignoreDataCon env dc = ignoreTyCon env (dataConTyCon dc) + +ignoreType env ty + = case tyConAppTyCon_maybe ty of + Just tycon -> ignoreTyCon env tycon + _ -> False + +ignoreTyCon :: ScEnv -> TyCon -> Bool +ignoreTyCon env tycon + = lookupUFM (sc_annotations env) tycon == Just NoSpecConstr + +forceSpecBndr env var = forceSpecFunTy env . snd . splitForAllTys . varType $ var + +forceSpecFunTy :: ScEnv -> Type -> Bool +forceSpecFunTy env = any (forceSpecArgTy env) . fst . splitFunTys + +forceSpecArgTy :: ScEnv -> Type -> Bool +forceSpecArgTy env ty + | Just ty' <- coreView ty = forceSpecArgTy env ty' + +forceSpecArgTy env ty + | Just (tycon, tys) <- splitTyConApp_maybe ty + , tycon /= funTyCon + = tyConName tycon == specTyConName + || lookupUFM (sc_annotations env) tycon == Just ForceSpecConstr + || any (forceSpecArgTy env) tys + +forceSpecArgTy _ _ = False + +{- +Note [Add scrutinee to ValueEnv too] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this: + case x of y + (a,b) -> case b of c + I# v -> ...(f y)... +By the time we get to the call (f y), the ValueEnv +will have a binding for y, and for c + y -> (a,b) + c -> I# v +BUT that's not enough! Looking at the call (f y) we +see that y is pair (a,b), but we also need to know what 'b' is. +So in extendCaseBndrs we must *also* add the binding + b -> I# v +else we lose a useful specialisation for f. This is necessary even +though the simplifier has systematically replaced uses of 'x' with 'y' +and 'b' with 'c' in the code. The use of 'b' in the ValueEnv came +from outside the case. See #4908 for the live example. + +Note [Avoiding exponential blowup] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The sc_count field of the ScEnv says how many times we are prepared to +duplicate a single function. But we must take care with recursive +specialisations. Consider + + let $j1 = let $j2 = let $j3 = ... + in + ...$j3... + in + ...$j2... + in + ...$j1... + +If we specialise $j1 then in each specialisation (as well as the original) +we can specialise $j2, and similarly $j3. Even if we make just *one* +specialisation of each, because we also have the original we'll get 2^n +copies of $j3, which is not good. + +So when recursively specialising we divide the sc_count by the number of +copies we are making at this level, including the original. + + +************************************************************************ +* * +\subsection{Usage information: flows upwards} +* * +************************************************************************ +-} + +data ScUsage + = SCU { + scu_calls :: CallEnv, -- Calls + -- The functions are a subset of the + -- RecFuns in the ScEnv + + scu_occs :: !(IdEnv ArgOcc) -- Information on argument occurrences + } -- The domain is OutIds + +type CallEnv = IdEnv [Call] +data Call = Call Id [CoreArg] ValueEnv + -- The arguments of the call, together with the + -- env giving the constructor bindings at the call site + -- We keep the function mainly for debug output + +instance Outputable ScUsage where + ppr (SCU { scu_calls = calls, scu_occs = occs }) + = text "SCU" <+> braces (sep [ ptext (sLit "calls =") <+> ppr calls + , text "occs =" <+> ppr occs ]) + +instance Outputable Call where + ppr (Call fn args _) = ppr fn <+> fsep (map pprParendExpr args) + +nullUsage :: ScUsage +nullUsage = SCU { scu_calls = emptyVarEnv, scu_occs = emptyVarEnv } + +combineCalls :: CallEnv -> CallEnv -> CallEnv +combineCalls = plusVarEnv_C (++) + where +-- plus cs ds | length res > 1 +-- = pprTrace "combineCalls" (vcat [ text "cs:" <+> ppr cs +-- , text "ds:" <+> ppr ds]) +-- res +-- | otherwise = res +-- where +-- res = cs ++ ds + +combineUsage :: ScUsage -> ScUsage -> ScUsage +combineUsage u1 u2 = SCU { scu_calls = combineCalls (scu_calls u1) (scu_calls u2), + scu_occs = plusVarEnv_C combineOcc (scu_occs u1) (scu_occs u2) } + +combineUsages :: [ScUsage] -> ScUsage +combineUsages [] = nullUsage +combineUsages us = foldr1 combineUsage us + +lookupOccs :: ScUsage -> [OutVar] -> (ScUsage, [ArgOcc]) +lookupOccs (SCU { scu_calls = sc_calls, scu_occs = sc_occs }) bndrs + = (SCU {scu_calls = sc_calls, scu_occs = delVarEnvList sc_occs bndrs}, + [lookupVarEnv sc_occs b `orElse` NoOcc | b <- bndrs]) + +data ArgOcc = NoOcc -- Doesn't occur at all; or a type argument + | UnkOcc -- Used in some unknown way + + | ScrutOcc -- See Note [ScrutOcc] + (DataConEnv [ArgOcc]) -- How the sub-components are used + +type DataConEnv a = UniqFM a -- Keyed by DataCon + +{- Note [ScrutOcc] +~~~~~~~~~~~~~~~~~~~ +An occurrence of ScrutOcc indicates that the thing, or a `cast` version of the thing, +is *only* taken apart or applied. + + Functions, literal: ScrutOcc emptyUFM + Data constructors: ScrutOcc subs, + +where (subs :: UniqFM [ArgOcc]) gives usage of the *pattern-bound* components, +The domain of the UniqFM is the Unique of the data constructor + +The [ArgOcc] is the occurrences of the *pattern-bound* components +of the data structure. E.g. + data T a = forall b. MkT a b (b->a) +A pattern binds b, x::a, y::b, z::b->a, but not 'a'! + +-} + +instance Outputable ArgOcc where + ppr (ScrutOcc xs) = text "scrut-occ" <> ppr xs + ppr UnkOcc = text "unk-occ" + ppr NoOcc = text "no-occ" + +evalScrutOcc :: ArgOcc +evalScrutOcc = ScrutOcc emptyUFM + +-- Experimentally, this version of combineOcc makes ScrutOcc "win", so +-- that if the thing is scrutinised anywhere then we get to see that +-- in the overall result, even if it's also used in a boxed way +-- This might be too aggressive; see Note [Reboxing] Alternative 3 +combineOcc :: ArgOcc -> ArgOcc -> ArgOcc +combineOcc NoOcc occ = occ +combineOcc occ NoOcc = occ +combineOcc (ScrutOcc xs) (ScrutOcc ys) = ScrutOcc (plusUFM_C combineOccs xs ys) +combineOcc UnkOcc (ScrutOcc ys) = ScrutOcc ys +combineOcc (ScrutOcc xs) UnkOcc = ScrutOcc xs +combineOcc UnkOcc UnkOcc = UnkOcc + +combineOccs :: [ArgOcc] -> [ArgOcc] -> [ArgOcc] +combineOccs xs ys = zipWithEqual "combineOccs" combineOcc xs ys + +setScrutOcc :: ScEnv -> ScUsage -> OutExpr -> ArgOcc -> ScUsage +-- _Overwrite_ the occurrence info for the scrutinee, if the scrutinee +-- is a variable, and an interesting variable +setScrutOcc env usg (Cast e _) occ = setScrutOcc env usg e occ +setScrutOcc env usg (Tick _ e) occ = setScrutOcc env usg e occ +setScrutOcc env usg (Var v) occ + | Just RecArg <- lookupHowBound env v = usg { scu_occs = extendVarEnv (scu_occs usg) v occ } + | otherwise = usg +setScrutOcc _env usg _other _occ -- Catch-all + = usg + +{- +************************************************************************ +* * +\subsection{The main recursive function} +* * +************************************************************************ + +The main recursive function gathers up usage information, and +creates specialised versions of functions. +-} + +scExpr, scExpr' :: ScEnv -> CoreExpr -> UniqSM (ScUsage, CoreExpr) + -- The unique supply is needed when we invent + -- a new name for the specialised function and its args + +scExpr env e = scExpr' env e + +scExpr' env (Var v) = case scSubstId env v of + Var v' -> return (mkVarUsage env v' [], Var v') + e' -> scExpr (zapScSubst env) e' + +scExpr' env (Type t) = return (nullUsage, Type (scSubstTy env t)) +scExpr' env (Coercion c) = return (nullUsage, Coercion (scSubstCo env c)) +scExpr' _ e@(Lit {}) = return (nullUsage, e) +scExpr' env (Tick t e) = do (usg, e') <- scExpr env e + return (usg, Tick t e') +scExpr' env (Cast e co) = do (usg, e') <- scExpr env e + return (usg, mkCast e' (scSubstCo env co)) + -- Important to use mkCast here + -- See Note [SpecConstr call patterns] +scExpr' env e@(App _ _) = scApp env (collectArgs e) +scExpr' env (Lam b e) = do let (env', b') = extendBndr env b + (usg, e') <- scExpr env' e + return (usg, Lam b' e') + +scExpr' env (Case scrut b ty alts) + = do { (scrut_usg, scrut') <- scExpr env scrut + ; case isValue (sc_vals env) scrut' of + Just (ConVal con args) -> sc_con_app con args scrut' + _other -> sc_vanilla scrut_usg scrut' + } + where + sc_con_app con args scrut' -- Known constructor; simplify + = do { let (_, bs, rhs) = findAlt con alts + `orElse` (DEFAULT, [], mkImpossibleExpr ty) + alt_env' = extendScSubstList env ((b,scrut') : bs `zip` trimConArgs con args) + ; scExpr alt_env' rhs } + + sc_vanilla scrut_usg scrut' -- Normal case + = do { let (alt_env,b') = extendBndrWith RecArg env b + -- Record RecArg for the components + + ; (alt_usgs, alt_occs, alts') + <- mapAndUnzip3M (sc_alt alt_env scrut' b') alts + + ; let scrut_occ = foldr combineOcc NoOcc alt_occs + scrut_usg' = setScrutOcc env scrut_usg scrut' scrut_occ + -- The combined usage of the scrutinee is given + -- by scrut_occ, which is passed to scScrut, which + -- in turn treats a bare-variable scrutinee specially + + ; return (foldr combineUsage scrut_usg' alt_usgs, + Case scrut' b' (scSubstTy env ty) alts') } + + sc_alt env scrut' b' (con,bs,rhs) + = do { let (env1, bs1) = extendBndrsWith RecArg env bs + (env2, bs2) = extendCaseBndrs env1 scrut' b' con bs1 + ; (usg, rhs') <- scExpr env2 rhs + ; let (usg', b_occ:arg_occs) = lookupOccs usg (b':bs2) + scrut_occ = case con of + DataAlt dc -> ScrutOcc (unitUFM dc arg_occs) + _ -> ScrutOcc emptyUFM + ; return (usg', b_occ `combineOcc` scrut_occ, (con, bs2, rhs')) } + +scExpr' env (Let (NonRec bndr rhs) body) + | isTyVar bndr -- Type-lets may be created by doBeta + = scExpr' (extendScSubst env bndr rhs) body + + | otherwise + = do { let (body_env, bndr') = extendBndr env bndr + ; rhs_info <- scRecRhs env (bndr',rhs) + + ; let body_env2 = extendHowBound body_env [bndr'] RecFun + -- Note [Local let bindings] + rhs' = ri_new_rhs rhs_info + body_env3 = extendValEnv body_env2 bndr' (isValue (sc_vals env) rhs') + + ; (body_usg, body') <- scExpr body_env3 body + + -- NB: For non-recursive bindings we inherit sc_force flag from + -- the parent function (see Note [Forcing specialisation]) + ; (spec_usg, specs) <- specNonRec env body_usg rhs_info + + ; return (body_usg { scu_calls = scu_calls body_usg `delVarEnv` bndr' } + `combineUsage` spec_usg, -- Note [spec_usg includes rhs_usg] + mkLets [NonRec b r | (b,r) <- ruleInfoBinds rhs_info specs] body') + } + + +-- A *local* recursive group: see Note [Local recursive groups] +scExpr' env (Let (Rec prs) body) + = do { let (bndrs,rhss) = unzip prs + (rhs_env1,bndrs') = extendRecBndrs env bndrs + rhs_env2 = extendHowBound rhs_env1 bndrs' RecFun + force_spec = any (forceSpecBndr env) bndrs' + -- Note [Forcing specialisation] + + ; rhs_infos <- mapM (scRecRhs rhs_env2) (bndrs' `zip` rhss) + ; (body_usg, body') <- scExpr rhs_env2 body + + -- NB: start specLoop from body_usg + ; (spec_usg, specs) <- specRec NotTopLevel (scForce rhs_env2 force_spec) + body_usg rhs_infos + -- Do not unconditionally generate specialisations from rhs_usgs + -- Instead use them only if we find an unspecialised call + -- See Note [Local recursive groups] + + ; let all_usg = spec_usg `combineUsage` body_usg -- Note [spec_usg includes rhs_usg] + bind' = Rec (concat (zipWith ruleInfoBinds rhs_infos specs)) + + ; return (all_usg { scu_calls = scu_calls all_usg `delVarEnvList` bndrs' }, + Let bind' body') } + +{- +Note [Local let bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~ +It is not uncommon to find this + + let $j = \x. <blah> in ...$j True...$j True... + +Here $j is an arbitrary let-bound function, but it often comes up for +join points. We might like to specialise $j for its call patterns. +Notice the difference from a letrec, where we look for call patterns +in the *RHS* of the function. Here we look for call patterns in the +*body* of the let. + +At one point I predicated this on the RHS mentioning the outer +recursive function, but that's not essential and might even be +harmful. I'm not sure. +-} + +scApp :: ScEnv -> (InExpr, [InExpr]) -> UniqSM (ScUsage, CoreExpr) + +scApp env (Var fn, args) -- Function is a variable + = ASSERT( not (null args) ) + do { args_w_usgs <- mapM (scExpr env) args + ; let (arg_usgs, args') = unzip args_w_usgs + arg_usg = combineUsages arg_usgs + ; case scSubstId env fn of + fn'@(Lam {}) -> scExpr (zapScSubst env) (doBeta fn' args') + -- Do beta-reduction and try again + + Var fn' -> return (arg_usg `combineUsage` mkVarUsage env fn' args', + mkApps (Var fn') args') + + other_fn' -> return (arg_usg, mkApps other_fn' args') } + -- NB: doing this ignores any usage info from the substituted + -- function, but I don't think that matters. If it does + -- we can fix it. + where + doBeta :: OutExpr -> [OutExpr] -> OutExpr + -- ToDo: adjust for System IF + doBeta (Lam bndr body) (arg : args) = Let (NonRec bndr arg) (doBeta body args) + doBeta fn args = mkApps fn args + +-- The function is almost always a variable, but not always. +-- In particular, if this pass follows float-in, +-- which it may, we can get +-- (let f = ...f... in f) arg1 arg2 +scApp env (other_fn, args) + = do { (fn_usg, fn') <- scExpr env other_fn + ; (arg_usgs, args') <- mapAndUnzipM (scExpr env) args + ; return (combineUsages arg_usgs `combineUsage` fn_usg, mkApps fn' args') } + +---------------------- +mkVarUsage :: ScEnv -> Id -> [CoreExpr] -> ScUsage +mkVarUsage env fn args + = case lookupHowBound env fn of + Just RecFun -> SCU { scu_calls = unitVarEnv fn [Call fn args (sc_vals env)] + , scu_occs = emptyVarEnv } + Just RecArg -> SCU { scu_calls = emptyVarEnv + , scu_occs = unitVarEnv fn arg_occ } + Nothing -> nullUsage + where + -- I rather think we could use UnkOcc all the time + arg_occ | null args = UnkOcc + | otherwise = evalScrutOcc + +---------------------- +scTopBindEnv :: ScEnv -> CoreBind -> UniqSM (ScEnv, CoreBind) +scTopBindEnv env (Rec prs) + = do { let (rhs_env1,bndrs') = extendRecBndrs env bndrs + rhs_env2 = extendHowBound rhs_env1 bndrs RecFun + + prs' = zip bndrs' rhss + ; return (rhs_env2, Rec prs') } + where + (bndrs,rhss) = unzip prs + +scTopBindEnv env (NonRec bndr rhs) + = do { let (env1, bndr') = extendBndr env bndr + env2 = extendValEnv env1 bndr' (isValue (sc_vals env) rhs) + ; return (env2, NonRec bndr' rhs) } + +---------------------- +scTopBind :: ScEnv -> ScUsage -> CoreBind -> UniqSM (ScUsage, CoreBind) + +{- +scTopBind _ usage _ + | pprTrace "scTopBind_usage" (ppr (scu_calls usage)) False + = error "false" +-} + +scTopBind env body_usage (Rec prs) + | Just threshold <- sc_size env + , not force_spec + , not (all (couldBeSmallEnoughToInline (sc_dflags env) threshold) rhss) + -- No specialisation + = -- pprTrace "scTopBind: nospec" (ppr bndrs) $ + do { (rhs_usgs, rhss') <- mapAndUnzipM (scExpr env) rhss + ; return (body_usage `combineUsage` combineUsages rhs_usgs, Rec (bndrs `zip` rhss')) } + + | otherwise -- Do specialisation + = do { rhs_infos <- mapM (scRecRhs env) prs + + ; (spec_usage, specs) <- specRec TopLevel (scForce env force_spec) + body_usage rhs_infos + + ; return (body_usage `combineUsage` spec_usage, + Rec (concat (zipWith ruleInfoBinds rhs_infos specs))) } + where + (bndrs,rhss) = unzip prs + force_spec = any (forceSpecBndr env) bndrs + -- Note [Forcing specialisation] + +scTopBind env usage (NonRec bndr rhs) -- Oddly, we don't seem to specialise top-level non-rec functions + = do { (rhs_usg', rhs') <- scExpr env rhs + ; return (usage `combineUsage` rhs_usg', NonRec bndr rhs') } + +---------------------- +scRecRhs :: ScEnv -> (OutId, InExpr) -> UniqSM RhsInfo +scRecRhs env (bndr,rhs) + = do { let (arg_bndrs,body) = collectBinders rhs + (body_env, arg_bndrs') = extendBndrsWith RecArg env arg_bndrs + ; (body_usg, body') <- scExpr body_env body + ; let (rhs_usg, arg_occs) = lookupOccs body_usg arg_bndrs' + ; return (RI { ri_rhs_usg = rhs_usg + , ri_fn = bndr, ri_new_rhs = mkLams arg_bndrs' body' + , ri_lam_bndrs = arg_bndrs, ri_lam_body = body + , ri_arg_occs = arg_occs }) } + -- The arg_occs says how the visible, + -- lambda-bound binders of the RHS are used + -- (including the TyVar binders) + -- Two pats are the same if they match both ways + +---------------------- +ruleInfoBinds :: RhsInfo -> SpecInfo -> [(Id,CoreExpr)] +ruleInfoBinds (RI { ri_fn = fn, ri_new_rhs = new_rhs }) + (SI { si_specs = specs }) + = [(id,rhs) | OS { os_id = id, os_rhs = rhs } <- specs] ++ + -- First the specialised bindings + + [(fn `addIdSpecialisations` rules, new_rhs)] + -- And now the original binding + where + rules = [r | OS { os_rule = r } <- specs] + +{- +************************************************************************ +* * + The specialiser itself +* * +************************************************************************ +-} + +data RhsInfo + = RI { ri_fn :: OutId -- The binder + , ri_new_rhs :: OutExpr -- The specialised RHS (in current envt) + , ri_rhs_usg :: ScUsage -- Usage info from specialising RHS + + , ri_lam_bndrs :: [InVar] -- The *original* RHS (\xs.body) + , ri_lam_body :: InExpr -- Note [Specialise original body] + , ri_arg_occs :: [ArgOcc] -- Info on how the xs occur in body + } + +data SpecInfo -- Info about specialisations for a particular Id + = SI { si_specs :: [OneSpec] -- The specialisations we have generated + + , si_n_specs :: Int -- Length of si_specs; used for numbering them + + , si_mb_unspec :: Maybe ScUsage -- Just cs => we have not yet used calls in the + } -- from calls in the *original* RHS as + -- seeds for new specialisations; + -- if you decide to do so, here is the + -- RHS usage (which has not yet been + -- unleashed) + -- Nothing => we have + -- See Note [Local recursive groups] + -- See Note [spec_usg includes rhs_usg] + + -- One specialisation: Rule plus definition +data OneSpec = + OS { os_pat :: CallPat -- Call pattern that generated this specialisation + , os_rule :: CoreRule -- Rule connecting original id with the specialisation + , os_id :: OutId -- Spec id + , os_rhs :: OutExpr } -- Spec rhs + +noSpecInfo :: SpecInfo +noSpecInfo = SI { si_specs = [], si_n_specs = 0, si_mb_unspec = Nothing } + +---------------------- +specNonRec :: ScEnv + -> ScUsage -- Body usage + -> RhsInfo -- Structure info usage info for un-specialised RHS + -> UniqSM (ScUsage, SpecInfo) -- Usage from RHSs (specialised and not) + -- plus details of specialisations + +specNonRec env body_usg rhs_info + = specialise env (scu_calls body_usg) rhs_info + (noSpecInfo { si_mb_unspec = Just (ri_rhs_usg rhs_info) }) + +---------------------- +specRec :: TopLevelFlag -> ScEnv + -> ScUsage -- Body usage + -> [RhsInfo] -- Structure info and usage info for un-specialised RHSs + -> UniqSM (ScUsage, [SpecInfo]) -- Usage from all RHSs (specialised and not) + -- plus details of specialisations + +specRec top_lvl env body_usg rhs_infos + = go 1 seed_calls nullUsage init_spec_infos + where + (seed_calls, init_spec_infos) -- Note [Seeding top-level recursive groups] + | isTopLevel top_lvl + , any (isExportedId . ri_fn) rhs_infos -- Seed from body and RHSs + = (all_calls, [noSpecInfo | _ <- rhs_infos]) + | otherwise -- Seed from body only + = (calls_in_body, [noSpecInfo { si_mb_unspec = Just (ri_rhs_usg ri) } + | ri <- rhs_infos]) + + calls_in_body = scu_calls body_usg + calls_in_rhss = foldr (combineCalls . scu_calls . ri_rhs_usg) emptyVarEnv rhs_infos + all_calls = calls_in_rhss `combineCalls` calls_in_body + + -- Loop, specialising, until you get no new specialisations + go :: Int -- Which iteration of the "until no new specialisations" + -- loop we are on; first iteration is 1 + -> CallEnv -- Seed calls + -- Two accumulating parameters: + -> ScUsage -- Usage from earlier specialisations + -> [SpecInfo] -- Details of specialisations so far + -> UniqSM (ScUsage, [SpecInfo]) + go n_iter seed_calls usg_so_far spec_infos + | isEmptyVarEnv seed_calls + = -- pprTrace "specRec1" (vcat [ ppr (map ri_fn rhs_infos) + -- , ppr seed_calls + -- , ppr body_usg ]) $ + return (usg_so_far, spec_infos) + + -- Limit recursive specialisation + -- See Note [Limit recursive specialisation] + | n_iter > sc_recursive env -- Too many iterations of the 'go' loop + , sc_force env || isNothing (sc_count env) + -- If both of these are false, the sc_count + -- threshold will prevent non-termination + , any ((> the_limit) . si_n_specs) spec_infos + = -- pprTrace "specRec2" (ppr (map (map os_pat . si_specs) spec_infos)) $ + return (usg_so_far, spec_infos) + + | otherwise + = -- pprTrace "specRec3" (vcat [ text "bndrs" <+> ppr (map ri_fn rhs_infos) + -- , text "iteration" <+> int n_iter + -- , text "spec_infos" <+> ppr (map (map os_pat . si_specs) spec_infos) + -- ]) $ + do { specs_w_usg <- zipWithM (specialise env seed_calls) rhs_infos spec_infos + ; let (extra_usg_s, new_spec_infos) = unzip specs_w_usg + extra_usg = combineUsages extra_usg_s + all_usg = usg_so_far `combineUsage` extra_usg + ; go (n_iter + 1) (scu_calls extra_usg) all_usg new_spec_infos } + + -- See Note [Limit recursive specialisation] + the_limit = case sc_count env of + Nothing -> 10 -- Ugh! + Just max -> max + + +---------------------- +specialise + :: ScEnv + -> CallEnv -- Info on newly-discovered calls to this function + -> RhsInfo + -> SpecInfo -- Original RHS plus patterns dealt with + -> UniqSM (ScUsage, SpecInfo) -- New specialised versions and their usage + +-- See Note [spec_usg includes rhs_usg] + +-- Note: this only generates *specialised* bindings +-- The original binding is added by ruleInfoBinds +-- +-- Note: the rhs here is the optimised version of the original rhs +-- So when we make a specialised copy of the RHS, we're starting +-- from an RHS whose nested functions have been optimised already. + +specialise env bind_calls (RI { ri_fn = fn, ri_lam_bndrs = arg_bndrs + , ri_lam_body = body, ri_arg_occs = arg_occs }) + spec_info@(SI { si_specs = specs, si_n_specs = spec_count + , si_mb_unspec = mb_unspec }) + | isBottomingId fn -- Note [Do not specialise diverging functions] + -- and do not generate specialisation seeds from its RHS + = -- pprTrace "specialise bot" (ppr fn) $ + return (nullUsage, spec_info) + + | isNeverActive (idInlineActivation fn) -- See Note [Transfer activation] + || null arg_bndrs -- Only specialise functions + = -- pprTrace "specialise inactive" (ppr fn) $ + case mb_unspec of -- Behave as if there was a single, boring call + Just rhs_usg -> return (rhs_usg, spec_info { si_mb_unspec = Nothing }) + -- See Note [spec_usg includes rhs_usg] + Nothing -> return (nullUsage, spec_info) + + | Just all_calls <- lookupVarEnv bind_calls fn + = -- pprTrace "specialise entry {" (ppr fn <+> ppr all_calls) $ + do { (boring_call, new_pats) <- callsToNewPats env fn spec_info arg_occs all_calls + + ; let n_pats = length new_pats +-- ; if (not (null new_pats) || isJust mb_unspec) then +-- pprTrace "specialise" (vcat [ ppr fn <+> text "with" <+> int n_pats <+> text "good patterns" +-- , text "mb_unspec" <+> ppr (isJust mb_unspec) +-- , text "arg_occs" <+> ppr arg_occs +-- , text "good pats" <+> ppr new_pats]) $ +-- return () +-- else return () + + ; let spec_env = decreaseSpecCount env n_pats + ; (spec_usgs, new_specs) <- mapAndUnzipM (spec_one spec_env fn arg_bndrs body) + (new_pats `zip` [spec_count..]) + -- See Note [Specialise original body] + + ; let spec_usg = combineUsages spec_usgs + + -- If there were any boring calls among the seeds (= all_calls), then those + -- calls will call the un-specialised function. So we should use the seeds + -- from the _unspecialised_ function's RHS, which are in mb_unspec, by returning + -- then in new_usg. + (new_usg, mb_unspec') + = case mb_unspec of + Just rhs_usg | boring_call -> (spec_usg `combineUsage` rhs_usg, Nothing) + _ -> (spec_usg, mb_unspec) + +-- ; pprTrace "specialise return }" +-- (vcat [ ppr fn +-- , text "boring_call:" <+> ppr boring_call +-- , text "new calls:" <+> ppr (scu_calls new_usg)]) $ +-- return () + + ; return (new_usg, SI { si_specs = new_specs ++ specs + , si_n_specs = spec_count + n_pats + , si_mb_unspec = mb_unspec' }) } + + | otherwise -- No new seeds, so return nullUsage + = return (nullUsage, spec_info) + + + + +--------------------- +spec_one :: ScEnv + -> OutId -- Function + -> [InVar] -- Lambda-binders of RHS; should match patterns + -> InExpr -- Body of the original function + -> (CallPat, Int) + -> UniqSM (ScUsage, OneSpec) -- Rule and binding + +-- spec_one creates a specialised copy of the function, together +-- with a rule for using it. I'm very proud of how short this +-- function is, considering what it does :-). + +{- + Example + + In-scope: a, x::a + f = /\b \y::[(a,b)] -> ....f (b,c) ((:) (a,(b,c)) (x,v) (h w))... + [c::*, v::(b,c) are presumably bound by the (...) part] + ==> + f_spec = /\ b c \ v::(b,c) hw::[(a,(b,c))] -> + (...entire body of f...) [b -> (b,c), + y -> ((:) (a,(b,c)) (x,v) hw)] + + RULE: forall b::* c::*, -- Note, *not* forall a, x + v::(b,c), + hw::[(a,(b,c))] . + + f (b,c) ((:) (a,(b,c)) (x,v) hw) = f_spec b c v hw +-} + +spec_one env fn arg_bndrs body (call_pat@(qvars, pats), rule_number) + = do { spec_uniq <- getUniqueM + ; let spec_env = extendScSubstList (extendScInScope env qvars) + (arg_bndrs `zip` pats) + fn_name = idName fn + fn_loc = nameSrcSpan fn_name + fn_occ = nameOccName fn_name + spec_occ = mkSpecOcc fn_occ + -- We use fn_occ rather than fn in the rule_name string + -- as we don't want the uniq to end up in the rule, and + -- hence in the ABI, as that can cause spurious ABI + -- changes (#4012). + rule_name = mkFastString ("SC:" ++ occNameString fn_occ ++ show rule_number) + spec_name = mkInternalName spec_uniq spec_occ fn_loc +-- ; pprTrace "{spec_one" (ppr (sc_count env) <+> ppr fn +-- <+> ppr pats <+> text "-->" <+> ppr spec_name) $ +-- return () + + -- Specialise the body + ; (spec_usg, spec_body) <- scExpr spec_env body + +-- ; pprTrace "done spec_one}" (ppr fn) $ +-- return () + + -- And build the results + ; let (spec_lam_args, spec_call_args) = mkWorkerArgs (sc_dflags env) + qvars body_ty + -- Usual w/w hack to avoid generating + -- a spec_rhs of unlifted type and no args + + spec_lam_args_str = handOutStrictnessInformation (fst (splitStrictSig spec_str)) spec_lam_args + -- Annotate the variables with the strictness information from + -- the function (see Note [Strictness information in worker binders]) + + spec_join_arity | isJoinId fn = Just (length spec_lam_args) + | otherwise = Nothing + spec_id = mkLocalId spec_name + (mkLamTypes spec_lam_args body_ty) + -- See Note [Transfer strictness] + `setIdStrictness` spec_str + `setIdCprInfo` topCprSig + `setIdArity` count isId spec_lam_args + `asJoinId_maybe` spec_join_arity + spec_str = calcSpecStrictness fn spec_lam_args pats + + + -- Conditionally use result of new worker-wrapper transform + spec_rhs = mkLams spec_lam_args_str spec_body + body_ty = exprType spec_body + rule_rhs = mkVarApps (Var spec_id) spec_call_args + inline_act = idInlineActivation fn + this_mod = sc_module spec_env + rule = mkRule this_mod True {- Auto -} True {- Local -} + rule_name inline_act fn_name qvars pats rule_rhs + -- See Note [Transfer activation] + ; return (spec_usg, OS { os_pat = call_pat, os_rule = rule + , os_id = spec_id + , os_rhs = spec_rhs }) } + + +-- See Note [Strictness information in worker binders] +handOutStrictnessInformation :: [Demand] -> [Var] -> [Var] +handOutStrictnessInformation = go + where + go _ [] = [] + go [] vs = vs + go (d:dmds) (v:vs) | isId v = setIdDemandInfo v d : go dmds vs + go dmds (v:vs) = v : go dmds vs + +calcSpecStrictness :: Id -- The original function + -> [Var] -> [CoreExpr] -- Call pattern + -> StrictSig -- Strictness of specialised thing +-- See Note [Transfer strictness] +calcSpecStrictness fn qvars pats + = mkClosedStrictSig spec_dmds topDiv + where + spec_dmds = [ lookupVarEnv dmd_env qv `orElse` topDmd | qv <- qvars, isId qv ] + StrictSig (DmdType _ dmds _) = idStrictness fn + + dmd_env = go emptyVarEnv dmds pats + + go :: DmdEnv -> [Demand] -> [CoreExpr] -> DmdEnv + go env ds (Type {} : pats) = go env ds pats + go env ds (Coercion {} : pats) = go env ds pats + go env (d:ds) (pat : pats) = go (go_one env d pat) ds pats + go env _ _ = env + + go_one :: DmdEnv -> Demand -> CoreExpr -> DmdEnv + go_one env d (Var v) = extendVarEnv_C bothDmd env v d + go_one env d e + | Just ds <- splitProdDmd_maybe d -- NB: d does not have to be strict + , (Var _, args) <- collectArgs e = go env ds args + go_one env _ _ = env + +{- +Note [spec_usg includes rhs_usg] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In calls to 'specialise', the returned ScUsage must include the rhs_usg in +the passed-in SpecInfo, unless there are no calls at all to the function. + +The caller can, indeed must, assume this. He should not combine in rhs_usg +himself, or he'll get rhs_usg twice -- and that can lead to an exponential +blowup of duplicates in the CallEnv. This is what gave rise to the massive +performance loss in #8852. + +Note [Specialise original body] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The RhsInfo for a binding keeps the *original* body of the binding. We +must specialise that, *not* the result of applying specExpr to the RHS +(which is also kept in RhsInfo). Otherwise we end up specialising a +specialised RHS, and that can lead directly to exponential behaviour. + +Note [Transfer activation] +~~~~~~~~~~~~~~~~~~~~~~~~~~ + This note is for SpecConstr, but exactly the same thing + happens in the overloading specialiser; see + Note [Auto-specialisation and RULES] in GHC.Core.Op.Specialise. + +In which phase should the specialise-constructor rules be active? +Originally I made them always-active, but Manuel found that this +defeated some clever user-written rules. Then I made them active only +in Phase 0; after all, currently, the specConstr transformation is +only run after the simplifier has reached Phase 0, but that meant +that specialisations didn't fire inside wrappers; see test +simplCore/should_compile/spec-inline. + +So now I just use the inline-activation of the parent Id, as the +activation for the specialisation RULE, just like the main specialiser; + +This in turn means there is no point in specialising NOINLINE things, +so we test for that. + +Note [Transfer strictness] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +We must transfer strictness information from the original function to +the specialised one. Suppose, for example + + f has strictness SS + and a RULE f (a:as) b = f_spec a as b + +Now we want f_spec to have strictness LLS, otherwise we'll use call-by-need +when calling f_spec instead of call-by-value. And that can result in +unbounded worsening in space (cf the classic foldl vs foldl') + +See #3437 for a good example. + +The function calcSpecStrictness performs the calculation. + +Note [Strictness information in worker binders] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After having calculated the strictness annotation for the worker (see Note +[Transfer strictness] above), we also want to have this information attached to +the worker’s arguments, for the benefit of later passes. The function +handOutStrictnessInformation decomposes the strictness annotation calculated by +calcSpecStrictness and attaches them to the variables. + +************************************************************************ +* * +\subsection{Argument analysis} +* * +************************************************************************ + +This code deals with analysing call-site arguments to see whether +they are constructor applications. + +Note [Free type variables of the qvar types] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In a call (f @a x True), that we want to specialise, what variables should +we quantify over. Clearly over 'a' and 'x', but what about any type variables +free in x's type? In fact we don't need to worry about them because (f @a) +can only be a well-typed application if its type is compatible with x, so any +variables free in x's type must be free in (f @a), and hence either be gathered +via 'a' itself, or be in scope at f's defn. Hence we just take + (exprsFreeVars pats). + +BUT phantom type synonyms can mess this reasoning up, + eg x::T b with type T b = Int +So we apply expandTypeSynonyms to the bound Ids. +See # 5458. Yuk. + +Note [SpecConstr call patterns] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A "call patterns" that we collect is going to become the LHS of a RULE. +It's important that it doesn't have + e |> Refl +or + e |> g1 |> g2 +because both of these will be optimised by Simplify.simplRule. In the +former case such optimisation benign, because the rule will match more +terms; but in the latter we may lose a binding of 'g1' or 'g2', and +end up with a rule LHS that doesn't bind the template variables +(#10602). + +The simplifier eliminates such things, but SpecConstr itself constructs +new terms by substituting. So the 'mkCast' in the Cast case of scExpr +is very important! + +Note [Choosing patterns] +~~~~~~~~~~~~~~~~~~~~~~~~ +If we get lots of patterns we may not want to make a specialisation +for each of them (code bloat), so we choose as follows, implemented +by trim_pats. + +* The flag -fspec-constr-count-N sets the sc_count field + of the ScEnv to (Just n). This limits the total number + of specialisations for a given function to N. + +* -fno-spec-constr-count sets the sc_count field to Nothing, + which switches of the limit. + +* The ghastly ForceSpecConstr trick also switches of the limit + for a particular function + +* Otherwise we sort the patterns to choose the most general + ones first; more general => more widely applicable. + +Note [SpecConstr and casts] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider (#14270) a call like + + let f = e + in ... f (K @(a |> co)) ... + +where 'co' is a coercion variable not in scope at f's definition site. +If we aren't caereful we'll get + + let $sf a co = e (K @(a |> co)) + RULE "SC:f" forall a co. f (K @(a |> co)) = $sf a co + f = e + in ... + +But alas, when we match the call we won't bind 'co', because type-matching +(for good reasons) discards casts). + +I don't know how to solve this, so for now I'm just discarding any +call patterns that + * Mentions a coercion variable in a type argument + * That is not in scope at the binding of the function + +I think this is very rare. + +It is important (e.g. #14936) that this /only/ applies to +coercions mentioned in casts. We don't want to be discombobulated +by casts in terms! For example, consider + f ((e1,e2) |> sym co) +where, say, + f :: Foo -> blah + co :: Foo ~R (Int,Int) + +Here we definitely do want to specialise for that pair! We do not +match on the structure of the coercion; instead we just match on a +coercion variable, so the RULE looks like + + forall (x::Int, y::Int, co :: (Int,Int) ~R Foo) + f ((x,y) |> co) = $sf x y co + +Often the body of f looks like + f arg = ...(case arg |> co' of + (x,y) -> blah)... + +so that the specialised f will turn into + $sf x y co = let arg = (x,y) |> co + in ...(case arg>| co' of + (x,y) -> blah).... + +which will simplify to not use 'co' at all. But we can't guarantee +that co will end up unused, so we still pass it. Absence analysis +may remove it later. + +Note that this /also/ discards the call pattern if we have a cast in a +/term/, although in fact Rules.match does make a very flaky and +fragile attempt to match coercions. e.g. a call like + f (Maybe Age) (Nothing |> co) blah + where co :: Maybe Int ~ Maybe Age +will be discarded. It's extremely fragile to match on the form of a +coercion, so I think it's better just not to try. A more complicated +alternative would be to discard calls that mention coercion variables +only in kind-casts, but I'm doing the simple thing for now. +-} + +type CallPat = ([Var], [CoreExpr]) -- Quantified variables and arguments + -- See Note [SpecConstr call patterns] + +callsToNewPats :: ScEnv -> Id + -> SpecInfo + -> [ArgOcc] -> [Call] + -> UniqSM (Bool, [CallPat]) + -- Result has no duplicate patterns, + -- nor ones mentioned in done_pats + -- Bool indicates that there was at least one boring pattern +callsToNewPats env fn spec_info@(SI { si_specs = done_specs }) bndr_occs calls + = do { mb_pats <- mapM (callToPats env bndr_occs) calls + + ; let have_boring_call = any isNothing mb_pats + + good_pats :: [CallPat] + good_pats = catMaybes mb_pats + + -- Remove patterns we have already done + new_pats = filterOut is_done good_pats + is_done p = any (samePat p . os_pat) done_specs + + -- Remove duplicates + non_dups = nubBy samePat new_pats + + -- Remove ones that have too many worker variables + small_pats = filterOut too_big non_dups + too_big (vars,_) = not (isWorkerSmallEnough (sc_dflags env) vars) + -- We are about to construct w/w pair in 'spec_one'. + -- Omit specialisation leading to high arity workers. + -- See Note [Limit w/w arity] in GHC.Core.Op.WorkWrap.Lib + + -- Discard specialisations if there are too many of them + trimmed_pats = trim_pats env fn spec_info small_pats + +-- ; pprTrace "callsToPats" (vcat [ text "calls to" <+> ppr fn <> colon <+> ppr calls +-- , text "done_specs:" <+> ppr (map os_pat done_specs) +-- , text "good_pats:" <+> ppr good_pats ]) $ +-- return () + + ; return (have_boring_call, trimmed_pats) } + + +trim_pats :: ScEnv -> Id -> SpecInfo -> [CallPat] -> [CallPat] +-- See Note [Choosing patterns] +trim_pats env fn (SI { si_n_specs = done_spec_count }) pats + | sc_force env + || isNothing mb_scc + || n_remaining >= n_pats + = -- pprTrace "trim_pats: no-trim" (ppr (sc_force env) $$ ppr mb_scc $$ ppr n_remaining $$ ppr n_pats) + pats -- No need to trim + + | otherwise + = emit_trace $ -- Need to trim, so keep the best ones + take n_remaining sorted_pats + + where + n_pats = length pats + spec_count' = n_pats + done_spec_count + n_remaining = max_specs - done_spec_count + mb_scc = sc_count env + Just max_specs = mb_scc + + sorted_pats = map fst $ + sortBy (comparing snd) $ + [(pat, pat_cons pat) | pat <- pats] + -- Sort in order of increasing number of constructors + -- (i.e. decreasing generality) and pick the initial + -- segment of this list + + pat_cons :: CallPat -> Int + -- How many data constructors of literals are in + -- the pattern. More data-cons => less general + pat_cons (qs, ps) = foldr ((+) . n_cons) 0 ps + where + q_set = mkVarSet qs + n_cons (Var v) | v `elemVarSet` q_set = 0 + | otherwise = 1 + n_cons (Cast e _) = n_cons e + n_cons (App e1 e2) = n_cons e1 + n_cons e2 + n_cons (Lit {}) = 1 + n_cons _ = 0 + + emit_trace result + | debugIsOn || hasPprDebug (sc_dflags env) + -- Suppress this scary message for ordinary users! #5125 + = pprTrace "SpecConstr" msg result + | otherwise + = result + msg = vcat [ sep [ text "Function" <+> quotes (ppr fn) + , nest 2 (text "has" <+> + speakNOf spec_count' (text "call pattern") <> comma <+> + text "but the limit is" <+> int max_specs) ] + , text "Use -fspec-constr-count=n to set the bound" + , text "done_spec_count =" <+> int done_spec_count + , text "Keeping " <+> int n_remaining <> text ", out of" <+> int n_pats + , text "Discarding:" <+> ppr (drop n_remaining sorted_pats) ] + + +callToPats :: ScEnv -> [ArgOcc] -> Call -> UniqSM (Maybe CallPat) + -- The [Var] is the variables to quantify over in the rule + -- Type variables come first, since they may scope + -- over the following term variables + -- The [CoreExpr] are the argument patterns for the rule +callToPats env bndr_occs call@(Call _ args con_env) + | args `ltLength` bndr_occs -- Check saturated + = return Nothing + | otherwise + = do { let in_scope = substInScope (sc_subst env) + ; (interesting, pats) <- argsToPats env in_scope con_env args bndr_occs + ; let pat_fvs = exprsFreeVarsList pats + -- To get determinism we need the list of free variables in + -- deterministic order. Otherwise we end up creating + -- lambdas with different argument orders. See + -- determinism/simplCore/should_compile/spec-inline-determ.hs + -- for an example. For explanation of determinism + -- considerations See Note [Unique Determinism] in Unique. + + in_scope_vars = getInScopeVars in_scope + is_in_scope v = v `elemVarSet` in_scope_vars + qvars = filterOut is_in_scope pat_fvs + -- Quantify over variables that are not in scope + -- at the call site + -- See Note [Free type variables of the qvar types] + -- See Note [Shadowing] at the top + + (ktvs, ids) = partition isTyVar qvars + qvars' = scopedSort ktvs ++ map sanitise ids + -- Order into kind variables, type variables, term variables + -- The kind of a type variable may mention a kind variable + -- and the type of a term variable may mention a type variable + + sanitise id = id `setIdType` expandTypeSynonyms (idType id) + -- See Note [Free type variables of the qvar types] + + -- Bad coercion variables: see Note [SpecConstr and casts] + bad_covars :: CoVarSet + bad_covars = mapUnionVarSet get_bad_covars pats + get_bad_covars :: CoreArg -> CoVarSet + get_bad_covars (Type ty) + = filterVarSet (\v -> isId v && not (is_in_scope v)) $ + tyCoVarsOfType ty + get_bad_covars _ + = emptyVarSet + + ; -- pprTrace "callToPats" (ppr args $$ ppr bndr_occs) $ + WARN( not (isEmptyVarSet bad_covars) + , text "SpecConstr: bad covars:" <+> ppr bad_covars + $$ ppr call ) + if interesting && isEmptyVarSet bad_covars + then return (Just (qvars', pats)) + else return Nothing } + + -- argToPat takes an actual argument, and returns an abstracted + -- version, consisting of just the "constructor skeleton" of the + -- argument, with non-constructor sub-expression replaced by new + -- placeholder variables. For example: + -- C a (D (f x) (g y)) ==> C p1 (D p2 p3) + +argToPat :: ScEnv + -> InScopeSet -- What's in scope at the fn defn site + -> ValueEnv -- ValueEnv at the call site + -> CoreArg -- A call arg (or component thereof) + -> ArgOcc + -> UniqSM (Bool, CoreArg) + +-- Returns (interesting, pat), +-- where pat is the pattern derived from the argument +-- interesting=True if the pattern is non-trivial (not a variable or type) +-- E.g. x:xs --> (True, x:xs) +-- f xs --> (False, w) where w is a fresh wildcard +-- (f xs, 'c') --> (True, (w, 'c')) where w is a fresh wildcard +-- \x. x+y --> (True, \x. x+y) +-- lvl7 --> (True, lvl7) if lvl7 is bound +-- somewhere further out + +argToPat _env _in_scope _val_env arg@(Type {}) _arg_occ + = return (False, arg) + +argToPat env in_scope val_env (Tick _ arg) arg_occ + = argToPat env in_scope val_env arg arg_occ + -- Note [Notes in call patterns] + -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + -- Ignore Notes. In particular, we want to ignore any InlineMe notes + -- Perhaps we should not ignore profiling notes, but I'm going to + -- ride roughshod over them all for now. + --- See Note [Notes in RULE matching] in GHC.Core.Rules + +argToPat env in_scope val_env (Let _ arg) arg_occ + = argToPat env in_scope val_env arg arg_occ + -- See Note [Matching lets] in Rule.hs + -- Look through let expressions + -- e.g. f (let v = rhs in (v,w)) + -- Here we can specialise for f (v,w) + -- because the rule-matcher will look through the let. + +{- Disabled; see Note [Matching cases] in Rule.hs +argToPat env in_scope val_env (Case scrut _ _ [(_, _, rhs)]) arg_occ + | exprOkForSpeculation scrut -- See Note [Matching cases] in Rule.hhs + = argToPat env in_scope val_env rhs arg_occ +-} + +argToPat env in_scope val_env (Cast arg co) arg_occ + | not (ignoreType env ty2) + = do { (interesting, arg') <- argToPat env in_scope val_env arg arg_occ + ; if not interesting then + wildCardPat ty2 + else do + { -- Make a wild-card pattern for the coercion + uniq <- getUniqueM + ; let co_name = mkSysTvName uniq (fsLit "sg") + co_var = mkCoVar co_name (mkCoercionType Representational ty1 ty2) + ; return (interesting, Cast arg' (mkCoVarCo co_var)) } } + where + Pair ty1 ty2 = coercionKind co + + + +{- Disabling lambda specialisation for now + It's fragile, and the spec_loop can be infinite +argToPat in_scope val_env arg arg_occ + | is_value_lam arg + = return (True, arg) + where + is_value_lam (Lam v e) -- Spot a value lambda, even if + | isId v = True -- it is inside a type lambda + | otherwise = is_value_lam e + is_value_lam other = False +-} + + -- Check for a constructor application + -- NB: this *precedes* the Var case, so that we catch nullary constrs +argToPat env in_scope val_env arg arg_occ + | Just (ConVal (DataAlt dc) args) <- isValue val_env arg + , not (ignoreDataCon env dc) -- See Note [NoSpecConstr] + , Just arg_occs <- mb_scrut dc + = do { let (ty_args, rest_args) = splitAtList (dataConUnivTyVars dc) args + ; (_, args') <- argsToPats env in_scope val_env rest_args arg_occs + ; return (True, + mkConApp dc (ty_args ++ args')) } + where + mb_scrut dc = case arg_occ of + ScrutOcc bs | Just occs <- lookupUFM bs dc + -> Just (occs) -- See Note [Reboxing] + _other | sc_force env || sc_keen env + -> Just (repeat UnkOcc) + | otherwise + -> Nothing + + -- Check if the argument is a variable that + -- (a) is used in an interesting way in the function body + -- (b) we know what its value is + -- In that case it counts as "interesting" +argToPat env in_scope val_env (Var v) arg_occ + | sc_force env || case arg_occ of { UnkOcc -> False; _other -> True }, -- (a) + is_value, -- (b) + -- Ignoring sc_keen here to avoid gratuitously incurring Note [Reboxing] + -- So sc_keen focused just on f (I# x), where we have freshly-allocated + -- box that we can eliminate in the caller + not (ignoreType env (varType v)) + = return (True, Var v) + where + is_value + | isLocalId v = v `elemInScopeSet` in_scope + && isJust (lookupVarEnv val_env v) + -- Local variables have values in val_env + | otherwise = isValueUnfolding (idUnfolding v) + -- Imports have unfoldings + +-- I'm really not sure what this comment means +-- And by not wild-carding we tend to get forall'd +-- variables that are in scope, which in turn can +-- expose the weakness in let-matching +-- See Note [Matching lets] in GHC.Core.Rules + + -- Check for a variable bound inside the function. + -- Don't make a wild-card, because we may usefully share + -- e.g. f a = let x = ... in f (x,x) + -- NB: this case follows the lambda and con-app cases!! +-- argToPat _in_scope _val_env (Var v) _arg_occ +-- = return (False, Var v) + -- SLPJ : disabling this to avoid proliferation of versions + -- also works badly when thinking about seeding the loop + -- from the body of the let + -- f x y = letrec g z = ... in g (x,y) + -- We don't want to specialise for that *particular* x,y + + -- The default case: make a wild-card + -- We use this for coercions too +argToPat _env _in_scope _val_env arg _arg_occ + = wildCardPat (exprType arg) + +wildCardPat :: Type -> UniqSM (Bool, CoreArg) +wildCardPat ty + = do { uniq <- getUniqueM + ; let id = mkSysLocalOrCoVar (fsLit "sc") uniq ty + ; return (False, varToCoreExpr id) } + +argsToPats :: ScEnv -> InScopeSet -> ValueEnv + -> [CoreArg] -> [ArgOcc] -- Should be same length + -> UniqSM (Bool, [CoreArg]) +argsToPats env in_scope val_env args occs + = do { stuff <- zipWithM (argToPat env in_scope val_env) args occs + ; let (interesting_s, args') = unzip stuff + ; return (or interesting_s, args') } + +isValue :: ValueEnv -> CoreExpr -> Maybe Value +isValue _env (Lit lit) + | litIsLifted lit = Nothing + | otherwise = Just (ConVal (LitAlt lit) []) + +isValue env (Var v) + | Just cval <- lookupVarEnv env v + = Just cval -- You might think we could look in the idUnfolding here + -- but that doesn't take account of which branch of a + -- case we are in, which is the whole point + + | not (isLocalId v) && isCheapUnfolding unf + = isValue env (unfoldingTemplate unf) + where + unf = idUnfolding v + -- However we do want to consult the unfolding + -- as well, for let-bound constructors! + +isValue env (Lam b e) + | isTyVar b = case isValue env e of + Just _ -> Just LambdaVal + Nothing -> Nothing + | otherwise = Just LambdaVal + +isValue env (Tick t e) + | not (tickishIsCode t) + = isValue env e + +isValue _env expr -- Maybe it's a constructor application + | (Var fun, args, _) <- collectArgsTicks (not . tickishIsCode) expr + = case isDataConWorkId_maybe fun of + + Just con | args `lengthAtLeast` dataConRepArity con + -- Check saturated; might be > because the + -- arity excludes type args + -> Just (ConVal (DataAlt con) args) + + _other | valArgCount args < idArity fun + -- Under-applied function + -> Just LambdaVal -- Partial application + + _other -> Nothing + +isValue _env _expr = Nothing + +valueIsWorkFree :: Value -> Bool +valueIsWorkFree LambdaVal = True +valueIsWorkFree (ConVal _ args) = all exprIsWorkFree args + +samePat :: CallPat -> CallPat -> Bool +samePat (vs1, as1) (vs2, as2) + = all2 same as1 as2 + where + same (Var v1) (Var v2) + | v1 `elem` vs1 = v2 `elem` vs2 + | v2 `elem` vs2 = False + | otherwise = v1 == v2 + + same (Lit l1) (Lit l2) = l1==l2 + same (App f1 a1) (App f2 a2) = same f1 f2 && same a1 a2 + + same (Type {}) (Type {}) = True -- Note [Ignore type differences] + same (Coercion {}) (Coercion {}) = True + same (Tick _ e1) e2 = same e1 e2 -- Ignore casts and notes + same (Cast e1 _) e2 = same e1 e2 + same e1 (Tick _ e2) = same e1 e2 + same e1 (Cast e2 _) = same e1 e2 + + same e1 e2 = WARN( bad e1 || bad e2, ppr e1 $$ ppr e2) + False -- Let, lambda, case should not occur + bad (Case {}) = True + bad (Let {}) = True + bad (Lam {}) = True + bad _other = False + +{- +Note [Ignore type differences] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We do not want to generate specialisations where the call patterns +differ only in their type arguments! Not only is it utterly useless, +but it also means that (with polymorphic recursion) we can generate +an infinite number of specialisations. Example is Data.Sequence.adjustTree, +I think. +-} diff --git a/compiler/GHC/Core/Op/Specialise.hs b/compiler/GHC/Core/Op/Specialise.hs new file mode 100644 index 0000000000..250a0f7313 --- /dev/null +++ b/compiler/GHC/Core/Op/Specialise.hs @@ -0,0 +1,2720 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1993-1998 + +\section[Specialise]{Stamping out overloading, and (optionally) polymorphism} +-} + +{-# LANGUAGE CPP #-} +{-# LANGUAGE DeriveFunctor #-} +{-# LANGUAGE ViewPatterns #-} + +{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} +module GHC.Core.Op.Specialise ( specProgram, specUnfolding ) where + +#include "HsVersions.h" + +import GhcPrelude + +import Id +import TcType hiding( substTy ) +import GHC.Core.Type hiding( substTy, extendTvSubstList ) +import GHC.Core.Predicate +import Module( Module, HasModule(..) ) +import GHC.Core.Coercion( Coercion ) +import GHC.Core.Op.Monad +import qualified GHC.Core.Subst +import GHC.Core.Unfold +import Var ( isLocalVar ) +import VarSet +import VarEnv +import GHC.Core +import GHC.Core.Rules +import GHC.Core.SimpleOpt ( collectBindersPushingCo ) +import GHC.Core.Utils ( exprIsTrivial, mkCast, exprType ) +import GHC.Core.FVs +import GHC.Core.Arity ( etaExpandToJoinPointRule ) +import UniqSupply +import Name +import MkId ( voidArgId, voidPrimId ) +import Maybes ( mapMaybe, isJust ) +import MonadUtils ( foldlM ) +import BasicTypes +import GHC.Driver.Types +import Bag +import GHC.Driver.Session +import Util +import Outputable +import FastString +import State +import UniqDFM +import GHC.Core.TyCo.Rep (TyCoBinder (..)) + +import Control.Monad +import qualified Control.Monad.Fail as MonadFail + +{- +************************************************************************ +* * +\subsection[notes-Specialise]{Implementation notes [SLPJ, Aug 18 1993]} +* * +************************************************************************ + +These notes describe how we implement specialisation to eliminate +overloading. + +The specialisation pass works on Core +syntax, complete with all the explicit dictionary application, +abstraction and construction as added by the type checker. The +existing type checker remains largely as it is. + +One important thought: the {\em types} passed to an overloaded +function, and the {\em dictionaries} passed are mutually redundant. +If the same function is applied to the same type(s) then it is sure to +be applied to the same dictionary(s)---or rather to the same {\em +values}. (The arguments might look different but they will evaluate +to the same value.) + +Second important thought: we know that we can make progress by +treating dictionary arguments as static and worth specialising on. So +we can do without binding-time analysis, and instead specialise on +dictionary arguments and no others. + +The basic idea +~~~~~~~~~~~~~~ +Suppose we have + + let f = <f_rhs> + in <body> + +and suppose f is overloaded. + +STEP 1: CALL-INSTANCE COLLECTION + +We traverse <body>, accumulating all applications of f to types and +dictionaries. + +(Might there be partial applications, to just some of its types and +dictionaries? In principle yes, but in practice the type checker only +builds applications of f to all its types and dictionaries, so partial +applications could only arise as a result of transformation, and even +then I think it's unlikely. In any case, we simply don't accumulate such +partial applications.) + + +STEP 2: EQUIVALENCES + +So now we have a collection of calls to f: + f t1 t2 d1 d2 + f t3 t4 d3 d4 + ... +Notice that f may take several type arguments. To avoid ambiguity, we +say that f is called at type t1/t2 and t3/t4. + +We take equivalence classes using equality of the *types* (ignoring +the dictionary args, which as mentioned previously are redundant). + +STEP 3: SPECIALISATION + +For each equivalence class, choose a representative (f t1 t2 d1 d2), +and create a local instance of f, defined thus: + + f@t1/t2 = <f_rhs> t1 t2 d1 d2 + +f_rhs presumably has some big lambdas and dictionary lambdas, so lots +of simplification will now result. However we don't actually *do* that +simplification. Rather, we leave it for the simplifier to do. If we +*did* do it, though, we'd get more call instances from the specialised +RHS. We can work out what they are by instantiating the call-instance +set from f's RHS with the types t1, t2. + +Add this new id to f's IdInfo, to record that f has a specialised version. + +Before doing any of this, check that f's IdInfo doesn't already +tell us about an existing instance of f at the required type/s. +(This might happen if specialisation was applied more than once, or +it might arise from user SPECIALIZE pragmas.) + +Recursion +~~~~~~~~~ +Wait a minute! What if f is recursive? Then we can't just plug in +its right-hand side, can we? + +But it's ok. The type checker *always* creates non-recursive definitions +for overloaded recursive functions. For example: + + f x = f (x+x) -- Yes I know its silly + +becomes + + f a (d::Num a) = let p = +.sel a d + in + letrec fl (y::a) = fl (p y y) + in + fl + +We still have recursion for non-overloaded functions which we +specialise, but the recursive call should get specialised to the +same recursive version. + + +Polymorphism 1 +~~~~~~~~~~~~~~ + +All this is crystal clear when the function is applied to *constant +types*; that is, types which have no type variables inside. But what if +it is applied to non-constant types? Suppose we find a call of f at type +t1/t2. There are two possibilities: + +(a) The free type variables of t1, t2 are in scope at the definition point +of f. In this case there's no problem, we proceed just as before. A common +example is as follows. Here's the Haskell: + + g y = let f x = x+x + in f y + f y + +After typechecking we have + + g a (d::Num a) (y::a) = let f b (d'::Num b) (x::b) = +.sel b d' x x + in +.sel a d (f a d y) (f a d y) + +Notice that the call to f is at type type "a"; a non-constant type. +Both calls to f are at the same type, so we can specialise to give: + + g a (d::Num a) (y::a) = let f@a (x::a) = +.sel a d x x + in +.sel a d (f@a y) (f@a y) + + +(b) The other case is when the type variables in the instance types +are *not* in scope at the definition point of f. The example we are +working with above is a good case. There are two instances of (+.sel a d), +but "a" is not in scope at the definition of +.sel. Can we do anything? +Yes, we can "common them up", a sort of limited common sub-expression deal. +This would give: + + g a (d::Num a) (y::a) = let +.sel@a = +.sel a d + f@a (x::a) = +.sel@a x x + in +.sel@a (f@a y) (f@a y) + +This can save work, and can't be spotted by the type checker, because +the two instances of +.sel weren't originally at the same type. + +Further notes on (b) + +* There are quite a few variations here. For example, the defn of + +.sel could be floated outside the \y, to attempt to gain laziness. + It certainly mustn't be floated outside the \d because the d has to + be in scope too. + +* We don't want to inline f_rhs in this case, because +that will duplicate code. Just commoning up the call is the point. + +* Nothing gets added to +.sel's IdInfo. + +* Don't bother unless the equivalence class has more than one item! + +Not clear whether this is all worth it. It is of course OK to +simply discard call-instances when passing a big lambda. + +Polymorphism 2 -- Overloading +~~~~~~~~~~~~~~ +Consider a function whose most general type is + + f :: forall a b. Ord a => [a] -> b -> b + +There is really no point in making a version of g at Int/Int and another +at Int/Bool, because it's only instantiating the type variable "a" which +buys us any efficiency. Since g is completely polymorphic in b there +ain't much point in making separate versions of g for the different +b types. + +That suggests that we should identify which of g's type variables +are constrained (like "a") and which are unconstrained (like "b"). +Then when taking equivalence classes in STEP 2, we ignore the type args +corresponding to unconstrained type variable. In STEP 3 we make +polymorphic versions. Thus: + + f@t1/ = /\b -> <f_rhs> t1 b d1 d2 + +We do this. + + +Dictionary floating +~~~~~~~~~~~~~~~~~~~ +Consider this + + f a (d::Num a) = let g = ... + in + ...(let d1::Ord a = Num.Ord.sel a d in g a d1)... + +Here, g is only called at one type, but the dictionary isn't in scope at the +definition point for g. Usually the type checker would build a +definition for d1 which enclosed g, but the transformation system +might have moved d1's defn inward. Solution: float dictionary bindings +outwards along with call instances. + +Consider + + f x = let g p q = p==q + h r s = (r+s, g r s) + in + h x x + + +Before specialisation, leaving out type abstractions we have + + f df x = let g :: Eq a => a -> a -> Bool + g dg p q = == dg p q + h :: Num a => a -> a -> (a, Bool) + h dh r s = let deq = eqFromNum dh + in (+ dh r s, g deq r s) + in + h df x x + +After specialising h we get a specialised version of h, like this: + + h' r s = let deq = eqFromNum df + in (+ df r s, g deq r s) + +But we can't naively make an instance for g from this, because deq is not in scope +at the defn of g. Instead, we have to float out the (new) defn of deq +to widen its scope. Notice that this floating can't be done in advance -- it only +shows up when specialisation is done. + +User SPECIALIZE pragmas +~~~~~~~~~~~~~~~~~~~~~~~ +Specialisation pragmas can be digested by the type checker, and implemented +by adding extra definitions along with that of f, in the same way as before + + f@t1/t2 = <f_rhs> t1 t2 d1 d2 + +Indeed the pragmas *have* to be dealt with by the type checker, because +only it knows how to build the dictionaries d1 and d2! For example + + g :: Ord a => [a] -> [a] + {-# SPECIALIZE f :: [Tree Int] -> [Tree Int] #-} + +Here, the specialised version of g is an application of g's rhs to the +Ord dictionary for (Tree Int), which only the type checker can conjure +up. There might not even *be* one, if (Tree Int) is not an instance of +Ord! (All the other specialision has suitable dictionaries to hand +from actual calls.) + +Problem. The type checker doesn't have to hand a convenient <f_rhs>, because +it is buried in a complex (as-yet-un-desugared) binding group. +Maybe we should say + + f@t1/t2 = f* t1 t2 d1 d2 + +where f* is the Id f with an IdInfo which says "inline me regardless!". +Indeed all the specialisation could be done in this way. +That in turn means that the simplifier has to be prepared to inline absolutely +any in-scope let-bound thing. + + +Again, the pragma should permit polymorphism in unconstrained variables: + + h :: Ord a => [a] -> b -> b + {-# SPECIALIZE h :: [Int] -> b -> b #-} + +We *insist* that all overloaded type variables are specialised to ground types, +(and hence there can be no context inside a SPECIALIZE pragma). +We *permit* unconstrained type variables to be specialised to + - a ground type + - or left as a polymorphic type variable +but nothing in between. So + + {-# SPECIALIZE h :: [Int] -> [c] -> [c] #-} + +is *illegal*. (It can be handled, but it adds complication, and gains the +programmer nothing.) + + +SPECIALISING INSTANCE DECLARATIONS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + + instance Foo a => Foo [a] where + ... + {-# SPECIALIZE instance Foo [Int] #-} + +The original instance decl creates a dictionary-function +definition: + + dfun.Foo.List :: forall a. Foo a -> Foo [a] + +The SPECIALIZE pragma just makes a specialised copy, just as for +ordinary function definitions: + + dfun.Foo.List@Int :: Foo [Int] + dfun.Foo.List@Int = dfun.Foo.List Int dFooInt + +The information about what instance of the dfun exist gets added to +the dfun's IdInfo in the same way as a user-defined function too. + + +Automatic instance decl specialisation? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Can instance decls be specialised automatically? It's tricky. +We could collect call-instance information for each dfun, but +then when we specialised their bodies we'd get new call-instances +for ordinary functions; and when we specialised their bodies, we might get +new call-instances of the dfuns, and so on. This all arises because of +the unrestricted mutual recursion between instance decls and value decls. + +Still, there's no actual problem; it just means that we may not do all +the specialisation we could theoretically do. + +Furthermore, instance decls are usually exported and used non-locally, +so we'll want to compile enough to get those specialisations done. + +Lastly, there's no such thing as a local instance decl, so we can +survive solely by spitting out *usage* information, and then reading that +back in as a pragma when next compiling the file. So for now, +we only specialise instance decls in response to pragmas. + + +SPITTING OUT USAGE INFORMATION +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To spit out usage information we need to traverse the code collecting +call-instance information for all imported (non-prelude?) functions +and data types. Then we equivalence-class it and spit it out. + +This is done at the top-level when all the call instances which escape +must be for imported functions and data types. + +*** Not currently done *** + + +Partial specialisation by pragmas +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +What about partial specialisation: + + k :: (Ord a, Eq b) => [a] -> b -> b -> [a] + {-# SPECIALIZE k :: Eq b => [Int] -> b -> b -> [a] #-} + +or even + + {-# SPECIALIZE k :: Eq b => [Int] -> [b] -> [b] -> [a] #-} + +Seems quite reasonable. Similar things could be done with instance decls: + + instance (Foo a, Foo b) => Foo (a,b) where + ... + {-# SPECIALIZE instance Foo a => Foo (a,Int) #-} + {-# SPECIALIZE instance Foo b => Foo (Int,b) #-} + +Ho hum. Things are complex enough without this. I pass. + + +Requirements for the simplifier +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The simplifier has to be able to take advantage of the specialisation. + +* When the simplifier finds an application of a polymorphic f, it looks in +f's IdInfo in case there is a suitable instance to call instead. This converts + + f t1 t2 d1 d2 ===> f_t1_t2 + +Note that the dictionaries get eaten up too! + +* Dictionary selection operations on constant dictionaries must be + short-circuited: + + +.sel Int d ===> +Int + +The obvious way to do this is in the same way as other specialised +calls: +.sel has inside it some IdInfo which tells that if it's applied +to the type Int then it should eat a dictionary and transform to +Int. + +In short, dictionary selectors need IdInfo inside them for constant +methods. + +* Exactly the same applies if a superclass dictionary is being + extracted: + + Eq.sel Int d ===> dEqInt + +* Something similar applies to dictionary construction too. Suppose +dfun.Eq.List is the function taking a dictionary for (Eq a) to +one for (Eq [a]). Then we want + + dfun.Eq.List Int d ===> dEq.List_Int + +Where does the Eq [Int] dictionary come from? It is built in +response to a SPECIALIZE pragma on the Eq [a] instance decl. + +In short, dfun Ids need IdInfo with a specialisation for each +constant instance of their instance declaration. + +All this uses a single mechanism: the SpecEnv inside an Id + + +What does the specialisation IdInfo look like? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The SpecEnv of an Id maps a list of types (the template) to an expression + + [Type] |-> Expr + +For example, if f has this RuleInfo: + + [Int, a] -> \d:Ord Int. f' a + +it means that we can replace the call + + f Int t ===> (\d. f' t) + +This chucks one dictionary away and proceeds with the +specialised version of f, namely f'. + + +What can't be done this way? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There is no way, post-typechecker, to get a dictionary for (say) +Eq a from a dictionary for Eq [a]. So if we find + + ==.sel [t] d + +we can't transform to + + eqList (==.sel t d') + +where + eqList :: (a->a->Bool) -> [a] -> [a] -> Bool + +Of course, we currently have no way to automatically derive +eqList, nor to connect it to the Eq [a] instance decl, but you +can imagine that it might somehow be possible. Taking advantage +of this is permanently ruled out. + +Still, this is no great hardship, because we intend to eliminate +overloading altogether anyway! + +A note about non-tyvar dictionaries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Some Ids have types like + + forall a,b,c. Eq a -> Ord [a] -> tau + +This seems curious at first, because we usually only have dictionary +args whose types are of the form (C a) where a is a type variable. +But this doesn't hold for the functions arising from instance decls, +which sometimes get arguments with types of form (C (T a)) for some +type constructor T. + +Should we specialise wrt this compound-type dictionary? We used to say +"no", saying: + "This is a heuristic judgement, as indeed is the fact that we + specialise wrt only dictionaries. We choose *not* to specialise + wrt compound dictionaries because at the moment the only place + they show up is in instance decls, where they are simply plugged + into a returned dictionary. So nothing is gained by specialising + wrt them." + +But it is simpler and more uniform to specialise wrt these dicts too; +and in future GHC is likely to support full fledged type signatures +like + f :: Eq [(a,b)] => ... + + +************************************************************************ +* * +\subsubsection{The new specialiser} +* * +************************************************************************ + +Our basic game plan is this. For let(rec) bound function + f :: (C a, D c) => (a,b,c,d) -> Bool + +* Find any specialised calls of f, (f ts ds), where + ts are the type arguments t1 .. t4, and + ds are the dictionary arguments d1 .. d2. + +* Add a new definition for f1 (say): + + f1 = /\ b d -> (..body of f..) t1 b t3 d d1 d2 + + Note that we abstract over the unconstrained type arguments. + +* Add the mapping + + [t1,b,t3,d] |-> \d1 d2 -> f1 b d + + to the specialisations of f. This will be used by the + simplifier to replace calls + (f t1 t2 t3 t4) da db + by + (\d1 d1 -> f1 t2 t4) da db + + All the stuff about how many dictionaries to discard, and what types + to apply the specialised function to, are handled by the fact that the + SpecEnv contains a template for the result of the specialisation. + +We don't build *partial* specialisations for f. For example: + + f :: Eq a => a -> a -> Bool + {-# SPECIALISE f :: (Eq b, Eq c) => (b,c) -> (b,c) -> Bool #-} + +Here, little is gained by making a specialised copy of f. +There's a distinct danger that the specialised version would +first build a dictionary for (Eq b, Eq c), and then select the (==) +method from it! Even if it didn't, not a great deal is saved. + +We do, however, generate polymorphic, but not overloaded, specialisations: + + f :: Eq a => [a] -> b -> b -> b + ... SPECIALISE f :: [Int] -> b -> b -> b ... + +Hence, the invariant is this: + + *** no specialised version is overloaded *** + + +************************************************************************ +* * +\subsubsection{The exported function} +* * +************************************************************************ +-} + +-- | Specialise calls to type-class overloaded functions occurring in a program. +specProgram :: ModGuts -> CoreM ModGuts +specProgram guts@(ModGuts { mg_module = this_mod + , mg_rules = local_rules + , mg_binds = binds }) + = do { dflags <- getDynFlags + + -- Specialise the bindings of this module + ; (binds', uds) <- runSpecM dflags this_mod (go binds) + + -- Specialise imported functions + ; hpt_rules <- getRuleBase + ; let rule_base = extendRuleBaseList hpt_rules local_rules + ; (new_rules, spec_binds) <- specImports dflags this_mod top_env emptyVarSet + [] rule_base uds + + ; let final_binds + | null spec_binds = binds' + | otherwise = Rec (flattenBinds spec_binds) : binds' + -- Note [Glom the bindings if imported functions are specialised] + + ; return (guts { mg_binds = final_binds + , mg_rules = new_rules ++ local_rules }) } + where + -- We need to start with a Subst that knows all the things + -- that are in scope, so that the substitution engine doesn't + -- accidentally re-use a unique that's already in use + -- Easiest thing is to do it all at once, as if all the top-level + -- decls were mutually recursive + top_env = SE { se_subst = GHC.Core.Subst.mkEmptySubst $ mkInScopeSet $ mkVarSet $ + bindersOfBinds binds + , se_interesting = emptyVarSet } + + go [] = return ([], emptyUDs) + go (bind:binds) = do (binds', uds) <- go binds + (bind', uds') <- specBind top_env bind uds + return (bind' ++ binds', uds') + +{- +Note [Wrap bindings returned by specImports] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +'specImports' returns a set of specialized bindings. However, these are lacking +necessary floated dictionary bindings, which are returned by +UsageDetails(ud_binds). These dictionaries need to be brought into scope with +'wrapDictBinds' before the bindings returned by 'specImports' can be used. See, +for instance, the 'specImports' call in 'specProgram'. + + +Note [Disabling cross-module specialisation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Since GHC 7.10 we have performed specialisation of INLINABLE bindings living +in modules outside of the current module. This can sometimes uncover user code +which explodes in size when aggressively optimized. The +-fno-cross-module-specialise option was introduced to allow users to being +bitten by such instances to revert to the pre-7.10 behavior. + +See #10491 +-} + +-- | An argument that we might want to specialise. +-- See Note [Specialising Calls] for the nitty gritty details. +data SpecArg + = + -- | Type arguments that should be specialised, due to appearing + -- free in the type of a 'SpecDict'. + SpecType Type + -- | Type arguments that should remain polymorphic. + | UnspecType + -- | Dictionaries that should be specialised. + | SpecDict DictExpr + -- | Value arguments that should not be specialised. + | UnspecArg + +instance Outputable SpecArg where + ppr (SpecType t) = text "SpecType" <+> ppr t + ppr UnspecType = text "UnspecType" + ppr (SpecDict d) = text "SpecDict" <+> ppr d + ppr UnspecArg = text "UnspecArg" + +getSpecDicts :: [SpecArg] -> [DictExpr] +getSpecDicts = mapMaybe go + where + go (SpecDict d) = Just d + go _ = Nothing + +getSpecTypes :: [SpecArg] -> [Type] +getSpecTypes = mapMaybe go + where + go (SpecType t) = Just t + go _ = Nothing + +isUnspecArg :: SpecArg -> Bool +isUnspecArg UnspecArg = True +isUnspecArg UnspecType = True +isUnspecArg _ = False + +isValueArg :: SpecArg -> Bool +isValueArg UnspecArg = True +isValueArg (SpecDict _) = True +isValueArg _ = False + +-- | Given binders from an original function 'f', and the 'SpecArg's +-- corresponding to its usage, compute everything necessary to build +-- a specialisation. +-- +-- We will use a running example. Consider the function +-- +-- foo :: forall a b. Eq a => Int -> blah +-- foo @a @b dEqA i = blah +-- +-- which is called with the 'CallInfo' +-- +-- [SpecType T1, UnspecType, SpecDict dEqT1, UnspecArg] +-- +-- We'd eventually like to build the RULE +-- +-- RULE "SPEC foo @T1 _" +-- forall @a @b (dEqA' :: Eq a). +-- foo @T1 @b dEqA' = $sfoo @b +-- +-- and the specialisation '$sfoo' +-- +-- $sfoo :: forall b. Int -> blah +-- $sfoo @b = \i -> SUBST[a->T1, dEqA->dEqA'] blah +-- +-- The cases for 'specHeader' below are presented in the same order as this +-- running example. The result of 'specHeader' for this example is as follows: +-- +-- ( -- Returned arguments +-- env + [a -> T1, deqA -> dEqA'] +-- , [] +-- +-- -- RULE helpers +-- , [b, dx', i] +-- , [T1, b, dx', i] +-- +-- -- Specialised function helpers +-- , [b, i] +-- , [dx] +-- , [T1, b, dx_spec, i] +-- ) +specHeader + :: SpecEnv + -> [CoreBndr] -- The binders from the original function 'f' + -> [SpecArg] -- From the CallInfo + -> SpecM ( -- Returned arguments + SpecEnv -- Substitution to apply to the body of 'f' + , [CoreBndr] -- All the remaining unspecialised args from the original function 'f' + + -- RULE helpers + , [CoreBndr] -- Binders for the RULE + , [CoreArg] -- Args for the LHS of the rule + + -- Specialised function helpers + , [CoreBndr] -- Binders for $sf + , [DictBind] -- Auxiliary dictionary bindings + , [CoreExpr] -- Specialised arguments for unfolding + ) + +-- We want to specialise on type 'T1', and so we must construct a substitution +-- 'a->T1', as well as a LHS argument for the resulting RULE and unfolding +-- details. +specHeader env (bndr : bndrs) (SpecType t : args) + = do { let env' = extendTvSubstList env [(bndr, t)] + ; (env'', unused_bndrs, rule_bs, rule_es, bs', dx, spec_args) + <- specHeader env' bndrs args + ; pure ( env'' + , unused_bndrs + , rule_bs + , Type t : rule_es + , bs' + , dx + , Type t : spec_args + ) + } + +-- Next we have a type that we don't want to specialise. We need to perform +-- a substitution on it (in case the type refers to 'a'). Additionally, we need +-- to produce a binder, LHS argument and RHS argument for the resulting rule, +-- /and/ a binder for the specialised body. +specHeader env (bndr : bndrs) (UnspecType : args) + = do { let (env', bndr') = substBndr env bndr + ; (env'', unused_bndrs, rule_bs, rule_es, bs', dx, spec_args) + <- specHeader env' bndrs args + ; pure ( env'' + , unused_bndrs + , bndr' : rule_bs + , varToCoreExpr bndr' : rule_es + , bndr' : bs' + , dx + , varToCoreExpr bndr' : spec_args + ) + } + +-- Next we want to specialise the 'Eq a' dict away. We need to construct +-- a wildcard binder to match the dictionary (See Note [Specialising Calls] for +-- the nitty-gritty), as a LHS rule and unfolding details. +specHeader env (bndr : bndrs) (SpecDict d : args) + = do { inst_dict_id <- newDictBndr env bndr + ; let (rhs_env2, dx_binds, spec_dict_args') + = bindAuxiliaryDicts env [bndr] [d] [inst_dict_id] + ; (env', unused_bndrs, rule_bs, rule_es, bs', dx, spec_args) + <- specHeader rhs_env2 bndrs args + ; pure ( env' + , unused_bndrs + -- See Note [Evidence foralls] + , exprFreeIdsList (varToCoreExpr inst_dict_id) ++ rule_bs + , varToCoreExpr inst_dict_id : rule_es + , bs' + , dx_binds ++ dx + , spec_dict_args' ++ spec_args + ) + } + +-- Finally, we have the unspecialised argument 'i'. We need to produce +-- a binder, LHS and RHS argument for the RULE, and a binder for the +-- specialised body. +-- +-- NB: Calls to 'specHeader' will trim off any trailing 'UnspecArg's, which is +-- why 'i' doesn't appear in our RULE above. But we have no guarantee that +-- there aren't 'UnspecArg's which come /before/ all of the dictionaries, so +-- this case must be here. +specHeader env (bndr : bndrs) (UnspecArg : args) + = do { let (env', bndr') = substBndr env bndr + ; (env'', unused_bndrs, rule_bs, rule_es, bs', dx, spec_args) + <- specHeader env' bndrs args + ; pure ( env'' + , unused_bndrs + , bndr' : rule_bs + , varToCoreExpr bndr' : rule_es + , bndr' : bs' + , dx + , varToCoreExpr bndr' : spec_args + ) + } + +-- Return all remaining binders from the original function. These have the +-- invariant that they should all correspond to unspecialised arguments, so +-- it's safe to stop processing at this point. +specHeader env bndrs [] = pure (env, bndrs, [], [], [], [], []) +specHeader env [] _ = pure (env, [], [], [], [], [], []) + + +-- | Specialise a set of calls to imported bindings +specImports :: DynFlags + -> Module + -> SpecEnv -- Passed in so that all top-level Ids are in scope + -> VarSet -- Don't specialise these ones + -- See Note [Avoiding recursive specialisation] + -> [Id] -- Stack of imported functions being specialised + -> RuleBase -- Rules from this module and the home package + -- (but not external packages, which can change) + -> UsageDetails -- Calls for imported things, and floating bindings + -> CoreM ( [CoreRule] -- New rules + , [CoreBind] ) -- Specialised bindings + -- See Note [Wrapping bindings returned by specImports] +specImports dflags this_mod top_env done callers rule_base + (MkUD { ud_binds = dict_binds, ud_calls = calls }) + -- See Note [Disabling cross-module specialisation] + | not $ gopt Opt_CrossModuleSpecialise dflags + = return ([], []) + + | otherwise + = do { let import_calls = dVarEnvElts calls + ; (rules, spec_binds) <- go rule_base import_calls + + -- Don't forget to wrap the specialized bindings with + -- bindings for the needed dictionaries. + -- See Note [Wrap bindings returned by specImports] + ; let spec_binds' = wrapDictBinds dict_binds spec_binds + + ; return (rules, spec_binds') } + where + go :: RuleBase -> [CallInfoSet] -> CoreM ([CoreRule], [CoreBind]) + go _ [] = return ([], []) + go rb (cis@(CIS fn _) : other_calls) + = do { let ok_calls = filterCalls cis dict_binds + -- Drop calls that (directly or indirectly) refer to fn + -- See Note [Avoiding loops] +-- ; debugTraceMsg (text "specImport" <+> vcat [ ppr fn +-- , text "calls" <+> ppr cis +-- , text "ud_binds =" <+> ppr dict_binds +-- , text "dump set =" <+> ppr dump_set +-- , text "filtered calls =" <+> ppr ok_calls ]) + ; (rules1, spec_binds1) <- specImport dflags this_mod top_env + done callers rb fn ok_calls + + ; (rules2, spec_binds2) <- go (extendRuleBaseList rb rules1) other_calls + ; return (rules1 ++ rules2, spec_binds1 ++ spec_binds2) } + +specImport :: DynFlags + -> Module + -> SpecEnv -- Passed in so that all top-level Ids are in scope + -> VarSet -- Don't specialise these + -- See Note [Avoiding recursive specialisation] + -> [Id] -- Stack of imported functions being specialised + -> RuleBase -- Rules from this module + -> Id -> [CallInfo] -- Imported function and calls for it + -> CoreM ( [CoreRule] -- New rules + , [CoreBind] ) -- Specialised bindings +specImport dflags this_mod top_env done callers rb fn calls_for_fn + | fn `elemVarSet` done + = return ([], []) -- No warning. This actually happens all the time + -- when specialising a recursive function, because + -- the RHS of the specialised function contains a recursive + -- call to the original function + + | null calls_for_fn -- We filtered out all the calls in deleteCallsMentioning + = return ([], []) + + | wantSpecImport dflags unfolding + , Just rhs <- maybeUnfoldingTemplate unfolding + = do { -- Get rules from the external package state + -- We keep doing this in case we "page-fault in" + -- more rules as we go along + ; hsc_env <- getHscEnv + ; eps <- liftIO $ hscEPS hsc_env + ; vis_orphs <- getVisibleOrphanMods + ; let full_rb = unionRuleBase rb (eps_rule_base eps) + rules_for_fn = getRules (RuleEnv full_rb vis_orphs) fn + + ; (rules1, spec_pairs, uds) + <- -- pprTrace "specImport1" (vcat [ppr fn, ppr calls_for_fn, ppr rhs]) $ + runSpecM dflags this_mod $ + specCalls (Just this_mod) top_env rules_for_fn calls_for_fn fn rhs + ; let spec_binds1 = [NonRec b r | (b,r) <- spec_pairs] + -- After the rules kick in we may get recursion, but + -- we rely on a global GlomBinds to sort that out later + -- See Note [Glom the bindings if imported functions are specialised] + + -- Now specialise any cascaded calls + ; (rules2, spec_binds2) <- -- pprTrace "specImport 2" (ppr fn $$ ppr rules1 $$ ppr spec_binds1) $ + specImports dflags this_mod top_env + (extendVarSet done fn) + (fn:callers) + (extendRuleBaseList rb rules1) + uds + + ; let final_binds = spec_binds2 ++ spec_binds1 + + ; return (rules2 ++ rules1, final_binds) } + + | otherwise = do { tryWarnMissingSpecs dflags callers fn calls_for_fn + ; return ([], [])} + + where + unfolding = realIdUnfolding fn -- We want to see the unfolding even for loop breakers + +-- | Returns whether or not to show a missed-spec warning. +-- If -Wall-missed-specializations is on, show the warning. +-- Otherwise, if -Wmissed-specializations is on, only show a warning +-- if there is at least one imported function being specialized, +-- and if all imported functions are marked with an inline pragma +-- Use the most specific warning as the reason. +tryWarnMissingSpecs :: DynFlags -> [Id] -> Id -> [CallInfo] -> CoreM () +-- See Note [Warning about missed specialisations] +tryWarnMissingSpecs dflags callers fn calls_for_fn + | wopt Opt_WarnMissedSpecs dflags + && not (null callers) + && allCallersInlined = doWarn $ Reason Opt_WarnMissedSpecs + | wopt Opt_WarnAllMissedSpecs dflags = doWarn $ Reason Opt_WarnAllMissedSpecs + | otherwise = return () + where + allCallersInlined = all (isAnyInlinePragma . idInlinePragma) callers + doWarn reason = + warnMsg reason + (vcat [ hang (text ("Could not specialise imported function") <+> quotes (ppr fn)) + 2 (vcat [ text "when specialising" <+> quotes (ppr caller) + | caller <- callers]) + , whenPprDebug (text "calls:" <+> vcat (map (pprCallInfo fn) calls_for_fn)) + , text "Probable fix: add INLINABLE pragma on" <+> quotes (ppr fn) ]) + +wantSpecImport :: DynFlags -> Unfolding -> Bool +-- See Note [Specialise imported INLINABLE things] +wantSpecImport dflags unf + = case unf of + NoUnfolding -> False + BootUnfolding -> False + OtherCon {} -> False + DFunUnfolding {} -> True + CoreUnfolding { uf_src = src, uf_guidance = _guidance } + | gopt Opt_SpecialiseAggressively dflags -> True + | isStableSource src -> True + -- Specialise even INLINE things; it hasn't inlined yet, + -- so perhaps it never will. Moreover it may have calls + -- inside it that we want to specialise + | otherwise -> False -- Stable, not INLINE, hence INLINABLE + +{- Note [Warning about missed specialisations] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose + * In module Lib, you carefully mark a function 'foo' INLINABLE + * Import Lib(foo) into another module M + * Call 'foo' at some specialised type in M +Then you jolly well expect it to be specialised in M. But what if +'foo' calls another function 'Lib.bar'. Then you'd like 'bar' to be +specialised too. But if 'bar' is not marked INLINABLE it may well +not be specialised. The warning Opt_WarnMissedSpecs warns about this. + +It's more noisy to warning about a missed specialisation opportunity +for /every/ overloaded imported function, but sometimes useful. That +is what Opt_WarnAllMissedSpecs does. + +ToDo: warn about missed opportunities for local functions. + +Note [Specialise imported INLINABLE things] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +What imported functions do we specialise? The basic set is + * DFuns and things with INLINABLE pragmas. +but with -fspecialise-aggressively we add + * Anything with an unfolding template + +#8874 has a good example of why we want to auto-specialise DFuns. + +We have the -fspecialise-aggressively flag (usually off), because we +risk lots of orphan modules from over-vigorous specialisation. +However it's not a big deal: anything non-recursive with an +unfolding-template will probably have been inlined already. + +Note [Glom the bindings if imported functions are specialised] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose we have an imported, *recursive*, INLINABLE function + f :: Eq a => a -> a + f = /\a \d x. ...(f a d)... +In the module being compiled we have + g x = f (x::Int) +Now we'll make a specialised function + f_spec :: Int -> Int + f_spec = \x -> ...(f Int dInt)... + {-# RULE f Int _ = f_spec #-} + g = \x. f Int dInt x +Note that f_spec doesn't look recursive +After rewriting with the RULE, we get + f_spec = \x -> ...(f_spec)... +BUT since f_spec was non-recursive before it'll *stay* non-recursive. +The occurrence analyser never turns a NonRec into a Rec. So we must +make sure that f_spec is recursive. Easiest thing is to make all +the specialisations for imported bindings recursive. + + +Note [Avoiding recursive specialisation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we specialise 'f' we may find new overloaded calls to 'g', 'h' in +'f's RHS. So we want to specialise g,h. But we don't want to +specialise f any more! It's possible that f's RHS might have a +recursive yet-more-specialised call, so we'd diverge in that case. +And if the call is to the same type, one specialisation is enough. +Avoiding this recursive specialisation loop is the reason for the +'done' VarSet passed to specImports and specImport. + +************************************************************************ +* * +\subsubsection{@specExpr@: the main function} +* * +************************************************************************ +-} + +data SpecEnv + = SE { se_subst :: GHC.Core.Subst.Subst + -- We carry a substitution down: + -- a) we must clone any binding that might float outwards, + -- to avoid name clashes + -- b) we carry a type substitution to use when analysing + -- the RHS of specialised bindings (no type-let!) + + + , se_interesting :: VarSet + -- Dict Ids that we know something about + -- and hence may be worth specialising against + -- See Note [Interesting dictionary arguments] + } + +specVar :: SpecEnv -> Id -> CoreExpr +specVar env v = GHC.Core.Subst.lookupIdSubst (text "specVar") (se_subst env) v + +specExpr :: SpecEnv -> CoreExpr -> SpecM (CoreExpr, UsageDetails) + +---------------- First the easy cases -------------------- +specExpr env (Type ty) = return (Type (substTy env ty), emptyUDs) +specExpr env (Coercion co) = return (Coercion (substCo env co), emptyUDs) +specExpr env (Var v) = return (specVar env v, emptyUDs) +specExpr _ (Lit lit) = return (Lit lit, emptyUDs) +specExpr env (Cast e co) + = do { (e', uds) <- specExpr env e + ; return ((mkCast e' (substCo env co)), uds) } +specExpr env (Tick tickish body) + = do { (body', uds) <- specExpr env body + ; return (Tick (specTickish env tickish) body', uds) } + +---------------- Applications might generate a call instance -------------------- +specExpr env expr@(App {}) + = go expr [] + where + go (App fun arg) args = do (arg', uds_arg) <- specExpr env arg + (fun', uds_app) <- go fun (arg':args) + return (App fun' arg', uds_arg `plusUDs` uds_app) + + go (Var f) args = case specVar env f of + Var f' -> return (Var f', mkCallUDs env f' args) + e' -> return (e', emptyUDs) -- I don't expect this! + go other _ = specExpr env other + +---------------- Lambda/case require dumping of usage details -------------------- +specExpr env e@(Lam _ _) = do + (body', uds) <- specExpr env' body + let (free_uds, dumped_dbs) = dumpUDs bndrs' uds + return (mkLams bndrs' (wrapDictBindsE dumped_dbs body'), free_uds) + where + (bndrs, body) = collectBinders e + (env', bndrs') = substBndrs env bndrs + -- More efficient to collect a group of binders together all at once + -- and we don't want to split a lambda group with dumped bindings + +specExpr env (Case scrut case_bndr ty alts) + = do { (scrut', scrut_uds) <- specExpr env scrut + ; (scrut'', case_bndr', alts', alts_uds) + <- specCase env scrut' case_bndr alts + ; return (Case scrut'' case_bndr' (substTy env ty) alts' + , scrut_uds `plusUDs` alts_uds) } + +---------------- Finally, let is the interesting case -------------------- +specExpr env (Let bind body) + = do { -- Clone binders + (rhs_env, body_env, bind') <- cloneBindSM env bind + + -- Deal with the body + ; (body', body_uds) <- specExpr body_env body + + -- Deal with the bindings + ; (binds', uds) <- specBind rhs_env bind' body_uds + + -- All done + ; return (foldr Let body' binds', uds) } + +specTickish :: SpecEnv -> Tickish Id -> Tickish Id +specTickish env (Breakpoint ix ids) + = Breakpoint ix [ id' | id <- ids, Var id' <- [specVar env id]] + -- drop vars from the list if they have a non-variable substitution. + -- should never happen, but it's harmless to drop them anyway. +specTickish _ other_tickish = other_tickish + +specCase :: SpecEnv + -> CoreExpr -- Scrutinee, already done + -> Id -> [CoreAlt] + -> SpecM ( CoreExpr -- New scrutinee + , Id + , [CoreAlt] + , UsageDetails) +specCase env scrut' case_bndr [(con, args, rhs)] + | isDictId case_bndr -- See Note [Floating dictionaries out of cases] + , interestingDict env scrut' + , not (isDeadBinder case_bndr && null sc_args') + = do { (case_bndr_flt : sc_args_flt) <- mapM clone_me (case_bndr' : sc_args') + + ; let sc_rhss = [ Case (Var case_bndr_flt) case_bndr' (idType sc_arg') + [(con, args', Var sc_arg')] + | sc_arg' <- sc_args' ] + + -- Extend the substitution for RHS to map the *original* binders + -- to their floated versions. + mb_sc_flts :: [Maybe DictId] + mb_sc_flts = map (lookupVarEnv clone_env) args' + clone_env = zipVarEnv sc_args' sc_args_flt + subst_prs = (case_bndr, Var case_bndr_flt) + : [ (arg, Var sc_flt) + | (arg, Just sc_flt) <- args `zip` mb_sc_flts ] + env_rhs' = env_rhs { se_subst = GHC.Core.Subst.extendIdSubstList (se_subst env_rhs) subst_prs + , se_interesting = se_interesting env_rhs `extendVarSetList` + (case_bndr_flt : sc_args_flt) } + + ; (rhs', rhs_uds) <- specExpr env_rhs' rhs + ; let scrut_bind = mkDB (NonRec case_bndr_flt scrut') + case_bndr_set = unitVarSet case_bndr_flt + sc_binds = [(NonRec sc_arg_flt sc_rhs, case_bndr_set) + | (sc_arg_flt, sc_rhs) <- sc_args_flt `zip` sc_rhss ] + flt_binds = scrut_bind : sc_binds + (free_uds, dumped_dbs) = dumpUDs (case_bndr':args') rhs_uds + all_uds = flt_binds `addDictBinds` free_uds + alt' = (con, args', wrapDictBindsE dumped_dbs rhs') + ; return (Var case_bndr_flt, case_bndr', [alt'], all_uds) } + where + (env_rhs, (case_bndr':args')) = substBndrs env (case_bndr:args) + sc_args' = filter is_flt_sc_arg args' + + clone_me bndr = do { uniq <- getUniqueM + ; return (mkUserLocalOrCoVar occ uniq ty loc) } + where + name = idName bndr + ty = idType bndr + occ = nameOccName name + loc = getSrcSpan name + + arg_set = mkVarSet args' + is_flt_sc_arg var = isId var + && not (isDeadBinder var) + && isDictTy var_ty + && not (tyCoVarsOfType var_ty `intersectsVarSet` arg_set) + where + var_ty = idType var + + +specCase env scrut case_bndr alts + = do { (alts', uds_alts) <- mapAndCombineSM spec_alt alts + ; return (scrut, case_bndr', alts', uds_alts) } + where + (env_alt, case_bndr') = substBndr env case_bndr + spec_alt (con, args, rhs) = do + (rhs', uds) <- specExpr env_rhs rhs + let (free_uds, dumped_dbs) = dumpUDs (case_bndr' : args') uds + return ((con, args', wrapDictBindsE dumped_dbs rhs'), free_uds) + where + (env_rhs, args') = substBndrs env_alt args + +{- +Note [Floating dictionaries out of cases] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + g = \d. case d of { MkD sc ... -> ...(f sc)... } +Naively we can't float d2's binding out of the case expression, +because 'sc' is bound by the case, and that in turn means we can't +specialise f, which seems a pity. + +So we invert the case, by floating out a binding +for 'sc_flt' thus: + sc_flt = case d of { MkD sc ... -> sc } +Now we can float the call instance for 'f'. Indeed this is just +what'll happen if 'sc' was originally bound with a let binding, +but case is more efficient, and necessary with equalities. So it's +good to work with both. + +You might think that this won't make any difference, because the +call instance will only get nuked by the \d. BUT if 'g' itself is +specialised, then transitively we should be able to specialise f. + +In general, given + case e of cb { MkD sc ... -> ...(f sc)... } +we transform to + let cb_flt = e + sc_flt = case cb_flt of { MkD sc ... -> sc } + in + case cb_flt of bg { MkD sc ... -> ....(f sc_flt)... } + +The "_flt" things are the floated binds; we use the current substitution +to substitute sc -> sc_flt in the RHS + +************************************************************************ +* * + Dealing with a binding +* * +************************************************************************ +-} + +specBind :: SpecEnv -- Use this for RHSs + -> CoreBind -- Binders are already cloned by cloneBindSM, + -- but RHSs are un-processed + -> UsageDetails -- Info on how the scope of the binding + -> SpecM ([CoreBind], -- New bindings + UsageDetails) -- And info to pass upstream + +-- Returned UsageDetails: +-- No calls for binders of this bind +specBind rhs_env (NonRec fn rhs) body_uds + = do { (rhs', rhs_uds) <- specExpr rhs_env rhs + ; (fn', spec_defns, body_uds1) <- specDefn rhs_env body_uds fn rhs + + ; let pairs = spec_defns ++ [(fn', rhs')] + -- fn' mentions the spec_defns in its rules, + -- so put the latter first + + combined_uds = body_uds1 `plusUDs` rhs_uds + + (free_uds, dump_dbs, float_all) = dumpBindUDs [fn] combined_uds + + final_binds :: [DictBind] + -- See Note [From non-recursive to recursive] + final_binds + | not (isEmptyBag dump_dbs) + , not (null spec_defns) + = [recWithDumpedDicts pairs dump_dbs] + | otherwise + = [mkDB $ NonRec b r | (b,r) <- pairs] + ++ bagToList dump_dbs + + ; if float_all then + -- Rather than discard the calls mentioning the bound variables + -- we float this (dictionary) binding along with the others + return ([], free_uds `snocDictBinds` final_binds) + else + -- No call in final_uds mentions bound variables, + -- so we can just leave the binding here + return (map fst final_binds, free_uds) } + + +specBind rhs_env (Rec pairs) body_uds + -- Note [Specialising a recursive group] + = do { let (bndrs,rhss) = unzip pairs + ; (rhss', rhs_uds) <- mapAndCombineSM (specExpr rhs_env) rhss + ; let scope_uds = body_uds `plusUDs` rhs_uds + -- Includes binds and calls arising from rhss + + ; (bndrs1, spec_defns1, uds1) <- specDefns rhs_env scope_uds pairs + + ; (bndrs3, spec_defns3, uds3) + <- if null spec_defns1 -- Common case: no specialisation + then return (bndrs1, [], uds1) + else do { -- Specialisation occurred; do it again + (bndrs2, spec_defns2, uds2) + <- specDefns rhs_env uds1 (bndrs1 `zip` rhss) + ; return (bndrs2, spec_defns2 ++ spec_defns1, uds2) } + + ; let (final_uds, dumped_dbs, float_all) = dumpBindUDs bndrs uds3 + final_bind = recWithDumpedDicts (spec_defns3 ++ zip bndrs3 rhss') + dumped_dbs + + ; if float_all then + return ([], final_uds `snocDictBind` final_bind) + else + return ([fst final_bind], final_uds) } + + +--------------------------- +specDefns :: SpecEnv + -> UsageDetails -- Info on how it is used in its scope + -> [(OutId,InExpr)] -- The things being bound and their un-processed RHS + -> SpecM ([OutId], -- Original Ids with RULES added + [(OutId,OutExpr)], -- Extra, specialised bindings + UsageDetails) -- Stuff to fling upwards from the specialised versions + +-- Specialise a list of bindings (the contents of a Rec), but flowing usages +-- upwards binding by binding. Example: { f = ...g ...; g = ...f .... } +-- Then if the input CallDetails has a specialised call for 'g', whose specialisation +-- in turn generates a specialised call for 'f', we catch that in this one sweep. +-- But not vice versa (it's a fixpoint problem). + +specDefns _env uds [] + = return ([], [], uds) +specDefns env uds ((bndr,rhs):pairs) + = do { (bndrs1, spec_defns1, uds1) <- specDefns env uds pairs + ; (bndr1, spec_defns2, uds2) <- specDefn env uds1 bndr rhs + ; return (bndr1 : bndrs1, spec_defns1 ++ spec_defns2, uds2) } + +--------------------------- +specDefn :: SpecEnv + -> UsageDetails -- Info on how it is used in its scope + -> OutId -> InExpr -- The thing being bound and its un-processed RHS + -> SpecM (Id, -- Original Id with added RULES + [(Id,CoreExpr)], -- Extra, specialised bindings + UsageDetails) -- Stuff to fling upwards from the specialised versions + +specDefn env body_uds fn rhs + = do { let (body_uds_without_me, calls_for_me) = callsForMe fn body_uds + rules_for_me = idCoreRules fn + ; (rules, spec_defns, spec_uds) <- specCalls Nothing env rules_for_me + calls_for_me fn rhs + ; return ( fn `addIdSpecialisations` rules + , spec_defns + , body_uds_without_me `plusUDs` spec_uds) } + -- It's important that the `plusUDs` is this way + -- round, because body_uds_without_me may bind + -- dictionaries that are used in calls_for_me passed + -- to specDefn. So the dictionary bindings in + -- spec_uds may mention dictionaries bound in + -- body_uds_without_me + +--------------------------- +specCalls :: Maybe Module -- Just this_mod => specialising imported fn + -- Nothing => specialising local fn + -> SpecEnv + -> [CoreRule] -- Existing RULES for the fn + -> [CallInfo] + -> OutId -> InExpr + -> SpecM SpecInfo -- New rules, specialised bindings, and usage details + +-- This function checks existing rules, and does not create +-- duplicate ones. So the caller does not need to do this filtering. +-- See 'already_covered' + +type SpecInfo = ( [CoreRule] -- Specialisation rules + , [(Id,CoreExpr)] -- Specialised definition + , UsageDetails ) -- Usage details from specialised RHSs + +specCalls mb_mod env existing_rules calls_for_me fn rhs + -- The first case is the interesting one + | callSpecArity pis <= fn_arity -- See Note [Specialisation Must Preserve Sharing] + && notNull calls_for_me -- And there are some calls to specialise + && not (isNeverActive (idInlineActivation fn)) + -- Don't specialise NOINLINE things + -- See Note [Auto-specialisation and RULES] + +-- && not (certainlyWillInline (idUnfolding fn)) -- And it's not small +-- See Note [Inline specialisation] for why we do not +-- switch off specialisation for inline functions + + = -- pprTrace "specDefn: some" (ppr fn $$ ppr calls_for_me $$ ppr existing_rules) $ + foldlM spec_call ([], [], emptyUDs) calls_for_me + + | otherwise -- No calls or RHS doesn't fit our preconceptions + = WARN( not (exprIsTrivial rhs) && notNull calls_for_me, + text "Missed specialisation opportunity for" + <+> ppr fn $$ _trace_doc ) + -- Note [Specialisation shape] + -- pprTrace "specDefn: none" (ppr fn <+> ppr calls_for_me) $ + return ([], [], emptyUDs) + where + _trace_doc = sep [ ppr rhs_tyvars, ppr rhs_bndrs + , ppr (idInlineActivation fn) ] + + fn_type = idType fn + fn_arity = idArity fn + fn_unf = realIdUnfolding fn -- Ignore loop-breaker-ness here + pis = fst $ splitPiTys fn_type + theta = getTheta pis + n_dicts = length theta + inl_prag = idInlinePragma fn + inl_act = inlinePragmaActivation inl_prag + is_local = isLocalId fn + + -- Figure out whether the function has an INLINE pragma + -- See Note [Inline specialisations] + + (rhs_bndrs, rhs_body) = collectBindersPushingCo rhs + -- See Note [Account for casts in binding] + rhs_tyvars = filter isTyVar rhs_bndrs + + in_scope = GHC.Core.Subst.substInScope (se_subst env) + + already_covered :: DynFlags -> [CoreRule] -> [CoreExpr] -> Bool + already_covered dflags new_rules args -- Note [Specialisations already covered] + = isJust (lookupRule dflags (in_scope, realIdUnfolding) + (const True) fn args + (new_rules ++ existing_rules)) + -- NB: we look both in the new_rules (generated by this invocation + -- of specCalls), and in existing_rules (passed in to specCalls) + + ---------------------------------------------------------- + -- Specialise to one particular call pattern + spec_call :: SpecInfo -- Accumulating parameter + -> CallInfo -- Call instance + -> SpecM SpecInfo + spec_call spec_acc@(rules_acc, pairs_acc, uds_acc) + (CI { ci_key = call_args, ci_arity = call_arity }) + = ASSERT(call_arity <= fn_arity) + + -- See Note [Specialising Calls] + do { (rhs_env2, unused_bndrs, rule_bndrs, rule_args, unspec_bndrs, dx_binds, spec_args) + <- specHeader env rhs_bndrs $ dropWhileEndLE isUnspecArg call_args + ; let rhs_body' = mkLams unused_bndrs rhs_body + ; dflags <- getDynFlags + ; if already_covered dflags rules_acc rule_args + then return spec_acc + else -- pprTrace "spec_call" (vcat [ ppr _call_info, ppr fn, ppr rhs_dict_ids + -- , text "rhs_env2" <+> ppr (se_subst rhs_env2) + -- , ppr dx_binds ]) $ + do + { -- Figure out the type of the specialised function + let body = mkLams unspec_bndrs rhs_body' + body_ty = substTy rhs_env2 $ exprType body + (lam_extra_args, app_args) -- See Note [Specialisations Must Be Lifted] + | isUnliftedType body_ty -- C.f. GHC.Core.Op.WorkWrap.Lib.mkWorkerArgs + , not (isJoinId fn) + = ([voidArgId], voidPrimId : unspec_bndrs) + | otherwise = ([], unspec_bndrs) + join_arity_change = length app_args - length rule_args + spec_join_arity | Just orig_join_arity <- isJoinId_maybe fn + = Just (orig_join_arity + join_arity_change) + | otherwise + = Nothing + + ; (spec_rhs, rhs_uds) <- specExpr rhs_env2 (mkLams lam_extra_args body) + ; let spec_id_ty = exprType spec_rhs + ; spec_f <- newSpecIdSM fn spec_id_ty spec_join_arity + ; this_mod <- getModule + ; let + -- The rule to put in the function's specialisation is: + -- forall x @b d1' d2'. + -- f x @T1 @b @T2 d1' d2' = f1 x @b + -- See Note [Specialising Calls] + herald = case mb_mod of + Nothing -- Specialising local fn + -> text "SPEC" + Just this_mod -- Specialising imported fn + -> text "SPEC/" <> ppr this_mod + + rule_name = mkFastString $ showSDoc dflags $ + herald <+> ftext (occNameFS (getOccName fn)) + <+> hsep (mapMaybe ppr_call_key_ty call_args) + -- This name ends up in interface files, so use occNameString. + -- Otherwise uniques end up there, making builds + -- less deterministic (See #4012 comment:61 ff) + + rule_wout_eta = mkRule + this_mod + True {- Auto generated -} + is_local + rule_name + inl_act -- Note [Auto-specialisation and RULES] + (idName fn) + rule_bndrs + rule_args + (mkVarApps (Var spec_f) app_args) + + spec_rule + = case isJoinId_maybe fn of + Just join_arity -> etaExpandToJoinPointRule join_arity + rule_wout_eta + Nothing -> rule_wout_eta + + -- Add the { d1' = dx1; d2' = dx2 } usage stuff + -- See Note [Specialising Calls] + spec_uds = foldr consDictBind rhs_uds dx_binds + + -------------------------------------- + -- Add a suitable unfolding if the spec_inl_prag says so + -- See Note [Inline specialisations] + (spec_inl_prag, spec_unf) + | not is_local && isStrongLoopBreaker (idOccInfo fn) + = (neverInlinePragma, noUnfolding) + -- See Note [Specialising imported functions] in OccurAnal + + | InlinePragma { inl_inline = Inlinable } <- inl_prag + = (inl_prag { inl_inline = NoUserInline }, noUnfolding) + + | otherwise + = (inl_prag, specUnfolding dflags unspec_bndrs spec_app n_dicts fn_unf) + + spec_app e = e `mkApps` spec_args + + -------------------------------------- + -- Adding arity information just propagates it a bit faster + -- See Note [Arity decrease] in GHC.Core.Op.Simplify + -- Copy InlinePragma information from the parent Id. + -- So if f has INLINE[1] so does spec_f + spec_f_w_arity = spec_f `setIdArity` max 0 (fn_arity - n_dicts) + `setInlinePragma` spec_inl_prag + `setIdUnfolding` spec_unf + `asJoinId_maybe` spec_join_arity + + _rule_trace_doc = vcat [ ppr spec_f, ppr fn_type, ppr spec_id_ty + , ppr rhs_bndrs, ppr call_args + , ppr spec_rule + ] + + ; -- pprTrace "spec_call: rule" _rule_trace_doc + return ( spec_rule : rules_acc + , (spec_f_w_arity, spec_rhs) : pairs_acc + , spec_uds `plusUDs` uds_acc + ) } } + +{- Note [Specialisation Must Preserve Sharing] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider a function: + + f :: forall a. Eq a => a -> blah + f = + if expensive + then f1 + else f2 + +As written, all calls to 'f' will share 'expensive'. But if we specialise 'f' +at 'Int', eg: + + $sfInt = SUBST[a->Int,dict->dEqInt] (if expensive then f1 else f2) + + RULE "SPEC f" + forall (d :: Eq Int). + f Int _ = $sfIntf + +We've now lost sharing between 'f' and '$sfInt' for 'expensive'. Yikes! + +To avoid this, we only generate specialisations for functions whose arity is +enough to bind all of the arguments we need to specialise. This ensures our +specialised functions don't do any work before receiving all of their dicts, +and thus avoids the 'f' case above. + +Note [Specialisations Must Be Lifted] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider a function 'f': + + f = forall a. Eq a => Array# a + +used like + + case x of + True -> ...f @Int dEqInt... + False -> 0 + +Naively, we might generate an (expensive) specialisation + + $sfInt :: Array# Int + +even in the case that @x = False@! Instead, we add a dummy 'Void#' argument to +the specialisation '$sfInt' ($sfInt :: Void# -> Array# Int) in order to +preserve laziness. + +Note [Specialising Calls] +~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose we have a function: + + f :: Int -> forall a b c. (Foo a, Foo c) => Bar -> Qux + f = \x -> /\ a b c -> \d1 d2 bar -> rhs + +and suppose it is called at: + + f 7 @T1 @T2 @T3 dFooT1 dFooT3 bar + +This call is described as a 'CallInfo' whose 'ci_key' is + + [ UnspecArg, SpecType T1, UnspecType, SpecType T3, SpecDict dFooT1 + , SpecDict dFooT3, UnspecArg ] + +Why are 'a' and 'c' identified as 'SpecType', while 'b' is 'UnspecType'? +Because we must specialise the function on type variables that appear +free in its *dictionary* arguments; but not on type variables that do not +appear in any dictionaries, i.e. are fully polymorphic. + +Because this call has dictionaries applied, we'd like to specialise +the call on any type argument that appears free in those dictionaries. +In this case, those are (a ~ T1, c ~ T3). + +As a result, we'd like to generate a function: + + $sf :: Int -> forall b. Bar -> Qux + $sf = SUBST[a->T1, c->T3, d1->d1', d2->d2'] (\x -> /\ b -> \bar -> rhs) + +Note that the substitution is applied to the whole thing. This is +convenient, but just slightly fragile. Notably: + * There had better be no name clashes in a/b/c + +We must construct a rewrite rule: + + RULE "SPEC f @T1 _ @T3" + forall (x :: Int) (@b :: Type) (d1' :: Foo T1) (d2' :: Foo T3). + f x @T1 @b @T3 d1' d2' = $sf x @b + +In the rule, d1' and d2' are just wildcards, not used in the RHS. Note +additionally that 'bar' isn't captured by this rule --- we bind only +enough etas in order to capture all of the *specialised* arguments. + +Finally, we must also construct the usage-details + + { d1' = dx1; d2' = dx2 } + +where d1', d2' are cloned versions of d1,d2, with the type substitution +applied. These auxiliary bindings just avoid duplication of dx1, dx2. + +Note [Account for casts in binding] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + f :: Eq a => a -> IO () + {-# INLINABLE f + StableUnf = (/\a \(d:Eq a) (x:a). blah) |> g + #-} + f = ... + +In f's stable unfolding we have done some modest simplification which +has pushed the cast to the outside. (I wonder if this is the Right +Thing, but it's what happens now; see GHC.Core.Op.Simplify.Utils Note [Casts and +lambdas].) Now that stable unfolding must be specialised, so we want +to push the cast back inside. It would be terrible if the cast +defeated specialisation! Hence the use of collectBindersPushingCo. + +Note [Evidence foralls] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose (#12212) that we are specialising + f :: forall a b. (Num a, F a ~ F b) => blah +with a=b=Int. Then the RULE will be something like + RULE forall (d:Num Int) (g :: F Int ~ F Int). + f Int Int d g = f_spec +But both varToCoreExpr (when constructing the LHS args), and the +simplifier (when simplifying the LHS args), will transform to + RULE forall (d:Num Int) (g :: F Int ~ F Int). + f Int Int d <F Int> = f_spec +by replacing g with Refl. So now 'g' is unbound, which results in a later +crash. So we use Refl right off the bat, and do not forall-quantify 'g': + * varToCoreExpr generates a Refl + * exprsFreeIdsList returns the Ids bound by the args, + which won't include g + +You might wonder if this will match as often, but the simplifier replaces +complicated Refl coercions with Refl pretty aggressively. + +Note [Orphans and auto-generated rules] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +When we specialise an INLINABLE function, or when we have +-fspecialise-aggressively, we auto-generate RULES that are orphans. +We don't want to warn about these, or we'd generate a lot of warnings. +Thus, we only warn about user-specified orphan rules. + +Indeed, we don't even treat the module as an orphan module if it has +auto-generated *rule* orphans. Orphan modules are read every time we +compile, so they are pretty obtrusive and slow down every compilation, +even non-optimised ones. (Reason: for type class instances it's a +type correctness issue.) But specialisation rules are strictly for +*optimisation* only so it's fine not to read the interface. + +What this means is that a SPEC rules from auto-specialisation in +module M will be used in other modules only if M.hi has been read for +some other reason, which is actually pretty likely. +-} + +bindAuxiliaryDicts + :: SpecEnv + -> [DictId] -> [CoreExpr] -- Original dict bndrs, and the witnessing expressions + -> [DictId] -- A cloned dict-id for each dict arg + -> (SpecEnv, -- Substitute for all orig_dicts + [DictBind], -- Auxiliary dict bindings + [CoreExpr]) -- Witnessing expressions (all trivial) +-- Bind any dictionary arguments to fresh names, to preserve sharing +bindAuxiliaryDicts env@(SE { se_subst = subst, se_interesting = interesting }) + orig_dict_ids call_ds inst_dict_ids + = (env', dx_binds, spec_dict_args) + where + (dx_binds, spec_dict_args) = go call_ds inst_dict_ids + env' = env { se_subst = subst `GHC.Core.Subst.extendSubstList` + (orig_dict_ids `zip` spec_dict_args) + `GHC.Core.Subst.extendInScopeList` dx_ids + , se_interesting = interesting `unionVarSet` interesting_dicts } + + dx_ids = [dx_id | (NonRec dx_id _, _) <- dx_binds] + interesting_dicts = mkVarSet [ dx_id | (NonRec dx_id dx, _) <- dx_binds + , interestingDict env dx ] + -- See Note [Make the new dictionaries interesting] + + go :: [CoreExpr] -> [CoreBndr] -> ([DictBind], [CoreExpr]) + go [] _ = ([], []) + go (dx:dxs) (dx_id:dx_ids) + | exprIsTrivial dx = (dx_binds, dx : args) + | otherwise = (mkDB (NonRec dx_id dx) : dx_binds, Var dx_id : args) + where + (dx_binds, args) = go dxs dx_ids + -- In the first case extend the substitution but not bindings; + -- in the latter extend the bindings but not the substitution. + -- For the former, note that we bind the *original* dict in the substitution, + -- overriding any d->dx_id binding put there by substBndrs + go _ _ = pprPanic "bindAuxiliaryDicts" (ppr orig_dict_ids $$ ppr call_ds $$ ppr inst_dict_ids) + +{- +Note [Make the new dictionaries interesting] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Important! We're going to substitute dx_id1 for d +and we want it to look "interesting", else we won't gather *any* +consequential calls. E.g. + f d = ...g d.... +If we specialise f for a call (f (dfun dNumInt)), we'll get +a consequent call (g d') with an auxiliary definition + d' = df dNumInt +We want that consequent call to look interesting + + +Note [From non-recursive to recursive] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Even in the non-recursive case, if any dict-binds depend on 'fn' we might +have built a recursive knot + + f a d x = <blah> + MkUD { ud_binds = NonRec d7 (MkD ..f..) + , ud_calls = ...(f T d7)... } + +The we generate + + Rec { fs x = <blah>[T/a, d7/d] + f a d x = <blah> + RULE f T _ = fs + d7 = ...f... } + +Here the recursion is only through the RULE. + +However we definitely should /not/ make the Rec in this wildly common +case: + d = ... + MkUD { ud_binds = NonRec d7 (...d...) + , ud_calls = ...(f T d7)... } + +Here we want simply to add d to the floats, giving + MkUD { ud_binds = NonRec d (...) + NonRec d7 (...d...) + , ud_calls = ...(f T d7)... } + +In general, we need only make this Rec if + - there are some specialisations (spec_binds non-empty) + - there are some dict_binds that depend on f (dump_dbs non-empty) + +Note [Avoiding loops] +~~~~~~~~~~~~~~~~~~~~~ +When specialising /dictionary functions/ we must be very careful to +avoid building loops. Here is an example that bit us badly: #3591 + + class Eq a => C a + instance Eq [a] => C [a] + +This translates to + dfun :: Eq [a] -> C [a] + dfun a d = MkD a d (meth d) + + d4 :: Eq [T] = <blah> + d2 :: C [T] = dfun T d4 + d1 :: Eq [T] = $p1 d2 + d3 :: C [T] = dfun T d1 + +None of these definitions is recursive. What happened was that we +generated a specialisation: + + RULE forall d. dfun T d = dT :: C [T] + dT = (MkD a d (meth d)) [T/a, d1/d] + = MkD T d1 (meth d1) + +But now we use the RULE on the RHS of d2, to get + + d2 = dT = MkD d1 (meth d1) + d1 = $p1 d2 + +and now d1 is bottom! The problem is that when specialising 'dfun' we +should first dump "below" the binding all floated dictionary bindings +that mention 'dfun' itself. So d2 and d3 (and hence d1) must be +placed below 'dfun', and thus unavailable to it when specialising +'dfun'. That in turn means that the call (dfun T d1) must be +discarded. On the other hand, the call (dfun T d4) is fine, assuming +d4 doesn't mention dfun. + +Solution: + Discard all calls that mention dictionaries that depend + (directly or indirectly) on the dfun we are specialising. + This is done by 'filterCalls' + +-------------- +Here's another example, this time for an imported dfun, so the call +to filterCalls is in specImports (#13429). Suppose we have + class Monoid v => C v a where ... + +We start with a call + f @ [Integer] @ Integer $fC[]Integer + +Specialising call to 'f' gives dict bindings + $dMonoid_1 :: Monoid [Integer] + $dMonoid_1 = M.$p1C @ [Integer] $fC[]Integer + + $dC_1 :: C [Integer] (Node [Integer] Integer) + $dC_1 = M.$fCvNode @ [Integer] $dMonoid_1 + +...plus a recursive call to + f @ [Integer] @ (Node [Integer] Integer) $dC_1 + +Specialising that call gives + $dMonoid_2 :: Monoid [Integer] + $dMonoid_2 = M.$p1C @ [Integer] $dC_1 + + $dC_2 :: C [Integer] (Node [Integer] Integer) + $dC_2 = M.$fCvNode @ [Integer] $dMonoid_2 + +Now we have two calls to the imported function + M.$fCvNode :: Monoid v => C v a + M.$fCvNode @v @a m = C m some_fun + +But we must /not/ use the call (M.$fCvNode @ [Integer] $dMonoid_2) +for specialisation, else we get: + + $dC_1 = M.$fCvNode @ [Integer] $dMonoid_1 + $dMonoid_2 = M.$p1C @ [Integer] $dC_1 + $s$fCvNode = C $dMonoid_2 ... + RULE M.$fCvNode [Integer] _ _ = $s$fCvNode + +Now use the rule to rewrite the call in the RHS of $dC_1 +and we get a loop! + +-------------- +Here's yet another example + + class C a where { foo,bar :: [a] -> [a] } + + instance C Int where + foo x = r_bar x + bar xs = reverse xs + + r_bar :: C a => [a] -> [a] + r_bar xs = bar (xs ++ xs) + +That translates to: + + r_bar a (c::C a) (xs::[a]) = bar a d (xs ++ xs) + + Rec { $fCInt :: C Int = MkC foo_help reverse + foo_help (xs::[Int]) = r_bar Int $fCInt xs } + +The call (r_bar $fCInt) mentions $fCInt, + which mentions foo_help, + which mentions r_bar +But we DO want to specialise r_bar at Int: + + Rec { $fCInt :: C Int = MkC foo_help reverse + foo_help (xs::[Int]) = r_bar Int $fCInt xs + + r_bar a (c::C a) (xs::[a]) = bar a d (xs ++ xs) + RULE r_bar Int _ = r_bar_Int + + r_bar_Int xs = bar Int $fCInt (xs ++ xs) + } + +Note that, because of its RULE, r_bar joins the recursive +group. (In this case it'll unravel a short moment later.) + + +Note [Specialising a recursive group] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider + let rec { f x = ...g x'... + ; g y = ...f y'.... } + in f 'a' +Here we specialise 'f' at Char; but that is very likely to lead to +a specialisation of 'g' at Char. We must do the latter, else the +whole point of specialisation is lost. + +But we do not want to keep iterating to a fixpoint, because in the +presence of polymorphic recursion we might generate an infinite number +of specialisations. + +So we use the following heuristic: + * Arrange the rec block in dependency order, so far as possible + (the occurrence analyser already does this) + + * Specialise it much like a sequence of lets + + * Then go through the block a second time, feeding call-info from + the RHSs back in the bottom, as it were + +In effect, the ordering maxmimises the effectiveness of each sweep, +and we do just two sweeps. This should catch almost every case of +monomorphic recursion -- the exception could be a very knotted-up +recursion with multiple cycles tied up together. + +This plan is implemented in the Rec case of specBindItself. + +Note [Specialisations already covered] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We obviously don't want to generate two specialisations for the same +argument pattern. There are two wrinkles + +1. We do the already-covered test in specDefn, not when we generate +the CallInfo in mkCallUDs. We used to test in the latter place, but +we now iterate the specialiser somewhat, and the Id at the call site +might therefore not have all the RULES that we can see in specDefn + +2. What about two specialisations where the second is an *instance* +of the first? If the more specific one shows up first, we'll generate +specialisations for both. If the *less* specific one shows up first, +we *don't* currently generate a specialisation for the more specific +one. (See the call to lookupRule in already_covered.) Reasons: + (a) lookupRule doesn't say which matches are exact (bad reason) + (b) if the earlier specialisation is user-provided, it's + far from clear that we should auto-specialise further + +Note [Auto-specialisation and RULES] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider: + g :: Num a => a -> a + g = ... + + f :: (Int -> Int) -> Int + f w = ... + {-# RULE f g = 0 #-} + +Suppose that auto-specialisation makes a specialised version of +g::Int->Int That version won't appear in the LHS of the RULE for f. +So if the specialisation rule fires too early, the rule for f may +never fire. + +It might be possible to add new rules, to "complete" the rewrite system. +Thus when adding + RULE forall d. g Int d = g_spec +also add + RULE f g_spec = 0 + +But that's a bit complicated. For now we ask the programmer's help, +by *copying the INLINE activation pragma* to the auto-specialised +rule. So if g says {-# NOINLINE[2] g #-}, then the auto-spec rule +will also not be active until phase 2. And that's what programmers +should jolly well do anyway, even aside from specialisation, to ensure +that g doesn't inline too early. + +This in turn means that the RULE would never fire for a NOINLINE +thing so not much point in generating a specialisation at all. + +Note [Specialisation shape] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We only specialise a function if it has visible top-level lambdas +corresponding to its overloading. E.g. if + f :: forall a. Eq a => .... +then its body must look like + f = /\a. \d. ... + +Reason: when specialising the body for a call (f ty dexp), we want to +substitute dexp for d, and pick up specialised calls in the body of f. + +This doesn't always work. One example I came across was this: + newtype Gen a = MkGen{ unGen :: Int -> a } + + choose :: Eq a => a -> Gen a + choose n = MkGen (\r -> n) + + oneof = choose (1::Int) + +It's a silly example, but we get + choose = /\a. g `cast` co +where choose doesn't have any dict arguments. Thus far I have not +tried to fix this (wait till there's a real example). + +Mind you, then 'choose' will be inlined (since RHS is trivial) so +it doesn't matter. This comes up with single-method classes + + class C a where { op :: a -> a } + instance C a => C [a] where .... +==> + $fCList :: C a => C [a] + $fCList = $copList |> (...coercion>...) + ....(uses of $fCList at particular types)... + +So we suppress the WARN if the rhs is trivial. + +Note [Inline specialisations] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Here is what we do with the InlinePragma of the original function + * Activation/RuleMatchInfo: both transferred to the + specialised function + * InlineSpec: + (a) An INLINE pragma is transferred + (b) An INLINABLE pragma is *not* transferred + +Why (a): transfer INLINE pragmas? The point of INLINE was precisely to +specialise the function at its call site, and arguably that's not so +important for the specialised copies. BUT *pragma-directed* +specialisation now takes place in the typechecker/desugarer, with +manually specified INLINEs. The specialisation here is automatic. +It'd be very odd if a function marked INLINE was specialised (because +of some local use), and then forever after (including importing +modules) the specialised version wasn't INLINEd. After all, the +programmer said INLINE! + +You might wonder why we specialise INLINE functions at all. After +all they should be inlined, right? Two reasons: + + * Even INLINE functions are sometimes not inlined, when they aren't + applied to interesting arguments. But perhaps the type arguments + alone are enough to specialise (even though the args are too boring + to trigger inlining), and it's certainly better to call the + specialised version. + + * The RHS of an INLINE function might call another overloaded function, + and we'd like to generate a specialised version of that function too. + This actually happens a lot. Consider + replicateM_ :: (Monad m) => Int -> m a -> m () + {-# INLINABLE replicateM_ #-} + replicateM_ d x ma = ... + The strictness analyser may transform to + replicateM_ :: (Monad m) => Int -> m a -> m () + {-# INLINE replicateM_ #-} + replicateM_ d x ma = case x of I# x' -> $wreplicateM_ d x' ma + + $wreplicateM_ :: (Monad m) => Int# -> m a -> m () + {-# INLINABLE $wreplicateM_ #-} + $wreplicateM_ = ... + Now an importing module has a specialised call to replicateM_, say + (replicateM_ dMonadIO). We certainly want to specialise $wreplicateM_! + This particular example had a huge effect on the call to replicateM_ + in nofib/shootout/n-body. + +Why (b): discard INLINABLE pragmas? See #4874 for persuasive examples. +Suppose we have + {-# INLINABLE f #-} + f :: Ord a => [a] -> Int + f xs = letrec f' = ...f'... in f' +Then, when f is specialised and optimised we might get + wgo :: [Int] -> Int# + wgo = ...wgo... + f_spec :: [Int] -> Int + f_spec xs = case wgo xs of { r -> I# r } +and we clearly want to inline f_spec at call sites. But if we still +have the big, un-optimised of f (albeit specialised) captured in an +INLINABLE pragma for f_spec, we won't get that optimisation. + +So we simply drop INLINABLE pragmas when specialising. It's not really +a complete solution; ignoring specialisation for now, INLINABLE functions +don't get properly strictness analysed, for example. But it works well +for examples involving specialisation, which is the dominant use of +INLINABLE. See #4874. + + +************************************************************************ +* * +\subsubsection{UsageDetails and suchlike} +* * +************************************************************************ +-} + +data UsageDetails + = MkUD { + ud_binds :: !(Bag DictBind), + -- See Note [Floated dictionary bindings] + -- The order is important; + -- in ds1 `union` ds2, bindings in ds2 can depend on those in ds1 + -- (Remember, Bags preserve order in GHC.) + + ud_calls :: !CallDetails + + -- INVARIANT: suppose bs = bindersOf ud_binds + -- Then 'calls' may *mention* 'bs', + -- but there should be no calls *for* bs + } + +-- | A 'DictBind' is a binding along with a cached set containing its free +-- variables (both type variables and dictionaries) +type DictBind = (CoreBind, VarSet) + +{- Note [Floated dictionary bindings] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We float out dictionary bindings for the reasons described under +"Dictionary floating" above. But not /just/ dictionary bindings. +Consider + + f :: Eq a => blah + f a d = rhs + + $c== :: T -> T -> Bool + $c== x y = ... + + $df :: Eq T + $df = Eq $c== ... + + gurgle = ...(f @T $df)... + +We gather the call info for (f @T $df), and we don't want to drop it +when we come across the binding for $df. So we add $df to the floats +and continue. But then we have to add $c== to the floats, and so on. +These all float above the binding for 'f', and now we can +successfully specialise 'f'. + +So the DictBinds in (ud_binds :: Bag DictBind) may contain +non-dictionary bindings too. +-} + +instance Outputable UsageDetails where + ppr (MkUD { ud_binds = dbs, ud_calls = calls }) + = text "MkUD" <+> braces (sep (punctuate comma + [text "binds" <+> equals <+> ppr dbs, + text "calls" <+> equals <+> ppr calls])) + +emptyUDs :: UsageDetails +emptyUDs = MkUD { ud_binds = emptyBag, ud_calls = emptyDVarEnv } + +------------------------------------------------------------ +type CallDetails = DIdEnv CallInfoSet + -- The order of specialized binds and rules depends on how we linearize + -- CallDetails, so to get determinism we must use a deterministic set here. + -- See Note [Deterministic UniqFM] in UniqDFM + +data CallInfoSet = CIS Id (Bag CallInfo) + -- The list of types and dictionaries is guaranteed to + -- match the type of f + -- The Bag may contain duplicate calls (i.e. f @T and another f @T) + -- These dups are eliminated by already_covered in specCalls + +data CallInfo + = CI { ci_key :: [SpecArg] -- All arguments + , ci_arity :: Int -- The number of variables necessary to bind + -- all of the specialised arguments + , ci_fvs :: VarSet -- Free vars of the ci_key + -- call (including tyvars) + -- [*not* include the main id itself, of course] + } + +type DictExpr = CoreExpr + +ciSetFilter :: (CallInfo -> Bool) -> CallInfoSet -> CallInfoSet +ciSetFilter p (CIS id a) = CIS id (filterBag p a) + +instance Outputable CallInfoSet where + ppr (CIS fn map) = hang (text "CIS" <+> ppr fn) + 2 (ppr map) + +pprCallInfo :: Id -> CallInfo -> SDoc +pprCallInfo fn (CI { ci_key = key }) + = ppr fn <+> ppr key + +ppr_call_key_ty :: SpecArg -> Maybe SDoc +ppr_call_key_ty (SpecType ty) = Just $ char '@' <> pprParendType ty +ppr_call_key_ty UnspecType = Just $ char '_' +ppr_call_key_ty (SpecDict _) = Nothing +ppr_call_key_ty UnspecArg = Nothing + +instance Outputable CallInfo where + ppr (CI { ci_key = key, ci_fvs = fvs }) + = text "CI" <> braces (hsep [ fsep (mapMaybe ppr_call_key_ty key), ppr fvs ]) + +unionCalls :: CallDetails -> CallDetails -> CallDetails +unionCalls c1 c2 = plusDVarEnv_C unionCallInfoSet c1 c2 + +unionCallInfoSet :: CallInfoSet -> CallInfoSet -> CallInfoSet +unionCallInfoSet (CIS f calls1) (CIS _ calls2) = + CIS f (calls1 `unionBags` calls2) + +callDetailsFVs :: CallDetails -> VarSet +callDetailsFVs calls = + nonDetFoldUDFM (unionVarSet . callInfoFVs) emptyVarSet calls + -- It's OK to use nonDetFoldUDFM here because we forget the ordering + -- immediately by converting to a nondeterministic set. + +callInfoFVs :: CallInfoSet -> VarSet +callInfoFVs (CIS _ call_info) = + foldr (\(CI { ci_fvs = fv }) vs -> unionVarSet fv vs) emptyVarSet call_info + +computeArity :: [SpecArg] -> Int +computeArity = length . filter isValueArg . dropWhileEndLE isUnspecArg + +callSpecArity :: [TyCoBinder] -> Int +callSpecArity = length . filter (not . isNamedBinder) . dropWhileEndLE isVisibleBinder + +getTheta :: [TyCoBinder] -> [PredType] +getTheta = fmap tyBinderType . filter isInvisibleBinder . filter (not . isNamedBinder) + + +------------------------------------------------------------ +singleCall :: Id -> [SpecArg] -> UsageDetails +singleCall id args + = MkUD {ud_binds = emptyBag, + ud_calls = unitDVarEnv id $ CIS id $ + unitBag (CI { ci_key = args -- used to be tys + , ci_arity = computeArity args + , ci_fvs = call_fvs }) } + where + tys = getSpecTypes args + dicts = getSpecDicts args + call_fvs = exprsFreeVars dicts `unionVarSet` tys_fvs + tys_fvs = tyCoVarsOfTypes tys + -- The type args (tys) are guaranteed to be part of the dictionary + -- types, because they are just the constrained types, + -- and the dictionary is therefore sure to be bound + -- inside the binding for any type variables free in the type; + -- hence it's safe to neglect tyvars free in tys when making + -- the free-var set for this call + -- BUT I don't trust this reasoning; play safe and include tys_fvs + -- + -- We don't include the 'id' itself. + +mkCallUDs, mkCallUDs' :: SpecEnv -> Id -> [CoreExpr] -> UsageDetails +mkCallUDs env f args + = -- pprTrace "mkCallUDs" (vcat [ ppr f, ppr args, ppr res ]) + res + where + res = mkCallUDs' env f args + +mkCallUDs' env f args + | not (want_calls_for f) -- Imported from elsewhere + || null theta -- Not overloaded + = emptyUDs + + | not (all type_determines_value theta) + || not (computeArity ci_key <= idArity f) + || not (length dicts == length theta) + || not (any (interestingDict env) dicts) -- Note [Interesting dictionary arguments] + -- See also Note [Specialisations already covered] + = -- pprTrace "mkCallUDs: discarding" _trace_doc + emptyUDs -- Not overloaded, or no specialisation wanted + + | otherwise + = -- pprTrace "mkCallUDs: keeping" _trace_doc + singleCall f ci_key + where + _trace_doc = vcat [ppr f, ppr args, ppr (map (interestingDict env) dicts)] + pis = fst $ splitPiTys $ idType f + theta = getTheta pis + constrained_tyvars = tyCoVarsOfTypes theta + + ci_key :: [SpecArg] + ci_key = fmap (\(t, a) -> + case t of + Named (binderVar -> tyVar) + | tyVar `elemVarSet` constrained_tyvars + -> case a of + Type ty -> SpecType ty + _ -> pprPanic "ci_key" $ ppr a + | otherwise + -> UnspecType + Anon InvisArg _ -> SpecDict a + Anon VisArg _ -> UnspecArg + ) $ zip pis args + + dicts = getSpecDicts ci_key + + want_calls_for f = isLocalId f || isJust (maybeUnfoldingTemplate (realIdUnfolding f)) + -- For imported things, we gather call instances if + -- there is an unfolding that we could in principle specialise + -- We might still decide not to use it (consulting dflags) + -- in specImports + -- Use 'realIdUnfolding' to ignore the loop-breaker flag! + + type_determines_value pred -- See Note [Type determines value] + = case classifyPredType pred of + ClassPred cls _ -> not (isIPClass cls) -- Superclasses can't be IPs + EqPred {} -> True + IrredPred {} -> True -- Things like (D []) where D is a + -- Constraint-ranged family; #7785 + ForAllPred {} -> True + +{- +Note [Type determines value] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Only specialise if all overloading is on non-IP *class* params, +because these are the ones whose *type* determines their *value*. In +parrticular, with implicit params, the type args *don't* say what the +value of the implicit param is! See #7101 + +However, consider + type family D (v::*->*) :: Constraint + type instance D [] = () + f :: D v => v Char -> Int +If we see a call (f "foo"), we'll pass a "dictionary" + () |> (g :: () ~ D []) +and it's good to specialise f at this dictionary. + +So the question is: can an implicit parameter "hide inside" a +type-family constraint like (D a). Well, no. We don't allow + type instance D Maybe = ?x:Int +Hence the IrredPred case in type_determines_value. +See #7785. + +Note [Interesting dictionary arguments] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this + \a.\d:Eq a. let f = ... in ...(f d)... +There really is not much point in specialising f wrt the dictionary d, +because the code for the specialised f is not improved at all, because +d is lambda-bound. We simply get junk specialisations. + +What is "interesting"? Just that it has *some* structure. But what about +variables? + + * A variable might be imported, in which case its unfolding + will tell us whether it has useful structure + + * Local variables are cloned on the way down (to avoid clashes when + we float dictionaries), and cloning drops the unfolding + (cloneIdBndr). Moreover, we make up some new bindings, and it's a + nuisance to give them unfoldings. So we keep track of the + "interesting" dictionaries as a VarSet in SpecEnv. + We have to take care to put any new interesting dictionary + bindings in the set. + +We accidentally lost accurate tracking of local variables for a long +time, because cloned variables don't have unfoldings. But makes a +massive difference in a few cases, eg #5113. For nofib as a +whole it's only a small win: 2.2% improvement in allocation for ansi, +1.2% for bspt, but mostly 0.0! Average 0.1% increase in binary size. +-} + +interestingDict :: SpecEnv -> CoreExpr -> Bool +-- A dictionary argument is interesting if it has *some* structure +-- NB: "dictionary" arguments include constraints of all sorts, +-- including equality constraints; hence the Coercion case +interestingDict env (Var v) = hasSomeUnfolding (idUnfolding v) + || isDataConWorkId v + || v `elemVarSet` se_interesting env +interestingDict _ (Type _) = False +interestingDict _ (Coercion _) = False +interestingDict env (App fn (Type _)) = interestingDict env fn +interestingDict env (App fn (Coercion _)) = interestingDict env fn +interestingDict env (Tick _ a) = interestingDict env a +interestingDict env (Cast e _) = interestingDict env e +interestingDict _ _ = True + +plusUDs :: UsageDetails -> UsageDetails -> UsageDetails +plusUDs (MkUD {ud_binds = db1, ud_calls = calls1}) + (MkUD {ud_binds = db2, ud_calls = calls2}) + = MkUD { ud_binds = db1 `unionBags` db2 + , ud_calls = calls1 `unionCalls` calls2 } + +----------------------------- +_dictBindBndrs :: Bag DictBind -> [Id] +_dictBindBndrs dbs = foldr ((++) . bindersOf . fst) [] dbs + +-- | Construct a 'DictBind' from a 'CoreBind' +mkDB :: CoreBind -> DictBind +mkDB bind = (bind, bind_fvs bind) + +-- | Identify the free variables of a 'CoreBind' +bind_fvs :: CoreBind -> VarSet +bind_fvs (NonRec bndr rhs) = pair_fvs (bndr,rhs) +bind_fvs (Rec prs) = foldl' delVarSet rhs_fvs bndrs + where + bndrs = map fst prs + rhs_fvs = unionVarSets (map pair_fvs prs) + +pair_fvs :: (Id, CoreExpr) -> VarSet +pair_fvs (bndr, rhs) = exprSomeFreeVars interesting rhs + `unionVarSet` idFreeVars bndr + -- idFreeVars: don't forget variables mentioned in + -- the rules of the bndr. C.f. OccAnal.addRuleUsage + -- Also tyvars mentioned in its type; they may not appear + -- in the RHS + -- type T a = Int + -- x :: T a = 3 + where + interesting :: InterestingVarFun + interesting v = isLocalVar v || (isId v && isDFunId v) + -- Very important: include DFunIds /even/ if it is imported + -- Reason: See Note [Avoiding loops], the second example + -- involving an imported dfun. We must know whether + -- a dictionary binding depends on an imported dfun, + -- in case we try to specialise that imported dfun + -- #13429 illustrates + +-- | Flatten a set of "dumped" 'DictBind's, and some other binding +-- pairs, into a single recursive binding. +recWithDumpedDicts :: [(Id,CoreExpr)] -> Bag DictBind ->DictBind +recWithDumpedDicts pairs dbs + = (Rec bindings, fvs) + where + (bindings, fvs) = foldr add + ([], emptyVarSet) + (dbs `snocBag` mkDB (Rec pairs)) + add (NonRec b r, fvs') (pairs, fvs) = + ((b,r) : pairs, fvs `unionVarSet` fvs') + add (Rec prs1, fvs') (pairs, fvs) = + (prs1 ++ pairs, fvs `unionVarSet` fvs') + +snocDictBinds :: UsageDetails -> [DictBind] -> UsageDetails +-- Add ud_binds to the tail end of the bindings in uds +snocDictBinds uds dbs + = uds { ud_binds = ud_binds uds `unionBags` listToBag dbs } + +consDictBind :: DictBind -> UsageDetails -> UsageDetails +consDictBind bind uds = uds { ud_binds = bind `consBag` ud_binds uds } + +addDictBinds :: [DictBind] -> UsageDetails -> UsageDetails +addDictBinds binds uds = uds { ud_binds = listToBag binds `unionBags` ud_binds uds } + +snocDictBind :: UsageDetails -> DictBind -> UsageDetails +snocDictBind uds bind = uds { ud_binds = ud_binds uds `snocBag` bind } + +wrapDictBinds :: Bag DictBind -> [CoreBind] -> [CoreBind] +wrapDictBinds dbs binds + = foldr add binds dbs + where + add (bind,_) binds = bind : binds + +wrapDictBindsE :: Bag DictBind -> CoreExpr -> CoreExpr +wrapDictBindsE dbs expr + = foldr add expr dbs + where + add (bind,_) expr = Let bind expr + +---------------------- +dumpUDs :: [CoreBndr] -> UsageDetails -> (UsageDetails, Bag DictBind) +-- Used at a lambda or case binder; just dump anything mentioning the binder +dumpUDs bndrs uds@(MkUD { ud_binds = orig_dbs, ud_calls = orig_calls }) + | null bndrs = (uds, emptyBag) -- Common in case alternatives + | otherwise = -- pprTrace "dumpUDs" (ppr bndrs $$ ppr free_uds $$ ppr dump_dbs) $ + (free_uds, dump_dbs) + where + free_uds = MkUD { ud_binds = free_dbs, ud_calls = free_calls } + bndr_set = mkVarSet bndrs + (free_dbs, dump_dbs, dump_set) = splitDictBinds orig_dbs bndr_set + free_calls = deleteCallsMentioning dump_set $ -- Drop calls mentioning bndr_set on the floor + deleteCallsFor bndrs orig_calls -- Discard calls for bndr_set; there should be + -- no calls for any of the dicts in dump_dbs + +dumpBindUDs :: [CoreBndr] -> UsageDetails -> (UsageDetails, Bag DictBind, Bool) +-- Used at a let(rec) binding. +-- We return a boolean indicating whether the binding itself is mentioned, +-- directly or indirectly, by any of the ud_calls; in that case we want to +-- float the binding itself; +-- See Note [Floated dictionary bindings] +dumpBindUDs bndrs (MkUD { ud_binds = orig_dbs, ud_calls = orig_calls }) + = -- pprTrace "dumpBindUDs" (ppr bndrs $$ ppr free_uds $$ ppr dump_dbs) $ + (free_uds, dump_dbs, float_all) + where + free_uds = MkUD { ud_binds = free_dbs, ud_calls = free_calls } + bndr_set = mkVarSet bndrs + (free_dbs, dump_dbs, dump_set) = splitDictBinds orig_dbs bndr_set + free_calls = deleteCallsFor bndrs orig_calls + float_all = dump_set `intersectsVarSet` callDetailsFVs free_calls + +callsForMe :: Id -> UsageDetails -> (UsageDetails, [CallInfo]) +callsForMe fn (MkUD { ud_binds = orig_dbs, ud_calls = orig_calls }) + = -- pprTrace ("callsForMe") + -- (vcat [ppr fn, + -- text "Orig dbs =" <+> ppr (_dictBindBndrs orig_dbs), + -- text "Orig calls =" <+> ppr orig_calls, + -- text "Dep set =" <+> ppr dep_set, + -- text "Calls for me =" <+> ppr calls_for_me]) $ + (uds_without_me, calls_for_me) + where + uds_without_me = MkUD { ud_binds = orig_dbs + , ud_calls = delDVarEnv orig_calls fn } + calls_for_me = case lookupDVarEnv orig_calls fn of + Nothing -> [] + Just cis -> filterCalls cis orig_dbs + -- filterCalls: drop calls that (directly or indirectly) + -- refer to fn. See Note [Avoiding loops] + +---------------------- +filterCalls :: CallInfoSet -> Bag DictBind -> [CallInfo] +-- See Note [Avoiding loops] +filterCalls (CIS fn call_bag) dbs + = filter ok_call (bagToList call_bag) + where + dump_set = foldl' go (unitVarSet fn) dbs + -- This dump-set could also be computed by splitDictBinds + -- (_,_,dump_set) = splitDictBinds dbs {fn} + -- But this variant is shorter + + go so_far (db,fvs) | fvs `intersectsVarSet` so_far + = extendVarSetList so_far (bindersOf db) + | otherwise = so_far + + ok_call (CI { ci_fvs = fvs }) = not (fvs `intersectsVarSet` dump_set) + +---------------------- +splitDictBinds :: Bag DictBind -> IdSet -> (Bag DictBind, Bag DictBind, IdSet) +-- splitDictBinds dbs bndrs returns +-- (free_dbs, dump_dbs, dump_set) +-- where +-- * dump_dbs depends, transitively on bndrs +-- * free_dbs does not depend on bndrs +-- * dump_set = bndrs `union` bndrs(dump_dbs) +splitDictBinds dbs bndr_set + = foldl' split_db (emptyBag, emptyBag, bndr_set) dbs + -- Important that it's foldl' not foldr; + -- we're accumulating the set of dumped ids in dump_set + where + split_db (free_dbs, dump_dbs, dump_idset) db@(bind, fvs) + | dump_idset `intersectsVarSet` fvs -- Dump it + = (free_dbs, dump_dbs `snocBag` db, + extendVarSetList dump_idset (bindersOf bind)) + + | otherwise -- Don't dump it + = (free_dbs `snocBag` db, dump_dbs, dump_idset) + + +---------------------- +deleteCallsMentioning :: VarSet -> CallDetails -> CallDetails +-- Remove calls *mentioning* bs in any way +deleteCallsMentioning bs calls + = mapDVarEnv (ciSetFilter keep_call) calls + where + keep_call (CI { ci_fvs = fvs }) = not (fvs `intersectsVarSet` bs) + +deleteCallsFor :: [Id] -> CallDetails -> CallDetails +-- Remove calls *for* bs +deleteCallsFor bs calls = delDVarEnvList calls bs + +{- +************************************************************************ +* * +\subsubsection{Boring helper functions} +* * +************************************************************************ +-} + +newtype SpecM a = SpecM (State SpecState a) deriving (Functor) + +data SpecState = SpecState { + spec_uniq_supply :: UniqSupply, + spec_module :: Module, + spec_dflags :: DynFlags + } + +instance Applicative SpecM where + pure x = SpecM $ return x + (<*>) = ap + +instance Monad SpecM where + SpecM x >>= f = SpecM $ do y <- x + case f y of + SpecM z -> + z +#if !MIN_VERSION_base(4,13,0) + fail = MonadFail.fail +#endif + +instance MonadFail.MonadFail SpecM where + fail str = SpecM $ error str + +instance MonadUnique SpecM where + getUniqueSupplyM + = SpecM $ do st <- get + let (us1, us2) = splitUniqSupply $ spec_uniq_supply st + put $ st { spec_uniq_supply = us2 } + return us1 + + getUniqueM + = SpecM $ do st <- get + let (u,us') = takeUniqFromSupply $ spec_uniq_supply st + put $ st { spec_uniq_supply = us' } + return u + +instance HasDynFlags SpecM where + getDynFlags = SpecM $ liftM spec_dflags get + +instance HasModule SpecM where + getModule = SpecM $ liftM spec_module get + +runSpecM :: DynFlags -> Module -> SpecM a -> CoreM a +runSpecM dflags this_mod (SpecM spec) + = do us <- getUniqueSupplyM + let initialState = SpecState { + spec_uniq_supply = us, + spec_module = this_mod, + spec_dflags = dflags + } + return $ evalState spec initialState + +mapAndCombineSM :: (a -> SpecM (b, UsageDetails)) -> [a] -> SpecM ([b], UsageDetails) +mapAndCombineSM _ [] = return ([], emptyUDs) +mapAndCombineSM f (x:xs) = do (y, uds1) <- f x + (ys, uds2) <- mapAndCombineSM f xs + return (y:ys, uds1 `plusUDs` uds2) + +extendTvSubstList :: SpecEnv -> [(TyVar,Type)] -> SpecEnv +extendTvSubstList env tv_binds + = env { se_subst = GHC.Core.Subst.extendTvSubstList (se_subst env) tv_binds } + +substTy :: SpecEnv -> Type -> Type +substTy env ty = GHC.Core.Subst.substTy (se_subst env) ty + +substCo :: SpecEnv -> Coercion -> Coercion +substCo env co = GHC.Core.Subst.substCo (se_subst env) co + +substBndr :: SpecEnv -> CoreBndr -> (SpecEnv, CoreBndr) +substBndr env bs = case GHC.Core.Subst.substBndr (se_subst env) bs of + (subst', bs') -> (env { se_subst = subst' }, bs') + +substBndrs :: SpecEnv -> [CoreBndr] -> (SpecEnv, [CoreBndr]) +substBndrs env bs = case GHC.Core.Subst.substBndrs (se_subst env) bs of + (subst', bs') -> (env { se_subst = subst' }, bs') + +cloneBindSM :: SpecEnv -> CoreBind -> SpecM (SpecEnv, SpecEnv, CoreBind) +-- Clone the binders of the bind; return new bind with the cloned binders +-- Return the substitution to use for RHSs, and the one to use for the body +cloneBindSM env@(SE { se_subst = subst, se_interesting = interesting }) (NonRec bndr rhs) + = do { us <- getUniqueSupplyM + ; let (subst', bndr') = GHC.Core.Subst.cloneIdBndr subst us bndr + interesting' | interestingDict env rhs + = interesting `extendVarSet` bndr' + | otherwise = interesting + ; return (env, env { se_subst = subst', se_interesting = interesting' } + , NonRec bndr' rhs) } + +cloneBindSM env@(SE { se_subst = subst, se_interesting = interesting }) (Rec pairs) + = do { us <- getUniqueSupplyM + ; let (subst', bndrs') = GHC.Core.Subst.cloneRecIdBndrs subst us (map fst pairs) + env' = env { se_subst = subst' + , se_interesting = interesting `extendVarSetList` + [ v | (v,r) <- pairs, interestingDict env r ] } + ; return (env', env', Rec (bndrs' `zip` map snd pairs)) } + +newDictBndr :: SpecEnv -> CoreBndr -> SpecM CoreBndr +-- Make up completely fresh binders for the dictionaries +-- Their bindings are going to float outwards +newDictBndr env b = do { uniq <- getUniqueM + ; let n = idName b + ty' = substTy env (idType b) + ; return (mkUserLocal (nameOccName n) uniq ty' (getSrcSpan n)) } + +newSpecIdSM :: Id -> Type -> Maybe JoinArity -> SpecM Id + -- Give the new Id a similar occurrence name to the old one +newSpecIdSM old_id new_ty join_arity_maybe + = do { uniq <- getUniqueM + ; let name = idName old_id + new_occ = mkSpecOcc (nameOccName name) + new_id = mkUserLocal new_occ uniq new_ty (getSrcSpan name) + `asJoinId_maybe` join_arity_maybe + ; return new_id } + +{- + Old (but interesting) stuff about unboxed bindings + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +What should we do when a value is specialised to a *strict* unboxed value? + + map_*_* f (x:xs) = let h = f x + t = map f xs + in h:t + +Could convert let to case: + + map_*_Int# f (x:xs) = case f x of h# -> + let t = map f xs + in h#:t + +This may be undesirable since it forces evaluation here, but the value +may not be used in all branches of the body. In the general case this +transformation is impossible since the mutual recursion in a letrec +cannot be expressed as a case. + +There is also a problem with top-level unboxed values, since our +implementation cannot handle unboxed values at the top level. + +Solution: Lift the binding of the unboxed value and extract it when it +is used: + + map_*_Int# f (x:xs) = let h = case (f x) of h# -> _Lift h# + t = map f xs + in case h of + _Lift h# -> h#:t + +Now give it to the simplifier and the _Lifting will be optimised away. + +The benefit is that we have given the specialised "unboxed" values a +very simple lifted semantics and then leave it up to the simplifier to +optimise it --- knowing that the overheads will be removed in nearly +all cases. + +In particular, the value will only be evaluated in the branches of the +program which use it, rather than being forced at the point where the +value is bound. For example: + + filtermap_*_* p f (x:xs) + = let h = f x + t = ... + in case p x of + True -> h:t + False -> t + ==> + filtermap_*_Int# p f (x:xs) + = let h = case (f x) of h# -> _Lift h# + t = ... + in case p x of + True -> case h of _Lift h# + -> h#:t + False -> t + +The binding for h can still be inlined in the one branch and the +_Lifting eliminated. + + +Question: When won't the _Lifting be eliminated? + +Answer: When they at the top-level (where it is necessary) or when +inlining would duplicate work (or possibly code depending on +options). However, the _Lifting will still be eliminated if the +strictness analyser deems the lifted binding strict. +-} diff --git a/compiler/GHC/Core/Op/StaticArgs.hs b/compiler/GHC/Core/Op/StaticArgs.hs new file mode 100644 index 0000000000..e550fabfd9 --- /dev/null +++ b/compiler/GHC/Core/Op/StaticArgs.hs @@ -0,0 +1,433 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1992-1998 + + +************************************************************************ + + Static Argument Transformation pass + +************************************************************************ + +May be seen as removing invariants from loops: +Arguments of recursive functions that do not change in recursive +calls are removed from the recursion, which is done locally +and only passes the arguments which effectively change. + +Example: +map = /\ ab -> \f -> \xs -> case xs of + [] -> [] + (a:b) -> f a : map f b + +as map is recursively called with the same argument f (unmodified) +we transform it to + +map = /\ ab -> \f -> \xs -> let map' ys = case ys of + [] -> [] + (a:b) -> f a : map' b + in map' xs + +Notice that for a compiler that uses lambda lifting this is +useless as map' will be transformed back to what map was. + +We could possibly do the same for big lambdas, but we don't as +they will eventually be removed in later stages of the compiler, +therefore there is no penalty in keeping them. + +We only apply the SAT when the number of static args is > 2. This +produces few bad cases. See + should_transform +in saTransform. + +Here are the headline nofib results: + Size Allocs Runtime +Min +0.0% -13.7% -21.4% +Max +0.1% +0.0% +5.4% +Geometric Mean +0.0% -0.2% -6.9% + +The previous patch, to fix polymorphic floatout demand signatures, is +essential to make this work well! +-} + +{-# LANGUAGE CPP #-} +module GHC.Core.Op.StaticArgs ( doStaticArgs ) where + +import GhcPrelude + +import Var +import GHC.Core +import GHC.Core.Utils +import GHC.Core.Type +import GHC.Core.Coercion +import Id +import Name +import VarEnv +import UniqSupply +import Util +import UniqFM +import VarSet +import Unique +import UniqSet +import Outputable + +import Data.List (mapAccumL) +import FastString + +#include "HsVersions.h" + +doStaticArgs :: UniqSupply -> CoreProgram -> CoreProgram +doStaticArgs us binds = snd $ mapAccumL sat_bind_threaded_us us binds + where + sat_bind_threaded_us us bind = + let (us1, us2) = splitUniqSupply us + in (us1, fst $ runSAT us2 (satBind bind emptyUniqSet)) + +-- We don't bother to SAT recursive groups since it can lead +-- to massive code expansion: see Andre Santos' thesis for details. +-- This means we only apply the actual SAT to Rec groups of one element, +-- but we want to recurse into the others anyway to discover other binds +satBind :: CoreBind -> IdSet -> SatM (CoreBind, IdSATInfo) +satBind (NonRec binder expr) interesting_ids = do + (expr', sat_info_expr, expr_app) <- satExpr expr interesting_ids + return (NonRec binder expr', finalizeApp expr_app sat_info_expr) +satBind (Rec [(binder, rhs)]) interesting_ids = do + let interesting_ids' = interesting_ids `addOneToUniqSet` binder + (rhs_binders, rhs_body) = collectBinders rhs + (rhs_body', sat_info_rhs_body) <- satTopLevelExpr rhs_body interesting_ids' + let sat_info_rhs_from_args = unitVarEnv binder (bindersToSATInfo rhs_binders) + sat_info_rhs' = mergeIdSATInfo sat_info_rhs_from_args sat_info_rhs_body + + shadowing = binder `elementOfUniqSet` interesting_ids + sat_info_rhs'' = if shadowing + then sat_info_rhs' `delFromUFM` binder -- For safety + else sat_info_rhs' + + bind' <- saTransformMaybe binder (lookupUFM sat_info_rhs' binder) + rhs_binders rhs_body' + return (bind', sat_info_rhs'') +satBind (Rec pairs) interesting_ids = do + let (binders, rhss) = unzip pairs + rhss_SATed <- mapM (\e -> satTopLevelExpr e interesting_ids) rhss + let (rhss', sat_info_rhss') = unzip rhss_SATed + return (Rec (zipEqual "satBind" binders rhss'), mergeIdSATInfos sat_info_rhss') + +data App = VarApp Id | TypeApp Type | CoApp Coercion +data Staticness a = Static a | NotStatic + +type IdAppInfo = (Id, SATInfo) + +type SATInfo = [Staticness App] +type IdSATInfo = IdEnv SATInfo +emptyIdSATInfo :: IdSATInfo +emptyIdSATInfo = emptyUFM + +{- +pprIdSATInfo id_sat_info = vcat (map pprIdAndSATInfo (Map.toList id_sat_info)) + where pprIdAndSATInfo (v, sat_info) = hang (ppr v <> colon) 4 (pprSATInfo sat_info) +-} + +pprSATInfo :: SATInfo -> SDoc +pprSATInfo staticness = hcat $ map pprStaticness staticness + +pprStaticness :: Staticness App -> SDoc +pprStaticness (Static (VarApp _)) = text "SV" +pprStaticness (Static (TypeApp _)) = text "ST" +pprStaticness (Static (CoApp _)) = text "SC" +pprStaticness NotStatic = text "NS" + + +mergeSATInfo :: SATInfo -> SATInfo -> SATInfo +mergeSATInfo l r = zipWith mergeSA l r + where + mergeSA NotStatic _ = NotStatic + mergeSA _ NotStatic = NotStatic + mergeSA (Static (VarApp v)) (Static (VarApp v')) + | v == v' = Static (VarApp v) + | otherwise = NotStatic + mergeSA (Static (TypeApp t)) (Static (TypeApp t')) + | t `eqType` t' = Static (TypeApp t) + | otherwise = NotStatic + mergeSA (Static (CoApp c)) (Static (CoApp c')) + | c `eqCoercion` c' = Static (CoApp c) + | otherwise = NotStatic + mergeSA _ _ = pprPanic "mergeSATInfo" $ + text "Left:" + <> pprSATInfo l <> text ", " + <> text "Right:" + <> pprSATInfo r + +mergeIdSATInfo :: IdSATInfo -> IdSATInfo -> IdSATInfo +mergeIdSATInfo = plusUFM_C mergeSATInfo + +mergeIdSATInfos :: [IdSATInfo] -> IdSATInfo +mergeIdSATInfos = foldl' mergeIdSATInfo emptyIdSATInfo + +bindersToSATInfo :: [Id] -> SATInfo +bindersToSATInfo vs = map (Static . binderToApp) vs + where binderToApp v | isId v = VarApp v + | isTyVar v = TypeApp $ mkTyVarTy v + | otherwise = CoApp $ mkCoVarCo v + +finalizeApp :: Maybe IdAppInfo -> IdSATInfo -> IdSATInfo +finalizeApp Nothing id_sat_info = id_sat_info +finalizeApp (Just (v, sat_info')) id_sat_info = + let sat_info'' = case lookupUFM id_sat_info v of + Nothing -> sat_info' + Just sat_info -> mergeSATInfo sat_info sat_info' + in extendVarEnv id_sat_info v sat_info'' + +satTopLevelExpr :: CoreExpr -> IdSet -> SatM (CoreExpr, IdSATInfo) +satTopLevelExpr expr interesting_ids = do + (expr', sat_info_expr, expr_app) <- satExpr expr interesting_ids + return (expr', finalizeApp expr_app sat_info_expr) + +satExpr :: CoreExpr -> IdSet -> SatM (CoreExpr, IdSATInfo, Maybe IdAppInfo) +satExpr var@(Var v) interesting_ids = do + let app_info = if v `elementOfUniqSet` interesting_ids + then Just (v, []) + else Nothing + return (var, emptyIdSATInfo, app_info) + +satExpr lit@(Lit _) _ = do + return (lit, emptyIdSATInfo, Nothing) + +satExpr (Lam binders body) interesting_ids = do + (body', sat_info, this_app) <- satExpr body interesting_ids + return (Lam binders body', finalizeApp this_app sat_info, Nothing) + +satExpr (App fn arg) interesting_ids = do + (fn', sat_info_fn, fn_app) <- satExpr fn interesting_ids + let satRemainder = boring fn' sat_info_fn + case fn_app of + Nothing -> satRemainder Nothing + Just (fn_id, fn_app_info) -> + -- TODO: remove this use of append somehow (use a data structure with O(1) append but a left-to-right kind of interface) + let satRemainderWithStaticness arg_staticness = satRemainder $ Just (fn_id, fn_app_info ++ [arg_staticness]) + in case arg of + Type t -> satRemainderWithStaticness $ Static (TypeApp t) + Coercion c -> satRemainderWithStaticness $ Static (CoApp c) + Var v -> satRemainderWithStaticness $ Static (VarApp v) + _ -> satRemainderWithStaticness $ NotStatic + where + boring :: CoreExpr -> IdSATInfo -> Maybe IdAppInfo -> SatM (CoreExpr, IdSATInfo, Maybe IdAppInfo) + boring fn' sat_info_fn app_info = + do (arg', sat_info_arg, arg_app) <- satExpr arg interesting_ids + let sat_info_arg' = finalizeApp arg_app sat_info_arg + sat_info = mergeIdSATInfo sat_info_fn sat_info_arg' + return (App fn' arg', sat_info, app_info) + +satExpr (Case expr bndr ty alts) interesting_ids = do + (expr', sat_info_expr, expr_app) <- satExpr expr interesting_ids + let sat_info_expr' = finalizeApp expr_app sat_info_expr + + zipped_alts' <- mapM satAlt alts + let (alts', sat_infos_alts) = unzip zipped_alts' + return (Case expr' bndr ty alts', mergeIdSATInfo sat_info_expr' (mergeIdSATInfos sat_infos_alts), Nothing) + where + satAlt (con, bndrs, expr) = do + (expr', sat_info_expr) <- satTopLevelExpr expr interesting_ids + return ((con, bndrs, expr'), sat_info_expr) + +satExpr (Let bind body) interesting_ids = do + (body', sat_info_body, body_app) <- satExpr body interesting_ids + (bind', sat_info_bind) <- satBind bind interesting_ids + return (Let bind' body', mergeIdSATInfo sat_info_body sat_info_bind, body_app) + +satExpr (Tick tickish expr) interesting_ids = do + (expr', sat_info_expr, expr_app) <- satExpr expr interesting_ids + return (Tick tickish expr', sat_info_expr, expr_app) + +satExpr ty@(Type _) _ = do + return (ty, emptyIdSATInfo, Nothing) + +satExpr co@(Coercion _) _ = do + return (co, emptyIdSATInfo, Nothing) + +satExpr (Cast expr coercion) interesting_ids = do + (expr', sat_info_expr, expr_app) <- satExpr expr interesting_ids + return (Cast expr' coercion, sat_info_expr, expr_app) + +{- +************************************************************************ + + Static Argument Transformation Monad + +************************************************************************ +-} + +type SatM result = UniqSM result + +runSAT :: UniqSupply -> SatM a -> a +runSAT = initUs_ + +newUnique :: SatM Unique +newUnique = getUniqueM + +{- +************************************************************************ + + Static Argument Transformation Monad + +************************************************************************ + +To do the transformation, the game plan is to: + +1. Create a small nonrecursive RHS that takes the + original arguments to the function but discards + the ones that are static and makes a call to the + SATed version with the remainder. We intend that + this will be inlined later, removing the overhead + +2. Bind this nonrecursive RHS over the original body + WITH THE SAME UNIQUE as the original body so that + any recursive calls to the original now go via + the small wrapper + +3. Rebind the original function to a new one which contains + our SATed function and just makes a call to it: + we call the thing making this call the local body + +Example: transform this + + map :: forall a b. (a->b) -> [a] -> [b] + map = /\ab. \(f:a->b) (as:[a]) -> body[map] +to + map :: forall a b. (a->b) -> [a] -> [b] + map = /\ab. \(f:a->b) (as:[a]) -> + letrec map' :: [a] -> [b] + -- The "worker function + map' = \(as:[a]) -> + let map :: forall a' b'. (a -> b) -> [a] -> [b] + -- The "shadow function + map = /\a'b'. \(f':(a->b) (as:[a]). + map' as + in body[map] + in map' as + +Note [Shadow binding] +~~~~~~~~~~~~~~~~~~~~~ +The calls to the inner map inside body[map] should get inlined +by the local re-binding of 'map'. We call this the "shadow binding". + +But we can't use the original binder 'map' unchanged, because +it might be exported, in which case the shadow binding won't be +discarded as dead code after it is inlined. + +So we use a hack: we make a new SysLocal binder with the *same* unique +as binder. (Another alternative would be to reset the export flag.) + +Note [Binder type capture] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +Notice that in the inner map (the "shadow function"), the static arguments +are discarded -- it's as if they were underscores. Instead, mentions +of these arguments (notably in the types of dynamic arguments) are bound +by the *outer* lambdas of the main function. So we must make up fresh +names for the static arguments so that they do not capture variables +mentioned in the types of dynamic args. + +In the map example, the shadow function must clone the static type +argument a,b, giving a',b', to ensure that in the \(as:[a]), the 'a' +is bound by the outer forall. We clone f' too for consistency, but +that doesn't matter either way because static Id arguments aren't +mentioned in the shadow binding at all. + +If we don't we get something like this: + +[Exported] +[Arity 3] +GHC.Base.until = + \ (@ a_aiK) + (p_a6T :: a_aiK -> GHC.Types.Bool) + (f_a6V :: a_aiK -> a_aiK) + (x_a6X :: a_aiK) -> + letrec { + sat_worker_s1aU :: a_aiK -> a_aiK + [] + sat_worker_s1aU = + \ (x_a6X :: a_aiK) -> + let { + sat_shadow_r17 :: forall a_a3O. + (a_a3O -> GHC.Types.Bool) -> (a_a3O -> a_a3O) -> a_a3O -> a_a3O + [] + sat_shadow_r17 = + \ (@ a_aiK) + (p_a6T :: a_aiK -> GHC.Types.Bool) + (f_a6V :: a_aiK -> a_aiK) + (x_a6X :: a_aiK) -> + sat_worker_s1aU x_a6X } in + case p_a6T x_a6X of wild_X3y [ALWAYS Dead Nothing] { + GHC.Types.False -> GHC.Base.until @ a_aiK p_a6T f_a6V (f_a6V x_a6X); + GHC.Types.True -> x_a6X + }; } in + sat_worker_s1aU x_a6X + +Where sat_shadow has captured the type variables of x_a6X etc as it has a a_aiK +type argument. This is bad because it means the application sat_worker_s1aU x_a6X +is not well typed. +-} + +saTransformMaybe :: Id -> Maybe SATInfo -> [Id] -> CoreExpr -> SatM CoreBind +saTransformMaybe binder maybe_arg_staticness rhs_binders rhs_body + | Just arg_staticness <- maybe_arg_staticness + , should_transform arg_staticness + = saTransform binder arg_staticness rhs_binders rhs_body + | otherwise + = return (Rec [(binder, mkLams rhs_binders rhs_body)]) + where + should_transform staticness = n_static_args > 1 -- THIS IS THE DECISION POINT + where + n_static_args = count isStaticValue staticness + +saTransform :: Id -> SATInfo -> [Id] -> CoreExpr -> SatM CoreBind +saTransform binder arg_staticness rhs_binders rhs_body + = do { shadow_lam_bndrs <- mapM clone binders_w_staticness + ; uniq <- newUnique + ; return (NonRec binder (mk_new_rhs uniq shadow_lam_bndrs)) } + where + -- Running example: foldr + -- foldr \alpha \beta c n xs = e, for some e + -- arg_staticness = [Static TypeApp, Static TypeApp, Static VarApp, Static VarApp, NonStatic] + -- rhs_binders = [\alpha, \beta, c, n, xs] + -- rhs_body = e + + binders_w_staticness = rhs_binders `zip` (arg_staticness ++ repeat NotStatic) + -- Any extra args are assumed NotStatic + + non_static_args :: [Var] + -- non_static_args = [xs] + -- rhs_binders_without_type_capture = [\alpha', \beta', c, n, xs] + non_static_args = [v | (v, NotStatic) <- binders_w_staticness] + + clone (bndr, NotStatic) = return bndr + clone (bndr, _ ) = do { uniq <- newUnique + ; return (setVarUnique bndr uniq) } + + -- new_rhs = \alpha beta c n xs -> + -- let sat_worker = \xs -> let sat_shadow = \alpha' beta' c n xs -> + -- sat_worker xs + -- in e + -- in sat_worker xs + mk_new_rhs uniq shadow_lam_bndrs + = mkLams rhs_binders $ + Let (Rec [(rec_body_bndr, rec_body)]) + local_body + where + local_body = mkVarApps (Var rec_body_bndr) non_static_args + + rec_body = mkLams non_static_args $ + Let (NonRec shadow_bndr shadow_rhs) rhs_body + + -- See Note [Binder type capture] + shadow_rhs = mkLams shadow_lam_bndrs local_body + -- nonrec_rhs = \alpha' beta' c n xs -> sat_worker xs + + rec_body_bndr = mkSysLocal (fsLit "sat_worker") uniq (exprType rec_body) + -- rec_body_bndr = sat_worker + + -- See Note [Shadow binding]; make a SysLocal + shadow_bndr = mkSysLocal (occNameFS (getOccName binder)) + (idUnique binder) + (exprType shadow_rhs) + +isStaticValue :: Staticness App -> Bool +isStaticValue (Static (VarApp _)) = True +isStaticValue _ = False diff --git a/compiler/GHC/Core/Op/Tidy.hs b/compiler/GHC/Core/Op/Tidy.hs index 60db2c8fea..758c1daf6c 100644 --- a/compiler/GHC/Core/Op/Tidy.hs +++ b/compiler/GHC/Core/Op/Tidy.hs @@ -191,7 +191,7 @@ tidyLetBndr rec_tidy_env env@(tidy_env, var_env) id -- Similarly for the demand info - on a let binder, this tells -- CorePrep to turn the let into a case. -- But: Remove the usage demand here - -- (See Note [Zapping DmdEnv after Demand Analyzer] in WorkWrap) + -- (See Note [Zapping DmdEnv after Demand Analyzer] in GHC.Core.Op.WorkWrap) -- -- Similarly arity info for eta expansion in CorePrep -- Don't attempt to recompute arity here; this is just tidying! diff --git a/compiler/GHC/Core/Op/WorkWrap.hs b/compiler/GHC/Core/Op/WorkWrap.hs new file mode 100644 index 0000000000..241a295899 --- /dev/null +++ b/compiler/GHC/Core/Op/WorkWrap.hs @@ -0,0 +1,776 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1993-1998 + +\section[WorkWrap]{Worker/wrapper-generating back-end of strictness analyser} +-} + +{-# LANGUAGE CPP #-} +module GHC.Core.Op.WorkWrap ( wwTopBinds ) where + +import GhcPrelude + +import GHC.Core.Arity ( manifestArity ) +import GHC.Core +import GHC.Core.Unfold ( certainlyWillInline, mkWwInlineRule, mkWorkerUnfolding ) +import GHC.Core.Utils ( exprType, exprIsHNF ) +import GHC.Core.FVs ( exprFreeVars ) +import Var +import Id +import IdInfo +import GHC.Core.Type +import UniqSupply +import BasicTypes +import GHC.Driver.Session +import Demand +import Cpr +import GHC.Core.Op.WorkWrap.Lib +import Util +import Outputable +import GHC.Core.FamInstEnv +import MonadUtils + +#include "HsVersions.h" + +{- +We take Core bindings whose binders have: + +\begin{enumerate} + +\item Strictness attached (by the front-end of the strictness +analyser), and / or + +\item Constructed Product Result information attached by the CPR +analysis pass. + +\end{enumerate} + +and we return some ``plain'' bindings which have been +worker/wrapper-ified, meaning: + +\begin{enumerate} + +\item Functions have been split into workers and wrappers where +appropriate. If a function has both strictness and CPR properties +then only one worker/wrapper doing both transformations is produced; + +\item Binders' @IdInfos@ have been updated to reflect the existence of +these workers/wrappers (this is where we get STRICTNESS and CPR pragma +info for exported values). +\end{enumerate} +-} + +wwTopBinds :: DynFlags -> FamInstEnvs -> UniqSupply -> CoreProgram -> CoreProgram + +wwTopBinds dflags fam_envs us top_binds + = initUs_ us $ do + top_binds' <- mapM (wwBind dflags fam_envs) top_binds + return (concat top_binds') + +{- +************************************************************************ +* * +\subsection[wwBind-wwExpr]{@wwBind@ and @wwExpr@} +* * +************************************************************************ + +@wwBind@ works on a binding, trying each \tr{(binder, expr)} pair in +turn. Non-recursive case first, then recursive... +-} + +wwBind :: DynFlags + -> FamInstEnvs + -> CoreBind + -> UniqSM [CoreBind] -- returns a WwBinding intermediate form; + -- the caller will convert to Expr/Binding, + -- as appropriate. + +wwBind dflags fam_envs (NonRec binder rhs) = do + new_rhs <- wwExpr dflags fam_envs rhs + new_pairs <- tryWW dflags fam_envs NonRecursive binder new_rhs + return [NonRec b e | (b,e) <- new_pairs] + -- Generated bindings must be non-recursive + -- because the original binding was. + +wwBind dflags fam_envs (Rec pairs) + = return . Rec <$> concatMapM do_one pairs + where + do_one (binder, rhs) = do new_rhs <- wwExpr dflags fam_envs rhs + tryWW dflags fam_envs Recursive binder new_rhs + +{- +@wwExpr@ basically just walks the tree, looking for appropriate +annotations that can be used. Remember it is @wwBind@ that does the +matching by looking for strict arguments of the correct type. +@wwExpr@ is a version that just returns the ``Plain'' Tree. +-} + +wwExpr :: DynFlags -> FamInstEnvs -> CoreExpr -> UniqSM CoreExpr + +wwExpr _ _ e@(Type {}) = return e +wwExpr _ _ e@(Coercion {}) = return e +wwExpr _ _ e@(Lit {}) = return e +wwExpr _ _ e@(Var {}) = return e + +wwExpr dflags fam_envs (Lam binder expr) + = Lam new_binder <$> wwExpr dflags fam_envs expr + where new_binder | isId binder = zapIdUsedOnceInfo binder + | otherwise = binder + -- See Note [Zapping Used Once info in WorkWrap] + +wwExpr dflags fam_envs (App f a) + = App <$> wwExpr dflags fam_envs f <*> wwExpr dflags fam_envs a + +wwExpr dflags fam_envs (Tick note expr) + = Tick note <$> wwExpr dflags fam_envs expr + +wwExpr dflags fam_envs (Cast expr co) = do + new_expr <- wwExpr dflags fam_envs expr + return (Cast new_expr co) + +wwExpr dflags fam_envs (Let bind expr) + = mkLets <$> wwBind dflags fam_envs bind <*> wwExpr dflags fam_envs expr + +wwExpr dflags fam_envs (Case expr binder ty alts) = do + new_expr <- wwExpr dflags fam_envs expr + new_alts <- mapM ww_alt alts + let new_binder = zapIdUsedOnceInfo binder + -- See Note [Zapping Used Once info in WorkWrap] + return (Case new_expr new_binder ty new_alts) + where + ww_alt (con, binders, rhs) = do + new_rhs <- wwExpr dflags fam_envs rhs + let new_binders = [ if isId b then zapIdUsedOnceInfo b else b + | b <- binders ] + -- See Note [Zapping Used Once info in WorkWrap] + return (con, new_binders, new_rhs) + +{- +************************************************************************ +* * +\subsection[tryWW]{@tryWW@: attempt a worker/wrapper pair} +* * +************************************************************************ + +@tryWW@ just accumulates arguments, converts strictness info from the +front-end into the proper form, then calls @mkWwBodies@ to do +the business. + +The only reason this is monadised is for the unique supply. + +Note [Don't w/w INLINE things] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +It's very important to refrain from w/w-ing an INLINE function (ie one +with a stable unfolding) because the wrapper will then overwrite the +old stable unfolding with the wrapper code. + +Furthermore, if the programmer has marked something as INLINE, +we may lose by w/w'ing it. + +If the strictness analyser is run twice, this test also prevents +wrappers (which are INLINEd) from being re-done. (You can end up with +several liked-named Ids bouncing around at the same time---absolute +mischief.) + +Notice that we refrain from w/w'ing an INLINE function even if it is +in a recursive group. It might not be the loop breaker. (We could +test for loop-breaker-hood, but I'm not sure that ever matters.) + +Note [Worker-wrapper for INLINABLE functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have + {-# INLINABLE f #-} + f :: Ord a => [a] -> Int -> a + f x y = ....f.... + +where f is strict in y, we might get a more efficient loop by w/w'ing +f. But that would make a new unfolding which would overwrite the old +one! So the function would no longer be INLNABLE, and in particular +will not be specialised at call sites in other modules. + +This comes in practice (#6056). + +Solution: do the w/w for strictness analysis, but transfer the Stable +unfolding to the *worker*. So we will get something like this: + + {-# INLINE[0] f #-} + f :: Ord a => [a] -> Int -> a + f d x y = case y of I# y' -> fw d x y' + + {-# INLINABLE[0] fw #-} + fw :: Ord a => [a] -> Int# -> a + fw d x y' = let y = I# y' in ...f... + +How do we "transfer the unfolding"? Easy: by using the old one, wrapped +in work_fn! See GHC.Core.Unfold.mkWorkerUnfolding. + +Note [Worker-wrapper for NOINLINE functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We used to disable worker/wrapper for NOINLINE things, but it turns out +this can cause unnecessary reboxing of values. Consider + + {-# NOINLINE f #-} + f :: Int -> a + f x = error (show x) + + g :: Bool -> Bool -> Int -> Int + g True True p = f p + g False True p = p + 1 + g b False p = g b True p + +the strictness analysis will discover f and g are strict, but because f +has no wrapper, the worker for g will rebox p. So we get + + $wg x y p# = + let p = I# p# in -- Yikes! Reboxing! + case x of + False -> + case y of + False -> $wg False True p# + True -> +# p# 1# + True -> + case y of + False -> $wg True True p# + True -> case f p of { } + + g x y p = case p of (I# p#) -> $wg x y p# + +Now, in this case the reboxing will float into the True branch, and so +the allocation will only happen on the error path. But it won't float +inwards if there are multiple branches that call (f p), so the reboxing +will happen on every call of g. Disaster. + +Solution: do worker/wrapper even on NOINLINE things; but move the +NOINLINE pragma to the worker. + +(See #13143 for a real-world example.) + +It is crucial that we do this for *all* NOINLINE functions. #10069 +demonstrates what happens when we promise to w/w a (NOINLINE) leaf function, but +fail to deliver: + + data C = C Int# Int# + + {-# NOINLINE c1 #-} + c1 :: C -> Int# + c1 (C _ n) = n + + {-# NOINLINE fc #-} + fc :: C -> Int# + fc c = 2 *# c1 c + +Failing to w/w `c1`, but still w/wing `fc` leads to the following code: + + c1 :: C -> Int# + c1 (C _ n) = n + + $wfc :: Int# -> Int# + $wfc n = let c = C 0# n in 2 #* c1 c + + fc :: C -> Int# + fc (C _ n) = $wfc n + +Yikes! The reboxed `C` in `$wfc` can't cancel out, so we are in a bad place. +This generalises to any function that derives its strictness signature from +its callees, so we have to make sure that when a function announces particular +strictness properties, we have to w/w them accordingly, even if it means +splitting a NOINLINE function. + +Note [Worker activation] +~~~~~~~~~~~~~~~~~~~~~~~~ +Follows on from Note [Worker-wrapper for INLINABLE functions] + +It is *vital* that if the worker gets an INLINABLE pragma (from the +original function), then the worker has the same phase activation as +the wrapper (or later). That is necessary to allow the wrapper to +inline into the worker's unfolding: see GHC.Core.Op.Simplify.Utils +Note [Simplifying inside stable unfoldings]. + +If the original is NOINLINE, it's important that the work inherit the +original activation. Consider + + {-# NOINLINE expensive #-} + expensive x = x + 1 + + f y = let z = expensive y in ... + +If expensive's worker inherits the wrapper's activation, +we'll get this (because of the compromise in point (2) of +Note [Wrapper activation]) + + {-# NOINLINE[0] $wexpensive #-} + $wexpensive x = x + 1 + {-# INLINE[0] expensive #-} + expensive x = $wexpensive x + + f y = let z = expensive y in ... + +and $wexpensive will be immediately inlined into expensive, followed by +expensive into f. This effectively removes the original NOINLINE! + +Otherwise, nothing is lost by giving the worker the same activation as the +wrapper, because the worker won't have any chance of inlining until the +wrapper does; there's no point in giving it an earlier activation. + +Note [Don't w/w inline small non-loop-breaker things] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In general, we refrain from w/w-ing *small* functions, which are not +loop breakers, because they'll inline anyway. But we must take care: +it may look small now, but get to be big later after other inlining +has happened. So we take the precaution of adding an INLINE pragma to +any such functions. + +I made this change when I observed a big function at the end of +compilation with a useful strictness signature but no w-w. (It was +small during demand analysis, we refrained from w/w, and then got big +when something was inlined in its rhs.) When I measured it on nofib, +it didn't make much difference; just a few percent improved allocation +on one benchmark (bspt/Euclid.space). But nothing got worse. + +There is an infelicity though. We may get something like + f = g val +==> + g x = case gw x of r -> I# r + + f {- InlineStable, Template = g val -} + f = case gw x of r -> I# r + +The code for f duplicates that for g, without any real benefit. It +won't really be executed, because calls to f will go via the inlining. + +Note [Don't w/w join points for CPR] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There's no point in exploiting CPR info on a join point. If the whole function +is getting CPR'd, then the case expression around the worker function will get +pushed into the join point by the simplifier, which will have the same effect +that w/w'ing for CPR would have - the result will be returned in an unboxed +tuple. + + f z = let join j x y = (x+1, y+1) + in case z of A -> j 1 2 + B -> j 2 3 + + => + + f z = case $wf z of (# a, b #) -> (a, b) + $wf z = case (let join j x y = (x+1, y+1) + in case z of A -> j 1 2 + B -> j 2 3) of (a, b) -> (# a, b #) + + => + + f z = case $wf z of (# a, b #) -> (a, b) + $wf z = let join j x y = (# x+1, y+1 #) + in case z of A -> j 1 2 + B -> j 2 3 + +Note that we still want to give @j@ the CPR property, so that @f@ has it. So +CPR *analyse* join points as regular functions, but don't *transform* them. + +Doing W/W for returned products on a join point would be tricky anyway, as the +worker could not be a join point because it would not be tail-called. However, +doing the *argument* part of W/W still works for join points, since the wrapper +body will make a tail call: + + f z = let join j x y = x + y + in ... + + => + + f z = let join $wj x# y# = x# +# y# + j x y = case x of I# x# -> + case y of I# y# -> + $wj x# y# + in ... + +Note [Wrapper activation] +~~~~~~~~~~~~~~~~~~~~~~~~~ +When should the wrapper inlining be active? + +1. It must not be active earlier than the current Activation of the + Id + +2. It should be active at some point, despite (1) because of + Note [Worker-wrapper for NOINLINE functions] + +3. For ordinary functions with no pragmas we want to inline the + wrapper as early as possible (#15056). Suppose another module + defines f x = g x x + and suppose there is some RULE for (g True True). Then if we have + a call (f True), we'd expect to inline 'f' and the RULE will fire. + But if f is w/w'd (which it might be), we want the inlining to + occur just as if it hadn't been. + + (This only matters if f's RHS is big enough to w/w, but small + enough to inline given the call site, but that can happen.) + +4. We do not want to inline the wrapper before specialisation. + module Foo where + f :: Num a => a -> Int -> a + f n 0 = n -- Strict in the Int, hence wrapper + f n x = f (n+n) (x-1) + + g :: Int -> Int + g x = f x x -- Provokes a specialisation for f + + module Bar where + import Foo + + h :: Int -> Int + h x = f 3 x + + In module Bar we want to give specialisations a chance to fire + before inlining f's wrapper. + +Reminder: Note [Don't w/w INLINE things], so we don't need to worry + about INLINE things here. + +Conclusion: + - If the user said NOINLINE[n], respect that + - If the user said NOINLINE, inline the wrapper as late as + poss (phase 0). This is a compromise driven by (2) above + - Otherwise inline wrapper in phase 2. That allows the + 'gentle' simplification pass to apply specialisation rules + +Historical note: At one stage I tried making the wrapper inlining +always-active, and that had a very bad effect on nofib/imaginary/x2n1; +a wrapper was inlined before the specialisation fired. + +Note [Wrapper NoUserInline] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The use an inl_inline of NoUserInline on the wrapper distinguishes +this pragma from one that was given by the user. In particular, CSE +will not happen if there is a user-specified pragma, but should happen +for w/w’ed things (#14186). +-} + +tryWW :: DynFlags + -> FamInstEnvs + -> RecFlag + -> Id -- The fn binder + -> CoreExpr -- The bound rhs; its innards + -- are already ww'd + -> UniqSM [(Id, CoreExpr)] -- either *one* or *two* pairs; + -- if one, then no worker (only + -- the orig "wrapper" lives on); + -- if two, then a worker and a + -- wrapper. +tryWW dflags fam_envs is_rec fn_id rhs + -- See Note [Worker-wrapper for NOINLINE functions] + + | Just stable_unf <- certainlyWillInline dflags fn_info + = return [ (fn_id `setIdUnfolding` stable_unf, rhs) ] + -- See Note [Don't w/w INLINE things] + -- See Note [Don't w/w inline small non-loop-breaker things] + + | is_fun && is_eta_exp + = splitFun dflags fam_envs new_fn_id fn_info wrap_dmds div cpr rhs + + | is_thunk -- See Note [Thunk splitting] + = splitThunk dflags fam_envs is_rec new_fn_id rhs + + | otherwise + = return [ (new_fn_id, rhs) ] + + where + fn_info = idInfo fn_id + (wrap_dmds, div) = splitStrictSig (strictnessInfo fn_info) + + cpr_ty = getCprSig (cprInfo fn_info) + -- Arity of the CPR sig should match idArity when it's not a join point. + -- See Note [Arity trimming for CPR signatures] in GHC.Core.Op.CprAnal + cpr = ASSERT2( isJoinId fn_id || cpr_ty == topCprType || ct_arty cpr_ty == arityInfo fn_info + , ppr fn_id <> colon <+> text "ct_arty:" <+> int (ct_arty cpr_ty) <+> text "arityInfo:" <+> ppr (arityInfo fn_info)) + ct_cpr cpr_ty + + new_fn_id = zapIdUsedOnceInfo (zapIdUsageEnvInfo fn_id) + -- See Note [Zapping DmdEnv after Demand Analyzer] and + -- See Note [Zapping Used Once info WorkWrap] + + is_fun = notNull wrap_dmds || isJoinId fn_id + -- See Note [Don't eta expand in w/w] + is_eta_exp = length wrap_dmds == manifestArity rhs + is_thunk = not is_fun && not (exprIsHNF rhs) && not (isJoinId fn_id) + && not (isUnliftedType (idType fn_id)) + +{- +Note [Zapping DmdEnv after Demand Analyzer] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In the worker-wrapper pass we zap the DmdEnv. Why? + (a) it is never used again + (b) it wastes space + (c) it becomes incorrect as things are cloned, because + we don't push the substitution into it + +Why here? + * Because we don’t want to do it in the Demand Analyzer, as we never know + there when we are doing the last pass. + * We want them to be still there at the end of DmdAnal, so that + -ddump-str-anal contains them. + * We don’t want a second pass just for that. + * WorkWrap looks at all bindings anyway. + +We also need to do it in TidyCore.tidyLetBndr to clean up after the +final, worker/wrapper-less run of the demand analyser (see +Note [Final Demand Analyser run] in GHC.Core.Op.DmdAnal). + +Note [Zapping Used Once info in WorkWrap] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In the worker-wrapper pass we zap the used once info in demands and in +strictness signatures. + +Why? + * The simplifier may happen to transform code in a way that invalidates the + data (see #11731 for an example). + * It is not used in later passes, up to code generation. + +So as the data is useless and possibly wrong, we want to remove it. The most +convenient place to do that is the worker wrapper phase, as it runs after every +run of the demand analyser besides the very last one (which is the one where we +want to _keep_ the info for the code generator). + +We do not do it in the demand analyser for the same reasons outlined in +Note [Zapping DmdEnv after Demand Analyzer] above. + +Note [Don't eta expand in w/w] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A binding where the manifestArity of the RHS is less than idArity of the binder +means GHC.Core.Arity didn't eta expand that binding. When this happens, it does so +for a reason (see Note [exprArity invariant] in GHC.Core.Arity) and we probably have +a PAP, cast or trivial expression as RHS. + +Performing the worker/wrapper split will implicitly eta-expand the binding to +idArity, overriding GHC.Core.Arity's decision. Other than playing fast and loose with +divergence, it's also broken for newtypes: + + f = (\xy.blah) |> co + where + co :: (Int -> Int -> Char) ~ T + +Then idArity is 2 (despite the type T), and it can have a StrictSig based on a +threshold of 2. But we can't w/w it without a type error. + +The situation is less grave for PAPs, but the implicit eta expansion caused a +compiler allocation regression in T15164, where huge recursive instance method +groups, mostly consisting of PAPs, got w/w'd. This caused great churn in the +simplifier, when simply waiting for the PAPs to inline arrived at the same +output program. + +Note there is the worry here that such PAPs and trivial RHSs might not *always* +be inlined. That would lead to reboxing, because the analysis tacitly assumes +that we W/W'd for idArity and will propagate analysis information under that +assumption. So far, this doesn't seem to matter in practice. +See https://gitlab.haskell.org/ghc/ghc/merge_requests/312#note_192064. +-} + + +--------------------- +splitFun :: DynFlags -> FamInstEnvs -> Id -> IdInfo -> [Demand] -> Divergence -> CprResult -> CoreExpr + -> UniqSM [(Id, CoreExpr)] +splitFun dflags fam_envs fn_id fn_info wrap_dmds div cpr rhs + = WARN( not (wrap_dmds `lengthIs` arity), ppr fn_id <+> (ppr arity $$ ppr wrap_dmds $$ ppr cpr) ) do + -- The arity should match the signature + stuff <- mkWwBodies dflags fam_envs rhs_fvs fn_id wrap_dmds use_cpr_info + case stuff of + Just (work_demands, join_arity, wrap_fn, work_fn) -> do + work_uniq <- getUniqueM + let work_rhs = work_fn rhs + work_act = case fn_inline_spec of -- See Note [Worker activation] + NoInline -> fn_act + _ -> wrap_act + + work_prag = InlinePragma { inl_src = SourceText "{-# INLINE" + , inl_inline = fn_inline_spec + , inl_sat = Nothing + , inl_act = work_act + , inl_rule = FunLike } + -- inl_inline: copy from fn_id; see Note [Worker-wrapper for INLINABLE functions] + -- inl_act: see Note [Worker activation] + -- inl_rule: it does not make sense for workers to be constructorlike. + + work_join_arity | isJoinId fn_id = Just join_arity + | otherwise = Nothing + -- worker is join point iff wrapper is join point + -- (see Note [Don't w/w join points for CPR]) + + work_id = mkWorkerId work_uniq fn_id (exprType work_rhs) + `setIdOccInfo` occInfo fn_info + -- Copy over occurrence info from parent + -- Notably whether it's a loop breaker + -- Doesn't matter much, since we will simplify next, but + -- seems right-er to do so + + `setInlinePragma` work_prag + + `setIdUnfolding` mkWorkerUnfolding dflags work_fn fn_unfolding + -- See Note [Worker-wrapper for INLINABLE functions] + + `setIdStrictness` mkClosedStrictSig work_demands div + -- Even though we may not be at top level, + -- it's ok to give it an empty DmdEnv + + `setIdCprInfo` mkCprSig work_arity work_cpr_info + + `setIdDemandInfo` worker_demand + + `setIdArity` work_arity + -- Set the arity so that the Core Lint check that the + -- arity is consistent with the demand type goes + -- through + `asJoinId_maybe` work_join_arity + + work_arity = length work_demands + + -- See Note [Demand on the Worker] + single_call = saturatedByOneShots arity (demandInfo fn_info) + worker_demand | single_call = mkWorkerDemand work_arity + | otherwise = topDmd + + wrap_rhs = wrap_fn work_id + wrap_act = case fn_act of -- See Note [Wrapper activation] + ActiveAfter {} -> fn_act + NeverActive -> activeDuringFinal + _ -> activeAfterInitial + wrap_prag = InlinePragma { inl_src = SourceText "{-# INLINE" + , inl_inline = NoUserInline + , inl_sat = Nothing + , inl_act = wrap_act + , inl_rule = rule_match_info } + -- inl_act: see Note [Wrapper activation] + -- inl_inline: see Note [Wrapper NoUserInline] + -- inl_rule: RuleMatchInfo is (and must be) unaffected + + wrap_id = fn_id `setIdUnfolding` mkWwInlineRule dflags wrap_rhs arity + `setInlinePragma` wrap_prag + `setIdOccInfo` noOccInfo + -- Zap any loop-breaker-ness, to avoid bleating from Lint + -- about a loop breaker with an INLINE rule + + + + return $ [(work_id, work_rhs), (wrap_id, wrap_rhs)] + -- Worker first, because wrapper mentions it + + Nothing -> return [(fn_id, rhs)] + where + rhs_fvs = exprFreeVars rhs + fn_inl_prag = inlinePragInfo fn_info + fn_inline_spec = inl_inline fn_inl_prag + fn_act = inl_act fn_inl_prag + rule_match_info = inlinePragmaRuleMatchInfo fn_inl_prag + fn_unfolding = unfoldingInfo fn_info + arity = arityInfo fn_info + -- The arity is set by the simplifier using exprEtaExpandArity + -- So it may be more than the number of top-level-visible lambdas + + -- use_cpr_info is the CPR we w/w for. Note that we kill it for join points, + -- see Note [Don't w/w join points for CPR]. + use_cpr_info | isJoinId fn_id = topCpr + | otherwise = cpr + -- Even if we don't w/w join points for CPR, we might still do so for + -- strictness. In which case a join point worker keeps its original CPR + -- property; see Note [Don't w/w join points for CPR]. Otherwise, the worker + -- doesn't have the CPR property anymore. + work_cpr_info | isJoinId fn_id = cpr + | otherwise = topCpr + + +{- +Note [Demand on the worker] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the original function is called once, according to its demand info, then +so is the worker. This is important so that the occurrence analyser can +attach OneShot annotations to the worker’s lambda binders. + + +Example: + + -- Original function + f [Demand=<L,1*C1(U)>] :: (a,a) -> a + f = \p -> ... + + -- Wrapper + f [Demand=<L,1*C1(U)>] :: a -> a -> a + f = \p -> case p of (a,b) -> $wf a b + + -- Worker + $wf [Demand=<L,1*C1(C1(U))>] :: Int -> Int + $wf = \a b -> ... + +We need to check whether the original function is called once, with +sufficiently many arguments. This is done using saturatedByOneShots, which +takes the arity of the original function (resp. the wrapper) and the demand on +the original function. + +The demand on the worker is then calculated using mkWorkerDemand, and always of +the form [Demand=<L,1*(C1(...(C1(U))))>] + + +Note [Do not split void functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this rather common form of binding: + $j = \x:Void# -> ...no use of x... + +Since x is not used it'll be marked as absent. But there is no point +in w/w-ing because we'll simply add (\y:Void#), see GHC.Core.Op.WorkWrap.Lib.mkWorerArgs. + +If x has a more interesting type (eg Int, or Int#), there *is* a point +in w/w so that we don't pass the argument at all. + +Note [Thunk splitting] +~~~~~~~~~~~~~~~~~~~~~~ +Suppose x is used strictly (never mind whether it has the CPR +property). + + let + x* = x-rhs + in body + +splitThunk transforms like this: + + let + x* = case x-rhs of { I# a -> I# a } + in body + +Now simplifier will transform to + + case x-rhs of + I# a -> let x* = I# a + in body + +which is what we want. Now suppose x-rhs is itself a case: + + x-rhs = case e of { T -> I# a; F -> I# b } + +The join point will abstract over a, rather than over (which is +what would have happened before) which is fine. + +Notice that x certainly has the CPR property now! + +In fact, splitThunk uses the function argument w/w splitting +function, so that if x's demand is deeper (say U(U(L,L),L)) +then the splitting will go deeper too. +-} + +-- See Note [Thunk splitting] +-- splitThunk converts the *non-recursive* binding +-- x = e +-- into +-- x = let x = e +-- in case x of +-- I# y -> let x = I# y in x } +-- See comments above. Is it not beautifully short? +-- Moreover, it works just as well when there are +-- several binders, and if the binders are lifted +-- E.g. x = e +-- --> x = let x = e in +-- case x of (a,b) -> let x = (a,b) in x + +splitThunk :: DynFlags -> FamInstEnvs -> RecFlag -> Var -> Expr Var -> UniqSM [(Var, Expr Var)] +splitThunk dflags fam_envs is_rec fn_id rhs + = ASSERT(not (isJoinId fn_id)) + do { (useful,_, wrap_fn, work_fn) <- mkWWstr dflags fam_envs False [fn_id] + ; let res = [ (fn_id, Let (NonRec fn_id rhs) (wrap_fn (work_fn (Var fn_id)))) ] + ; if useful then ASSERT2( isNonRec is_rec, ppr fn_id ) -- The thunk must be non-recursive + return res + else return [(fn_id, rhs)] } diff --git a/compiler/GHC/Core/Op/WorkWrap/Lib.hs b/compiler/GHC/Core/Op/WorkWrap/Lib.hs new file mode 100644 index 0000000000..3ce454e7a2 --- /dev/null +++ b/compiler/GHC/Core/Op/WorkWrap/Lib.hs @@ -0,0 +1,1209 @@ +{- +(c) The GRASP/AQUA Project, Glasgow University, 1993-1998 + +A library for the ``worker\/wrapper'' back-end to the strictness analyser +-} + +{-# LANGUAGE CPP #-} + +module GHC.Core.Op.WorkWrap.Lib + ( mkWwBodies, mkWWstr, mkWorkerArgs + , deepSplitProductType_maybe, findTypeShape + , isWorkerSmallEnough + ) +where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Core +import GHC.Core.Utils ( exprType, mkCast, mkDefaultCase, mkSingleAltCase ) +import Id +import IdInfo ( JoinArity ) +import GHC.Core.DataCon +import Demand +import Cpr +import GHC.Core.Make ( mkAbsentErrorApp, mkCoreUbxTup + , mkCoreApp, mkCoreLet ) +import MkId ( voidArgId, voidPrimId ) +import TysWiredIn ( tupleDataCon ) +import TysPrim ( voidPrimTy ) +import Literal ( absentLiteralOf, rubbishLit ) +import VarEnv ( mkInScopeSet ) +import VarSet ( VarSet ) +import GHC.Core.Type +import GHC.Core.Predicate ( isClassPred ) +import GHC.Types.RepType ( isVoidTy, typePrimRep ) +import GHC.Core.Coercion +import GHC.Core.FamInstEnv +import BasicTypes ( Boxity(..) ) +import GHC.Core.TyCon +import UniqSupply +import Unique +import Maybes +import Util +import Outputable +import GHC.Driver.Session +import FastString +import ListSetOps + +{- +************************************************************************ +* * +\subsection[mkWrapperAndWorker]{@mkWrapperAndWorker@} +* * +************************************************************************ + +Here's an example. The original function is: + +\begin{verbatim} +g :: forall a . Int -> [a] -> a + +g = \/\ a -> \ x ys -> + case x of + 0 -> head ys + _ -> head (tail ys) +\end{verbatim} + +From this, we want to produce: +\begin{verbatim} +-- wrapper (an unfolding) +g :: forall a . Int -> [a] -> a + +g = \/\ a -> \ x ys -> + case x of + I# x# -> $wg a x# ys + -- call the worker; don't forget the type args! + +-- worker +$wg :: forall a . Int# -> [a] -> a + +$wg = \/\ a -> \ x# ys -> + let + x = I# x# + in + case x of -- note: body of g moved intact + 0 -> head ys + _ -> head (tail ys) +\end{verbatim} + +Something we have to be careful about: Here's an example: + +\begin{verbatim} +-- "f" strictness: U(P)U(P) +f (I# a) (I# b) = a +# b + +g = f -- "g" strictness same as "f" +\end{verbatim} + +\tr{f} will get a worker all nice and friendly-like; that's good. +{\em But we don't want a worker for \tr{g}}, even though it has the +same strictness as \tr{f}. Doing so could break laziness, at best. + +Consequently, we insist that the number of strictness-info items is +exactly the same as the number of lambda-bound arguments. (This is +probably slightly paranoid, but OK in practice.) If it isn't the +same, we ``revise'' the strictness info, so that we won't propagate +the unusable strictness-info into the interfaces. + + +************************************************************************ +* * +\subsection{The worker wrapper core} +* * +************************************************************************ + +@mkWwBodies@ is called when doing the worker\/wrapper split inside a module. +-} + +type WwResult + = ([Demand], -- Demands for worker (value) args + JoinArity, -- Number of worker (type OR value) args + Id -> CoreExpr, -- Wrapper body, lacking only the worker Id + CoreExpr -> CoreExpr) -- Worker body, lacking the original function rhs + +mkWwBodies :: DynFlags + -> FamInstEnvs + -> VarSet -- Free vars of RHS + -- See Note [Freshen WW arguments] + -> Id -- The original function + -> [Demand] -- Strictness of original function + -> CprResult -- Info about function result + -> UniqSM (Maybe WwResult) + +-- wrap_fn_args E = \x y -> E +-- work_fn_args E = E x y + +-- wrap_fn_str E = case x of { (a,b) -> +-- case a of { (a1,a2) -> +-- E a1 a2 b y }} +-- work_fn_str E = \a1 a2 b y -> +-- let a = (a1,a2) in +-- let x = (a,b) in +-- E + +mkWwBodies dflags fam_envs rhs_fvs fun_id demands cpr_info + = do { let empty_subst = mkEmptyTCvSubst (mkInScopeSet rhs_fvs) + -- See Note [Freshen WW arguments] + + ; (wrap_args, wrap_fn_args, work_fn_args, res_ty) + <- mkWWargs empty_subst fun_ty demands + ; (useful1, work_args, wrap_fn_str, work_fn_str) + <- mkWWstr dflags fam_envs has_inlineable_prag wrap_args + + -- Do CPR w/w. See Note [Always do CPR w/w] + ; (useful2, wrap_fn_cpr, work_fn_cpr, cpr_res_ty) + <- mkWWcpr (gopt Opt_CprAnal dflags) fam_envs res_ty cpr_info + + ; let (work_lam_args, work_call_args) = mkWorkerArgs dflags work_args cpr_res_ty + worker_args_dmds = [idDemandInfo v | v <- work_call_args, isId v] + wrapper_body = wrap_fn_args . wrap_fn_cpr . wrap_fn_str . applyToVars work_call_args . Var + worker_body = mkLams work_lam_args. work_fn_str . work_fn_cpr . work_fn_args + + ; if isWorkerSmallEnough dflags work_args + && not (too_many_args_for_join_point wrap_args) + && ((useful1 && not only_one_void_argument) || useful2) + then return (Just (worker_args_dmds, length work_call_args, + wrapper_body, worker_body)) + else return Nothing + } + -- We use an INLINE unconditionally, even if the wrapper turns out to be + -- something trivial like + -- fw = ... + -- f = __inline__ (coerce T fw) + -- The point is to propagate the coerce to f's call sites, so even though + -- f's RHS is now trivial (size 1) we still want the __inline__ to prevent + -- fw from being inlined into f's RHS + where + fun_ty = idType fun_id + mb_join_arity = isJoinId_maybe fun_id + has_inlineable_prag = isStableUnfolding (realIdUnfolding fun_id) + -- See Note [Do not unpack class dictionaries] + + -- Note [Do not split void functions] + only_one_void_argument + | [d] <- demands + , Just (arg_ty1, _) <- splitFunTy_maybe fun_ty + , isAbsDmd d && isVoidTy arg_ty1 + = True + | otherwise + = False + + -- Note [Join points returning functions] + too_many_args_for_join_point wrap_args + | Just join_arity <- mb_join_arity + , wrap_args `lengthExceeds` join_arity + = WARN(True, text "Unable to worker/wrapper join point with arity " <+> + int join_arity <+> text "but" <+> + int (length wrap_args) <+> text "args") + True + | otherwise + = False + +-- See Note [Limit w/w arity] +isWorkerSmallEnough :: DynFlags -> [Var] -> Bool +isWorkerSmallEnough dflags vars = count isId vars <= maxWorkerArgs dflags + -- We count only Free variables (isId) to skip Type, Kind + -- variables which have no runtime representation. + +{- +Note [Always do CPR w/w] +~~~~~~~~~~~~~~~~~~~~~~~~ +At one time we refrained from doing CPR w/w for thunks, on the grounds that +we might duplicate work. But that is already handled by the demand analyser, +which doesn't give the CPR property if w/w might waste work: see +Note [CPR for thunks] in GHC.Core.Op.DmdAnal. + +And if something *has* been given the CPR property and we don't w/w, it's +a disaster, because then the enclosing function might say it has the CPR +property, but now doesn't and there a cascade of disaster. A good example +is #5920. + +Note [Limit w/w arity] +~~~~~~~~~~~~~~~~~~~~~~~~ +Guard against high worker arity as it generates a lot of stack traffic. +A simplified example is #11565#comment:6 + +Current strategy is very simple: don't perform w/w transformation at all +if the result produces a wrapper with arity higher than -fmax-worker-args=. + +It is a bit all or nothing, consider + + f (x,y) (a,b,c,d,e ... , z) = rhs + +Currently we will remove all w/w ness entirely. But actually we could +w/w on the (x,y) pair... it's the huge product that is the problem. + +Could we instead refrain from w/w on an arg-by-arg basis? Yes, that'd +solve f. But we can get a lot of args from deeply-nested products: + + g (a, (b, (c, (d, ...)))) = rhs + +This is harder to spot on an arg-by-arg basis. Previously mkWwStr was +given some "fuel" saying how many arguments it could add; when we ran +out of fuel it would stop w/wing. +Still not very clever because it had a left-right bias. + +************************************************************************ +* * +\subsection{Making wrapper args} +* * +************************************************************************ + +During worker-wrapper stuff we may end up with an unlifted thing +which we want to let-bind without losing laziness. So we +add a void argument. E.g. + + f = /\a -> \x y z -> E::Int# -- E does not mention x,y,z +==> + fw = /\ a -> \void -> E + f = /\ a -> \x y z -> fw realworld + +We use the state-token type which generates no code. +-} + +mkWorkerArgs :: DynFlags -> [Var] + -> Type -- Type of body + -> ([Var], -- Lambda bound args + [Var]) -- Args at call site +mkWorkerArgs dflags args res_ty + | any isId args || not needsAValueLambda + = (args, args) + | otherwise + = (args ++ [voidArgId], args ++ [voidPrimId]) + where + -- See "Making wrapper args" section above + needsAValueLambda = + lifted + -- We may encounter a levity-polymorphic result, in which case we + -- conservatively assume that we have laziness that needs preservation. + -- See #15186. + || not (gopt Opt_FunToThunk dflags) + -- see Note [Protecting the last value argument] + + -- Might the result be lifted? + lifted = + case isLiftedType_maybe res_ty of + Just lifted -> lifted + Nothing -> True + +{- +Note [Protecting the last value argument] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If the user writes (\_ -> E), they might be intentionally disallowing +the sharing of E. Since absence analysis and worker-wrapper are keen +to remove such unused arguments, we add in a void argument to prevent +the function from becoming a thunk. + +The user can avoid adding the void argument with the -ffun-to-thunk +flag. However, this can create sharing, which may be bad in two ways. 1) It can +create a space leak. 2) It can prevent inlining *under a lambda*. If w/w +removes the last argument from a function f, then f now looks like a thunk, and +so f can't be inlined *under a lambda*. + +Note [Join points and beta-redexes] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Originally, the worker would invoke the original function by calling it with +arguments, thus producing a beta-redex for the simplifier to munch away: + + \x y z -> e => (\x y z -> e) wx wy wz + +Now that we have special rules about join points, however, this is Not Good if +the original function is itself a join point, as then it may contain invocations +of other join points: + + join j1 x = ... + join j2 y = if y == 0 then 0 else j1 y + + => + + join j1 x = ... + join $wj2 y# = let wy = I# y# in (\y -> if y == 0 then 0 else jump j1 y) wy + join j2 y = case y of I# y# -> jump $wj2 y# + +There can't be an intervening lambda between a join point's declaration and its +occurrences, so $wj2 here is wrong. But of course, this is easy enough to fix: + + ... + let join $wj2 y# = let wy = I# y# in let y = wy in if y == 0 then 0 else j1 y + ... + +Hence we simply do the beta-reduction here. (This would be harder if we had to +worry about hygiene, but luckily wy is freshly generated.) + +Note [Join points returning functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is crucial that the arity of a join point depends on its *callers,* not its +own syntax. What this means is that a join point can have "extra lambdas": + +f :: Int -> Int -> (Int, Int) -> Int +f x y = join j (z, w) = \(u, v) -> ... + in jump j (x, y) + +Typically this happens with functions that are seen as computing functions, +rather than being curried. (The real-life example was GraphOps.addConflicts.) + +When we create the wrapper, it *must* be in "eta-contracted" form so that the +jump has the right number of arguments: + +f x y = join $wj z' w' = \u' v' -> let {z = z'; w = w'; u = u'; v = v'} in ... + j (z, w) = jump $wj z w + +(See Note [Join points and beta-redexes] for where the lets come from.) If j +were a function, we would instead say + +f x y = let $wj = \z' w' u' v' -> let {z = z'; w = w'; u = u'; v = v'} in ... + j (z, w) (u, v) = $wj z w u v + +Notice that the worker ends up with the same lambdas; it's only the wrapper we +have to be concerned about. + +FIXME Currently the functionality to produce "eta-contracted" wrappers is +unimplemented; we simply give up. + +************************************************************************ +* * +\subsection{Coercion stuff} +* * +************************************************************************ + +We really want to "look through" coerces. +Reason: I've seen this situation: + + let f = coerce T (\s -> E) + in \x -> case x of + p -> coerce T' f + q -> \s -> E2 + r -> coerce T' f + +If only we w/w'd f, we'd get + let f = coerce T (\s -> fw s) + fw = \s -> E + in ... + +Now we'll inline f to get + + let fw = \s -> E + in \x -> case x of + p -> fw + q -> \s -> E2 + r -> fw + +Now we'll see that fw has arity 1, and will arity expand +the \x to get what we want. +-} + +-- mkWWargs just does eta expansion +-- is driven off the function type and arity. +-- It chomps bites off foralls, arrows, newtypes +-- and keeps repeating that until it's satisfied the supplied arity + +mkWWargs :: TCvSubst -- Freshening substitution to apply to the type + -- See Note [Freshen WW arguments] + -> Type -- The type of the function + -> [Demand] -- Demands and one-shot info for value arguments + -> UniqSM ([Var], -- Wrapper args + CoreExpr -> CoreExpr, -- Wrapper fn + CoreExpr -> CoreExpr, -- Worker fn + Type) -- Type of wrapper body + +mkWWargs subst fun_ty demands + | null demands + = return ([], id, id, substTy subst fun_ty) + + | (dmd:demands') <- demands + , Just (arg_ty, fun_ty') <- splitFunTy_maybe fun_ty + = do { uniq <- getUniqueM + ; let arg_ty' = substTy subst arg_ty + id = mk_wrap_arg uniq arg_ty' dmd + ; (wrap_args, wrap_fn_args, work_fn_args, res_ty) + <- mkWWargs subst fun_ty' demands' + ; return (id : wrap_args, + Lam id . wrap_fn_args, + apply_or_bind_then work_fn_args (varToCoreExpr id), + res_ty) } + + | Just (tv, fun_ty') <- splitForAllTy_maybe fun_ty + = do { uniq <- getUniqueM + ; let (subst', tv') = cloneTyVarBndr subst tv uniq + -- See Note [Freshen WW arguments] + ; (wrap_args, wrap_fn_args, work_fn_args, res_ty) + <- mkWWargs subst' fun_ty' demands + ; return (tv' : wrap_args, + Lam tv' . wrap_fn_args, + apply_or_bind_then work_fn_args (mkTyArg (mkTyVarTy tv')), + res_ty) } + + | Just (co, rep_ty) <- topNormaliseNewType_maybe fun_ty + -- The newtype case is for when the function has + -- a newtype after the arrow (rare) + -- + -- It's also important when we have a function returning (say) a pair + -- wrapped in a newtype, at least if CPR analysis can look + -- through such newtypes, which it probably can since they are + -- simply coerces. + + = do { (wrap_args, wrap_fn_args, work_fn_args, res_ty) + <- mkWWargs subst rep_ty demands + ; let co' = substCo subst co + ; return (wrap_args, + \e -> Cast (wrap_fn_args e) (mkSymCo co'), + \e -> work_fn_args (Cast e co'), + res_ty) } + + | otherwise + = WARN( True, ppr fun_ty ) -- Should not happen: if there is a demand + return ([], id, id, substTy subst fun_ty) -- then there should be a function arrow + where + -- See Note [Join points and beta-redexes] + apply_or_bind_then k arg (Lam bndr body) + = mkCoreLet (NonRec bndr arg) (k body) -- Important that arg is fresh! + apply_or_bind_then k arg fun + = k $ mkCoreApp (text "mkWWargs") fun arg +applyToVars :: [Var] -> CoreExpr -> CoreExpr +applyToVars vars fn = mkVarApps fn vars + +mk_wrap_arg :: Unique -> Type -> Demand -> Id +mk_wrap_arg uniq ty dmd + = mkSysLocalOrCoVar (fsLit "w") uniq ty + `setIdDemandInfo` dmd + +{- Note [Freshen WW arguments] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Wen we do a worker/wrapper split, we must not in-scope names as the arguments +of the worker, else we'll get name capture. E.g. + + -- y1 is in scope from further out + f x = ..y1.. + +If we accidentally choose y1 as a worker argument disaster results: + + fww y1 y2 = let x = (y1,y2) in ...y1... + +To avoid this: + + * We use a fresh unique for both type-variable and term-variable binders + Originally we lacked this freshness for type variables, and that led + to the very obscure #12562. (A type variable in the worker shadowed + an outer term-variable binding.) + + * Because of this cloning we have to substitute in the type/kind of the + new binders. That's why we carry the TCvSubst through mkWWargs. + + So we need a decent in-scope set, just in case that type/kind + itself has foralls. We get this from the free vars of the RHS of the + function since those are the only variables that might be captured. + It's a lazy thunk, which will only be poked if the type/kind has a forall. + + Another tricky case was when f :: forall a. a -> forall a. a->a + (i.e. with shadowing), and then the worker used the same 'a' twice. + +************************************************************************ +* * +\subsection{Strictness stuff} +* * +************************************************************************ +-} + +mkWWstr :: DynFlags + -> FamInstEnvs + -> Bool -- True <=> INLINEABLE pragma on this function defn + -- See Note [Do not unpack class dictionaries] + -> [Var] -- Wrapper args; have their demand info on them + -- *Includes type variables* + -> UniqSM (Bool, -- Is this useful + [Var], -- Worker args + CoreExpr -> CoreExpr, -- Wrapper body, lacking the worker call + -- and without its lambdas + -- This fn adds the unboxing + + CoreExpr -> CoreExpr) -- Worker body, lacking the original body of the function, + -- and lacking its lambdas. + -- This fn does the reboxing +mkWWstr dflags fam_envs has_inlineable_prag args + = go args + where + go_one arg = mkWWstr_one dflags fam_envs has_inlineable_prag arg + + go [] = return (False, [], nop_fn, nop_fn) + go (arg : args) = do { (useful1, args1, wrap_fn1, work_fn1) <- go_one arg + ; (useful2, args2, wrap_fn2, work_fn2) <- go args + ; return ( useful1 || useful2 + , args1 ++ args2 + , wrap_fn1 . wrap_fn2 + , work_fn1 . work_fn2) } + +{- +Note [Unpacking arguments with product and polymorphic demands] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The argument is unpacked in a case if it has a product type and has a +strict *and* used demand put on it. I.e., arguments, with demands such +as the following ones: + + <S,U(U, L)> + <S(L,S),U> + +will be unpacked, but + + <S,U> or <B,U> + +will not, because the pieces aren't used. This is quite important otherwise +we end up unpacking massive tuples passed to the bottoming function. Example: + + f :: ((Int,Int) -> String) -> (Int,Int) -> a + f g pr = error (g pr) + + main = print (f fst (1, error "no")) + +Does 'main' print "error 1" or "error no"? We don't really want 'f' +to unbox its second argument. This actually happened in GHC's onwn +source code, in Packages.applyPackageFlag, which ended up un-boxing +the enormous DynFlags tuple, and being strict in the +as-yet-un-filled-in pkgState files. +-} + +---------------------- +-- mkWWstr_one wrap_arg = (useful, work_args, wrap_fn, work_fn) +-- * wrap_fn assumes wrap_arg is in scope, +-- brings into scope work_args (via cases) +-- * work_fn assumes work_args are in scope, a +-- brings into scope wrap_arg (via lets) +-- See Note [How to do the worker/wrapper split] +mkWWstr_one :: DynFlags -> FamInstEnvs + -> Bool -- True <=> INLINEABLE pragma on this function defn + -- See Note [Do not unpack class dictionaries] + -> Var + -> UniqSM (Bool, [Var], CoreExpr -> CoreExpr, CoreExpr -> CoreExpr) +mkWWstr_one dflags fam_envs has_inlineable_prag arg + | isTyVar arg + = return (False, [arg], nop_fn, nop_fn) + + | isAbsDmd dmd + , Just work_fn <- mk_absent_let dflags fam_envs arg + -- Absent case. We can't always handle absence for arbitrary + -- unlifted types, so we need to choose just the cases we can + -- (that's what mk_absent_let does) + = return (True, [], nop_fn, work_fn) + + | isStrictDmd dmd + , Just cs <- splitProdDmd_maybe dmd + -- See Note [Unpacking arguments with product and polymorphic demands] + , not (has_inlineable_prag && isClassPred arg_ty) + -- See Note [Do not unpack class dictionaries] + , Just stuff@(_, _, inst_con_arg_tys, _) <- deepSplitProductType_maybe fam_envs arg_ty + , cs `equalLength` inst_con_arg_tys + -- See Note [mkWWstr and unsafeCoerce] + = unbox_one dflags fam_envs arg cs stuff + + | isSeqDmd dmd -- For seqDmd, splitProdDmd_maybe will return Nothing, but + -- it should behave like <S, U(AAAA)>, for some suitable arity + , Just stuff@(_, _, inst_con_arg_tys, _) <- deepSplitProductType_maybe fam_envs arg_ty + , let abs_dmds = map (const absDmd) inst_con_arg_tys + = unbox_one dflags fam_envs arg abs_dmds stuff + + | otherwise -- Other cases + = return (False, [arg], nop_fn, nop_fn) + + where + arg_ty = idType arg + dmd = idDemandInfo arg + +unbox_one :: DynFlags -> FamInstEnvs -> Var + -> [Demand] + -> (DataCon, [Type], [(Type, StrictnessMark)], Coercion) + -> UniqSM (Bool, [Var], CoreExpr -> CoreExpr, CoreExpr -> CoreExpr) +unbox_one dflags fam_envs arg cs + (data_con, inst_tys, inst_con_arg_tys, co) + = do { (uniq1:uniqs) <- getUniquesM + ; let -- See Note [Add demands for strict constructors] + cs' = addDataConStrictness data_con cs + unpk_args = zipWith3 mk_ww_arg uniqs inst_con_arg_tys cs' + unbox_fn = mkUnpackCase (Var arg) co uniq1 + data_con unpk_args + arg_no_unf = zapStableUnfolding arg + -- See Note [Zap unfolding when beta-reducing] + -- in GHC.Core.Op.Simplify; and see #13890 + rebox_fn = Let (NonRec arg_no_unf con_app) + con_app = mkConApp2 data_con inst_tys unpk_args `mkCast` mkSymCo co + ; (_, worker_args, wrap_fn, work_fn) <- mkWWstr dflags fam_envs False unpk_args + ; return (True, worker_args, unbox_fn . wrap_fn, work_fn . rebox_fn) } + -- Don't pass the arg, rebox instead + where + mk_ww_arg uniq ty sub_dmd = setIdDemandInfo (mk_ww_local uniq ty) sub_dmd + +---------------------- +nop_fn :: CoreExpr -> CoreExpr +nop_fn body = body + +addDataConStrictness :: DataCon -> [Demand] -> [Demand] +-- See Note [Add demands for strict constructors] +addDataConStrictness con ds + = ASSERT2( equalLength strs ds, ppr con $$ ppr strs $$ ppr ds ) + zipWith add ds strs + where + strs = dataConRepStrictness con + add dmd str | isMarkedStrict str = strictifyDmd dmd + | otherwise = dmd + +{- Note [How to do the worker/wrapper split] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The worker-wrapper transformation, mkWWstr_one, takes into account +several possibilities to decide if the function is worthy for +splitting: + +1. If an argument is absent, it would be silly to pass it to + the worker. Hence the isAbsDmd case. This case must come + first because a demand like <S,A> or <B,A> is possible. + E.g. <B,A> comes from a function like + f x = error "urk" + and <S,A> can come from Note [Add demands for strict constructors] + +2. If the argument is evaluated strictly, and we can split the + product demand (splitProdDmd_maybe), then unbox it and w/w its + pieces. For example + + f :: (Int, Int) -> Int + f p = (case p of (a,b) -> a) + 1 + is split to + f :: (Int, Int) -> Int + f p = case p of (a,b) -> $wf a + + $wf :: Int -> Int + $wf a = a + 1 + + and + g :: Bool -> (Int, Int) -> Int + g c p = case p of (a,b) -> + if c then a else b + is split to + g c p = case p of (a,b) -> $gw c a b + $gw c a b = if c then a else b + +2a But do /not/ split if the components are not used; that is, the + usage is just 'Used' rather than 'UProd'. In this case + splitProdDmd_maybe returns Nothing. Otherwise we risk decomposing + a massive tuple which is barely used. Example: + + f :: ((Int,Int) -> String) -> (Int,Int) -> a + f g pr = error (g pr) + + main = print (f fst (1, error "no")) + + Here, f does not take 'pr' apart, and it's stupid to do so. + Imagine that it had millions of fields. This actually happened + in GHC itself where the tuple was DynFlags + +3. A plain 'seqDmd', which is head-strict with usage UHead, can't + be split by splitProdDmd_maybe. But we want it to behave just + like U(AAAA) for suitable number of absent demands. So we have + a special case for it, with arity coming from the data constructor. + +Note [Worker-wrapper for bottoming functions] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We used not to split if the result is bottom. +[Justification: there's no efficiency to be gained.] + +But it's sometimes bad not to make a wrapper. Consider + fw = \x# -> let x = I# x# in case e of + p1 -> error_fn x + p2 -> error_fn x + p3 -> the real stuff +The re-boxing code won't go away unless error_fn gets a wrapper too. +[We don't do reboxing now, but in general it's better to pass an +unboxed thing to f, and have it reboxed in the error cases....] + +Note [Add demands for strict constructors] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Consider this program (due to Roman): + + data X a = X !a + + foo :: X Int -> Int -> Int + foo (X a) n = go 0 + where + go i | i < n = a + go (i+1) + | otherwise = 0 + +We want the worker for 'foo' too look like this: + + $wfoo :: Int# -> Int# -> Int# + +with the first argument unboxed, so that it is not eval'd each time +around the 'go' loop (which would otherwise happen, since 'foo' is not +strict in 'a'). It is sound for the wrapper to pass an unboxed arg +because X is strict, so its argument must be evaluated. And if we +*don't* pass an unboxed argument, we can't even repair it by adding a +`seq` thus: + + foo (X a) n = a `seq` go 0 + +because the seq is discarded (very early) since X is strict! + +So here's what we do + +* We leave the demand-analysis alone. The demand on 'a' in the + definition of 'foo' is <L, U(U)>; the strictness info is Lazy + because foo's body may or may not evaluate 'a'; but the usage info + says that 'a' is unpacked and its content is used. + +* During worker/wrapper, if we unpack a strict constructor (as we do + for 'foo'), we use 'addDataConStrictness' to bump up the strictness on + the strict arguments of the data constructor. + +* That in turn means that, if the usage info supports doing so + (i.e. splitProdDmd_maybe returns Just), we will unpack that argument + -- even though the original demand (e.g. on 'a') was lazy. + +* What does "bump up the strictness" mean? Just add a head-strict + demand to the strictness! Even for a demand like <L,A> we can + safely turn it into <S,A>; remember case (1) of + Note [How to do the worker/wrapper split]. + +The net effect is that the w/w transformation is more aggressive about +unpacking the strict arguments of a data constructor, when that +eagerness is supported by the usage info. + +There is the usual danger of reboxing, which as usual we ignore. But +if X is monomorphic, and has an UNPACK pragma, then this optimisation +is even more important. We don't want the wrapper to rebox an unboxed +argument, and pass an Int to $wfoo! + +This works in nested situations like + + data family Bar a + data instance Bar (a, b) = BarPair !(Bar a) !(Bar b) + newtype instance Bar Int = Bar Int + + foo :: Bar ((Int, Int), Int) -> Int -> Int + foo f k = case f of BarPair x y -> + case burble of + True -> case x of + BarPair p q -> ... + False -> ... + +The extra eagerness lets us produce a worker of type: + $wfoo :: Int# -> Int# -> Int# -> Int -> Int + $wfoo p# q# y# = ... + +even though the `case x` is only lazily evaluated. + +--------- Historical note ------------ +We used to add data-con strictness demands when demand analysing case +expression. However, it was noticed in #15696 that this misses some cases. For +instance, consider the program (from T10482) + + data family Bar a + data instance Bar (a, b) = BarPair !(Bar a) !(Bar b) + newtype instance Bar Int = Bar Int + + foo :: Bar ((Int, Int), Int) -> Int -> Int + foo f k = + case f of + BarPair x y -> case burble of + True -> case x of + BarPair p q -> ... + False -> ... + +We really should be able to assume that `p` is already evaluated since it came +from a strict field of BarPair. This strictness would allow us to produce a +worker of type: + + $wfoo :: Int# -> Int# -> Int# -> Int -> Int + $wfoo p# q# y# = ... + +even though the `case x` is only lazily evaluated + +Indeed before we fixed #15696 this would happen since we would float the inner +`case x` through the `case burble` to get: + + foo f k = + case f of + BarPair x y -> case x of + BarPair p q -> case burble of + True -> ... + False -> ... + +However, after fixing #15696 this could no longer happen (for the reasons +discussed in ticket:15696#comment:76). This means that the demand placed on `f` +would then be significantly weaker (since the False branch of the case on +`burble` is not strict in `p` or `q`). + +Consequently, we now instead account for data-con strictness in mkWWstr_one, +applying the strictness demands to the final result of DmdAnal. The result is +that we get the strict demand signature we wanted even if we can't float +the case on `x` up through the case on `burble`. + + +Note [mkWWstr and unsafeCoerce] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +By using unsafeCoerce, it is possible to make the number of demands fail to +match the number of constructor arguments; this happened in #8037. +If so, the worker/wrapper split doesn't work right and we get a Core Lint +bug. The fix here is simply to decline to do w/w if that happens. + +Note [Record evaluated-ness in worker/wrapper] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Suppose we have + + data T = MkT !Int Int + + f :: T -> T + f x = e + +and f's is strict, and has the CPR property. The we are going to generate +this w/w split + + f x = case x of + MkT x1 x2 -> case $wf x1 x2 of + (# r1, r2 #) -> MkT r1 r2 + + $wfw x1 x2 = let x = MkT x1 x2 in + case e of + MkT r1 r2 -> (# r1, r2 #) + +Note that + +* In the worker $wf, inside 'e' we can be sure that x1 will be + evaluated (it came from unpacking the argument MkT. But that's no + immediately apparent in $wf + +* In the wrapper 'f', which we'll inline at call sites, we can be sure + that 'r1' has been evaluated (because it came from unpacking the result + MkT. But that is not immediately apparent from the wrapper code. + +Missing these facts isn't unsound, but it loses possible future +opportunities for optimisation. + +Solution: use setCaseBndrEvald when creating + (A) The arg binders x1,x2 in mkWstr_one + See #13077, test T13077 + (B) The result binders r1,r2 in mkWWcpr_help + See Trace #13077, test T13077a + And #13027 comment:20, item (4) +to record that the relevant binder is evaluated. + + +************************************************************************ +* * + Type scrutiny that is specific to demand analysis +* * +************************************************************************ + +Note [Do not unpack class dictionaries] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If we have + f :: Ord a => [a] -> Int -> a + {-# INLINABLE f #-} +and we worker/wrapper f, we'll get a worker with an INLINABLE pragma +(see Note [Worker-wrapper for INLINABLE functions] in GHC.Core.Op.WorkWrap), which +can still be specialised by the type-class specialiser, something like + fw :: Ord a => [a] -> Int# -> a + +BUT if f is strict in the Ord dictionary, we might unpack it, to get + fw :: (a->a->Bool) -> [a] -> Int# -> a +and the type-class specialiser can't specialise that. An example is +#6056. + +But in any other situation a dictionary is just an ordinary value, +and can be unpacked. So we track the INLINABLE pragma, and switch +off the unpacking in mkWWstr_one (see the isClassPred test). + +Historical note: #14955 describes how I got this fix wrong +the first time. +-} + +deepSplitProductType_maybe + :: FamInstEnvs -> Type + -> Maybe (DataCon, [Type], [(Type, StrictnessMark)], Coercion) +-- If deepSplitProductType_maybe ty = Just (dc, tys, arg_tys, co) +-- then dc @ tys (args::arg_tys) :: rep_ty +-- co :: ty ~ rep_ty +-- Why do we return the strictness of the data-con arguments? +-- Answer: see Note [Record evaluated-ness in worker/wrapper] +deepSplitProductType_maybe fam_envs ty + | let (co, ty1) = topNormaliseType_maybe fam_envs ty + `orElse` (mkRepReflCo ty, ty) + , Just (tc, tc_args) <- splitTyConApp_maybe ty1 + , Just con <- isDataProductTyCon_maybe tc + , let arg_tys = dataConInstArgTys con tc_args + strict_marks = dataConRepStrictness con + = Just (con, tc_args, zipEqual "dspt" arg_tys strict_marks, co) +deepSplitProductType_maybe _ _ = Nothing + +deepSplitCprType_maybe + :: FamInstEnvs -> ConTag -> Type + -> Maybe (DataCon, [Type], [(Type, StrictnessMark)], Coercion) +-- If deepSplitCprType_maybe n ty = Just (dc, tys, arg_tys, co) +-- then dc @ tys (args::arg_tys) :: rep_ty +-- co :: ty ~ rep_ty +-- Why do we return the strictness of the data-con arguments? +-- Answer: see Note [Record evaluated-ness in worker/wrapper] +deepSplitCprType_maybe fam_envs con_tag ty + | let (co, ty1) = topNormaliseType_maybe fam_envs ty + `orElse` (mkRepReflCo ty, ty) + , Just (tc, tc_args) <- splitTyConApp_maybe ty1 + , isDataTyCon tc + , let cons = tyConDataCons tc + , cons `lengthAtLeast` con_tag -- This might not be true if we import the + -- type constructor via a .hs-bool file (#8743) + , let con = cons `getNth` (con_tag - fIRST_TAG) + arg_tys = dataConInstArgTys con tc_args + strict_marks = dataConRepStrictness con + = Just (con, tc_args, zipEqual "dsct" arg_tys strict_marks, co) +deepSplitCprType_maybe _ _ _ = Nothing + +findTypeShape :: FamInstEnvs -> Type -> TypeShape +-- Uncover the arrow and product shape of a type +-- The data type TypeShape is defined in Demand +-- See Note [Trimming a demand to a type] in Demand +findTypeShape fam_envs ty + | Just (tc, tc_args) <- splitTyConApp_maybe ty + , Just con <- isDataProductTyCon_maybe tc + = TsProd (map (findTypeShape fam_envs) $ dataConInstArgTys con tc_args) + + | Just (_, res) <- splitFunTy_maybe ty + = TsFun (findTypeShape fam_envs res) + + | Just (_, ty') <- splitForAllTy_maybe ty + = findTypeShape fam_envs ty' + + | Just (_, ty') <- topNormaliseType_maybe fam_envs ty + = findTypeShape fam_envs ty' + + | otherwise + = TsUnk + +{- +************************************************************************ +* * +\subsection{CPR stuff} +* * +************************************************************************ + + +@mkWWcpr@ takes the worker/wrapper pair produced from the strictness +info and adds in the CPR transformation. The worker returns an +unboxed tuple containing non-CPR components. The wrapper takes this +tuple and re-produces the correct structured output. + +The non-CPR results appear ordered in the unboxed tuple as if by a +left-to-right traversal of the result structure. +-} + +mkWWcpr :: Bool + -> FamInstEnvs + -> Type -- function body type + -> CprResult -- CPR analysis results + -> UniqSM (Bool, -- Is w/w'ing useful? + CoreExpr -> CoreExpr, -- New wrapper + CoreExpr -> CoreExpr, -- New worker + Type) -- Type of worker's body + +mkWWcpr opt_CprAnal fam_envs body_ty cpr + -- CPR explicitly turned off (or in -O0) + | not opt_CprAnal = return (False, id, id, body_ty) + -- CPR is turned on by default for -O and O2 + | otherwise + = case asConCpr cpr of + Nothing -> return (False, id, id, body_ty) -- No CPR info + Just con_tag | Just stuff <- deepSplitCprType_maybe fam_envs con_tag body_ty + -> mkWWcpr_help stuff + | otherwise + -- See Note [non-algebraic or open body type warning] + -> WARN( True, text "mkWWcpr: non-algebraic or open body type" <+> ppr body_ty ) + return (False, id, id, body_ty) + +mkWWcpr_help :: (DataCon, [Type], [(Type,StrictnessMark)], Coercion) + -> UniqSM (Bool, CoreExpr -> CoreExpr, CoreExpr -> CoreExpr, Type) + +mkWWcpr_help (data_con, inst_tys, arg_tys, co) + | [arg1@(arg_ty1, _)] <- arg_tys + , isUnliftedType arg_ty1 + -- Special case when there is a single result of unlifted type + -- + -- Wrapper: case (..call worker..) of x -> C x + -- Worker: case ( ..body.. ) of C x -> x + = do { (work_uniq : arg_uniq : _) <- getUniquesM + ; let arg = mk_ww_local arg_uniq arg1 + con_app = mkConApp2 data_con inst_tys [arg] `mkCast` mkSymCo co + + ; return ( True + , \ wkr_call -> mkDefaultCase wkr_call arg con_app + , \ body -> mkUnpackCase body co work_uniq data_con [arg] (varToCoreExpr arg) + -- varToCoreExpr important here: arg can be a coercion + -- Lacking this caused #10658 + , arg_ty1 ) } + + | otherwise -- The general case + -- Wrapper: case (..call worker..) of (# a, b #) -> C a b + -- Worker: case ( ...body... ) of C a b -> (# a, b #) + = do { (work_uniq : wild_uniq : uniqs) <- getUniquesM + ; let wrap_wild = mk_ww_local wild_uniq (ubx_tup_ty,MarkedStrict) + args = zipWith mk_ww_local uniqs arg_tys + ubx_tup_ty = exprType ubx_tup_app + ubx_tup_app = mkCoreUbxTup (map fst arg_tys) (map varToCoreExpr args) + con_app = mkConApp2 data_con inst_tys args `mkCast` mkSymCo co + tup_con = tupleDataCon Unboxed (length arg_tys) + + ; return (True + , \ wkr_call -> mkSingleAltCase wkr_call wrap_wild + (DataAlt tup_con) args con_app + , \ body -> mkUnpackCase body co work_uniq data_con args ubx_tup_app + , ubx_tup_ty ) } + +mkUnpackCase :: CoreExpr -> Coercion -> Unique -> DataCon -> [Id] -> CoreExpr -> CoreExpr +-- (mkUnpackCase e co uniq Con args body) +-- returns +-- case e |> co of bndr { Con args -> body } + +mkUnpackCase (Tick tickish e) co uniq con args body -- See Note [Profiling and unpacking] + = Tick tickish (mkUnpackCase e co uniq con args body) +mkUnpackCase scrut co uniq boxing_con unpk_args body + = mkSingleAltCase casted_scrut bndr + (DataAlt boxing_con) unpk_args body + where + casted_scrut = scrut `mkCast` co + bndr = mk_ww_local uniq (exprType casted_scrut, MarkedStrict) + +{- +Note [non-algebraic or open body type warning] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are a few cases where the W/W transformation is told that something +returns a constructor, but the type at hand doesn't really match this. One +real-world example involves unsafeCoerce: + foo = IO a + foo = unsafeCoerce c_exit + foreign import ccall "c_exit" c_exit :: IO () +Here CPR will tell you that `foo` returns a () constructor for sure, but trying +to create a worker/wrapper for type `a` obviously fails. +(This was a real example until ee8e792 in libraries/base.) + +It does not seem feasible to avoid all such cases already in the analyser (and +after all, the analysis is not really wrong), so we simply do nothing here in +mkWWcpr. But we still want to emit warning with -DDEBUG, to hopefully catch +other cases where something went avoidably wrong. + +This warning also triggers for the stream fusion library within `text`. +We can'easily W/W constructed results like `Stream` because we have no simple +way to express existential types in the worker's type signature. + +Note [Profiling and unpacking] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If the original function looked like + f = \ x -> {-# SCC "foo" #-} E + +then we want the CPR'd worker to look like + \ x -> {-# SCC "foo" #-} (case E of I# x -> x) +and definitely not + \ x -> case ({-# SCC "foo" #-} E) of I# x -> x) + +This transform doesn't move work or allocation +from one cost centre to another. + +Later [SDM]: presumably this is because we want the simplifier to +eliminate the case, and the scc would get in the way? I'm ok with +including the case itself in the cost centre, since it is morally +part of the function (post transformation) anyway. + + +************************************************************************ +* * +\subsection{Utilities} +* * +************************************************************************ + +Note [Absent errors] +~~~~~~~~~~~~~~~~~~~~ +We make a new binding for Ids that are marked absent, thus + let x = absentError "x :: Int" +The idea is that this binding will never be used; but if it +buggily is used we'll get a runtime error message. + +Coping with absence for *unlifted* types is important; see, for +example, #4306 and #15627. In the UnliftedRep case, we can +use LitRubbish, which we need to apply to the required type. +For the unlifted types of singleton kind like Float#, Addr#, etc. we +also find a suitable literal, using Literal.absentLiteralOf. We don't +have literals for every primitive type, so the function is partial. + +Note: I did try the experiment of using an error thunk for unlifted +things too, relying on the simplifier to drop it as dead code. +But this is fragile + + - It fails when profiling is on, which disables various optimisations + + - It fails when reboxing happens. E.g. + data T = MkT Int Int# + f p@(MkT a _) = ...g p.... + where g is /lazy/ in 'p', but only uses the first component. Then + 'f' is /strict/ in 'p', and only uses the first component. So we only + pass that component to the worker for 'f', which reconstructs 'p' to + pass it to 'g'. Alas we can't say + ...f (MkT a (absentError Int# "blah"))... + bacause `MkT` is strict in its Int# argument, so we get an absentError + exception when we shouldn't. Very annoying! + +So absentError is only used for lifted types. +-} + +-- | Tries to find a suitable dummy RHS to bind the given absent identifier to. +-- +-- If @mk_absent_let _ id == Just wrap@, then @wrap e@ will wrap a let binding +-- for @id@ with that RHS around @e@. Otherwise, there could no suitable RHS be +-- found (currently only happens for bindings of 'VecRep' representation). +mk_absent_let :: DynFlags -> FamInstEnvs -> Id -> Maybe (CoreExpr -> CoreExpr) +mk_absent_let dflags fam_envs arg + -- The lifted case: Bind 'absentError' + -- See Note [Absent errors] + | not (isUnliftedType arg_ty) + = Just (Let (NonRec lifted_arg abs_rhs)) + -- The 'UnliftedRep' (because polymorphic) case: Bind @__RUBBISH \@arg_ty@ + -- See Note [Absent errors] + | [UnliftedRep] <- typePrimRep arg_ty + = Just (Let (NonRec arg unlifted_rhs)) + -- The monomorphic unlifted cases: Bind to some literal, if possible + -- See Note [Absent errors] + | Just tc <- tyConAppTyCon_maybe nty + , Just lit <- absentLiteralOf tc + = Just (Let (NonRec arg (Lit lit `mkCast` mkSymCo co))) + | nty `eqType` voidPrimTy + = Just (Let (NonRec arg (Var voidPrimId `mkCast` mkSymCo co))) + | otherwise + = WARN( True, text "No absent value for" <+> ppr arg_ty ) + Nothing -- Can happen for 'State#' and things of 'VecRep' + where + lifted_arg = arg `setIdStrictness` botSig `setIdCprInfo` mkCprSig 0 botCpr + -- Note in strictness signature that this is bottoming + -- (for the sake of the "empty case scrutinee not known to + -- diverge for sure lint" warning) + arg_ty = idType arg + + -- Normalise the type to have best chance of finding an absent literal + -- e.g. (#17852) data unlifted N = MkN Int# + -- f :: N -> a -> a + -- f _ x = x + (co, nty) = topNormaliseType_maybe fam_envs arg_ty + `orElse` (mkRepReflCo arg_ty, arg_ty) + + abs_rhs = mkAbsentErrorApp arg_ty msg + msg = showSDoc (gopt_set dflags Opt_SuppressUniques) + (ppr arg <+> ppr (idType arg)) + -- We need to suppress uniques here because otherwise they'd + -- end up in the generated code as strings. This is bad for + -- determinism, because with different uniques the strings + -- will have different lengths and hence different costs for + -- the inliner leading to different inlining. + -- See also Note [Unique Determinism] in Unique + unlifted_rhs = mkTyApps (Lit rubbishLit) [arg_ty] + +mk_ww_local :: Unique -> (Type, StrictnessMark) -> Id +-- The StrictnessMark comes form the data constructor and says +-- whether this field is strict +-- See Note [Record evaluated-ness in worker/wrapper] +mk_ww_local uniq (ty,str) + = setCaseBndrEvald str $ + mkSysLocalOrCoVar (fsLit "ww") uniq ty diff --git a/compiler/GHC/Core/Op/simplifier.tib b/compiler/GHC/Core/Op/simplifier.tib new file mode 100644 index 0000000000..e0f9dc91f2 --- /dev/null +++ b/compiler/GHC/Core/Op/simplifier.tib @@ -0,0 +1,771 @@ +% Andre: +% +% - I'd like the transformation rules to appear clearly-identified in +% a box of some kind, so they can be distinguished from the examples. +% + + + +\documentstyle[slpj,11pt]{article} + +\renewcommand{\textfraction}{0.2} +\renewcommand{\floatpagefraction}{0.7} + +\begin{document} + +\title{How to simplify matters} + +\author{Simon Peyton Jones and Andre Santos\\ +Department of Computing Science, University of Glasgow, G12 8QQ \\ + @simonpj@@dcs.gla.ac.uk@ +} + +\maketitle + + +\section{Motivation} + +Quite a few compilers use the {\em compilation by transformation} idiom. +The idea is that as much of possible of the compilation process is +expressed as correctness-preserving transformations, each of which +transforms a program into a semantically-equivalent +program that (hopefully) executes more quickly or in less space. +Functional languages are particularly amenable to this approach because +they have a particularly rich family of possible transformations. +Examples of transformation-based compilers +include the Orbit compiler,[.kranz orbit thesis.] +Kelsey's compilers,[.kelsey thesis, hudak kelsey principles 1989.] +the New Jersey SML compiler,[.appel compiling with continuations.] +and the Glasgow Haskell compiler.[.ghc JFIT.] Of course many, perhaps most, +other compilers also use transformation to some degree. + +Compilation by transformation uses automatic transformations; that is, those +which can safely be applied automatically by a compiler. There +is also a whole approach to programming, which we might call {\em programming by transformation}, +in which the programmer manually transforms an inefficient specification into +an efficient program. This development process might be supported by +a programming environment in which does the book keeping, but the key steps +are guided by the programmer. We focus exclusively on automatic transformations +in this paper. + +Automatic program transformations seem to fall into two broad categories: +\begin{itemize} +\item {\bf Glamorous transformations} are global, sophisticated, +intellectually satisfying transformations, sometimes guided by some +interesting kind of analysis. +Examples include: +lambda lifting,[.johnsson lambda lifting.] +full laziness,[.hughes thesis, lester spe.] +closure conversion,[.appel jim 1989.] +deforestation,[.wadler 1990 deforestation, marlow wadler deforestation Glasgow92, chin phd 1990 march, gill launchbury.] +transformations based on strictness analysis,[.peyton launchbury unboxed.] +and so on. It is easy to write papers about these sorts of transformations. + +\item {\bf Humble transformations} are small, simple, local transformations, +which individually look pretty trivial. Here are two simple examples\footnote{ +The notation @E[]@ stands for an arbitrary expression with zero or more holes. +The notation @E[e]@ denotes @E[]@ with the holes filled in by the expression @e@. +We implicitly assume that no name-capture happens --- it's just +a short-hand, not an algorithm. +}: +@ + let x = y in E[x] ===> E[y] + + case (x:xs) of ===> E1[x,xs] + (y:ys) -> E1[y,ys] + [] -> E2 +@ +Transformations of this kind are almost embarrassingly simple. How could +anyone write a paper about them? +\end{itemize} +This paper is about humble transformations, and how to implement them. +Although each individual +transformation is simple enough, there is a scaling issue: +there are a large number of candidate transformations to consider, and +there are a very large number of opportunities to apply them. + +In the Glasgow Haskell compiler, all humble transformations +are performed by the so-called {\em simplifier}. +Our goal in this paper is to give an overview of how the simplifier works, what +transformations it applies, and what issues arose in its design. + +\section{The language} + +Mutter mutter. Important points: +\begin{itemize} +\item Second order lambda calculus. +\item Arguments are variables. +\item Unboxed data types, and unboxed cases. +\end{itemize} +Less important points: +\begin{itemize} +\item Constructors and primitives are saturated. +\item if-then-else desugared to @case@ +\end{itemize} + +Give data type. + +\section{Transformations} + +This section lists all the transformations implemented by the simplifier. +Because it is a complete list, it is a long one. +We content ourselves with a brief statement of each transformation, +augmented with forward references to Section~\ref{sect:composing} +which gives examples of the ways in which the transformations can compose together. + +\subsection{Beta reduction} + +If a lambda abstraction is applied to an argument, we can simply +beta-reduce. This applies equally to ordinary lambda abstractions and +type abstractions: +@ + (\x -> E[x]) arg ===> E[arg] + (/\a -> E[a]) ty ===> E[ty] +@ +There is no danger of duplicating work because the argument is +guaranteed to be a simple variable or literal. + +\subsubsection{Floating applications inward} + +Applications can be floated inside a @let(rec)@ or @case@ expression. +This is a good idea, because they might find a lambda abstraction inside +to beta-reduce with: +@ + (let(rec) Bind in E) arg ===> let(rec) Bind in (E arg) + + (case E of {P1 -> E1;...; Pn -> En}) arg + ===> + case E of {P1 -> E1 arg; ...; Pn -> En arg} +@ + + + +\subsection{Transformations concerning @let(rec)@} + +\subsubsection{Floating @let@ out of @let@} + +It is sometimes useful to float a @let(rec)@ out of a @let(rec)@ right-hand +side: +@ + let x = let(rec) Bind in B1 ===> let(rec) Bind in + in B2 let x = B1 + in B2 + + + letrec x = let(rec) Bind in B1 ===> let(rec) Bind + in B2 x = B1 + in B2 +@ + +\subsubsection{Floating @case@ out of @let@} + + +\subsubsection{@let@ to @case@} + + +\subsection{Transformations concerning @case@} + +\subsubsection{Case of known constructor} + +If a @case@ expression scrutinises a constructor, +the @case@ can be eliminated. This transformation is a real +win: it eliminates a whole @case@ expression. +@ + case (C a1 .. an) of ===> E[a1..an] + ... + C b1 .. bn -> E[b1..bn] + ... +@ +If none of the constructors in the alternatives match, then +the default is taken: +@ + case (C a1 .. an) of ===> let y = C a1 .. an + ...[no alt matches C]... in E + y -> E +@ +There is an important variant of this transformation when +the @case@ expression scrutinises a {\em variable} +which is known to be bound to a constructor. +This situation can +arise for two reasons: +\begin{itemize} +\item An enclosing @let(rec)@ binding binds the variable to a constructor. +For example: +@ + let x = C p q in ... (case x of ...) ... +@ +\item An enclosing @case@ expression scrutinises the same variable. +For example: +@ + case x of + ... + C p q -> ... (case x of ...) ... + ... +@ +This situation is particularly common, as we discuss in Section~\ref{sect:repeated-evals}. +\end{itemize} +In each of these examples, @x@ is known to be bound to @C p q@ +at the inner @case@. The general rules are: +@ + case x of {...; C b1 .. bn -> E[b1..bn]; ...} +===> {x bound to C a1 .. an} + E[a1..an] + + case x of {...[no alts match C]...; y -> E[y]} +===> {x bound to C a1 .. an} + E[x] +@ + +\subsubsection{Dead alternative elimination} +@ + case x of + C a .. z -> E + ...[other alts]... +===> x *not* bound to C + case x of + ...[other alts]... +@ +We might know that @x@ is not bound to a particular constructor +because of an enclosing case: +@ + case x of + C a .. z -> E1 + other -> E2 +@ +Inside @E1@ we know that @x@ is bound to @C@. +However, if the type has more than two constructors, +inside @E2@ all we know is that @x@ is {\em not} bound to @C@. + +This applies to unboxed cases also, in the obvious way. + +\subsubsection{Case elimination} + +If we can prove that @x@ is not bottom, then this rule applies. +@ + case x of ===> E[x] + y -> E[y] +@ +We might know that @x@ is non-bottom because: +\begin{itemize} +\item @x@ has an unboxed type. +\item There's an enclosing case which scrutinises @x@. +\item It is bound to an expression which provably terminates. +\end{itemize} +Since this transformation can only improve termination, even if we apply it +when @x@ is not provably non-bottom, we provide a compiler flag to +enable it all the time. + +\subsubsection{Case of error} + +@ + case (error ty E) of Alts ===> error ty' E + where + ty' is type of whole case expression +@ + +Mutter about types. Mutter about variables bound to error. +Mutter about disguised forms of error. + +\subsubsection{Floating @let(rec)@ out of @case@} + +A @let(rec)@ binding can be floated out of a @case@ scrutinee: +@ + case (let(rec) Bind in E) of Alts ===> let(rec) Bind in + case E of Alts +@ +This increases the likelihood of a case-of-known-constructor transformation, +because @E@ is not hidden from the @case@ by the @let(rec)@. + +\subsubsection{Floating @case@ out of @case@} + +Analogous to floating a @let(rec)@ from a @case@ scrutinee is +floating a @case@ from a @case@ scrutinee. We have to be +careful, though, about code size. If there's only one alternative +in the inner case, things are easy: +@ + case (case E of {P -> R}) of ===> case E of {P -> case R of + Q1 -> S1 Q1 -> S1 + ... ... + Qm -> Sm Qm -> Sm} +@ +If there's more than one alternative there's a danger +that we'll duplicate @S1@...@Sm@, which might be a lot of code. +Our solution is to create a new local definition for each +alternative: +@ + case (case E of {P1 -> R1; ...; Pn -> Rn}) of + Q1 -> S1 + ... + Qm -> Sm +===> + let s1 = \x1 ... z1 -> S1 + ... + sm = \xm ... zm -> Sm + in + case E of + P1 -> case R1 of {Q1 -> s1 x1 ... z1; ...; Qm -> sm xm ... zm} + ... + Pn -> case Rn of {Q1 -> s1 x1 ... z1; ...; Qm -> sm xm ... zm} +@ +Here, @x1 ... z1@ are that subset of +variables bound by the pattern @Q1@ which are free in @S1@, and +similarly for the other @si@. + +Is this transformation a win? After all, we have introduced @m@ new +functions! Section~\ref{sect:join-points} discusses this point. + +\subsubsection{Case merging} + +@ + case x of + ...[some alts]... + other -> case x of + ...[more alts]... +===> + case x of + ...[some alts]... + ...[more alts]... +@ +Any alternatives in @[more alts]@ which are already covered by @[some alts]@ +should first be eliminated by the dead-alternative transformation. + + +\subsection{Constructor reuse} + + +\subsection{Inlining} + +The inlining transformation is simple enough: +@ + let x = R in B[x] ===> B[R] +@ +Inlining is more conventionally used to describe the instantiation of a function +body at its call site, with arguments substituted for formal parameters. We treat +this as a two-stage process: inlining followed by beta reduction. Since we are +working with a higher-order language, not all the arguments may be available at every +call site, so separating inlining from beta reduction allows us to concentrate on +one problem at a time. + +The choice of exactly {\em which} bindings to inline has a major impact on efficiency. +Specifically, we need to consider the following factors: +\begin{itemize} +\item +Inlining a function at its call site, followed by some beta reduction, +very often exposes opportunities for further transformations. +We inline many simple arithmetic and boolean operators for this reason. +\item +Inlining can increase code size. +\item +Inlining can duplicate work, for example if a redex is inlined at more than one site. +Duplicating a single expensive redex can ruin a program's efficiency. +\end{itemize} + + +Our inlining strategy depends on the form of @R@: + +Mutter mutter. + + +\subsubsection{Dead code removal} + +If a @let@-bound variable is not used the binding can be dropped: +@ + let x = E in B ===> B + x not free in B +@ +A similar transformation applies for @letrec@-bound variables. +Programmers seldom write dead code, of course, but bindings often become dead when they +are inlined. + + + + +\section{Composing transformations} +\label{sect:composing} + +The really interesting thing about humble transformations is the way in which +they compose together to carry out substantial and useful transformations. +This section gives a collection of motivating examples, all of which have +shown up in real application programs. + +\subsection{Repeated evals} +\label{sect:repeated-evals} + +Example: x+x, as in unboxed paper. + + +\subsection{Lazy pattern matching} + +Lazy pattern matching is pretty inefficient. Consider: +@ + let (x,y) = E in B +@ +which desugars to: +@ + let t = E + x = case t of (x,y) -> x + y = case t of (x,y) -> y + in B +@ +This code allocates three thunks! However, if @B@ is strict in {\em either} +@x@ {\em or} @y@, then the strictness analyser will easily spot that +the binding for @t@ is strict, so we can do a @let@-to-@case@ transformation: +@ + case E of + (x,y) -> let t = (x,y) in + let x = case t of (x,y) -> x + y = case t of (x,y) -> y + in B +@ +whereupon the case-of-known-constructor transformation +eliminates the @case@ expressions in the right-hand side of @x@ and @y@, +and @t@ is then spotted as being dead, so we get +@ + case E of + (x,y) -> B +@ + +\subsection{Join points} +\label{sect:join-points} + +One motivating example is this: +@ + if (not x) then E1 else E2 +@ +After desugaring the conditional, and inlining the definition of +@not@, we get +@ + case (case x of True -> False; False -> True}) of + True -> E1 + False -> E2 +@ +Now, if we apply our case-of-case transformation we get: +@ + let e1 = E1 + e2 = E2 + in + case x of + True -> case False of {True -> e1; False -> e2} + False -> case True of {True -> e1; False -> e2} +@ +Now the case-of-known constructor transformation applies: +@ + let e1 = E1 + e2 = E2 + in + case x of + True -> e2 + False -> e1 +@ +Since there is now only one occurrence of @e1@ and @e2@ we can +inline them, giving just what we hoped for: +@ + case x of {True -> E2; False -> E1} +@ +The point is that the local definitions will often disappear again. + +\subsubsection{How join points occur} + +But what if they don't disappear? Then the definitions @s1@ ... @sm@ +play the role of ``join points''; they represent the places where +execution joins up again, having forked at the @case x@. The +``calls'' to the @si@ should really be just jumps. To see this more clearly +consider the expression +@ + if (x || y) then E1 else E2 +@ +A C compiler will ``short-circuit'' the +evaluation of the condition if @x@ turns out to be true +generate code, something like this: +@ + if (x) goto l1; + if (y) {...code for E2...} + l1: ...code for E1... +@ +In our setting, here's what will happen. First we desugar the +conditional, and inline the definition of @||@: +@ + case (case x of {True -> True; False -> y}) of + True -> E1 + False -> E2 +@ +Now apply the case-of-case transformation: +@ + let e1 = E1 + e2 = E2 + in + case x of + True -> case True of {True -> e1; False -> e2} + False -> case y of {True -> e1; False -> e2} +@ +Unlike the @not@ example, only one of the two inner case +simplifies, and we can therefore only inline @e2@, because +@e1@ is still mentioned twice\footnote{Unless the +inlining strategy decides that @E1@ is small enough to duplicate; +it is used in separate @case@ branches so there's no concern about duplicating +work. Here's another example of the way in which we make one part of the +simplifier (the inlining strategy) help with the work of another (@case@-expression +simplification.} +@ + let e1 = E1 + in + case x of + True -> e1 + False -> case y of {True -> e1; False -> e2} +@ +The code generator produces essentially the same code as +the C code given above. The binding for @e1@ turns into +just a label, which is jumped to from the two occurrences of @e1@. + +\subsubsection{Case of @error@} + +The case-of-error transformation is often exposed by the case-of-case +transformation. Consider +@ + case (hd xs) of + True -> E1 + False -> E2 +@ +After inlining @hd@, we get +@ + case (case xs of [] -> error "hd"; (x:_) -> x) of + True -> E1 + False -> E2 +@ +(I've omitted the type argument of @error@ to save clutter.) +Now doing case-of-case gives +@ + let e1 = E1 + e2 = E2 + in + case xs of + [] -> case (error "hd") of { True -> e1; False -> e2 } + (x:_) -> case x of { True -> e1; False -> e2 } +@ +Now the case-of-error transformation springs to life, after which +we can inline @e1@ and @e2@: +@ + case xs of + [] -> error "hd" + (x:_) -> case x of {True -> E1; False -> E2} +@ + +\subsection{Nested conditionals combined} + +Sometimes programmers write something which should be done +by a single @case@ as a sequence of tests: +@ + if x==0::Int then E0 else + if x==1 then E1 else + E2 +@ +After eliminating some redundant evals and doing the case-of-case +transformation we get +@ + case x of I# x# -> + case x# of + 0# -> E0 + other -> case x# of + 1# -> E1 + other -> E2 +@ +The case-merging transformation puts these together to get +@ + case x of I# x# -> + case x# of + 0# -> E0 + 1# -> E1 + other -> E2 +@ +Sometimes the sequence of tests cannot be eliminated from the source +code because of overloading: +@ + f :: Num a => a -> Bool + f 0 = True + f 3 = True + f n = False +@ +If we specialise @f@ to @Int@ we'll get the previous example again. + +\subsection{Error tests eliminated} + +The elimination of redundant alternatives, and then of redundant cases, +arises when we inline functions which do error checking. A typical +example is this: +@ + if (x `rem` y) == 0 then (x `div` y) else y +@ +Here, both @rem@ and @div@ do an error-check for @y@ being zero. +The second check is eliminated by the transformations. +After transformation the code becomes: +@ + case x of I# x# -> + case y of I# y# -> + case y of + 0# -> error "rem: zero divisor" + _ -> case x# rem# y# of + 0# -> case x# div# y# of + r# -> I# r# + _ -> y +@ + +\subsection{Atomic arguments} + +At this point it is possible to appreciate the usefulness of +the Core-language syntax requirement that arguments are atomic. +For example, suppose that arguments could be arbitrary expressions. +Here is a possible transformation: +@ + f (case x of (p,q) -> p) +===> f strict in its second argument + case x of (p,q) -> f (p,p) +@ +Doing this transformation would be useful, because now the +argument to @f@ is a simple variable rather than a thunk. +However, if arguments are atomic, this transformation becomes +just a special case of floating a @case@ out of a strict @let@: +@ + let a = case x of (p,q) -> p + in f a +===> (f a) strict in a + case x of (p,q) -> let a=p in f a +===> + case x of (p,q) -> f p +@ +There are many examples of this kind. For almost any transformation +involving @let@ there is a corresponding one involving a function +argument. The same effect is achieved with much less complexity +by restricting function arguments to be atomic. + +\section{Design} + +Dependency analysis +Occurrence analysis + +\subsection{Renaming and cloning} + +Every program-transformation system has to worry about name capture. +For example, here is an erroneous transformation: +@ + let y = E + in + (\x -> \y -> x + y) (y+3) +===> WRONG! + let y = E + in + (\y -> (y+3) + y) +@ +The transformation fails because the originally free-occurrence +of @y@ in the argument @y+3@ has been ``captured'' by the @\y@-abstraction. +There are various sophisticated solutions to this difficulty, but +we adopted a very simple one: we uniquely rename every locally-bound identifier +on every pass of the simplifier. +Since we are in any case producing an entirely new program (rather than side-effecting +an existing one) it costs very little extra to rename the identifiers as we go. + +So our example would become +@ + let y = E + in + (\x -> \y -> x + y) (y+3) +===> WRONG! + let y1 = E + in + (\y2 -> (y1+3) + y2) +@ +The simplifier accepts as input a program which has arbitrary bound +variable names, including ``shadowing'' (where a binding hides an +outer binding for the same identifier), but it produces a program in +which every bound identifier has a distinct name. + +Both the ``old'' and ``new'' identifiers have type @Id@, but when writing +type signatures for functions in the simplifier we use the types @InId@, for +identifiers from the input program, and @OutId@ for identifiers from the output program: +@ + type InId = Id + type OutId = Id +@ +This nomenclature extends naturally to expressions: a value of type @InExpr@ is an +expression whose identifiers are from the input-program name-space, and similarly +@OutExpr@. + + +\section{The simplifier} + +The basic algorithm followed by the simplifier is: +\begin{enumerate} +\item Analyse: perform occurrence analysis and dependency analysis. +\item Simplify: apply as many transformations as possible. +\item Iterate: perform the above two steps repeatedly until no further transformations are possible. +(A compiler flag allows the programmer to bound the maximum number of iterations.) +\end{enumerate} +We make a effort to apply as many transformations as possible in Step +2. To see why this is a good idea, just consider a sequence of +transformations in which each transformation enables the next. If +each iteration of Step 2 only performs one transformation, then the +entire program will to be re-analysed by Step 1, and re-traversed by +Step 2, for each transformation of the sequence. Sometimes this is +unavoidable, but it is often possible to perform a sequence of +transformtions in a single pass. + +The key function, which simplifies expressions, has the following type: +@ + simplExpr :: SimplEnv + -> InExpr -> [OutArg] + -> SmplM OutExpr +@ +The monad, @SmplM@ can quickly be disposed of. It has only two purposes: +\begin{itemize} +\item It plumbs around a supply of unique names, so that the simplifier can +easily invent new names. +\item It gathers together counts of how many of each kind of transformation +has been applied, for statistical purposes. These counts are also used +in Step 3 to decide when the simplification process has terminated. +\end{itemize} + +The signature can be understood like this: +\begin{itemize} +\item The environment, of type @SimplEnv@, provides information about +identifiers bound by the enclosing context. +\item The second and third arguments together specify the expression to be simplified. +\item The result is the simplified expression, wrapped up by the monad. +\end{itemize} +The simplifier's invariant is this: +$$ +@simplExpr@~env~expr~[a_1,\ldots,a_n] = expr[env]~a_1~\ldots~a_n +$$ +That is, the expression returned by $@simplExpr@~env~expr~[a_1,\ldots,a_n]$ +is semantically equal (although hopefully more efficient than) +$expr$, with the renamings in $env$ applied to it, applied to the arguments +$a_1,\ldots,a_n$. + +\subsection{Application and beta reduction} + +The arguments are carried ``inwards'' by @simplExpr@, as an accumulating parameter. +This is a convenient way of implementing the transformations which float +arguments inside a @let@ and @case@. This list of pending arguments +requires a new data type, @CoreArg@, along with its ``in'' and ``out'' synonyms, +because an argument might be a type or an atom: +@ +data CoreArg bindee = TypeArg UniType + | ValArg (CoreAtom bindee) + +type InArg = CoreArg InId +type OutArg = CoreArg OutId +@ +The equations for applications simply apply +the environment to the argument (to handle renaming) and put the result +on the argument stack, tagged to say whether it is a type argument or value argument: +@ + simplExpr env (CoApp fun arg) args + = simplExpr env fun (ValArg (simplAtom env arg) : args) + simplExpr env (CoTyApp fun ty) args + = simplExpr env fun (TypeArg (simplTy env ty) : args) +@ + + + + + + +\end{document} |