summaryrefslogtreecommitdiff
path: root/compiler/stranal/DmdAnal.lhs
diff options
context:
space:
mode:
authorSimon Peyton Jones <simonpj@microsoft.com>2013-05-03 14:50:58 +0100
committerSimon Peyton Jones <simonpj@microsoft.com>2013-06-06 14:29:53 +0100
commit99d4e5b4a0bd32813ff8c74e91d2dcf6b3555176 (patch)
tree62098e1b36c61fe1a978a29d955f57b629c5ec79 /compiler/stranal/DmdAnal.lhs
parentda4ff650ae77930a5a10d4886c8bc7d37f081db7 (diff)
downloadhaskell-99d4e5b4a0bd32813ff8c74e91d2dcf6b3555176.tar.gz
Implement cardinality analysis
This major patch implements the cardinality analysis described in our paper "Higher order cardinality analysis". It is joint work with Ilya Sergey and Dimitrios Vytiniotis. The basic is augment the absence-analysis part of the demand analyser so that it can tell when something is used never at most once some other way The "at most once" information is used a) to enable transformations, and in particular to identify one-shot lambdas b) to allow updates on thunks to be omitted. There are two new flags, mainly there so you can do performance comparisons: -fkill-absence stops GHC doing absence analysis at all -fkill-one-shot stops GHC spotting one-shot lambdas and single-entry thunks The big changes are: * The Demand type is substantially refactored. In particular the UseDmd is factored as follows data UseDmd = UCall Count UseDmd | UProd [MaybeUsed] | UHead | Used data MaybeUsed = Abs | Use Count UseDmd data Count = One | Many Notice that UCall recurses straight to UseDmd, whereas UProd goes via MaybeUsed. The "Count" embodies the "at most once" or "many" idea. * The demand analyser itself was refactored a lot * The previously ad-hoc stuff in the occurrence analyser for foldr and build goes away entirely. Before if we had build (\cn -> ...x... ) then the "\cn" was hackily made one-shot (by spotting 'build' as special. That's essential to allow x to be inlined. Now the occurrence analyser propagates info gotten from 'build's stricness signature (so build isn't special); and that strictness sig is in turn derived entirely automatically. Much nicer! * The ticky stuff is improved to count single-entry thunks separately. One shortcoming is that there is no DEBUG way to spot if an allegedly-single-entry thunk is acually entered more than once. It would not be hard to generate a bit of code to check for this, and it would be reassuring. But it's fiddly and I have not done it. Despite all this fuss, the performance numbers are rather under-whelming. See the paper for more discussion. nucleic2 -0.8% -10.9% 0.10 0.10 +0.0% sphere -0.7% -1.5% 0.08 0.08 +0.0% -------------------------------------------------------------------------------- Min -4.7% -10.9% -9.3% -9.3% -50.0% Max -0.4% +0.5% +2.2% +2.3% +7.4% Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8% I don't quite know how much credence to place in the runtime changes, but movement seems generally in the right direction.
Diffstat (limited to 'compiler/stranal/DmdAnal.lhs')
-rw-r--r--compiler/stranal/DmdAnal.lhs586
1 files changed, 355 insertions, 231 deletions
diff --git a/compiler/stranal/DmdAnal.lhs b/compiler/stranal/DmdAnal.lhs
index 0eca72fa00..adda04156f 100644
--- a/compiler/stranal/DmdAnal.lhs
+++ b/compiler/stranal/DmdAnal.lhs
@@ -23,17 +23,16 @@ import VarEnv
import BasicTypes
import FastString
import Data.List
-import DataCon ( dataConTyCon, dataConRepStrictness, isMarkedStrict )
+import DataCon
import Id
-import CoreUtils ( exprIsHNF, exprIsTrivial )
+import CoreUtils ( exprIsHNF, exprType, exprIsTrivial )
import PprCore
-import UniqFM ( filterUFM )
import TyCon
import Pair
import Type ( eqType, tyConAppTyCon_maybe )
import Coercion ( coercionKind )
import Util
-import Maybes ( orElse )
+import Maybes ( isJust, orElse )
import TysWiredIn ( unboxedPairDataCon )
import TysPrim ( realWorldStatePrimTy )
\end{code}
@@ -45,7 +44,6 @@ import TysPrim ( realWorldStatePrimTy )
%************************************************************************
\begin{code}
-
dmdAnalProgram :: DynFlags -> CoreProgram -> IO CoreProgram
dmdAnalProgram dflags binds
= do {
@@ -54,29 +52,27 @@ dmdAnalProgram dflags binds
}
where
do_prog :: CoreProgram -> CoreProgram
- do_prog binds = snd $ mapAccumL (dmdAnalTopBind dflags) emptySigEnv binds
+ do_prog binds = snd $ mapAccumL dmdAnalTopBind (emptyAnalEnv dflags) binds
-- Analyse a (group of) top-level binding(s)
-dmdAnalTopBind :: DynFlags
- -> SigEnv
+dmdAnalTopBind :: AnalEnv
-> CoreBind
- -> (SigEnv, CoreBind)
-dmdAnalTopBind dflags sigs (NonRec id rhs)
- = (sigs2, NonRec id2 rhs2)
+ -> (AnalEnv, CoreBind)
+dmdAnalTopBind sigs (NonRec id rhs)
+ = (extendAnalEnv TopLevel sigs id sig, NonRec id2 rhs2)
where
- ( _, _, (_, rhs1)) = dmdAnalRhs dflags TopLevel NonRecursive (virgin sigs) (id, rhs)
- (sigs2, _, (id2, rhs2)) = dmdAnalRhs dflags TopLevel NonRecursive (nonVirgin sigs) (id, rhs1)
+ ( _, _, _, rhs1) = dmdAnalRhs TopLevel Nothing sigs id rhs
+ (sig, _, id2, rhs2) = dmdAnalRhs TopLevel Nothing (nonVirgin sigs) id rhs1
-- Do two passes to improve CPR information
-- See comments with ignore_cpr_info in mk_sig_ty
-- and with extendSigsWithLam
-dmdAnalTopBind dflags sigs (Rec pairs)
+dmdAnalTopBind sigs (Rec pairs)
= (sigs', Rec pairs')
where
- (sigs', _, pairs') = dmdFix dflags TopLevel (virgin sigs) pairs
+ (sigs', _, pairs') = dmdFix TopLevel sigs pairs
-- We get two iterations automatically
-- c.f. the NonRec case above
-
\end{code}
%************************************************************************
@@ -101,61 +97,45 @@ c) The application rule wouldn't be right either
Evaluating (f x) in a L demand does *not* cause
evaluation of f in a C(L) demand!
-Note [Always analyse in virgin pass]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Tricky point: make sure that we analyse in the 'virgin' pass. Consider
- rec { f acc x True = f (...rec { g y = ...g... }...)
- f acc x False = acc }
-In the virgin pass for 'f' we'll give 'f' a very strict (bottom) type.
-That might mean that we analyse the sub-expression containing the
-E = "...rec g..." stuff in a bottom demand. Suppose we *didn't analyse*
-E, but just retuned botType.
-
-Then in the *next* (non-virgin) iteration for 'f', we might analyse E
-in a weaker demand, and that will trigger doing a fixpoint iteration
-for g. But *because it's not the virgin pass* we won't start g's
-iteration at bottom. Disaster. (This happened in $sfibToList' of
-nofib/spectral/fibheaps.)
-
-So in the virgin pass we make sure that we do analyse the expression
-at least once, to initialise its signatures.
-
\begin{code}
-evalDmdAnal :: DynFlags -> AnalEnv -> CoreExpr -> (DmdType, CoreExpr)
--- See Note [Ensure demand is strict]
-evalDmdAnal dflags env e
- | (res_ty, e') <- dmdAnal dflags env evalDmd e
- = (deferType res_ty, e')
-
-simpleDmdAnal :: DynFlags -> AnalEnv -> DmdType -> CoreExpr -> (DmdType, CoreExpr)
-simpleDmdAnal dflags env res_ty e
- | ae_virgin env -- See Note [Always analyse in virgin pass]
- , (_discarded_res_ty, e') <- dmdAnal dflags env evalDmd e
- = (res_ty, e')
- | otherwise
- = (res_ty, e)
-
-dmdAnal :: DynFlags -> AnalEnv -> Demand -> CoreExpr -> (DmdType, CoreExpr)
-dmdAnal dflags env dmd e
- | isBotDmd dmd = simpleDmdAnal dflags env botDmdType e
- | isAbsDmd dmd = simpleDmdAnal dflags env topDmdType e
- | not (isStrictDmd dmd) = evalDmdAnal dflags env e
-
-dmdAnal _ _ _ (Lit lit) = (topDmdType, Lit lit)
-dmdAnal _ _ _ (Type ty) = (topDmdType, Type ty) -- Doesn't happen, in fact
-dmdAnal _ _ _ (Coercion co) = (topDmdType, Coercion co)
-
-dmdAnal _ env dmd (Var var)
+dmdAnalThunk :: AnalEnv
+ -> Demand -- This one takes a *Demand*
+ -> CoreExpr -> (DmdType, CoreExpr)
+dmdAnalThunk env dmd e
+ | exprIsTrivial e = dmdAnalStar env dmd e
+ | otherwise = dmdAnalStar env (oneifyDmd dmd) e
+
+-- Do not process absent demands
+-- Otherwise act like in a normal demand analysis
+-- See |-* relation in the companion paper
+dmdAnalStar :: AnalEnv
+ -> Demand -- This one takes a *Demand*
+ -> CoreExpr -> (DmdType, CoreExpr)
+dmdAnalStar env dmd e = toCleanDmd (dmdAnal env) dmd e
+
+-- Main Demand Analsysis machinery
+dmdAnal :: AnalEnv
+ -> CleanDemand -- The main one takes a *CleanDemand*
+ -> CoreExpr -> (DmdType, CoreExpr)
+
+-- The CleanDemand is always strict and not absent
+-- See Note [Ensure demand is strict]
+
+dmdAnal _ _ (Lit lit) = (topDmdType, Lit lit)
+dmdAnal _ _ (Type ty) = (topDmdType, Type ty) -- Doesn't happen, in fact
+dmdAnal _ _ (Coercion co) = (topDmdType, Coercion co)
+
+dmdAnal env dmd (Var var)
= (dmdTransform env var dmd, Var var)
-dmdAnal dflags env dmd (Cast e co)
+dmdAnal env dmd (Cast e co)
= (dmd_ty, Cast e' co)
where
- (dmd_ty, e') = dmdAnal dflags env dmd' e
+ (dmd_ty, e') = dmdAnal env dmd' e
to_co = pSnd (coercionKind co)
dmd'
| Just tc <- tyConAppTyCon_maybe to_co
- , isRecursiveTyCon tc = evalDmd
+ , isRecursiveTyCon tc = cleanEvalDmd
| otherwise = dmd
-- This coerce usually arises from a recursive
-- newtype, and we don't want to look inside them
@@ -163,28 +143,29 @@ dmdAnal dflags env dmd (Cast e co)
-- inside recursive products -- we might not reach
-- a fixpoint. So revert to a vanilla Eval demand
-dmdAnal dflags env dmd (Tick t e)
+dmdAnal env dmd (Tick t e)
= (dmd_ty, Tick t e')
where
- (dmd_ty, e') = dmdAnal dflags env dmd e
+ (dmd_ty, e') = dmdAnal env dmd e
-dmdAnal dflags env dmd (App fun (Type ty))
+dmdAnal env dmd (App fun (Type ty))
= (fun_ty, App fun' (Type ty))
where
- (fun_ty, fun') = dmdAnal dflags env dmd fun
+ (fun_ty, fun') = dmdAnal env dmd fun
-dmdAnal dflags sigs dmd (App fun (Coercion co))
+dmdAnal sigs dmd (App fun (Coercion co))
= (fun_ty, App fun' (Coercion co))
where
- (fun_ty, fun') = dmdAnal dflags sigs dmd fun
+ (fun_ty, fun') = dmdAnal sigs dmd fun
-- Lots of the other code is there to make this
-- beautiful, compositional, application rule :-)
-dmdAnal dflags env dmd (App fun arg) -- Non-type arguments
- = let -- [Type arg handled above]
- (fun_ty, fun') = dmdAnal dflags env (mkCallDmd dmd) fun
- (arg_ty, arg') = dmdAnal dflags env arg_dmd arg
+dmdAnal env dmd (App fun arg) -- Non-type arguments
+ = let -- [Type arg handled above]
+ call_dmd = mkCallDmd dmd
+ (fun_ty, fun') = dmdAnal env call_dmd fun
(arg_dmd, res_ty) = splitDmdTy fun_ty
+ (arg_ty, arg') = dmdAnalThunk env arg_dmd arg
in
-- pprTrace "dmdAnal:app" (vcat
-- [ text "dmd =" <+> ppr dmd
@@ -196,37 +177,34 @@ dmdAnal dflags env dmd (App fun arg) -- Non-type arguments
-- , text "overall res dmd_ty =" <+> ppr (res_ty `bothDmdType` arg_ty) ])
(res_ty `bothDmdType` arg_ty, App fun' arg')
-dmdAnal dflags env dmd (Lam var body)
+dmdAnal env dmd (Lam var body)
| isTyVar var
- = let
- (body_ty, body') = dmdAnal dflags env dmd body
+ = let
+ (body_ty, body') = dmdAnal env dmd body
in
(body_ty, Lam var body')
- | Just body_dmd <- peelCallDmd dmd -- A call demand: good!
- = let
- env' = extendSigsWithLam env var
- (body_ty, body') = dmdAnal dflags env' body_dmd body
- (lam_ty, var') = annotateLamIdBndr dflags env body_ty var
- in
- (lam_ty, Lam var' body')
+ | otherwise
+ = let (body_dmd, defer_me, one_shot) = peelCallDmd dmd
+ -- body_dmd - a demand to analyze the body
+ -- one_shot - one-shotness of the lambda
+ -- hence, cardinality of its free vars
- | otherwise -- Not enough demand on the lambda; but do the body
- = let -- anyway to annotate it and gather free var info
- (body_ty, body') = dmdAnal dflags env evalDmd body
- (lam_ty, var') = annotateLamIdBndr dflags env body_ty var
+ env' = extendSigsWithLam env var
+ (body_ty, body') = dmdAnal env' body_dmd body
+ (lam_ty, var') = annotateLamIdBndr env body_ty one_shot var
in
- (deferType lam_ty, Lam var' body')
+ (deferAndUse defer_me one_shot lam_ty, Lam var' body')
-dmdAnal dflags env dmd (Case scrut case_bndr ty [alt@(DataAlt dc, _, _)])
+dmdAnal env dmd (Case scrut case_bndr ty [alt@(DataAlt dc, _, _)])
-- Only one alternative with a product constructor
| let tycon = dataConTyCon dc
, isProductTyCon tycon
, not (isRecursiveTyCon tycon)
= let
env_alt = extendAnalEnv NotTopLevel env case_bndr case_bndr_sig
- (alt_ty, alt') = dmdAnalAlt dflags env_alt dmd alt
- (alt_ty1, case_bndr') = annotateBndr alt_ty case_bndr
+ (alt_ty, alt') = dmdAnalAlt env_alt dmd alt
+ (alt_ty1, case_bndr') = annotateBndr env alt_ty case_bndr
(_, bndrs', _) = alt'
case_bndr_sig = cprProdSig
-- Inside the alternative, the case binder has the CPR property.
@@ -258,12 +236,11 @@ dmdAnal dflags env dmd (Case scrut case_bndr ty [alt@(DataAlt dc, _, _)])
-- The above code would compute a Keep for x, since y is not Abs, which is silly
-- The insight is, of course, that a demand on y is a demand on the
-- scrutinee, so we need to `both` it with the scrut demand
+
+ scrut_dmd1 = mkProdDmd [idDemandInfo b | b <- bndrs', isId b]
+ scrut_dmd2 = strictenDmd (idDemandInfo case_bndr')
- alt_dmd = mkProdDmd [idDemandInfo b | b <- bndrs', isId b]
- scrut_dmd = alt_dmd `bothDmd`
- idDemandInfo case_bndr'
-
- (scrut_ty, scrut') = dmdAnal dflags env scrut_dmd scrut
+ (scrut_ty, scrut') = dmdAnal env (scrut_dmd1 `bothCleanDmd` scrut_dmd2) scrut
res_ty = alt_ty1 `bothDmdType` scrut_ty
in
-- pprTrace "dmdAnal:Case1" (vcat [ text "scrut" <+> ppr scrut
@@ -276,12 +253,12 @@ dmdAnal dflags env dmd (Case scrut case_bndr ty [alt@(DataAlt dc, _, _)])
-- , text "res_ty" <+> ppr res_ty ]) $
(res_ty, Case scrut' case_bndr' ty [alt'])
-dmdAnal dflags env dmd (Case scrut case_bndr ty alts)
+dmdAnal env dmd (Case scrut case_bndr ty alts)
= let -- Case expression with multiple alternatives
- (alt_tys, alts') = mapAndUnzip (dmdAnalAlt dflags env dmd) alts
- (scrut_ty, scrut') = dmdAnal dflags env evalDmd scrut
- (alt_ty, case_bndr') = annotateBndr (foldr lubDmdType botDmdType alt_tys) case_bndr
- res_ty = alt_ty `bothDmdType` scrut_ty
+ (alt_tys, alts') = mapAndUnzip (dmdAnalAlt env dmd) alts
+ (scrut_ty, scrut') = dmdAnal env cleanEvalDmd scrut
+ (alt_ty, case_bndr') = annotateBndr env(foldr lubDmdType botDmdType alt_tys) case_bndr
+ res_ty = alt_ty `bothDmdType` scrut_ty
in
-- pprTrace "dmdAnal:Case2" (vcat [ text "scrut" <+> ppr scrut
-- , text "scrut_ty" <+> ppr scrut_ty
@@ -289,13 +266,18 @@ dmdAnal dflags env dmd (Case scrut case_bndr ty alts)
-- , text "res_ty" <+> ppr res_ty ]) $
(res_ty, Case scrut' case_bndr' ty alts')
-dmdAnal dflags env dmd (Let (NonRec id rhs) body)
- = let
- (sigs', lazy_fv, (id1, rhs')) = dmdAnalRhs dflags NotTopLevel NonRecursive env (id, rhs)
- (body_ty, body') = dmdAnal dflags (updSigEnv env sigs') dmd body
- (body_ty1, id2) = annotateBndr body_ty id1
- body_ty2 = addLazyFVs body_ty1 lazy_fv
- in
+dmdAnal env dmd (Let (NonRec id rhs) body)
+ = (body_ty2, Let (NonRec id2 annotated_rhs) body')
+ where
+ (sig, lazy_fv, id1, rhs') = dmdAnalRhs NotTopLevel Nothing env id rhs
+ (body_ty, body') = dmdAnal (extendAnalEnv NotTopLevel env id sig) dmd body
+ (body_ty1, id2) = annotateBndr env body_ty id1
+ body_ty2 = addLazyFVs body_ty1 lazy_fv
+
+ -- Annotate top-level lambdas at RHS basing on the aggregated demand info
+ -- See Note [Annotating lambdas at right-hand side]
+ annotated_rhs = annLamWithShotness (idDemandInfo id2) rhs'
+
-- If the actual demand is better than the vanilla call
-- demand, you might think that we might do better to re-analyse
-- the RHS with the stronger demand.
@@ -308,32 +290,42 @@ dmdAnal dflags env dmd (Let (NonRec id rhs) body)
-- In practice, all the times the actual demand on id2 is more than
-- the vanilla call demand seem to be due to (b). So we don't
-- bother to re-analyse the RHS.
- (body_ty2, Let (NonRec id2 rhs') body')
-dmdAnal dflags env dmd (Let (Rec pairs) body)
+dmdAnal env dmd (Let (Rec pairs) body)
= let
- bndrs = map fst pairs
- (sigs', lazy_fv, pairs') = dmdFix dflags NotTopLevel env pairs
- (body_ty, body') = dmdAnal dflags (updSigEnv env sigs') dmd body
- body_ty1 = addLazyFVs body_ty lazy_fv
- in
- sigs' `seq` body_ty `seq`
- let
- (body_ty2, _) = annotateBndrs body_ty1 bndrs
- -- Don't bother to add demand info to recursive
- -- binders as annotateBndr does;
- -- being recursive, we can't treat them strictly.
- -- But we do need to remove the binders from the result demand env
+ (env', lazy_fv, pairs') = dmdFix NotTopLevel env pairs
+ (body_ty, body') = dmdAnal env' dmd body
+ body_ty1 = deleteFVs body_ty (map fst pairs)
+ body_ty2 = addLazyFVs body_ty1 lazy_fv
in
+ body_ty2 `seq`
(body_ty2, Let (Rec pairs') body')
+annLamWithShotness :: Demand -> CoreExpr -> CoreExpr
+annLamWithShotness d e
+ | Just u <- cleanUseDmd_maybe d
+ = go u e
+ | otherwise = e
+ where
+ go u e
+ | Just (c, u') <- peelUseCall u
+ , Lam bndr body <- e
+ = if isTyVar bndr
+ then Lam bndr (go u body)
+ else Lam (setOneShotness c bndr) (go u' body)
+ | otherwise
+ = e
+
+setOneShotness :: Count -> Id -> Id
+setOneShotness One bndr = setOneShotLambda bndr
+setOneShotness Many bndr = bndr
-dmdAnalAlt :: DynFlags -> AnalEnv -> Demand -> Alt Var -> (DmdType, Alt Var)
-dmdAnalAlt dflags env dmd (con,bndrs,rhs)
+dmdAnalAlt :: AnalEnv -> CleanDemand -> Alt Var -> (DmdType, Alt Var)
+dmdAnalAlt env dmd (con,bndrs,rhs)
= let
- (rhs_ty, rhs') = dmdAnal dflags env dmd rhs
+ (rhs_ty, rhs') = dmdAnal env dmd rhs
rhs_ty' = addDataConPatDmds con bndrs rhs_ty
- (alt_ty, bndrs') = annotateBndrs rhs_ty' bndrs
+ (alt_ty, bndrs') = annotateBndrs env rhs_ty' bndrs
final_alt_ty | io_hack_reqd = alt_ty `lubDmdType` topDmdType
| otherwise = alt_ty
@@ -358,7 +350,66 @@ dmdAnalAlt dflags env dmd (con,bndrs,rhs)
idType (head bndrs) `eqType` realWorldStatePrimTy
in
(final_alt_ty, (con, bndrs', rhs'))
+\end{code}
+
+Note [Aggregated demand for cardinality]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We use different strategies for strictness and usage/cardinality to
+"unleash" demands captured on free variables by bindings. Let us
+consider the example:
+
+f1 y = let {-# NOINLINE h #-}
+ h = y
+ in (h, h)
+
+We are interested in obtaining cardinality demand U1 on |y|, as it is
+used only in a thunk, and, therefore, is not going to be updated any
+more. Therefore, the demand on |y|, captured and unleashed by usage of
+|h| is U1. However, if we unleash this demand every time |h| is used,
+and then sum up the effects, the ultimate demand on |y| will be U1 +
+U1 = U. In order to avoid it, we *first* collect the aggregate demand
+on |h| in the body of let-expression, and only then apply the demand
+transformer:
+
+transf[x](U) = {y |-> U1}
+
+so the resulting demand on |y| is U1.
+
+The situation is, however, different for strictness, where this
+aggregating approach exhibits worse results because of the nature of
+|both| operation for strictness. Consider the example:
+
+f y c =
+ let h x = y |seq| x
+ in case of
+ True -> h True
+ False -> y
+
+It is clear that |f| is strict in |y|, however, the suggested analysis
+will infer from the body of |let| that |h| is used lazily (as it is
+used in one branch only), therefore lazy demand will be put on its
+free variable |y|. Conversely, if the demand on |h| is unleashed right
+on the spot, we will get the desired result, namely, that |f| is
+strict in |y|.
+
+Note [Annotating lambdas at right-hand side]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Let us take a look at the following example:
+
+g f = let x = 100
+ h = \y -> f x y
+ in h 5
+
+One can see that |h| is called just once, therefore the RHS of h can
+be annotated as a one-shot lambda. This is done by the function
+annLamWithShotness *a posteriori*, i.e., basing on the aggregated
+usage demand on |h| from the body of |let|-expression, which is C1(U)
+in this case.
+
+In other words, for locally-bound lambdas we can infer
+one-shotness.
+\begin{code}
addDataConPatDmds :: AltCon -> [Var] -> DmdType -> DmdType
-- See Note [Add demands for strict constructors]
addDataConPatDmds DEFAULT _ dmd_ty = dmd_ty
@@ -366,14 +417,45 @@ addDataConPatDmds (LitAlt _) _ dmd_ty = dmd_ty
addDataConPatDmds (DataAlt con) bndrs dmd_ty
= foldr add dmd_ty str_bndrs
where
- add bndr dmd_ty = addVarDmd dmd_ty bndr absDmd
+ add bndr dmd_ty = addVarDmd dmd_ty bndr seqDmd
str_bndrs = [ b | (b,s) <- zipEqual "addDataConPatBndrs"
(filter isId bndrs)
(dataConRepStrictness con)
, isMarkedStrict s ]
-
\end{code}
+Note [Add demands for strict constructors]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider this program (due to Roman):
+
+ data X a = X !a
+
+ foo :: X Int -> Int -> Int
+ foo (X a) n = go 0
+ where
+ go i | i < n = a + go (i+1)
+ | otherwise = 0
+
+We want the worker for 'foo' too look like this:
+
+ $wfoo :: Int# -> Int# -> Int#
+
+with the first argument unboxed, so that it is not eval'd each time
+around the loop (which would otherwise happen, since 'foo' is not
+strict in 'a'. It is sound for the wrapper to pass an unboxed arg
+because X is strict, so its argument must be evaluated. And if we
+*don't* pass an unboxed argument, we can't even repair it by adding a
+`seq` thus:
+
+ foo (X a) n = a `seq` go 0
+
+because the seq is discarded (very early) since X is strict!
+
+There is the usual danger of reboxing, which as usual we ignore. But
+if X is monomorphic, and has an UNPACK pragma, then this optimisation
+is even more important. We don't want the wrapper to rebox an unboxed
+argument, and pass an Int to $wfoo!
+
%************************************************************************
%* *
Demand transformer
@@ -383,7 +465,7 @@ addDataConPatDmds (DataAlt con) bndrs dmd_ty
\begin{code}
dmdTransform :: AnalEnv -- The strictness environment
-> Id -- The function
- -> Demand -- The demand on the function
+ -> CleanDemand -- The demand on the function
-> DmdType -- The demand type of the function in this context
-- Returned DmdEnv includes the demand on
-- this function plus demand on its free variables
@@ -394,16 +476,18 @@ dmdTransform env var dmd
(idArity var) (idStrictness var) dmd
| isGlobalId var -- Imported function
- = dmdTransformSig (idStrictness var) dmd
+ = let res = dmdTransformSig (idStrictness var) dmd in
+ -- pprTrace "dmdTransform" (vcat [ppr var, ppr dmd, ppr res])
+ res
| Just (sig, top_lvl) <- lookupSigEnv env var -- Local letrec bound thing
, let fn_ty = dmdTransformSig sig dmd
= if isTopLevel top_lvl
- then fn_ty -- Don't record top level things
- else addVarDmd fn_ty var dmd
+ then fn_ty -- Don't record top level things
+ else addVarDmd fn_ty var (mkOnceUsedDmd dmd)
| otherwise -- Local non-letrec-bound thing
- = unitVarDmd var dmd
+ = unitVarDmd var (mkOnceUsedDmd dmd)
\end{code}
%************************************************************************
@@ -415,30 +499,31 @@ dmdTransform env var dmd
\begin{code}
-- Recursive bindings
-dmdFix :: DynFlags
- -> TopLevelFlag
+dmdFix :: TopLevelFlag
-> AnalEnv -- Does not include bindings for this binding
-> [(Id,CoreExpr)]
- -> (SigEnv, DmdEnv,
+ -> (AnalEnv, DmdEnv,
[(Id,CoreExpr)]) -- Binders annotated with stricness info
-dmdFix dflags top_lvl env orig_pairs
- = loop 1 initial_env orig_pairs
+dmdFix top_lvl env orig_pairs
+ = (updSigEnv env (sigEnv final_env), lazy_fv, pairs')
+ -- Return to original virgin state, keeping new signatures
where
bndrs = map fst orig_pairs
initial_env = addInitialSigs top_lvl env bndrs
+ (final_env, lazy_fv, pairs') = loop 1 initial_env orig_pairs
loop :: Int
-> AnalEnv -- Already contains the current sigs
-> [(Id,CoreExpr)]
- -> (SigEnv, DmdEnv, [(Id,CoreExpr)])
+ -> (AnalEnv, DmdEnv, [(Id,CoreExpr)])
loop n env pairs
= -- pprTrace "dmd loop" (ppr n <+> ppr bndrs $$ ppr env) $
loop' n env pairs
loop' n env pairs
| found_fixpoint
- = (sigs', lazy_fv, pairs')
+ = (env', lazy_fv, pairs')
-- Note: return pairs', not pairs. pairs' is the result of
-- processing the RHSs with sigs (= sigs'), whereas pairs
-- is the result of processing the RHSs with the *previous*
@@ -446,11 +531,12 @@ dmdFix dflags top_lvl env orig_pairs
| n >= 10
= pprTrace "dmdFix loop" (ppr n <+> (vcat
- [ text "Sigs:" <+> ppr [ (id,lookupVarEnv sigs id, lookupVarEnv sigs' id)
+ [ text "Sigs:" <+> ppr [ (id,lookupVarEnv (sigEnv env) id,
+ lookupVarEnv (sigEnv env') id)
| (id,_) <- pairs],
text "env:" <+> ppr env,
text "binds:" <+> pprCoreBinding (Rec pairs)]))
- (sigEnv env, lazy_fv, orig_pairs) -- Safe output
+ (env, lazy_fv, orig_pairs) -- Safe output
-- The lazy_fv part is really important! orig_pairs has no strictness
-- info, including nothing about free vars. But if we have
-- letrec f = ....y..... in ...f...
@@ -458,21 +544,21 @@ dmdFix dflags top_lvl env orig_pairs
-- otherwise y will get recorded as absent altogether
| otherwise
- = loop (n+1) (nonVirgin sigs') pairs'
+ = loop (n+1) (nonVirgin env') pairs'
where
- sigs = sigEnv env
- found_fixpoint = all (same_sig sigs sigs') bndrs
+ found_fixpoint = all (same_sig (sigEnv env) (sigEnv env')) bndrs
- ((sigs',lazy_fv), pairs') = mapAccumL my_downRhs (sigs, emptyDmdEnv) pairs
+ ((env',lazy_fv), pairs') = mapAccumL my_downRhs (env, emptyDmdEnv) pairs
-- mapAccumL: Use the new signature to do the next pair
-- The occurrence analyser has arranged them in a good order
-- so this can significantly reduce the number of iterations needed
- my_downRhs (sigs,lazy_fv) (id,rhs)
- = ((sigs', lazy_fv'), pair')
+ my_downRhs (env, lazy_fv) (id,rhs)
+ = ((env', lazy_fv'), (id', rhs'))
where
- (sigs', lazy_fv1, pair') = dmdAnalRhs dflags top_lvl Recursive (updSigEnv env sigs) (id,rhs)
- lazy_fv' = plusVarEnv_C bothDmd lazy_fv lazy_fv1
+ (sig, lazy_fv1, id', rhs') = dmdAnalRhs top_lvl (Just bndrs) env id rhs
+ lazy_fv' = plusVarEnv_C bothDmd lazy_fv lazy_fv1
+ env' = extendAnalEnv top_lvl env id sig
same_sig sigs sigs' var = lookup sigs var == lookup sigs' var
lookup sigs var = case lookupVarEnv sigs var of
@@ -480,26 +566,79 @@ dmdFix dflags top_lvl env orig_pairs
Nothing -> pprPanic "dmdFix" (ppr var)
-- Non-recursive bindings
-dmdAnalRhs :: DynFlags -> TopLevelFlag -> RecFlag
- -> AnalEnv -> (Id, CoreExpr)
- -> (SigEnv, DmdEnv, (Id, CoreExpr))
+dmdAnalRhs :: TopLevelFlag
+ -> Maybe [Id] -- Just bs <=> recursive, Nothing <=> non-recursive
+ -> AnalEnv -> Id -> CoreExpr
+ -> (StrictSig, DmdEnv, Id, CoreExpr)
-- Process the RHS of the binding, add the strictness signature
-- to the Id, and augment the environment with the signature as well.
-dmdAnalRhs dflags top_lvl rec_flag env (id, rhs)
- = (sigs', lazy_fv, (id', rhs'))
- where
- arity = idArity id -- The idArity should be up to date
- -- The simplifier was run just beforehand
- (rhs_dmd_ty, rhs') = dmdAnal dflags env (vanillaCall arity) rhs
- (lazy_fv, sig_ty) = WARN( arity /= dmdTypeDepth rhs_dmd_ty && not (exprIsTrivial rhs), ppr id )
- -- The RHS can be eta-reduced to just a variable,
- -- in which case we should not complain.
- mkSigTy top_lvl rec_flag env id rhs rhs_dmd_ty
- id' = id `setIdStrictness` sig_ty
- sigs' = extendSigEnv top_lvl (sigEnv env) id sig_ty
+dmdAnalRhs top_lvl rec_flag env id rhs
+ = (sig_ty, lazy_fv, id', mkLams bndrs' body')
+ where
+ (bndrs, body) = collectBinders rhs
+ env_body = foldl extendSigsWithLam env bndrs
+ (body_dmd_ty, body') = dmdAnal env_body body_dmd body
+ (rhs_dmd_ty, bndrs') = annotateLamBndrs env body_dmd_ty bndrs
+ id' = set_idStrictness env id sig_ty
+ sig_ty = mkStrictSig (mkDmdType sig_fv rhs_dmds rhs_res')
+ -- See Note [NOINLINE and strictness]
+
+ -- See Note [Product demands for function body]
+ body_dmd = case deepSplitProductType_maybe (exprType body) of
+ Nothing -> cleanEvalDmd
+ Just (dc, _, _, _) -> cleanEvalProdDmd (dataConRepArity dc)
+
+ DmdType rhs_fv rhs_dmds rhs_res = rhs_dmd_ty
+
+ -- See Note [Lazy and unleashable free variables]
+ -- See Note [Aggregated demand for cardinality]
+ rhs_fv1 = case rec_flag of
+ Just bs -> useEnv (delVarEnvList rhs_fv bs)
+ Nothing -> rhs_fv
+
+ (lazy_fv, sig_fv) = splitFVs is_thunk rhs_fv1
+
+ rhs_res' | returnsCPR rhs_res
+ , discard_cpr_info = topRes
+ | otherwise = rhs_res
+
+ discard_cpr_info = nested_sum || (is_thunk && not_strict)
+ nested_sum -- See Note [CPR for sum types ]
+ = not (isTopLevel top_lvl || returnsCPRProd rhs_res)
+ -- See Note [CPR for thunks]
+ is_thunk = not (exprIsHNF rhs)
+ not_strict
+ = isTopLevel top_lvl -- Top level and recursive things don't
+ || isJust rec_flag -- get their demandInfo set at all
+ || not (isStrictDmd (idDemandInfo id) || ae_virgin env)
+ -- See Note [Optimistic CPR in the "virgin" case]
\end{code}
+Note [Product demands for function body]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This example comes from shootout/binary_trees:
+
+ Main.check' = \ b z ds. case z of z' { I# ip ->
+ case ds_d13s of
+ Main.Nil -> z'
+ Main.Node s14k s14l s14m ->
+ Main.check' (not b)
+ (Main.check' b
+ (case b {
+ False -> I# (-# s14h s14k);
+ True -> I# (+# s14h s14k)
+ })
+ s14l)
+ s14m } } }
+
+Here we *really* want to unbox z, even though it appears to be used boxed in
+the Nil case. Partly the Nil case is not a hot path. But more specifically,
+the whole function gets the CPR property if we do.
+
+So for the demand on the body of a RHS we use a product demand if it's
+a product type.
+
%************************************************************************
%* *
\subsection{Strictness signatures and types}
@@ -516,19 +655,17 @@ addVarDmd (DmdType fv ds res) var dmd
= DmdType (extendVarEnv_C bothDmd fv var dmd) ds res
addLazyFVs :: DmdType -> DmdEnv -> DmdType
-addLazyFVs (DmdType fv ds res) lazy_fvs
- = DmdType both_fv1 ds res
- where
- both_fv = plusVarEnv_C bothDmd fv lazy_fvs
- both_fv1 = modifyEnv (isBotRes res) (`bothDmd` botDmd) lazy_fvs fv both_fv
- -- This modifyEnv is vital. Consider
+addLazyFVs dmd_ty lazy_fvs
+ = dmd_ty `bothDmdType` mkDmdType lazy_fvs [] topRes
+ -- Using bothDmdType (rather than just both'ing the envs)
+ -- is vital. Consider
-- let f = \x -> (x,y)
-- in error (f 3)
-- Here, y is treated as a lazy-fv of f, but we must `bothDmd` that L
-- demand with the bottom coming up from 'error'
--
-- I got a loop in the fixpointer without this, due to an interaction
- -- with the lazy_fv filtering in mkSigTy. Roughly, it was
+ -- with the lazy_fv filtering in dmdAnalRhs. Roughly, it was
-- letrec f n x
-- = letrec g y = x `fatbar`
-- letrec h z = z + ...g...
@@ -543,13 +680,9 @@ addLazyFVs (DmdType fv ds res) lazy_fvs
-- which floats out of the defn for h. Without the modifyEnv, that
-- L demand doesn't get both'd with the Bot coming up from the inner
-- call to f. So we just get an L demand for x for g.
- --
- -- A better way to say this is that the lazy-fv filtering should give the
- -- same answer as putting the lazy fv demands in the function's type.
-
-removeFV :: DmdEnv -> Var -> DmdResult -> (DmdEnv, Demand)
-removeFV fv id res = -- pprTrace "rfv" (ppr id <+> ppr dmd $$ ppr fv)
+peelFV :: DmdEnv -> Var -> DmdResult -> (DmdEnv, Demand)
+peelFV fv id res = -- pprTrace "rfv" (ppr id <+> ppr dmd $$ ppr fv)
(fv', dmd)
where
fv' = fv `delVarEnv` id
@@ -570,73 +703,55 @@ possible to safely ignore non-mentioned variables (their joint demand
is <L,A>).
\begin{code}
-annotateBndr :: DmdType -> Var -> (DmdType, Var)
+annotateBndr :: AnalEnv -> DmdType -> Var -> (DmdType, Var)
-- The returned env has the var deleted
-- The returned var is annotated with demand info
-- according to the result demand of the provided demand type
-- No effect on the argument demands
-annotateBndr dmd_ty@(DmdType fv ds res) var
+annotateBndr env dmd_ty@(DmdType fv ds res) var
| isTyVar var = (dmd_ty, var)
- | otherwise = (DmdType fv' ds res, setIdDemandInfo var dmd)
+ | otherwise = (DmdType fv' ds res, set_idDemandInfo env var dmd)
where
- (fv', dmd) = removeFV fv var res
+ (fv', dmd) = peelFV fv var res
-annotateBndrs :: DmdType -> [Var] -> (DmdType, [Var])
-annotateBndrs = mapAccumR annotateBndr
+annotateBndrs :: AnalEnv -> DmdType -> [Var] -> (DmdType, [Var])
+annotateBndrs env = mapAccumR (annotateBndr env)
-annotateLamIdBndr :: DynFlags
- -> AnalEnv
+annotateLamBndrs :: AnalEnv -> DmdType -> [Var] -> (DmdType, [Var])
+annotateLamBndrs env ty bndrs = mapAccumR annotate ty bndrs
+ where
+ annotate dmd_ty bndr
+ | isId bndr = annotateLamIdBndr env dmd_ty Many bndr
+ | otherwise = (dmd_ty, bndr)
+
+annotateLamIdBndr :: AnalEnv
-> DmdType -- Demand type of body
+ -> Count -- One-shot-ness of the lambda
-> Id -- Lambda binder
-> (DmdType, -- Demand type of lambda
Id) -- and binder annotated with demand
-annotateLamIdBndr dflags env (DmdType fv ds res) id
+annotateLamIdBndr env (DmdType fv ds res) one_shot id
-- For lambdas we add the demand to the argument demands
-- Only called for Ids
= ASSERT( isId id )
- (final_ty, setIdDemandInfo id dmd)
+ -- pprTrace "annLamBndr" (vcat [ppr id, ppr dmd_ty]) $
+ (final_ty, setOneShotness one_shot (set_idDemandInfo env id dmd))
where
-- Watch out! See note [Lambda-bound unfoldings]
final_ty = case maybeUnfoldingTemplate (idUnfolding id) of
Nothing -> main_ty
Just unf -> main_ty `bothDmdType` unf_ty
where
- (unf_ty, _) = dmdAnal dflags env dmd unf
+ (unf_ty, _) = dmdAnalStar env dmd unf
main_ty = DmdType fv' (dmd:ds) res
- (fv', dmd) = removeFV fv id res
+ (fv', dmd) = peelFV fv id res
-mkSigTy :: TopLevelFlag -> RecFlag -> AnalEnv -> Id ->
- CoreExpr -> DmdType -> (DmdEnv, StrictSig)
-mkSigTy top_lvl rec_flag env id rhs (DmdType fv dmds res)
- = (lazy_fv, mkStrictSig dmd_ty)
- -- See Note [NOINLINE and strictness]
- where
- dmd_ty = mkDmdType strict_fv dmds res'
-
- -- See Note [Lazy and strict free variables]
- lazy_fv = filterUFM (not . isStrictDmd) fv
- strict_fv = filterUFM isStrictDmd fv
-
- ignore_cpr_info = not (exprIsHNF rhs || thunk_cpr_ok)
- res' | returnsCPR res
- , not (isTopLevel top_lvl || returnsCPRProd res)
- -- See Note [CPR for sum types ]
- || ignore_cpr_info = topRes
- | otherwise = res
-
- -- Is it okay or not to assign CPR
- -- (not okay in the first pass)
- thunk_cpr_ok -- See Note [CPR for thunks]
- | isTopLevel top_lvl = False -- Top level things don't get
- -- their demandInfo set at all
- | isRec rec_flag = False -- Ditto recursive things
- | ae_virgin env = True -- Optimistic, first time round
- -- See Note [Optimistic CPR in the "virgin" case]
- | isStrictDmd (idDemandInfo id) = True
- | otherwise = False
+deleteFVs :: DmdType -> [Var] -> DmdType
+deleteFVs (DmdType fvs dmds res) bndrs
+ = DmdType (delVarEnvList fvs bndrs) dmds res
\end{code}
Note [CPR for sum types]
@@ -728,7 +843,6 @@ NB: strictly_demanded is never true of a top-level Id, or of a recursive Id.
Note [Optimistic CPR in the "virgin" case]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
Demand and strictness info are initialized by top elements. However,
this prevents from inferring a CPR property in the first pass of the
analyser, so we keep an explicit flag ae_virgin in the AnalEnv
@@ -789,10 +903,9 @@ strictness. For example, if you have a function implemented by an
error stub, but which has RULES, you may want it not to be eliminated
in favour of error!
-Note [Lazy and strict free variables]
+Note [Lazy and unleasheable free variables]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-We put the strict FVs in the DmdType of the Id, so
+We put the strict and once-used FVs in the DmdType of the Id, so
that at its call sites we unleash demands on its strict fvs.
An example is 'roll' in imaginary/wheel-sieve2
Something like this:
@@ -842,7 +955,8 @@ forget that fact, otherwise we might make 'x' absent when it isn't.
\begin{code}
data AnalEnv
- = AE { ae_sigs :: SigEnv
+ = AE { ae_dflags :: DynFlags
+ , ae_sigs :: SigEnv
, ae_virgin :: Bool } -- True on first iteration only
-- See Note [Initialising strictness]
-- We use the se_env to tell us whether to
@@ -860,6 +974,9 @@ instance Outputable AnalEnv where
[ ptext (sLit "ae_virgin =") <+> ppr virgin
, ptext (sLit "ae_sigs =") <+> ppr env ])
+emptyAnalEnv :: DynFlags -> AnalEnv
+emptyAnalEnv dflags = AE { ae_dflags = dflags, ae_sigs = emptySigEnv, ae_virgin = True }
+
emptySigEnv :: SigEnv
emptySigEnv = emptyVarEnv
@@ -888,22 +1005,29 @@ addInitialSigs top_lvl env@(AE { ae_sigs = sigs, ae_virgin = virgin }) ids
init_sig | virgin = \_ -> botSig
| otherwise = idStrictness
-virgin, nonVirgin :: SigEnv -> AnalEnv
-virgin sigs = AE { ae_sigs = sigs, ae_virgin = True }
-nonVirgin sigs = AE { ae_sigs = sigs, ae_virgin = False }
+nonVirgin :: AnalEnv -> AnalEnv
+nonVirgin env = env { ae_virgin = False }
extendSigsWithLam :: AnalEnv -> Id -> AnalEnv
-- Extend the AnalEnv when we meet a lambda binder
extendSigsWithLam env id
- | isStrictDmd dmd_info || ae_virgin env
+ | isId id
+ , isStrictDmd (idDemandInfo id) || ae_virgin env
-- See Note [Optimistic CPR in the "virgin" case]
-- See Note [Initial CPR for strict binders]
, Just {} <- deepSplitProductType_maybe $ idType id
= extendAnalEnv NotTopLevel env id cprProdSig
- | otherwise = env
- where
- dmd_info = idDemandInfo id
+ | otherwise
+ = env
+
+set_idDemandInfo :: AnalEnv -> Id -> Demand -> Id
+set_idDemandInfo env id dmd
+ = setIdDemandInfo id (zapDemand (ae_dflags env) dmd)
+
+set_idStrictness :: AnalEnv -> Id -> StrictSig -> Id
+set_idStrictness env id sig
+ = setIdStrictness id (zapStrictSig (ae_dflags env) sig)
\end{code}
Note [Initial CPR for strict binders]