summaryrefslogtreecommitdiff
path: root/compiler/stranal
diff options
context:
space:
mode:
authorsimonpj@microsoft.com <unknown>2009-10-29 14:30:51 +0000
committersimonpj@microsoft.com <unknown>2009-10-29 14:30:51 +0000
commit72462499b891d5779c19f3bda03f96e24f9554ae (patch)
tree2647ed2599419e2a144e7bfc157b43142b2d3e63 /compiler/stranal
parentad23a496a860063ab01025051d9c9baf45725a61 (diff)
downloadhaskell-72462499b891d5779c19f3bda03f96e24f9554ae.tar.gz
The Big INLINE Patch: totally reorganise way that INLINE pragmas work
This patch has been a long time in gestation and has, as a result, accumulated some extra bits and bobs that are only loosely related. I separated the bits that are easy to split off, but the rest comes as one big patch, I'm afraid. Note that: * It comes together with a patch to the 'base' library * Interface file formats change slightly, so you need to recompile all libraries The patch is mainly giant tidy-up, driven in part by the particular stresses of the Data Parallel Haskell project. I don't expect a big performance win for random programs. Still, here are the nofib results, relative to the state of affairs without the patch Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- Min -12.7% -14.5% -17.5% -17.8% Max +4.7% +10.9% +9.1% +8.4% Geometric Mean +0.9% -0.1% -5.6% -7.3% The +10.9% allocation outlier is rewrite, which happens to have a very delicate optimisation opportunity involving an interaction of CSE and inlining (see nofib/Simon-nofib-notes). The fact that the 'before' case found the optimisation is somewhat accidental. Runtimes seem to go down, but I never kno wwhether to really trust this number. Binary sizes wobble a bit, but nothing drastic. The Main Ideas are as follows. InlineRules ~~~~~~~~~~~ When you say {-# INLINE f #-} f x = <rhs> you intend that calls (f e) are replaced by <rhs>[e/x] So we should capture (\x.<rhs>) in the Unfolding of 'f', and never meddle with it. Meanwhile, we can optimise <rhs> to our heart's content, leaving the original unfolding intact in Unfolding of 'f'. So the representation of an Unfolding has changed quite a bit (see CoreSyn). An INLINE pragma gives rise to an InlineRule unfolding. Moreover, it's only used when 'f' is applied to the specified number of arguments; that is, the number of argument on the LHS of the '=' sign in the original source definition. For example, (.) is now defined in the libraries like this {-# INLINE (.) #-} (.) f g = \x -> f (g x) so that it'll inline when applied to two arguments. If 'x' appeared on the left, thus (.) f g x = f (g x) it'd only inline when applied to three arguments. This slightly-experimental change was requested by Roman, but it seems to make sense. Other associated changes * Moving the deck chairs in DsBinds, which processes the INLINE pragmas * In the old system an INLINE pragma made the RHS look like (Note InlineMe <rhs>) The Note switched off optimisation in <rhs>. But it was quite fragile in corner cases. The new system is more robust, I believe. In any case, the InlineMe note has disappeared * The workerInfo of an Id has also been combined into its Unfolding, so it's no longer a separate field of the IdInfo. * Many changes in CoreUnfold, esp in callSiteInline, which is the critical function that decides which function to inline. Lots of comments added! * exprIsConApp_maybe has moved to CoreUnfold, since it's so strongly associated with "does this expression unfold to a constructor application". It can now do some limited beta reduction too, which Roman found was an important. Instance declarations ~~~~~~~~~~~~~~~~~~~~~ It's always been tricky to get the dfuns generated from instance declarations to work out well. This is particularly important in the Data Parallel Haskell project, and I'm now on my fourth attempt, more or less. There is a detailed description in TcInstDcls, particularly in Note [How instance declarations are translated]. Roughly speaking we now generate a top-level helper function for every method definition in an instance declaration, so that the dfun takes a particularly stylised form: dfun a d1 d2 = MkD (op1 a d1 d2) (op2 a d1 d2) ...etc... In fact, it's *so* stylised that we never need to unfold a dfun. Instead ClassOps have a special rewrite rule that allows us to short-cut dictionary selection. Suppose dfun :: Ord a -> Ord [a] d :: Ord a Then compare (dfun a d) --> compare_list a d in one rewrite, without first inlining the 'compare' selector and the body of the dfun. To support this a) ClassOps have a BuiltInRule (see MkId.dictSelRule) b) DFuns have a special form of unfolding (CoreSyn.DFunUnfolding) which is exploited in CoreUnfold.exprIsConApp_maybe Implmenting all this required a root-and-branch rework of TcInstDcls and bits of TcClassDcl. Default methods ~~~~~~~~~~~~~~~ If you give an INLINE pragma to a default method, it should be just as if you'd written out that code in each instance declaration, including the INLINE pragma. I think that it now *is* so. As a result, library code can be simpler; less duplication. The CONLIKE pragma ~~~~~~~~~~~~~~~~~~ In the DPH project, Roman found cases where he had p n k = let x = replicate n k in ...(f x)...(g x).... {-# RULE f (replicate x) = f_rep x #-} Normally the RULE would not fire, because doing so involves (in effect) duplicating the redex (replicate n k). A new experimental modifier to the INLINE pragma, {-# INLINE CONLIKE replicate #-}, allows you to tell GHC to be prepared to duplicate a call of this function if it allows a RULE to fire. See Note [CONLIKE pragma] in BasicTypes Join points ~~~~~~~~~~~ See Note [Case binders and join points] in Simplify Other refactoring ~~~~~~~~~~~~~~~~~ * I moved endPass from CoreLint to CoreMonad, with associated jigglings * Better pretty-printing of Core * The top-level RULES (ones that are not rules for locally-defined things) are now substituted on every simplifier iteration. I'm not sure how we got away without doing this before. This entails a bit more plumbing in SimplCore. * The necessary stuff to serialise and deserialise the new info across interface files. * Something about bottoming floats in SetLevels Note [Bottoming floats] * substUnfolding has moved from SimplEnv to CoreSubs, where it belongs -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- anna +2.4% -0.5% 0.16 0.17 ansi +2.6% -0.1% 0.00 0.00 atom -3.8% -0.0% -1.0% -2.5% awards +3.0% +0.7% 0.00 0.00 banner +3.3% -0.0% 0.00 0.00 bernouilli +2.7% +0.0% -4.6% -6.9% boyer +2.6% +0.0% 0.06 0.07 boyer2 +4.4% +0.2% 0.01 0.01 bspt +3.2% +9.6% 0.02 0.02 cacheprof +1.4% -1.0% -12.2% -13.6% calendar +2.7% -1.7% 0.00 0.00 cichelli +3.7% -0.0% 0.13 0.14 circsim +3.3% +0.0% -2.3% -9.9% clausify +2.7% +0.0% 0.05 0.06 comp_lab_zift +2.6% -0.3% -7.2% -7.9% compress +3.3% +0.0% -8.5% -9.6% compress2 +3.6% +0.0% -15.1% -17.8% constraints +2.7% -0.6% -10.0% -10.7% cryptarithm1 +4.5% +0.0% -4.7% -5.7% cryptarithm2 +4.3% -14.5% 0.02 0.02 cse +4.4% -0.0% 0.00 0.00 eliza +2.8% -0.1% 0.00 0.00 event +2.6% -0.0% -4.9% -4.4% exp3_8 +2.8% +0.0% -4.5% -9.5% expert +2.7% +0.3% 0.00 0.00 fem -2.0% +0.6% 0.04 0.04 fft -6.0% +1.8% 0.05 0.06 fft2 -4.8% +2.7% 0.13 0.14 fibheaps +2.6% -0.6% 0.05 0.05 fish +4.1% +0.0% 0.03 0.04 fluid -2.1% -0.2% 0.01 0.01 fulsom -4.8% +9.2% +9.1% +8.4% gamteb -7.1% -1.3% 0.10 0.11 gcd +2.7% +0.0% 0.05 0.05 gen_regexps +3.9% -0.0% 0.00 0.00 genfft +2.7% -0.1% 0.05 0.06 gg -2.7% -0.1% 0.02 0.02 grep +3.2% -0.0% 0.00 0.00 hidden -0.5% +0.0% -11.9% -13.3% hpg -3.0% -1.8% +0.0% -2.4% ida +2.6% -1.2% 0.17 -9.0% infer +1.7% -0.8% 0.08 0.09 integer +2.5% -0.0% -2.6% -2.2% integrate -5.0% +0.0% -1.3% -2.9% knights +4.3% -1.5% 0.01 0.01 lcss +2.5% -0.1% -7.5% -9.4% life +4.2% +0.0% -3.1% -3.3% lift +2.4% -3.2% 0.00 0.00 listcompr +4.0% -1.6% 0.16 0.17 listcopy +4.0% -1.4% 0.17 0.18 maillist +4.1% +0.1% 0.09 0.14 mandel +2.9% +0.0% 0.11 0.12 mandel2 +4.7% +0.0% 0.01 0.01 minimax +3.8% -0.0% 0.00 0.00 mkhprog +3.2% -4.2% 0.00 0.00 multiplier +2.5% -0.4% +0.7% -1.3% nucleic2 -9.3% +0.0% 0.10 0.10 para +2.9% +0.1% -0.7% -1.2% paraffins -10.4% +0.0% 0.20 -1.9% parser +3.1% -0.0% 0.05 0.05 parstof +1.9% -0.0% 0.00 0.01 pic -2.8% -0.8% 0.01 0.02 power +2.1% +0.1% -8.5% -9.0% pretty -12.7% +0.1% 0.00 0.00 primes +2.8% +0.0% 0.11 0.11 primetest +2.5% -0.0% -2.1% -3.1% prolog +3.2% -7.2% 0.00 0.00 puzzle +4.1% +0.0% -3.5% -8.0% queens +2.8% +0.0% 0.03 0.03 reptile +2.2% -2.2% 0.02 0.02 rewrite +3.1% +10.9% 0.03 0.03 rfib -5.2% +0.2% 0.03 0.03 rsa +2.6% +0.0% 0.05 0.06 scc +4.6% +0.4% 0.00 0.00 sched +2.7% +0.1% 0.03 0.03 scs -2.6% -0.9% -9.6% -11.6% simple -4.0% +0.4% -14.6% -14.9% solid -5.6% -0.6% -9.3% -14.3% sorting +3.8% +0.0% 0.00 0.00 sphere -3.6% +8.5% 0.15 0.16 symalg -1.3% +0.2% 0.03 0.03 tak +2.7% +0.0% 0.02 0.02 transform +2.0% -2.9% -8.0% -8.8% treejoin +3.1% +0.0% -17.5% -17.8% typecheck +2.9% -0.3% -4.6% -6.6% veritas +3.9% -0.3% 0.00 0.00 wang -6.2% +0.0% 0.18 -9.8% wave4main -10.3% +2.6% -2.1% -2.3% wheel-sieve1 +2.7% -0.0% +0.3% -0.6% wheel-sieve2 +2.7% +0.0% -3.7% -7.5% x2n1 -4.1% +0.1% 0.03 0.04 -------------------------------------------------------------------------------- Min -12.7% -14.5% -17.5% -17.8% Max +4.7% +10.9% +9.1% +8.4% Geometric Mean +0.9% -0.1% -5.6% -7.3%
Diffstat (limited to 'compiler/stranal')
-rw-r--r--compiler/stranal/DmdAnal.lhs2
-rw-r--r--compiler/stranal/StrictAnal.lhs1
-rw-r--r--compiler/stranal/WorkWrap.lhs36
-rw-r--r--compiler/stranal/WwLib.lhs2
4 files changed, 26 insertions, 15 deletions
diff --git a/compiler/stranal/DmdAnal.lhs b/compiler/stranal/DmdAnal.lhs
index 6dc0fb7118..789e77aa5d 100644
--- a/compiler/stranal/DmdAnal.lhs
+++ b/compiler/stranal/DmdAnal.lhs
@@ -50,11 +50,11 @@ import UniqFM ( plusUFM_C, addToUFM_Directly, lookupUFM_Directly,
keysUFM, minusUFM, ufmToList, filterUFM )
import Type ( isUnLiftedType, coreEqType, splitTyConApp_maybe )
import Coercion ( coercionKind )
-import CoreLint ( showPass, endPass )
import Util ( mapAndUnzip, lengthIs )
import BasicTypes ( Arity, TopLevelFlag(..), isTopLevel, isNeverActive,
RecFlag(..), isRec )
import Maybes ( orElse, expectJust )
+import ErrUtils ( showPass )
import Outputable
import Data.List
diff --git a/compiler/stranal/StrictAnal.lhs b/compiler/stranal/StrictAnal.lhs
index a5efe30d39..920f8415ef 100644
--- a/compiler/stranal/StrictAnal.lhs
+++ b/compiler/stranal/StrictAnal.lhs
@@ -29,7 +29,6 @@ import Id ( setIdStrictness, setInlinePragma,
idDemandInfo, setIdDemandInfo, isBottomingId,
Id
)
-import CoreLint ( showPass, endPass )
import ErrUtils ( dumpIfSet_dyn )
import SaAbsInt
import SaLib
diff --git a/compiler/stranal/WorkWrap.lhs b/compiler/stranal/WorkWrap.lhs
index 7b124f303f..d23e83ece2 100644
--- a/compiler/stranal/WorkWrap.lhs
+++ b/compiler/stranal/WorkWrap.lhs
@@ -7,11 +7,14 @@
module WorkWrap ( wwTopBinds, mkWrapper ) where
import CoreSyn
-import CoreUnfold ( certainlyWillInline )
-import CoreUtils ( exprType, exprIsHNF, mkInlineMe )
+import CoreUnfold ( certainlyWillInline, mkInlineRule, mkWwInlineRule )
+import CoreUtils ( exprType, exprIsHNF )
import CoreArity ( exprArity )
import Var
-import Id
+import Id ( idType, isOneShotLambda, idUnfolding,
+ setIdNewStrictness, mkWorkerId,
+ setInlineActivation, setIdUnfolding,
+ setIdArity )
import Type ( Type )
import IdInfo
import NewDemand ( Demand(..), StrictSig(..), DmdType(..), DmdResult(..),
@@ -102,11 +105,9 @@ matching by looking for strict arguments of the correct type.
\begin{code}
wwExpr :: CoreExpr -> UniqSM CoreExpr
-wwExpr e@(Type {}) = return e
-wwExpr e@(Lit {}) = return e
-wwExpr e@(Var {}) = return e
-wwExpr e@(Note InlineMe _) = return e
- -- Don't w/w inside InlineMe's
+wwExpr e@(Type {}) = return e
+wwExpr e@(Lit {}) = return e
+wwExpr e@(Var {}) = return e
wwExpr (Lam binder expr)
= Lam binder <$> wwExpr expr
@@ -155,7 +156,10 @@ The only reason this is monadised is for the unique supply.
Note [Don't w/w inline things (a)]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It's very important to refrain from w/w-ing an INLINE function
-If we do so by mistake we transform
+because the wrapepr will then overwrite the InlineRule unfolding.
+
+It was wrong with the old InlineMe Note too: if we do so by mistake
+we transform
f = __inline (\x -> E)
into
f = __inline (\x -> case x of (a,b) -> fw E)
@@ -242,14 +246,22 @@ tryWW is_rec fn_id rhs
is_thunk = not is_fun && not (exprIsHNF rhs)
---------------------
-checkSize :: Id -> CoreExpr -> UniqSM [(Id,CoreExpr)] -> UniqSM [(Id,CoreExpr)]
+checkSize :: Id -> CoreExpr
+ -> UniqSM [(Id,CoreExpr)] -> UniqSM [(Id,CoreExpr)]
-- See Note [Don't w/w inline things (a) and (b)]
checkSize fn_id rhs thing_inside
- | certainlyWillInline unfolding = return [ (fn_id, mkInlineMe rhs) ]
+ | isStableUnfolding unfolding -- For DFuns and INLINE things, leave their
+ = return [ (fn_id, rhs) ] -- unfolding unchanged; but still attach
+ -- strictness info to the Id
+
+ | certainlyWillInline unfolding
+ = return [ (fn_id `setIdUnfolding` inline_rule, rhs) ]
-- Note [Don't w/w inline things (b)]
+
| otherwise = thing_inside
where
unfolding = idUnfolding fn_id
+ inline_rule = mkInlineRule InlUnSat rhs (unfoldingArity unfolding)
---------------------
splitFun :: Id -> IdInfo -> [Demand] -> DmdResult -> Activation -> Expr Var
@@ -279,7 +291,7 @@ splitFun fn_id fn_info wrap_dmds res_info inline_act rhs
-- arity is consistent with the demand type goes through
wrap_rhs = wrap_fn work_id
- wrap_id = fn_id `setIdWorkerInfo` HasWorker work_id arity
+ wrap_id = fn_id `setIdUnfolding` mkWwInlineRule work_id wrap_rhs arity
; return ([(work_id, work_rhs), (wrap_id, wrap_rhs)]) })
-- Worker first, because wrapper mentions it
diff --git a/compiler/stranal/WwLib.lhs b/compiler/stranal/WwLib.lhs
index bceb453487..2c3581c64c 100644
--- a/compiler/stranal/WwLib.lhs
+++ b/compiler/stranal/WwLib.lhs
@@ -134,7 +134,7 @@ mkWwBodies fun_ty demands res_info one_shots
; let (work_lam_args, work_call_args) = mkWorkerArgs work_args res_ty
; return ([idNewDemandInfo v | v <- work_call_args, isId v],
- Note InlineMe . wrap_fn_args . wrap_fn_cpr . wrap_fn_str . applyToVars work_call_args . Var,
+ wrap_fn_args . wrap_fn_cpr . wrap_fn_str . applyToVars work_call_args . Var,
mkLams work_lam_args. work_fn_str . work_fn_cpr . work_fn_args) }
-- We use an INLINE unconditionally, even if the wrapper turns out to be
-- something trivial like