Eta reduction based on evaluation context (#21261)

I completely rewrote our Notes surrounding eta-reduction. The new entry point is `Note [Eta reduction makes sense]`. Then I went on to extend the Simplifier to maintain an evaluation context in the form of a `SubDemand` inside a `SimplCont`. That `SubDemand` is useful for doing eta reduction according to `Note [Eta reduction based on evaluation context]`, which describes how Demand analysis, Simplifier and `tryEtaReduce` interact to facilitate eta reduction in more scenarios. Thus we fix #21261. ghc/alloc perf marginally improves (-0.0%). A medium-sized win is when compiling T3064 (-3%). It seems that haddock improves by 0.6% to 1.0%, too. Metric Decrease: T3064
author: Sebastian Graf <sebastian.graf@kit.edu> 2022-02-18 10:57:14 +0100
committer: Marge Bot <ben+marge-bot@smart-cactus.org> 2022-04-12 11:11:42 -0400
commit: 0090ad7b8b436961fe1e225aae214d0ea1381c07 (patch)
tree: d56a67bbaa3c9cbd54de2f0d1f21f6d2b01d0e3b
parent: 5440f63ec4a584b8805a8ff49ba1bd26bc2c032d (diff)
download: haskell-0090ad7b8b436961fe1e225aae214d0ea1381c07.tar.gz
11 files changed, 478 insertions, 135 deletions
diff --git a/compiler/GHC/Core/Opt/Simplify.hs b/compiler/GHC/Core/Opt/Simplify.hs
index cc7529179b..c5fd3dfef1 100644
--- a/compiler/GHC/Core/Opt/Simplify.hs
+++ b/compiler/GHC/Core/Opt/Simplify.hs
@@ -352,7 +352,7 @@ simplLazyBind env top_lvl is_rec bndr bndr1 rhs rhs_se
                 -- See Note [Floating and type abstraction] in GHC.Core.Opt.Simplify.Utils
 
         -- Simplify the RHS
-        ; let rhs_cont = mkRhsStop (substTy body_env (exprType body))
+        ; let rhs_cont = mkRhsStop (substTy body_env (exprType body)) (idDemandInfo bndr)
         ; (body_floats0, body0) <- {-#SCC "simplExprF" #-} simplExprF body_env body rhs_cont
 
         -- ANF-ise a constructor or PAP rhs
@@ -2205,7 +2205,7 @@ rebuildCall env fun_info
         -- have to be very careful about bogus strictness through
         -- floating a demanded let.
   = do  { arg' <- simplExprC (arg_se `setInScopeFromE` env) arg
-                             (mkLazyArgStop arg_ty (lazyArgContext fun_info))
+                             (mkLazyArgStop arg_ty fun_info)
         ; rebuildCall env (addValArgTo fun_info  arg' fun_ty) cont }
   where
     arg_ty = funArgTy fun_ty
@@ -2671,7 +2671,7 @@ There have been various earlier versions of this patch:
     case_bndr_evald_next _               = False
 
   This patch was part of fixing #7542. See also
-  Note [Eta reduction of an eval'd function] in GHC.Core.Utils.)
+  Note [Eta reduction soundness], criterion (E) in GHC.Core.Utils.)
 
 
 Further notes about case elimination
@@ -4281,5 +4281,3 @@ for the RHS as well as the LHS, but that seems more conservative
 than necesary.  Allowing some inlining might, for example, eliminate
 a binding.
 -}
-
-
diff --git a/compiler/GHC/Core/Opt/Simplify/Env.hs b/compiler/GHC/Core/Opt/Simplify/Env.hs
index 3873bfddb7..fa6599b6bc 100644
--- a/compiler/GHC/Core/Opt/Simplify/Env.hs
+++ b/compiler/GHC/Core/Opt/Simplify/Env.hs
@@ -906,10 +906,10 @@ allows us to eta-reduce
         f = \x -> f x
 to
         f = f
-which technically is not sound.   This is very much a corner case, so
-I'm not worried about it.  Another idea is to ensure that f's arity
-never decreases; its arity started as 1, and we should never eta-reduce
-below that.
+which technically is not sound.  We take care of that in point (3) of
+Note [Eta reduction makes sense].
+Another idea is to ensure that f's arity never decreases; its arity started as
+1, and we should never eta-reduce below that.
 
 
 Note [Robust OccInfo]
diff --git a/compiler/GHC/Core/Opt/Simplify/Utils.hs b/compiler/GHC/Core/Opt/Simplify/Utils.hs
index b8d5d9ab43..1c0e228e79 100644
--- a/compiler/GHC/Core/Opt/Simplify/Utils.hs
+++ b/compiler/GHC/Core/Opt/Simplify/Utils.hs
@@ -133,16 +133,24 @@ Key points:
 -}
 
 data SimplCont
-  = Stop                -- Stop[e] = e
-        OutType         -- Type of the <hole>
-        CallCtxt        -- Tells if there is something interesting about
-                        --          the context, and hence the inliner
+  = Stop                -- ^ Stop[e] = e
+        OutType         -- ^ Type of the <hole>
+        CallCtxt        -- ^ Tells if there is something interesting about
+                        --          the syntactic context, and hence the inliner
                         --          should be a bit keener (see interestingCallContext)
                         -- Specifically:
                         --     This is an argument of a function that has RULES
                         --     Inlining the call might allow the rule to fire
                         -- Never ValAppCxt (use ApplyToVal instead)
                         -- or CaseCtxt (use Select instead)
+        SubDemand       -- ^ The evaluation context of e. Tells how e is evaluated.
+                        -- This fuels eta-expansion or eta-reduction without looking
+                        -- at lambda bodies, for example.
+                        --
+                        -- See Note [Eta reduction based on evaluation context]
+                        -- The evaluation context for other SimplConts can be
+                        -- reconstructed with 'contEvalContext'
+
 
   | CastIt              -- (CastIt co K)[e] = K[ e `cast` co ]
         OutCoercion             -- The coercion simplified
@@ -245,7 +253,10 @@ instance Outputable DupFlag where
   ppr Simplified = text "simpl"
 
 instance Outputable SimplCont where
-  ppr (Stop ty interesting) = text "Stop" <> brackets (ppr interesting) <+> ppr ty
+  ppr (Stop ty interesting eval_sd)
+    = text "Stop" <> brackets (sep $ punctuate comma pps) <+> ppr ty
+    where
+      pps = [ppr interesting] ++ [ppr eval_sd | eval_sd /= topSubDmd]
   ppr (CastIt co cont  )    = (text "CastIt" <+> pprOptCo co) $$ ppr cont
   ppr (TickIt t cont)       = (text "TickIt" <+> ppr t) $$ ppr cont
   ppr (ApplyToTy  { sc_arg_ty = ty, sc_cont = cont })
@@ -413,13 +424,15 @@ mkFunRules rs = Just (n_required, rs)
 -}
 
 mkBoringStop :: OutType -> SimplCont
-mkBoringStop ty = Stop ty BoringCtxt
+mkBoringStop ty = Stop ty BoringCtxt topSubDmd
 
-mkRhsStop :: OutType -> SimplCont       -- See Note [RHS of lets] in GHC.Core.Unfold
-mkRhsStop ty = Stop ty RhsCtxt
+mkRhsStop :: OutType -> Demand -> SimplCont       -- See Note [RHS of lets] in GHC.Core.Unfold
+mkRhsStop ty bndr_dmd = Stop ty RhsCtxt (subDemandIfEvaluated bndr_dmd)
 
-mkLazyArgStop :: OutType -> CallCtxt -> SimplCont
-mkLazyArgStop ty cci = Stop ty cci
+mkLazyArgStop :: OutType -> ArgInfo -> SimplCont
+mkLazyArgStop ty fun_info = Stop ty (lazyArgContext fun_info) arg_sd
+  where
+    arg_sd = subDemandIfEvaluated (head (ai_dmds fun_info))
 
 -------------------
 contIsRhsOrArg :: SimplCont -> Bool
@@ -429,9 +442,9 @@ contIsRhsOrArg (StrictArg {})  = True
 contIsRhsOrArg _               = False
 
 contIsRhs :: SimplCont -> Bool
-contIsRhs (Stop _ RhsCtxt) = True
-contIsRhs (CastIt _ k)     = contIsRhs k   -- For f = e |> co, treat e as Rhs context
-contIsRhs _                = False
+contIsRhs (Stop _ RhsCtxt _) = True
+contIsRhs (CastIt _ k)       = contIsRhs k   -- For f = e |> co, treat e as Rhs context
+contIsRhs _                  = False
 
 -------------------
 contIsStop :: SimplCont -> Bool
@@ -458,7 +471,7 @@ contIsTrivial _                                                 = False
 
 -------------------
 contResultType :: SimplCont -> OutType
-contResultType (Stop ty _)                  = ty
+contResultType (Stop ty _ _)                = ty
 contResultType (CastIt _ k)                 = contResultType k
 contResultType (StrictBind { sc_cont = k }) = contResultType k
 contResultType (StrictArg { sc_cont = k })  = contResultType k
@@ -468,7 +481,7 @@ contResultType (ApplyToVal { sc_cont = k }) = contResultType k
 contResultType (TickIt _ k)                 = contResultType k
 
 contHoleType :: SimplCont -> OutType
-contHoleType (Stop ty _)                      = ty
+contHoleType (Stop ty _ _)                    = ty
 contHoleType (TickIt _ k)                     = contHoleType k
 contHoleType (CastIt co _)                    = coercionLKind co
 contHoleType (StrictBind { sc_bndr = b, sc_dup = dup, sc_env = se })
@@ -489,7 +502,7 @@ contHoleType (Select { sc_dup = d, sc_bndr =  b, sc_env = se })
 -- should be scaled if it commutes with E. This appears, in particular, in the
 -- case-of-case transformation.
 contHoleScaling :: SimplCont -> Mult
-contHoleScaling (Stop _ _) = One
+contHoleScaling (Stop _ _ _) = One
 contHoleScaling (CastIt _ k) = contHoleScaling k
 contHoleScaling (StrictBind { sc_bndr = id, sc_cont = k })
   = idMult id `mkMultMul` contHoleScaling k
@@ -534,6 +547,35 @@ contArgs cont
                    -- Do *not* use short-cutting substitution here
                    -- because we want to get as much IdInfo as possible
 
+-- | Describes how the 'SimplCont' will evaluate the hole as a 'SubDemand'.
+-- This can be more insightful than the limited syntactic context that
+-- 'SimplCont' provides, because the 'Stop' constructor might carry a useful
+-- 'SubDemand'.
+-- For example, when simplifying the argument `e` in `f e` and `f` has the
+-- demand signature `<MP(S,A)>`, this function will give you back `P(S,A)` when
+-- simplifying `e`.
+--
+-- PRECONDITION: Don't call with 'ApplyToVal'. We haven't thoroughly thought
+-- about what to do then and no call sites so far seem to care.
+contEvalContext :: SimplCont -> SubDemand
+contEvalContext k = case k of
+  (Stop _ _ sd)              -> sd
+  (TickIt _ k)               -> contEvalContext k
+  (CastIt _ k)               -> contEvalContext k
+  ApplyToTy{sc_cont=k}       -> contEvalContext k
+    --  ApplyToVal{sc_cont=k}      -> mkCalledOnceDmd $ contEvalContext k
+    -- Not 100% sure that's correct, . Here's an example:
+    --   f (e x) and f :: <SCS(C1(L))>
+    -- then what is the evaluation context of 'e' when we simplify it? E.g.,
+    --   simpl e (ApplyToVal x $ Stop "CS(C1(L))")
+    -- then it *should* be "C1(CS(C1(L))", so perhaps correct after all.
+    -- But for now we just panic:
+  ApplyToVal{}               -> pprPanic "contEvalContext" (ppr k)
+  StrictArg{sc_fun=fun_info} -> subDemandIfEvaluated (head (ai_dmds fun_info))
+  StrictBind{sc_bndr=bndr}   -> subDemandIfEvaluated (idDemandInfo bndr)
+  Select{}                   -> topSubDmd
+    -- Perhaps reconstruct the demand on the scrutinee by looking at field
+    -- and case binder dmds, see addCaseBndrDmd. No priority right now.
 
 -------------------
 mkArgInfo :: SimplEnv
@@ -552,7 +594,7 @@ mkArgInfo env fun rules n_val_args call_cont
             , ai_discs = vanilla_discounts }
   | otherwise
   = ArgInfo { ai_fun   = fun
-            , ai_args = []
+            , ai_args  = []
             , ai_rules = fun_rules
             , ai_encl  = interestingArgContext rules call_cont
             , ai_dmds  = add_type_strictness (idType fun) arg_dmds
@@ -749,7 +791,7 @@ interestingCallContext env cont
 
     interesting (StrictArg { sc_fun = fun }) = strictArgContext fun
     interesting (StrictBind {})              = BoringCtxt
-    interesting (Stop _ cci)                 = cci
+    interesting (Stop _ cci _)               = cci
     interesting (TickIt _ k)                 = interesting k
     interesting (ApplyToTy { sc_cont = k })  = interesting k
     interesting (CastIt _ k)                 = interesting k
@@ -800,8 +842,8 @@ interestingArgContext rules call_cont
     go (StrictArg { sc_fun = fun }) = ai_encl fun
     go (StrictBind {})              = False      -- ??
     go (CastIt _ c)                 = go c
-    go (Stop _ RuleArgCtxt)         = True
-    go (Stop _ _)                   = False
+    go (Stop _ RuleArgCtxt _)       = True
+    go (Stop _ _ _)                 = False
     go (TickIt _ c)                 = go c
 
 {- Note [Interesting arguments]
@@ -1611,6 +1653,10 @@ mkLam env bndrs body cont
   where
     mode = getMode env
 
+    -- See Note [Eta reduction based on evaluation context]
+    -- NB: cont is never ApplyToVal, otherwise contEvalContext panics
+    eval_sd = contEvalContext cont
+
     mkLam' :: DynFlags -> [OutBndr] -> OutExpr -> SimplM OutExpr
     mkLam' dflags bndrs body@(Lam {})
       = mkLam' dflags (bndrs ++ bndrs1) body1
@@ -1633,7 +1679,8 @@ mkLam env bndrs body cont
 
     mkLam' dflags bndrs body
       | gopt Opt_DoEtaReduction dflags
-      , Just etad_lam <- {-# SCC "tryee" #-} tryEtaReduce bndrs body
+      -- , pprTrace "try eta" (ppr bndrs $$ ppr body $$ ppr cont $$ ppr eval_sd) True
+      , Just etad_lam <- {-# SCC "tryee" #-} tryEtaReduce bndrs body eval_sd
       = do { tick (EtaReduction (head bndrs))
            ; return etad_lam }
 
diff --git a/compiler/GHC/Core/SimpleOpt.hs b/compiler/GHC/Core/SimpleOpt.hs
index fa6fcda351..925eaf5841 100644
--- a/compiler/GHC/Core/SimpleOpt.hs
+++ b/compiler/GHC/Core/SimpleOpt.hs
@@ -36,7 +36,7 @@ import GHC.Types.Var      ( isNonCoVarId )
 import GHC.Types.Var.Set
 import GHC.Types.Var.Env
 import GHC.Core.DataCon
-import GHC.Types.Demand( etaConvertDmdSig )
+import GHC.Types.Demand( etaConvertDmdSig, topSubDmd )
 import GHC.Types.Tickish
 import GHC.Core.Coercion.Opt ( optCoercion, OptCoercionOpts (..) )
 import GHC.Core.Type hiding ( substTy, extendTvSubst, extendCvSubst, extendTvSubstList
@@ -295,8 +295,8 @@ simple_opt_expr env expr
        where
          (env', b') = subst_opt_bndr env b
     go_lam env bs' e
-       | Just etad_e <- tryEtaReduce bs e' = etad_e
-       | otherwise                         = mkLams bs e'
+       | Just etad_e <- tryEtaReduce bs e' topSubDmd = etad_e
+       | otherwise                                   = mkLams bs e'
        where
          bs = reverse bs'
          e' = simple_opt_expr env e
diff --git a/compiler/GHC/Core/Utils.hs b/compiler/GHC/Core/Utils.hs
index ed1dd6f246..43dd5169e1 100644
--- a/compiler/GHC/Core/Utils.hs
+++ b/compiler/GHC/Core/Utils.hs
@@ -96,8 +96,8 @@ import GHC.Types.Tickish
 import GHC.Types.Id
 import GHC.Types.Id.Info
 import GHC.Types.Unique
-import GHC.Types.Basic     ( Arity, CbvMark(..), Levity(..)
-                           , isMarkedCbv )
+import GHC.Types.Basic
+import GHC.Types.Demand
 import GHC.Types.Unique.Set
 
 import GHC.Data.FastString
@@ -2322,76 +2322,157 @@ locBind loc b1 b2 diffs = map addLoc diffs
 *                                                                      *
 ************************************************************************
 
-Note [Eta reduction conditions]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We try for eta reduction here, but *only* if we get all the way to an
-trivial expression.  We don't want to remove extra lambdas unless we
-are going to avoid allocating this thing altogether.
-
-There are some particularly delicate points here:
-
-* We want to eta-reduce if doing so leaves a trivial expression,
-  *including* a cast.  For example
-       \x. f |> co  -->  f |> co
-  (provided co doesn't mention x)
-
-* Eta reduction is not valid in general:
-        \x. bot  /=  bot
-  This matters, partly for old-fashioned correctness reasons but,
-  worse, getting it wrong can yield a seg fault. Consider
+Note [Eta reduction makes sense]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Traditionally, eta reduction transforms `\x. e x` to `e`, where `e` is an
+arbitrary expression in which `x` doesn't occur free.
+It is the inverse of eta expansion, which generally transforms the program into
+a form that executes faster. So why and when will GHC attempt to eta *reduce*?
+
+1. We want to eta-reduce only if we get all the way to a
+   trivial expression; we don't want to remove extra lambdas unless we are going
+   to avoid allocating this thing altogether.
+   Trivial means *including* casts and type lambdas:
+     * `\x. f x |> co  -->  f |> (ty(x) -> co)` (provided `co` doesn't mention `x`)
+     * `/\a. \x. f @(Maybe a) x -->  /\a. f @(Maybe a)`
+
+2. It's always sound to eta-reduce *type* lambdas and we always want to, because
+   it makes RULEs apply more often:
+      This RULE:    `forall g. foldr (build (/\a -> g a))`
+      should match  `foldr (build (/\b -> ...something complex...))`
+   and the simplest way to do so is eta-reduce `/\a -> g a` in the RULE to `g`.
+   The type checker can insert these eta-expanded versions of the RULE.
+   [SG: This is implied by (1), isn't it? Perhaps we want to eta-reduce type
+    lambdas even if the resulting expression is non-trivial?]
+
+3. We have to hide `f`'s `idArity` in its own RHS, lest we suffer from the last
+   point of Note [Arity robustness]. There we have `f = \x. f x` and we should
+   not eta-reduce to `f=f`. Which might change a terminating program
+   (think @f `seq` e@) to a non-terminating one.
+   So we check for being a loop breaker first. However for GlobalIds we can look
+   at the arity; and for primops we must, since they have no unfolding.
+   [SG: Perhaps this is rather a soundness subtlety?]
+
+Of course, eta reduction is not always sound. See Note [Eta reduction soundness]
+for when it is.
+
+When there are multiple arguments, we might get multiple eta-redexes. Example:
+   \x y. e x y
+   ==> { reduce \y. (e x) y in context \x._ }
+   \x. e x
+   ==> { reduce \x. e x in context _ }
+   e
+And (1) implies that we never want to stop with `\x. e x`, because that is not a
+trivial expression. So in practice, the implementation works by considering a
+whole group of leading lambdas to reduce.
+
+These delicacies are why we don't simply use 'exprIsTrivial' and 'exprIsHNF'
+in 'tryEtaReduce'. Alas.
+
+Note [Eta reduction soundness]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As Note [Eta reduction makes sense] explains, GHC's eta reduction transforms
+`\x y. e x y` to `e`, where `e` is an arbitrary expression in which `x` and `y`
+don't occur free.
+
+Eta reduction is *not* a sound transformation in general, because it
+may change termination behavior if *value* lambdas are involved:
+  `bot`  /=  `\x. bot x`   (as can be observed by a simple `seq`)
+The past has shown that oversight of this fact can not only lead to endless
+loops or exceptions, but also straight out *segfaults*.
+
+Nevertheless, we can give the following criteria for when it is sound to
+perform eta reduction on an expression with n leading lambdas `\xs. e xs`
+(checked in 'is_eta_reduction_sound' in 'tryEtaReduce', which focuses on the
+case where `e` is trivial):
+
+ A. It is sound to eta-reduce n arguments as long as n does not exceed the
+    `exprArity` of `e`. (Needs Arity analysis.)
+    This criterion exploits information about how `e` is *defined*.
+
+    Example: If `e = \x. bot` then we know it won't diverge until it is called
+    with one argument. Hence it is safe to eta-reduce `\x. e x` to `e`.
+    By contrast, it would be *unsound* to eta-reduce 2 args, `\x y. e x y` to `e`:
+    `e 42` diverges when `(\x y. e x y) 42` does not.
+
+ S. It is sound to eta-reduce n arguments in an evaluation context in which all
+    calls happen with at least n arguments. (Needs Strictness analysis.)
+    NB: This treats evaluations like a call with 0 args.
+    NB: This criterion exploits information about how `e` is *used*.
+
+    Example: Given a function `g` like
+      `g c = Just (c 1 2 + c 2 3)`
+    it is safe to eta-reduce the arg in `g (\x y. e x y)` to `g e` without
+    knowing *anything* about `e` (perhaps it's a parameter occ itself), simply
+    because `g` always calls its parameter with 2 arguments.
+    It is also safe to eta-reduce just one arg, e.g., `g (\x. e x)` to `g e`.
+    By contrast, it would *unsound* to eta-reduce 3 args in a call site
+    like `g (\x y z. e x y z)` to `g e`, because that diverges when
+    `e = \x y. bot`.
+
+    Could we relax to "At least *one call in the same trace* is with n args"?
+    Consider what happens for
+      ``g2 c = c True `seq` c False 42``
+    Here, `g2` will call `c` with 2 two arguments (if there is a call at all).
+    But it is unsafe to eta-reduce the arg in `g2 (\x y. e x y)` to `g2 e`
+    when `e = \x. if x then bot else id`, because the latter will diverge when
+    the former would not.
+
+    See Note [Eta reduction based on evaluation context] for the implementation
+    details. This criterion is tested extensively in T21261.
+
+ E. As a perhaps special case on the boundary of (A) and (S), when we know that
+    a fun binder `f` is in WHNF, we simply assume it has arity 1 and apply (A).
+    Example:
+      ``g f = f `seq` \x. f x``
+    Here it's sound eta-reduce `\x. f x` to `f`, because `f` can't be bottom
+    after the `seq`. This turned up in #7542.
+
+And here are a few more technical criteria for when it is *not* sound to
+eta-reduce that are specific to Core and GHC:
+
+ L. With linear types, eta-reduction can break type-checking:
+      f :: A ⊸ B
+      g :: A -> B
+      g = \x. f x
+    The above is correct, but eta-reducing g would yield g=f, the linter will
+    complain that g and f don't have the same type. NB: Not unsound in the
+    dynamic semantics, but unsound according to the static semantics of Core.
+
+ J. We may not undersaturate join points.
+    See Note [Invariants on join points] in GHC.Core, and #20599.
+
+ B. We may not undersaturate functions with no binding.
+    See Note [Eta expanding primops].
+
+ W. We may not undersaturate StrictWorkerIds.
+    See Note [Strict Worker Ids] in GHC.CoreToStg.Prep.
+
+Here is a list of historic accidents surrounding unsound eta-reduction:
+
+* Consider
         f = \x.f x
         h y = case (case y of { True -> f `seq` True; False -> False }) of
                 True -> ...; False -> ...
-
   If we (unsoundly) eta-reduce f to get f=f, the strictness analyser
   says f=bottom, and replaces the (f `seq` True) with just
-  (f `cast` unsafe-co).  BUT, as thing stand, 'f' got arity 1, and it
-  *keeps* arity 1 (perhaps also wrongly).  So CorePrep eta-expands
-  the definition again, so that it does not terminate after all.
+  (f `cast` unsafe-co).
+  [SG in 2022: I don't think worker/wrapper would do this today.]
+  BUT, as things stand, 'f' got arity 1, and it *keeps* arity 1 (perhaps also
+  wrongly). So CorePrep eta-expands the definition again, so that it does not
+  terminate after all.
   Result: seg-fault because the boolean case actually gets a function value.
   See #1947.
 
-  So it's important to do the right thing.
-
-* With linear types, eta-reduction can break type-checking:
-        f :: A ⊸ B
-        g :: A -> B
-        g = \x. f x
-
-  The above is correct, but eta-reducing g would yield g=f, the linter will
-  complain that g and f don't have the same type.
-
-* Note [Arity care]
-  ~~~~~~~~~~~~~~~~~
-  We need to be careful if we just look at f's
-  arity. Currently (Dec07), f's arity is visible in its own RHS (see
-  Note [Arity robustness] in GHC.Core.Opt.Simplify.Env) so we must *not* trust the
-  arity when checking that 'f' is a value.  Otherwise we will
-  eta-reduce
-      f = \x. f x
-  to
-      f = f
-  Which might change a terminating program (think (f `seq` e)) to a
-  non-terminating one.  So we check for being a loop breaker first.
-
-  However for GlobalIds we can look at the arity; and for primops we
-  must, since they have no unfolding.
-
-* Regardless of whether 'f' is a value, we always want to
-  reduce (/\a -> f a) to f
-  This came up in a RULE: foldr (build (/\a -> g a))
-  did not match           foldr (build (/\b -> ...something complex...))
-  The type checker can insert these eta-expanded versions,
-  with both type and dictionary lambdas; hence the slightly
-  ad-hoc isDictId
-
 * Never *reduce* arity. For example
       f = \xy. g x y
   Then if h has arity 1 we don't want to eta-reduce because then
   f's arity would decrease, and that is bad
-
-These delicacies are why we don't use exprIsTrivial and exprIsHNF here.
-Alas.
+  [SG in 2022: I don't understand this point. There is no `h`, perhaps that
+   should have been `g`. Even then, this proposed eta-reduction is invalid by
+   criterion (A), which might actually be the point this anecdote is trying to
+   make. Perhaps the "no arity decrease" idea is also related to
+   Note [Arity robustness]?]
 
 Note [Eta reduction with casted arguments]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2417,15 +2498,48 @@ It's true that we could also hope to eta reduce these:
     (\xy. (f x y) |> g)
 But the simplifier pushes those casts outwards, so we don't
 need to address that here.
+
+Note [Eta reduction based on evaluation context]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Eta reduction soundness], criterion (S) allows us to eta-reduce
+`g (\x y. e x y)` to `g e` when we know that `g` always calls its parameter with
+at least 2 arguments. So how do we read that off `g`'s demand signature?
+
+Let's take the simple example of #21261, where `g` (actually, `f`) is defined as
+  g c = c 1 2 + c 3 4
+Then this is how the pieces are put together:
+
+  * Demand analysis infers `<SCS(C1(L))>` for `g`'s demand signature
+
+  * When the Simplifier next simplifies the argument in `g (\x y. e x y)`, it
+    looks up the *evaluation context* of the argument in the form of the
+    sub-demand `CS(C1(L))` and stores it in the 'SimplCont'.
+    (Why does it drop the outer evaluation cardinality of the demand, `S`?
+    Because it's irrelevant! When we simplify an expression, we do so under the
+    assumption that it is currently under evaluation.)
+    This sub-demand literally says "Whenever this expression is evaluated, it
+    is also called with two arguments, potentially multiple times".
+
+  * Then the simplifier takes apart the lambda and simplifies the lambda group
+    and then calls 'tryEtaReduce' when rebuilding the lambda, passing the
+    evaluation context `CS(C1(L))` along. Then we simply peel off 2 call
+    sub-demands `Cn` and see whether all of the n's (here: `S=C_1N` and
+    `1=C_11`) were strict. And strict they are! Thus, it will eta-reduce
+    `\x y. e x y` to `e`.
+
 -}
 
+-- | `tryEtaReduce [x,y,z] e sd` returns `Just e'` if `\x y z -> e` is evaluated
+-- according to `sd` and can soundly and gainfully be eta-reduced to `e'`.
+-- See Note [Eta reduction soundness]
+-- and Note [Eta reduction makes sense] when that is the case.
+tryEtaReduce :: [Var] -> CoreExpr -> SubDemand -> Maybe CoreExpr
 -- When updating this function, make sure to update
 -- CorePrep.tryEtaReducePrep as well!
-tryEtaReduce :: [Var] -> CoreExpr -> Maybe CoreExpr
-tryEtaReduce bndrs body
+tryEtaReduce bndrs body eval_sd
   = go (reverse bndrs) body (mkRepReflCo (exprType body))
   where
-    incoming_arity = count isId bndrs
+    incoming_arity = count isId bndrs -- See Note [Eta reduction makes sense], point (2)
 
     go :: [Var]            -- Binders, innermost first, types [a3,a2,a1]
        -> CoreExpr         -- Of type tr
@@ -2433,13 +2547,6 @@ tryEtaReduce bndrs body
        -> Maybe CoreExpr   -- Of type a1 -> a2 -> a3 -> ts
     -- See Note [Eta reduction with casted arguments]
     -- for why we have an accumulating coercion
-    go [] fun co
-      | ok_fun fun
-      , let used_vars = exprFreeVars fun `unionVarSet` tyCoVarsOfCo co
-      , not (any (`elemVarSet` used_vars) bndrs)
-      = Just (mkCast fun co)   -- Check for any of the binders free in the result
-                               -- including the accumulated coercion
-
     go bs (Tick t e) co
       | tickishFloatable t
       = fmap (Tick t) $ go bs e co
@@ -2450,28 +2557,49 @@ tryEtaReduce bndrs body
       = fmap (flip (foldr mkTick) ticks) $ go bs fun co'
             -- Float arg ticks: \x -> e (Tick t x) ==> Tick t e
 
-    go _ _ _  = Nothing         -- Failure!
+    go remaining_bndrs fun co
+      | all isTyVar remaining_bndrs -- See Note [Eta reduction makes sense], point (1)
+      , remaining_bndrs `ltLength` bndrs
+      , ok_fun fun
+      , let used_vars = exprFreeVars fun `unionVarSet` tyCoVarsOfCo co
+            reduced_bndrs = mkVarSet (dropList remaining_bndrs bndrs)
+      , used_vars `disjointVarSet` reduced_bndrs
+          -- Check for any of the binders free in the result including the
+          -- accumulated coercion
+          -- See Note [Eta reduction makes sense], intro and point (1)
+      = Just $ mkLams (reverse remaining_bndrs) (mkCast fun co)
+
+    go _remaining_bndrs _fun  _  = -- pprTrace "tER fail" (ppr _fun $$ ppr _remaining_bndrs) $
+                                   Nothing
+
 
     ---------------
-    -- Note [Eta reduction conditions]
+    -- See Note [Eta reduction makes sense], point (1)
     ok_fun (App fun (Type {})) = ok_fun fun
     ok_fun (Cast fun _)        = ok_fun fun
     ok_fun (Tick _ expr)       = ok_fun expr
-    ok_fun (Var fun_id)        = ok_fun_id fun_id || all ok_lam bndrs
+    ok_fun (Var fun_id)        = is_eta_reduction_sound fun_id || all ok_lam bndrs
     ok_fun _fun                = False
 
     ---------------
-    ok_fun_id fun = -- There are arguments to reduce...
-                    fun_arity fun >= incoming_arity &&
-                    -- ... and the function can be eta reduced to arity 0
-                    canEtaReduceToArity fun 0 0
+    -- See Note [Eta reduction soundness], this is THE place to check soundness!
+    is_eta_reduction_sound fun =
+      -- Check that eta-reduction won't make the program stricter...
+      (fun_arity fun >= incoming_arity            -- criterion (A) and (E)
+        || all_calls_with_arity incoming_arity)   -- criterion (S)
+      -- ... and that the function can be eta reduced to arity 0
+      -- without violating invariants of Core and GHC
+      && canEtaReduceToArity fun 0 0              -- criteria (L), (J), (W), (B)
+    all_calls_with_arity n = isStrict (peelManyCalls n eval_sd)
+       -- See Note [Eta reduction based on evaluation context]
+
     ---------------
-    fun_arity fun             -- See Note [Arity care]
+    fun_arity fun -- See Note [Eta reduction makes sense], point (3)
        | isLocalId fun
        , isStrongLoopBreaker (idOccInfo fun) = 0
        | arity > 0                           = arity
        | isEvaldUnfolding (idUnfolding fun)  = 1
-            -- See Note [Eta reduction of an eval'd function]
+           -- See Note [Eta reduction soundness], criterion (E)
        | otherwise                           = 0
        where
          arity = idArity fun
@@ -2512,42 +2640,25 @@ tryEtaReduce bndrs body
     ok_arg _ _ _ _ = Nothing
 
 -- | Can we eta-reduce the given function to the specified arity?
--- See Note [Eta reduction conditions].
+-- See Note [Eta reduction soundness], criteria (B), (J), (W) and (L).
 canEtaReduceToArity :: Id -> JoinArity -> Arity -> Bool
 canEtaReduceToArity fun dest_join_arity dest_arity =
   not $
-        hasNoBinding fun
+        hasNoBinding fun -- (B)
        -- Don't undersaturate functions with no binding.
 
-    ||  ( isJoinId fun && dest_join_arity < idJoinArity fun )
+    ||  ( isJoinId fun && dest_join_arity < idJoinArity fun ) -- (J)
        -- Don't undersaturate join points.
        -- See Note [Invariants on join points] in GHC.Core, and #20599
 
-    || ( dest_arity < idCbvMarkArity fun )
+    || ( dest_arity < idCbvMarkArity fun ) -- (W)
        -- Don't undersaturate StrictWorkerIds.
        -- See Note [Strict Worker Ids] in GHC.CoreToStg.Prep.
 
-    ||  isLinearType (idType fun)
+    ||  isLinearType (idType fun) -- (L)
        -- Don't perform eta reduction on linear types.
        -- If `f :: A %1-> B` and `g :: A -> B`,
        -- then `g x = f x` is OK but `g = f` is not.
-       -- See Note [Eta reduction conditions].
-
-{-
-Note [Eta reduction of an eval'd function]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In Haskell it is not true that    f = \x. f x
-because f might be bottom, and 'seq' can distinguish them.
-
-But it *is* true that   f = f `seq` \x. f x
-and we'd like to simplify the latter to the former.  This amounts
-to the rule that
-  * when there is just *one* value argument,
-  * f is not bottom
-we can eta-reduce    \x. f x  ===>  f
-
-This turned up in #7542.
--}
 
 {- *********************************************************************
 *                                                                      *
diff --git a/compiler/GHC/Types/Demand.hs b/compiler/GHC/Types/Demand.hs
index c5a763a93a..733ca6819d 100644
--- a/compiler/GHC/Types/Demand.hs
+++ b/compiler/GHC/Types/Demand.hs
@@ -39,7 +39,7 @@ module GHC.Types.Demand (
     -- ** Other @Demand@ operations
     oneifyCard, oneifyDmd, strictifyDmd, strictifyDictDmd, lazifyDmd,
     peelCallDmd, peelManyCalls, mkCalledOnceDmd, mkCalledOnceDmds,
-    mkWorkerDemand,
+    mkWorkerDemand, subDemandIfEvaluated,
     -- ** Extracting one-shot information
     argOneShots, argsOneShots, saturatedByOneShots,
     -- ** Manipulating Boxity of a Demand
@@ -1026,6 +1026,12 @@ peelManyCalls 0 _                          = C_11
 peelManyCalls n (viewCall -> Just (m, sd)) = m `multCard` peelManyCalls (n-1) sd
 peelManyCalls _ _                          = C_0N
 
+-- | Extract the 'SubDemand' of a 'Demand'.
+-- PRECONDITION: The SubDemand must be used in a context where the expression
+-- denoted by the Demand is under evaluation.
+subDemandIfEvaluated :: Demand -> SubDemand
+subDemandIfEvaluated (_ :* sd) = sd
+
 -- See Note [Demand on the worker] in GHC.Core.Opt.WorkWrap
 mkWorkerDemand :: Int -> Demand
 mkWorkerDemand n = C_01 :* go n
diff --git a/testsuite/tests/simplCore/should_compile/T20040.hs b/testsuite/tests/simplCore/should_compile/T20040.hs
new file mode 100644
index 0000000000..a50323175f
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T20040.hs
@@ -0,0 +1,22 @@
+{-# LANGUAGE DataKinds           #-}
+{-# LANGUAGE GADTs               #-}
+{-# LANGUAGE RankNTypes          #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+{-# LANGUAGE TypeApplications    #-}
+{-# LANGUAGE BangPatterns        #-}
+
+module T20040 where
+
+import Data.Coerce
+
+data Nat = Z | S Nat
+
+data Vec n a where
+  Nil  :: Vec 'Z a
+  Cons :: a -> Vec n a -> Vec ('S n) a
+
+newtype Succ b n = Succ { unSucc :: b (S n) }
+
+ifoldl' :: forall b n a. (forall m. b m -> a -> b ('S m)) -> b 'Z -> Vec n a -> b n
+ifoldl' f  z Nil = z
+ifoldl' f !z (Cons x xs) = unSucc $ ifoldl' (\(Succ m) a -> Succ (f m a)) (Succ $ f z x) xs
diff --git a/testsuite/tests/simplCore/should_compile/T20040.stderr b/testsuite/tests/simplCore/should_compile/T20040.stderr
new file mode 100644
index 0000000000..3d4e827cbd
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T20040.stderr
@@ -0,0 +1,32 @@
+
+==================== Final STG: ====================
+$WNil = CCS_DONT_CARE Nil! [];
+
+$WCons = \r [conrep conrep] Cons [conrep conrep];
+
+unSucc1 = \r [ds] ds;
+
+unSucc = \r [eta] unSucc1 eta;
+
+Rec {
+ifoldl' =
+    \r [f z ds]
+        case ds of {
+          Nil -> z;
+          Cons ipv2 ipv3 ->
+              case z of z1 {
+              __DEFAULT ->
+              case f z1 ipv2 of sat { __DEFAULT -> ifoldl' f sat ipv3; };
+              };
+        };
+end Rec }
+
+Nil = \r [void] Nil [];
+
+Cons = \r [void eta eta] Cons [eta eta];
+
+Z = CCS_DONT_CARE Z! [];
+
+S = \r [eta] S [eta];
+
+
diff --git a/testsuite/tests/simplCore/should_compile/T21261.hs b/testsuite/tests/simplCore/should_compile/T21261.hs
new file mode 100644
index 0000000000..ae39c4b7d4
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21261.hs
@@ -0,0 +1,47 @@
+module T21261 where
+
+-- README: The convention here is that bindings starting with 'yes' should be
+-- eta-reduced and become trivial, while bindings starting with 'no' should not
+-- be eta-reduced.
+
+f1 :: (Int -> Int -> Int) -> Int
+f1 c = c 1 2 + c 3 4
+{-# NOINLINE f1 #-}
+yes1 :: (Int -> Int -> Int) -> Int
+yes1 c = f1 (\x -> c x)
+
+f2 :: (Int -> Int -> Int) -> Int
+f2 c = c 1 `seq` c 3 4
+{-# NOINLINE f2 #-}
+yes1or2 :: (Int -> Int -> Int) -> Int
+yes1or2 c = f2 c
+
+f3 :: (Int -> Int -> Int) -> Int
+f3 c = c 1 2 + c 3 4
+{-# NOINLINE f3 #-}
+yes2 :: (Int -> Int -> Int) -> Int
+yes2 c = f3 (\x y -> c x y)
+
+f4 :: (Int -> Int -> Int -> Int) -> Int
+f4 c = c 1 2 `seq` c 3 4 `seq` 42
+{-# NOINLINE f4 #-}
+no3 :: (Int -> Int -> Int -> Int) -> Int
+no3 c = f4 (\x y z -> c x y z)
+
+f5 :: (Int -> Int -> Int) -> Maybe Int
+f5 c = Just (c 1 2 + c 3 4)
+{-# NOINLINE f5 #-}
+yes2_lazy :: (Int -> Int -> Int) -> Maybe Int
+yes2_lazy c = f5 (\x y -> c x y)
+
+f6 :: (Int -> Int -> Int) -> Maybe Int
+f6 c = Just (c 1 `seq` c 3 4)
+{-# NOINLINE f6 #-}
+no2_lazy :: (Int -> Int -> Int) -> Maybe Int
+no2_lazy c = f6 (\x y -> c x y)
+
+f7 :: (Int -> Int -> Int) -> Int
+f7 c = c 1 `seq` c 2 3
+{-# NOINLINE f7 #-}
+not_quite_eta :: (Int -> Int -> Int) -> Int
+not_quite_eta c = f7 (\x y -> c x y)
diff --git a/testsuite/tests/simplCore/should_compile/T21261.stderr b/testsuite/tests/simplCore/should_compile/T21261.stderr
new file mode 100644
index 0000000000..779f769e43
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21261.stderr
@@ -0,0 +1,73 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 192, types: 201, coercions: 0, joins: 0/0}
+
+lvl = I# 3#
+
+lvl1 = I# 4#
+
+lvl2 = I# 1#
+
+f2 = \ c -> case c lvl2 of { __DEFAULT -> c lvl lvl1 }
+
+yes1or2 = f2
+
+lvl3 = I# 2#
+
+$wf4
+  = \ c ->
+      case c lvl2 lvl3 of { __DEFAULT ->
+      case c lvl lvl1 of { __DEFAULT -> 42# }
+      }
+
+f4 = \ c -> case $wf4 c of ww { __DEFAULT -> I# ww }
+
+no3
+  = \ c ->
+      case $wf4 (\ x y z -> c x y z) of ww { __DEFAULT -> I# ww }
+
+$wf6 = \ c -> (# case c lvl2 of { __DEFAULT -> c lvl lvl1 } #)
+
+f6 = \ c -> case $wf6 c of { (# ww #) -> Just ww }
+
+no2_lazy
+  = \ c -> case $wf6 (\ x y -> c x y) of { (# ww #) -> Just ww }
+
+f7 = \ c -> case c lvl2 of { __DEFAULT -> c lvl3 lvl }
+
+not_quite_eta = \ c -> f7 (\ x y -> c x y)
+
+$wf5
+  = \ c ->
+      (# case c lvl2 lvl3 of { I# x ->
+         case c lvl lvl1 of { I# y -> I# (+# x y) }
+         } #)
+
+f5 = \ c -> case $wf5 c of { (# ww #) -> Just ww }
+
+yes2_lazy
+  = \ c -> case $wf5 (\ x y -> c x y) of { (# ww #) -> Just ww }
+
+$wf3
+  = \ c ->
+      case c lvl2 lvl3 of { I# x ->
+      case c lvl lvl1 of { I# y -> +# x y }
+      }
+
+f3 = \ c -> case $wf3 c of ww { __DEFAULT -> I# ww }
+
+yes2 = f3
+
+$wf1
+  = \ c ->
+      case c lvl2 lvl3 of { I# x ->
+      case c lvl lvl1 of { I# y -> +# x y }
+      }
+
+f1 = \ c -> case $wf1 c of ww { __DEFAULT -> I# ww }
+
+yes1 = f1
+
+
+
diff --git a/testsuite/tests/simplCore/should_compile/all.T b/testsuite/tests/simplCore/should_compile/all.T
index 5a23e84c75..8cdf5a5417 100644
--- a/testsuite/tests/simplCore/should_compile/all.T
+++ b/testsuite/tests/simplCore/should_compile/all.T
@@ -392,3 +392,10 @@ test('OpaqueNoWW', normal, compile, ['-O -ddump-simpl -dsuppress-uniques'])
 
 test('T21144',  normal, compile, ['-O'])
 
+# Key here is that the argument to ifoldl' is eta-reduced in Core to
+#   `/\m. f @(S m)`
+# which will erase completely in STG
+test('T20040', [ grep_errmsg(r'ifoldl\''), expect_broken(20040) ], compile, ['-O -ddump-stg-final -dno-typeable-binds -dsuppress-all -dsuppress-uniques'])
+
+# Key here is that yes* become visibly trivial due to eta-reduction, while no* are not eta-reduced.
+test('T21261', [ grep_errmsg(r'^(yes|no)') ], compile, ['-O -ddump-simpl -dno-typeable-binds -dsuppress-all -dsuppress-uniques'])
author	Sebastian Graf <sebastian.graf@kit.edu>	2022-02-18 10:57:14 +0100
committer	Marge Bot <ben+marge-bot@smart-cactus.org>	2022-04-12 11:11:42 -0400
commit	0090ad7b8b436961fe1e225aae214d0ea1381c07 (patch)
tree	d56a67bbaa3c9cbd54de2f0d1f21f6d2b01d0e3b
parent	5440f63ec4a584b8805a8ff49ba1bd26bc2c032d (diff)
download	haskell-0090ad7b8b436961fe1e225aae214d0ea1381c07.tar.gz