Improve aggressive specialisation

This patch fixes #21286, by not unboxing dictionaries in worker/wrapper (ever). The main payload is tiny: * In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox dictionaries in `get_dmd`. See Note [Do not unbox class dictionaries] in that module * I also found that imported wrappers were being fruitlessly specialised, so I fixed that too, in canSpecImport. See Note [Specialising imported functions] point (2). In doing due diligence in the testsuite I fixed a number of other things: * Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make, and Note [Inline specialisations] in GHC.Core.Opt.Specialise, and remove duplication between the two. The new Note describes how we specialise functions with an INLINABLE pragma. And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`. * Improve Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap. And (critially) make an actual change which is to propagate the user-written pragma from the original function to the wrapper; see `mkStrWrapperInlinePrag`. * Write new Note [Specialising imported functions] in GHC.Core.Opt.Specialise All this has a big effect on some compile times. This is compiler/perf, showing only changes over 1%: Metrics: compile_time/bytes allocated ------------------------------------- LargeRecord(normal) -50.2% GOOD ManyConstructors(normal) +1.0% MultiLayerModulesTH_OneShot(normal) +2.6% PmSeriesG(normal) -1.1% T10547(normal) -1.2% T11195(normal) -1.2% T11276(normal) -1.0% T11303b(normal) -1.6% T11545(normal) -1.4% T11822(normal) -1.3% T12150(optasm) -1.0% T12234(optasm) -1.2% T13056(optasm) -9.3% GOOD T13253(normal) -3.8% GOOD T15164(normal) -3.6% GOOD T16190(normal) -2.1% T16577(normal) -2.8% GOOD T16875(normal) -1.6% T17836(normal) +2.2% T17977b(normal) -1.0% T18223(normal) -33.3% GOOD T18282(normal) -3.4% GOOD T18304(normal) -1.4% T18698a(normal) -1.4% GOOD T18698b(normal) -1.3% GOOD T19695(normal) -2.5% GOOD T5837(normal) -2.3% T9630(normal) -33.0% GOOD WWRec(normal) -9.7% GOOD hard_hole_fits(normal) -2.1% GOOD hie002(normal) +1.6% geo. mean -2.2% minimum -50.2% maximum +2.6% I diligently investigated some of the big drops. * Caused by not doing w/w for dictionaries: T13056, T15164, WWRec, T18223 * Caused by not fruitlessly specialising wrappers LargeRecord, T9630 For runtimes, here is perf/should+_run: Metrics: runtime/bytes allocated -------------------------------- T12990(normal) -3.8% T5205(normal) -1.3% T9203(normal) -10.7% GOOD haddock.Cabal(normal) +0.1% haddock.base(normal) -1.1% haddock.compiler(normal) -0.3% lazy-bs-alloc(normal) -0.2% ------------------------------------------ geo. mean -0.3% minimum -10.7% maximum +0.1% I did not investigate exactly what happens in T9203. Nofib is a wash: +-------------------------------++--+-----------+-----------+ | || | tsv (rel) | std. err. | +===============================++==+===========+===========+ | real/anna || | -0.13% | 0.0% | | real/fem || | +0.13% | 0.0% | | real/fulsom || | -0.16% | 0.0% | | real/lift || | -1.55% | 0.0% | | real/reptile || | -0.11% | 0.0% | | real/smallpt || | +0.51% | 0.0% | | spectral/constraints || | +0.20% | 0.0% | | spectral/dom-lt || | +1.80% | 0.0% | | spectral/expert || | +0.33% | 0.0% | +===============================++==+===========+===========+ | geom mean || | | | +-------------------------------++--+-----------+-----------+ I spent quite some time investigating dom-lt, but it's pretty complicated. See my note on !7847. Conclusion: it's just a delicate inlining interaction, and we have plenty of those. Metric Decrease: LargeRecord T13056 T13253 T15164 T16577 T18223 T18282 T18698a T18698b T19695 T9630 WWRec hard_hole_fits T9203
author: Simon Peyton Jones <simonpj@microsoft.com> 2022-07-27 14:59:55 +0100
committer: Marge Bot <ben+marge-bot@smart-cactus.org> 2022-09-28 17:49:09 -0400
commit: 2a53ac1877bbd29de432c0aca442904e9da96c4e (patch)
tree: e5b0ef5342d20003905933ca67cd59737a143d64
parent: b74b6191d7c442dffdfc9a9e2a6d476d7b3a28f2 (diff)
download: haskell-2a53ac1877bbd29de432c0aca442904e9da96c4e.tar.gz
17 files changed, 285 insertions, 226 deletions
diff --git a/compiler/GHC/Core/Opt/DmdAnal.hs b/compiler/GHC/Core/Opt/DmdAnal.hs
index fe98075487..86775592bb 100644
--- a/compiler/GHC/Core/Opt/DmdAnal.hs
+++ b/compiler/GHC/Core/Opt/DmdAnal.hs
@@ -1636,30 +1636,32 @@ the case on `x` up through the case on `burble`.
 
 Note [Do not unbox class dictionaries]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If we have
-   f :: Ord a => [a] -> Int -> a
-   {-# INLINABLE f #-}
-and we worker/wrapper f, we'll get a worker with an INLINABLE pragma
-(see Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap),
-which can still be specialised by the type-class specialiser, something like
-   fw :: Ord a => [a] -> Int# -> a
-
-BUT if f is strict in the Ord dictionary, we might unpack it, to get
-   fw :: (a->a->Bool) -> [a] -> Int# -> a
-and the type-class specialiser can't specialise that. An example is #6056.
-
-But in any other situation, a dictionary is just an ordinary value,
-and can be unpacked.  So we track the INLINABLE pragma, and discard the boxity
-flag in finaliseArgBoxities (see the isClassPred test).
-
-Historical note: #14955 describes how I got this fix wrong the first time.
-
-Note that the simplicity of this fix implies that INLINE functions (such as
-wrapper functions after the WW run) will never say that they unbox class
-dictionaries. That's not ideal, but not worth losing sleep over, as INLINE
-functions will have been inlined by the time we run demand analysis so we'll
-see the unboxing around the worker in client modules. I got aware of the issue
-in T5075 by the change in boxity of loop between demand analysis runs.
+We never unbox class dictionaries in worker/wrapper.
+
+1. INLINABLE functions
+   If we have
+      f :: Ord a => [a] -> Int -> a
+      {-# INLINABLE f #-}
+   and we worker/wrapper f, we'll get a worker with an INLINABLE pragma
+   (see Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap),
+   which can still be specialised by the type-class specialiser, something like
+      fw :: Ord a => [a] -> Int# -> a
+
+   BUT if f is strict in the Ord dictionary, we might unpack it, to get
+      fw :: (a->a->Bool) -> [a] -> Int# -> a
+   and the type-class specialiser can't specialise that. An example is #6056.
+
+   Historical note: #14955 describes how I got this fix wrong the first time.
+   I got aware of the issue in T5075 by the change in boxity of loop between
+   demand analysis runs.
+
+2. -fspecialise-aggressively.  As #21286 shows, the same phenomenon can occur
+   occur without INLINABLE, when we use -fexpose-all-unfoldings and
+   -fspecialise-aggressively to do vigorous cross-module specialisation.
+
+TL;DR we /never/ unbox class dictionaries. Unboxing the dictionary, and passing
+a raft of higher-order functions isn't a huge win anyway -- you really want to
+specialise the function.
 
 Note [Worker argument budget]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1725,7 +1727,6 @@ finaliseArgBoxities env fn arity rhs div
     -- uses the info on the binders directly.
   where
     opts            = ae_opts env
-    is_inlinable_fn = isStableUnfolding (realIdUnfolding fn)
     (bndrs, _body)  = collectBinders rhs
     max_wkr_args    = dmd_max_worker_args opts `max` arity
                       -- See Note [Worker argument budget]
@@ -1743,8 +1744,7 @@ finaliseArgBoxities env fn arity rhs div
 
     get_dmd :: Id -> Type -> Demand
     get_dmd bndr bndr_ty
-      | isClassPred bndr_ty
-      , is_inlinable_fn = trimBoxity dmd
+      | isClassPred bndr_ty = trimBoxity dmd
         -- See Note [Do not unbox class dictionaries]
         -- NB: 'ty' has not been normalised, so this will (rightly)
         --     catch newtype dictionaries too.
diff --git a/compiler/GHC/Core/Opt/Simplify/Iteration.hs b/compiler/GHC/Core/Opt/Simplify/Iteration.hs
index cfee2e5e56..afee252a40 100644
--- a/compiler/GHC/Core/Opt/Simplify/Iteration.hs
+++ b/compiler/GHC/Core/Opt/Simplify/Iteration.hs
@@ -668,9 +668,9 @@ tryCastWorkerWrapper env _ _ _ bndr rhs  -- All other bindings
 
 mkCastWrapperInlinePrag :: InlinePragma -> InlinePragma
 -- See Note [Cast worker/wrapper]
-mkCastWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info })
+mkCastWrapperInlinePrag (InlinePragma { inl_inline = fn_inl, inl_act = fn_act, inl_rule = rule_info })
   = InlinePragma { inl_src    = SourceText "{-# INLINE"
-                 , inl_inline = NoUserInlinePrag -- See Note [Wrapper NoUserInlinePrag]
+                 , inl_inline = fn_inl       -- See Note [Worker/wrapper for INLINABLE functions]
                  , inl_sat    = Nothing      --     in GHC.Core.Opt.WorkWrap
                  , inl_act    = wrap_act     -- See Note [Wrapper activation]
                  , inl_rule   = rule_info }  --     in GHC.Core.Opt.WorkWrap
@@ -678,8 +678,8 @@ mkCastWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info })
   where
     -- See Note [Wrapper activation] in GHC.Core.Opt.WorkWrap
     -- But simpler, because we don't need to disable during InitialPhase
-    wrap_act | isNeverActive act = activateDuringFinal
-             | otherwise         = act
+    wrap_act | isNeverActive fn_act = activateDuringFinal
+             | otherwise            = fn_act
 
 
 {- *********************************************************************
@@ -4161,23 +4161,14 @@ simplStableUnfolding env bind_cxt id rhs_ty id_arity unf
                                         ; expr' <- simplExprC unf_env expr cont
                                         ; return (eta_expand expr') }
               ; case guide of
-                  UnfWhen { ug_arity = arity
-                          , ug_unsat_ok = sat_ok
-                          , ug_boring_ok = boring_ok
-                          }
-                          -- Happens for INLINE things
-                        -- Really important to force new_boring_ok as otherwise
-                        -- `ug_boring_ok` is a thunk chain of
-                        -- inlineBoringExprOk expr0
-                        --  || inlineBoringExprOk expr1 || ...
-                        --  See #20134
+                  UnfWhen { ug_boring_ok = boring_ok }
+                     -- Happens for INLINE things
+                     -- Really important to force new_boring_ok since otherwise
+                     --   `ug_boring_ok` is a thunk chain of
+                     --   inlineBoringExprOk expr0 || inlineBoringExprOk expr1 || ...
+                     -- See #20134
                      -> let !new_boring_ok = boring_ok || inlineBoringOk expr'
-                            guide' =
-                              UnfWhen { ug_arity = arity
-                                      , ug_unsat_ok = sat_ok
-                                      , ug_boring_ok = new_boring_ok
-
-                                      }
+                            guide' = guide { ug_boring_ok = new_boring_ok }
                         -- Refresh the boring-ok flag, in case expr'
                         -- has got small. This happens, notably in the inlinings
                         -- for dfuns for single-method classes; see
diff --git a/compiler/GHC/Core/Opt/Specialise.hs b/compiler/GHC/Core/Opt/Specialise.hs
index 4578f2681d..6708922495 100644
--- a/compiler/GHC/Core/Opt/Specialise.hs
+++ b/compiler/GHC/Core/Opt/Specialise.hs
@@ -647,6 +647,37 @@ See #10491
 *                                                                      *
 ********************************************************************* -}
 
+{- Note [Specialising imported functions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+specImports specialises imported functions, based on calls in this module.
+
+When -fspecialise-aggressively is on, we specialise any imported
+function for which we have an unfolding.  The
+-fspecialise-aggressively flag is usually off, because we risk lots of
+orphan modules from over-vigorous specialisation.  (See Note [Orphans]
+in GHC.Core.) However it's not a big deal: anything non-recursive with
+an unfolding-template will probably have been inlined already.
+
+When -fspecialise-aggressively is off, we are more selective about
+specialisation (see canSpecImport):
+
+(1) Without -fspecialise-aggressively, do not specialise
+    DFunUnfoldings. Note [Do not specialise imported DFuns].
+
+(2) Without -fspecialise-aggressively, specialise only imported things
+    that have a /user-supplied/ INLINE or INLINABLE pragma (hence
+    isAnyInlinePragma rather than isStableSource).
+
+    In particular, we don't want to specialise workers created by
+    worker/wrapper (for functions with no pragma) because they won't
+    specialise usefully, and they generate quite a bit of useless code
+    bloat.
+
+    Specialise even INLINE things; it hasn't inlined yet, so perhaps
+    it never will.  Moreover it may have calls inside it that we want
+    to specialise
+-}
+
 specImports :: SpecEnv
             -> [CoreRule]
             -> UsageDetails
@@ -768,23 +799,17 @@ spec_import top_env callers rb dict_binds cis@(CIS fn _)
        -- See Note [Avoiding loops in specImports]
 
 canSpecImport :: DynFlags -> Id -> Maybe CoreExpr
--- See Note [Specialise imported INLINABLE things]
 canSpecImport dflags fn
-  | CoreUnfolding { uf_src = src, uf_tmpl = rhs } <- unf
-  , isStableSource src
-  = Just rhs   -- By default, specialise only imported things that have a stable
-               -- unfolding; that is, have an INLINE or INLINABLE pragma
-               -- Specialise even INLINE things; it hasn't inlined yet,
-               -- so perhaps it never will.  Moreover it may have calls
-               -- inside it that we want to specialise
-
-    -- CoreUnfolding case does /not/ include DFunUnfoldings;
-    -- We only specialise DFunUnfoldings with -fspecialise-aggressively
-    -- See Note [Do not specialise imported DFuns]
+  | CoreUnfolding { uf_tmpl = rhs } <- unf
+    -- See Note [Specialising imported functions] point (1).
+  , isAnyInlinePragma (idInlinePragma fn)
+    -- See Note [Specialising imported functions] point (2).
+  = Just rhs
 
   | gopt Opt_SpecialiseAggressively dflags
-  = maybeUnfoldingTemplate unf  -- With -fspecialise-aggressively, specialise anything
-                                -- with an unfolding, stable or not, DFun or not
+  = maybeUnfoldingTemplate unf
+    -- With -fspecialise-aggressively, specialise anything
+    -- with an unfolding, stable or not, DFun or not
 
   | otherwise = Nothing
   where
@@ -1021,20 +1046,6 @@ And if the call is to the same type, one specialisation is enough.
 Avoiding this recursive specialisation loop is one reason for the
 'callers' stack passed to specImports and specImport.
 
-Note [Specialise imported INLINABLE things]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-What imported functions do we specialise?  The basic set is
- * DFuns and things with INLINABLE pragmas.
-but with -fspecialise-aggressively we add
- * Anything with an unfolding template
-
-#8874 has a good example of why we want to auto-specialise DFuns.
-
-We have the -fspecialise-aggressively flag (usually off), because we
-risk lots of orphan modules from over-vigorous specialisation.
-However it's not a big deal: anything non-recursive with an
-unfolding-template will probably have been inlined already.
-
 Note [Glom the bindings if imported functions are specialised]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Suppose we have an imported, *recursive*, INLINABLE function
@@ -1604,19 +1615,16 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
                 simpl_opts = initSimpleOpts dflags
 
                 --------------------------------------
-                -- Add a suitable unfolding if the spec_inl_prag says so
-                -- See Note [Inline specialisations]
-                (spec_inl_prag, spec_unf)
-                  | not is_local && isStrongLoopBreaker (idOccInfo fn)
-                  = (neverInlinePragma, noUnfolding)
-                        -- See Note [Specialising imported functions] in "GHC.Core.Opt.OccurAnal"
-
-                  | isInlinablePragma inl_prag
-                  = (inl_prag { inl_inline = NoUserInlinePrag }, noUnfolding)
-
+                -- Add a suitable unfolding; see Note [Inline specialisations]
+                spec_unf = specUnfolding simpl_opts spec_bndrs (`mkApps` spec_args)
+                                         rule_lhs_args fn_unf
+
+                spec_inl_prag
+                  | not is_local     -- See Note [Specialising imported functions]
+                  , isStrongLoopBreaker (idOccInfo fn) -- in GHC.Core.Opt.OccurAnal
+                  = neverInlinePragma
                   | otherwise
-                  = (inl_prag, specUnfolding simpl_opts spec_bndrs (`mkApps` spec_args)
-                                             rule_lhs_args fn_unf)
+                  = inl_prag
 
                 --------------------------------------
                 -- Adding arity information just propagates it a bit faster
@@ -2172,24 +2180,17 @@ So we suppress the WARN if the rhs is trivial.
 Note [Inline specialisations]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Here is what we do with the InlinePragma of the original function
-  * Activation/RuleMatchInfo: both transferred to the
-                              specialised function
-  * InlineSpec:
-       (a) An INLINE pragma is transferred
-       (b) An INLINABLE pragma is *not* transferred
-
-Why (a): transfer INLINE pragmas? The point of INLINE was precisely to
-specialise the function at its call site, and arguably that's not so
-important for the specialised copies.  BUT *pragma-directed*
-specialisation now takes place in the typechecker/desugarer, with
-manually specified INLINEs.  The specialisation here is automatic.
-It'd be very odd if a function marked INLINE was specialised (because
-of some local use), and then forever after (including importing
-modules) the specialised version wasn't INLINEd.  After all, the
-programmer said INLINE!
-
-You might wonder why we specialise INLINE functions at all.  After
-all they should be inlined, right?  Two reasons:
+
+  * Activation/RuleMatchInfo: both inherited from the original function
+
+  * InlineSpec: inherit from original function
+
+  * Unfolding: transfer a StableUnfolding iff it is UnfWhen
+               See GHC.Core.Unfold.Make.specUnfolding
+               and its Note [Specialising unfoldings]
+
+InlineSpec: you might wonder why we specialise INLINE functions at all.
+After all they should be inlined, right?  Two reasons:
 
  * Even INLINE functions are sometimes not inlined, when they aren't
    applied to interesting arguments.  But perhaps the type arguments
@@ -2215,26 +2216,6 @@ all they should be inlined, right?  Two reasons:
    (replicateM_ dMonadIO).  We certainly want to specialise $wreplicateM_!
    This particular example had a huge effect on the call to replicateM_
    in nofib/shootout/n-body.
-
-Why (b): discard INLINABLE pragmas? See #4874 for persuasive examples.
-Suppose we have
-    {-# INLINABLE f #-}
-    f :: Ord a => [a] -> Int
-    f xs = letrec f' = ...f'... in f'
-Then, when f is specialised and optimised we might get
-    wgo :: [Int] -> Int#
-    wgo = ...wgo...
-    f_spec :: [Int] -> Int
-    f_spec xs = case wgo xs of { r -> I# r }
-and we clearly want to inline f_spec at call sites.  But if we still
-have the big, un-optimised of f (albeit specialised) captured in an
-INLINABLE pragma for f_spec, we won't get that optimisation.
-
-So we simply drop INLINABLE pragmas when specialising. It's not really
-a complete solution; ignoring specialisation for now, INLINABLE functions
-don't get properly strictness analysed, for example. But it works well
-for examples involving specialisation, which is the dominant use of
-INLINABLE.  See #4874.
 -}
 
 {- *********************************************************************
diff --git a/compiler/GHC/Core/Opt/WorkWrap.hs b/compiler/GHC/Core/Opt/WorkWrap.hs
index 27d85d0545..fc1d9e2785 100644
--- a/compiler/GHC/Core/Opt/WorkWrap.hs
+++ b/compiler/GHC/Core/Opt/WorkWrap.hs
@@ -195,19 +195,47 @@ will not be specialised at call sites in other modules.
 
 This comes up in practice (#6056).
 
-Solution: do the w/w for strictness analysis, but transfer the Stable
-unfolding to the *worker*.  So we will get something like this:
+Solution:
 
-  {-# INLINE[2] f #-}
+* Do the w/w for strictness analysis, even for INLINABLE functions
+
+* Transfer the Stable unfolding to the *worker*.  How do we "transfer
+  the unfolding"? Easy: by using the old one, wrapped in work_fn! See
+  GHC.Core.Unfold.Make.mkWorkerUnfolding.
+
+* We use the /original, user-specified/ function's InlineSpec pragma
+  for both the wrapper and the worker (see `mkStrWrapperInlinePrag`).
+  So if f is INLINEABLE, both worker and wrapper will get an InlineSpec
+  of (Inlinable "blah").
+
+  It's important that both get this, because the specialiser uses
+  the existence of a /user-specified/ INLINE/INLINABLE pragma to
+  drive specialiation of imported functions.  See  GHC.Core.Opt.Specialise
+  Note [Specialising imported functions]
+
+* Remember, the subsequent inlining behaviour of the wrapper is expressed by
+  (a) the stable unfolding
+  (b) the unfolding guidance of UnfWhen
+  (c) the inl_act activation (see Note [Wrapper activation]
+
+For our {-# INLINEABLE f #-} example above, we will get something a
+bit like like this:
+
+  {-# Has stable unfolding, active in phase 2;
+      plus InlineSpec = INLINEABLE #-}
   f :: Ord a => [a] -> Int -> a
   f d x y = case y of I# y' -> fw d x y'
 
-  {-# INLINABLE[2] fw #-}
+  {-# Has stable unfolding, plus InlineSpec = INLINEABLE #-}
   fw :: Ord a => [a] -> Int# -> a
   fw d x y' = let y = I# y' in ...f...
 
-How do we "transfer the unfolding"? Easy: by using the old one, wrapped
-in work_fn! See GHC.Core.Unfold.Make.mkWorkerUnfolding.
+
+(Historical note: we used to always give the wrapper an INLINE pragma,
+but CSE will not happen if there is a user-specified pragma, but
+should happen for w/w’ed things (#14186).  But now we simply propagate
+any user-defined pragma info, so we'll defeat CSE (rightly) only when
+there is a user-supplied INLINE/INLINEABLE pragma.)
 
 Note [No worker/wrapper for record selectors]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -496,19 +524,6 @@ fine.  (To work reliably, `foo` would need an INLINABLE pragma,
 in which case we don't unpack dictionaries for the worker; see
 see Note [Do not unbox class dictionaries].)
 
-Note [Wrapper NoUserInlinePrag]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We use NoUserInlinePrag on the wrapper, to say that there is no
-user-specified inline pragma. (The worker inherits that; see Note
-[Worker/wrapper for INLINABLE functions].)  The wrapper has no pragma
-given by the user.
-
-(Historical note: we used to give the wrapper an INLINE pragma, but
-CSE will not happen if there is a user-specified pragma, but should
-happen for w/w’ed things (#14186).  We don't need a pragma, because
-everything we needs is expressed by (a) the stable unfolding and (b)
-the inl_act activation.)
-
 Note [Drop absent bindings]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider (#19824):
@@ -872,17 +887,19 @@ mkWWBindPair ww_opts fn_id fn_info fn_args fn_body work_uniq div
     fn_rules        = ruleInfoRules (ruleInfo fn_info)
 
 mkStrWrapperInlinePrag :: InlinePragma -> [CoreRule] -> InlinePragma
--- See Note [Wrapper activation]
-mkStrWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info }) rules
+mkStrWrapperInlinePrag (InlinePragma { inl_inline = fn_inl
+                                     , inl_act    = fn_act
+                                     , inl_rule   = rule_info }) rules
   = InlinePragma { inl_src    = SourceText "{-# INLINE"
-                 , inl_inline = NoUserInlinePrag -- See Note [Wrapper NoUserInlinePrag]
                  , inl_sat    = Nothing
+                 , inl_inline = fn_inl -- See Note [Worker/wrapper for INLINABLE functions]
                  , inl_act    = activeAfter wrapper_phase
+                                -- See Note [Wrapper activation]
                  , inl_rule   = rule_info }  -- RuleMatchInfo is (and must be) unaffected
   where
     -- See Note [Wrapper activation]
     wrapper_phase = foldr (laterPhase . get_rule_phase) earliest_inline_phase rules
-    earliest_inline_phase = beginPhase act `laterPhase` nextPhase InitialPhase
+    earliest_inline_phase = beginPhase fn_act `laterPhase` nextPhase InitialPhase
           -- laterPhase (nextPhase InitialPhase) is a temporary hack
           -- to inline no earlier than phase 2.  I got regressions in
           -- 'mate', due to changes in full laziness due to Note [Case
diff --git a/compiler/GHC/Core/Unfold/Make.hs b/compiler/GHC/Core/Unfold/Make.hs
index ae20cb8a26..538af3db3d 100644
--- a/compiler/GHC/Core/Unfold/Make.hs
+++ b/compiler/GHC/Core/Unfold/Make.hs
@@ -193,25 +193,40 @@ specUnfolding to specialise its unfolding.  Some important points:
   must do so too!  Otherwise we lose the magic rules that make it
   interact with ClassOps
 
-* There is a bit of hack for INLINABLE functions:
-     f :: Ord a => ....
-     f = <big-rhs>
-     {- INLINABLE f #-}
-  Now if we specialise f, should the specialised version still have
-  an INLINABLE pragma?  If it does, we'll capture a specialised copy
-  of <big-rhs> as its unfolding, and that probably won't inline.  But
-  if we don't, the specialised version of <big-rhs> might be small
-  enough to inline at a call site. This happens with Control.Monad.liftM3,
-  and can cause a lot more allocation as a result (nofib n-body shows this).
-
-  Moreover, keeping the INLINABLE thing isn't much help, because
+* For a /stable/ CoreUnfolding, we specialise the unfolding, no matter
+  how big, iff it has UnfWhen guidance.  This happens for INLINE
+  functions, and for wrappers.  For these, it would be very odd if a
+  function marked INLINE was specialised (because of some local use),
+  and then forever after (including importing modules) the specialised
+  version wasn't INLINEd!  After all, the programmer said INLINE.
+
+* However, for a stable CoreUnfolding with guidance UnfoldIfGoodArgs,
+  which arises from INLINABLE functions, we drop the unfolding.
+  See #4874 for persuasive examples.  Suppose we have
+    {-# INLINABLE f #-}
+    f :: Ord a => [a] -> Int f xs = letrec f' = ...f'... in f'
+
+  Then, when f is specialised and optimised we might get
+    wgo :: [Int] -> Int#
+    wgo = ...wgo...
+    f_spec :: [Int] -> Int
+    f_spec xs = case wgo xs of { r -> I# r }
+
+  and we clearly want to inline f_spec at call sites.  But if we still
+  have the big, un-optimised of f (albeit specialised) captured in the
+  stable unfolding for f_spec, we won't get that optimisation.
+
+  This happens with Control.Monad.liftM3, and can cause a lot more
+  allocation as a result (nofib n-body shows this).
+
+  Moreover, keeping the stable unfoldign isn't much help, because
   the specialised function (probably) isn't overloaded any more.
 
-  Conclusion: drop the INLINEABLE pragma.  In practice what this means is:
-     if a stable unfolding has UnfoldingGuidance of UnfWhen,
-        we keep it (so the specialised thing too will always inline)
-     if a stable unfolding has UnfoldingGuidance of UnfIfGoodArgs
-        (which arises from INLINABLE), we discard it
+  TL;DR: we simply drop the stable unfolding when specialising. It's
+  not really a complete solution; ignoring specialisation for now,
+  INLINABLE functions don't get properly strictness analysed, for
+  example. But it works well for examples involving specialisation,
+  which is the dominant use of INLINABLE.
 
 Note [Honour INLINE on 0-ary bindings]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/testsuite/tests/arityanal/should_compile/Arity05.stderr b/testsuite/tests/arityanal/should_compile/Arity05.stderr
index 91c909ecc6..17a0fb668a 100644
--- a/testsuite/tests/arityanal/should_compile/Arity05.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity05.stderr
@@ -1,6 +1,6 @@
 
 ==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 56, types: 87, coercions: 0, joins: 0/0}
+Result size of Tidy Core = {terms: 42, types: 44, coercions: 0, joins: 0/0}
 
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 F5.f5g1 :: Integer
@@ -9,26 +9,13 @@ F5.f5g1 = GHC.Num.Integer.IS 1#
 
 -- RHS size: {terms: 12, types: 9, coercions: 0, joins: 0/0}
 f5g :: forall {a} {t}. Num a => (t -> a) -> t -> a
-[GblId,
- Arity=3,
- Str=<SP(1C1(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)
-         Tmpl= \ (@a) (@t) ($dNum :: Num a) (h [Occ=Once1!] :: t -> a) (z [Occ=Once1] :: t) -> + @a $dNum (h z) (fromInteger @a $dNum F5.f5g1)}]
+[GblId, Arity=3, Str=<SP(1C1(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 60 0] 90 0}]
 f5g = \ (@a) (@t) ($dNum :: Num a) (h :: t -> a) (z :: t) -> + @a $dNum (h z) (fromInteger @a $dNum F5.f5g1)
 
--- RHS size: {terms: 15, types: 12, coercions: 0, joins: 0/0}
-F5.$wf5h [InlPrag=[2]] :: forall {a} {t}. (a -> a -> a) -> (Integer -> a) -> (t -> a) -> t -> (t -> a) -> a
-[GblId, Arity=5, Str=<SCS(C1(L))><MC1(L)><MC1(L)><L><MC1(L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 60 60 0 60] 120 0}]
-F5.$wf5h = \ (@a) (@t) (ww :: a -> a -> a) (ww1 :: Integer -> a) (f :: t -> a) (x :: t) (g :: t -> a) -> ww (f x) (ww (g x) (ww1 F5.f5g1))
-
--- RHS size: {terms: 15, types: 30, coercions: 0, joins: 0/0}
-f5h [InlPrag=[2]] :: forall {a} {t}. Num a => (t -> a) -> t -> (t -> a) -> a
-[GblId,
- Arity=4,
- Str=<1P(SCS(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L><MC1(L)>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=4,unsat_ok=True,boring_ok=False)
-         Tmpl= \ (@a) (@t) ($dNum [Occ=Once1!] :: Num a) (f [Occ=Once1] :: t -> a) (x [Occ=Once1] :: t) (g [Occ=Once1] :: t -> a) -> case $dNum of { GHC.Num.C:Num ww [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] ww6 [Occ=Once1] -> F5.$wf5h @a @t ww ww6 f x g }}]
-f5h = \ (@a) (@t) ($dNum :: Num a) (f :: t -> a) (x :: t) (g :: t -> a) -> case $dNum of { GHC.Num.C:Num ww ww1 ww2 ww3 ww4 ww5 ww6 -> F5.$wf5h @a @t ww ww6 f x g }
+-- RHS size: {terms: 17, types: 12, coercions: 0, joins: 0/0}
+f5h :: forall {a} {t}. Num a => (t -> a) -> t -> (t -> a) -> a
+[GblId, Arity=4, Str=<SP(SCS(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L><MC1(L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [90 60 0 60] 150 0}]
+f5h = \ (@a) (@t) ($dNum :: Num a) (f :: t -> a) (x :: t) (g :: t -> a) -> + @a $dNum (f x) (+ @a $dNum (g x) (fromInteger @a $dNum F5.f5g1))
 
 -- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
 f5y :: Integer -> Integer
diff --git a/testsuite/tests/arityanal/should_compile/Arity11.stderr b/testsuite/tests/arityanal/should_compile/Arity11.stderr
index 6144198ac4..82b162e531 100644
--- a/testsuite/tests/arityanal/should_compile/Arity11.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity11.stderr
@@ -1,6 +1,6 @@
 
 ==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 149, types: 104, coercions: 0, joins: 2/7}
+Result size of Tidy Core = {terms: 136, types: 75, coercions: 0, joins: 2/7}
 
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 F11.fib3 :: Integer
@@ -51,11 +51,11 @@ F11.fib1 :: Integer
 [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 F11.fib1 = GHC.Num.Integer.IS 0#
 
--- RHS size: {terms: 52, types: 26, coercions: 0, joins: 0/5}
-F11.$wfib [InlPrag=[2]] :: forall {t} {a}. (t -> t -> Bool) -> (Num t, Num a) => t -> a
-[GblId, Arity=4, Str=<SCS(C1(L))><LP(A,LCL(C1(L)),A,A,A,A,L)><LP(LCS(C1(L)),A,A,A,A,A,MC1(L))><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 150 60 0] 460 0}]
-F11.$wfib
-  = \ (@t) (@a) (ww :: t -> t -> Bool) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) ->
+-- RHS size: {terms: 54, types: 27, coercions: 0, joins: 0/5}
+fib :: forall {t} {a}. (Eq t, Num t, Num a) => t -> a
+[GblId, Arity=4, Str=<SP(SCS(C1(L)),A)><LP(A,LCL(C1(L)),A,A,A,A,L)><LP(LCS(C1(L)),A,A,A,A,A,MC1(L))><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 150 60 0] 480 0}]
+fib
+  = \ (@t) (@a) ($dEq :: Eq t) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) ->
       let {
         lvl :: t
         [LclId]
@@ -77,9 +77,9 @@ F11.$wfib
         [LclId, Arity=1, Str=<L>, Unf=OtherCon []]
         fib4
           = \ (ds :: t) ->
-              case ww ds lvl3 of {
+              case == @t $dEq ds lvl3 of {
                 False ->
-                  case ww ds lvl of {
+                  case == @t $dEq ds lvl of {
                     False -> + @a $dNum1 (fib4 (- @t $dNum ds lvl)) (fib4 (- @t $dNum ds lvl1));
                     True -> lvl2
                   };
@@ -87,15 +87,6 @@ F11.$wfib
               }; } in
       fib4 eta
 
--- RHS size: {terms: 14, types: 19, coercions: 0, joins: 0/0}
-fib [InlPrag=[2]] :: forall {t} {a}. (Eq t, Num t, Num a) => t -> a
-[GblId,
- Arity=4,
- Str=<1P(SCS(C1(L)),A)><LP(A,LCL(C1(L)),A,A,A,A,L)><LP(LCS(C1(L)),A,A,A,A,A,LCS(L))><L>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=4,unsat_ok=True,boring_ok=False)
-         Tmpl= \ (@t) (@a) ($dEq [Occ=Once1!] :: Eq t) ($dNum [Occ=Once1] :: Num t) ($dNum1 [Occ=Once1] :: Num a) (eta [Occ=Once1] :: t) -> case $dEq of { GHC.Classes.C:Eq ww [Occ=Once1] _ [Occ=Dead] -> F11.$wfib @t @a ww $dNum $dNum1 eta }}]
-fib = \ (@t) (@a) ($dEq :: Eq t) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) -> case $dEq of { GHC.Classes.C:Eq ww ww1 -> F11.$wfib @t @a ww $dNum $dNum1 eta }
-
 -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
 F11.f3 :: Integer
 [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
diff --git a/testsuite/tests/arityanal/should_compile/Arity14.stderr b/testsuite/tests/arityanal/should_compile/Arity14.stderr
index 966106bdbc..6fccde58a1 100644
--- a/testsuite/tests/arityanal/should_compile/Arity14.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity14.stderr
@@ -1,6 +1,6 @@
 
 ==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 57, types: 81, coercions: 0, joins: 0/3}
+Result size of Tidy Core = {terms: 44, types: 38, coercions: 0, joins: 0/3}
 
 -- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
 F14.f1 :: forall {t}. t -> t
@@ -12,11 +12,11 @@ F14.f2 :: Integer
 [GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
 F14.f2 = GHC.Num.Integer.IS 1#
 
--- RHS size: {terms: 35, types: 23, coercions: 0, joins: 0/3}
-F14.$wf14 [InlPrag=[2]] :: forall {t}. (t -> t -> Bool) -> Num t => t -> t -> t -> t
-[GblId, Arity=4, Str=<SCS(C1(L))><LP(LCL(C1(L)),A,A,A,A,A,MC1(L))><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 90 0 0] 300 0}]
-F14.$wf14
-  = \ (@t) (ww :: t -> t -> Bool) ($dNum :: Num t) (eta :: t) (eta1 :: t) ->
+-- RHS size: {terms: 36, types: 23, coercions: 0, joins: 0/3}
+f14 :: forall {t}. (Ord t, Num t) => t -> t -> t -> t
+[GblId, Arity=4, Str=<SP(A,A,SCS(C1(L)),A,A,A,A,A)><LP(LCL(C1(L)),A,A,A,A,A,MC1(L))><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 90 0 0] 310 0}]
+f14
+  = \ (@t) ($dOrd :: Ord t) ($dNum :: Num t) (eta :: t) (eta1 :: t) ->
       let {
         lvl :: t
         [LclId]
@@ -26,7 +26,7 @@ F14.$wf14
         [LclId, Arity=2, Str=<L><L>, Unf=OtherCon []]
         f3
           = \ (n :: t) (x :: t) ->
-              case ww x n of {
+              case < @t $dOrd x n of {
                 False -> F14.f1 @t;
                 True ->
                   let {
@@ -37,14 +37,5 @@ F14.$wf14
               }; } in
       f3 eta eta1
 
--- RHS size: {terms: 13, types: 33, coercions: 0, joins: 0/0}
-f14 [InlPrag=[2]] :: forall {t}. (Ord t, Num t) => t -> t -> t -> t
-[GblId,
- Arity=4,
- Str=<1P(A,A,SCS(C1(L)),A,A,A,A,A)><LP(LCL(C1(L)),A,A,A,A,A,LCS(L))><L><L>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=4,unsat_ok=True,boring_ok=False)
-         Tmpl= \ (@t) ($dOrd [Occ=Once1!] :: Ord t) ($dNum [Occ=Once1] :: Num t) (eta [Occ=Once1] :: t) (eta1 [Occ=Once1] :: t) -> case $dOrd of { GHC.Classes.C:Ord _ [Occ=Dead] _ [Occ=Dead] ww2 [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] -> F14.$wf14 @t ww2 $dNum eta eta1 }}]
-f14 = \ (@t) ($dOrd :: Ord t) ($dNum :: Num t) (eta :: t) (eta1 :: t) -> case $dOrd of { GHC.Classes.C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> F14.$wf14 @t ww2 $dNum eta eta1 }
-
 
 
diff --git a/testsuite/tests/indexed-types/should_compile/T7837.stderr b/testsuite/tests/indexed-types/should_compile/T7837.stderr
index dec839f72b..e2fcedcd2b 100644
--- a/testsuite/tests/indexed-types/should_compile/T7837.stderr
+++ b/testsuite/tests/indexed-types/should_compile/T7837.stderr
@@ -1,7 +1,3 @@
 Rule fired: Class op signum (BUILTIN)
 Rule fired: Class op abs (BUILTIN)
 Rule fired: normalize/Double (T7837)
-Rule fired: Class op eq_sel (BUILTIN)
-Rule fired: Class op $p1Norm (BUILTIN)
-Rule fired: Class op / (BUILTIN)
-Rule fired: Class op norm (BUILTIN)
diff --git a/testsuite/tests/perf/compiler/LargeRecord.hs b/testsuite/tests/perf/compiler/LargeRecord.hs
index 11a35593ab..c31ef0ed08 100644
--- a/testsuite/tests/perf/compiler/LargeRecord.hs
+++ b/testsuite/tests/perf/compiler/LargeRecord.hs
@@ -9,6 +9,49 @@
 
 {-# OPTIONS_GHC -freduction-depth=0 #-}
 
+{- Notes on LargeRecord
+~~~~~~~~~~~~~~~~~~~~~~~
+I noticed that in GHC of July 2022, when compiling this
+module I got lots of "SPEC" rules
+
+      SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f2"
+                                                 @'["f1" := Int, "f2" := Int, "f3" := Int,
+                                                    "f4" := Int]
+                                                 @Int
+                                                 @'["f2" := Int, "f3" := Int, "f4" := Int]
+                                                 @'["f3" := Int, "f4" := Int]
+                                                 $d(%,,%)_X1 $d(%,,%)1_X2 $dRecCopy_X3
+      SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f3"
+                                                 @'["f1" := Int, "f2" := Int, "f3" := Int,
+                                                    "f4" := Int]
+                                                 @Int
+                                                 @'["f2" := Int, "f3" := Int, "f4" := Int]
+                                                 @'["f4" := Int]
+                                                 $d(%,,%)_X1 $d(%,,%)1_s6yK $dRecCopy_X2
+
+      SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f3"
+                                                 @'["f2" := Int, "f3" := Int, "f4" := Int]
+                                                 @Int
+                                                 @'["f3" := Int, "f4" := Int]
+                                                 @'["f4" := Int]
+                                                 $d(%,,%)_s6yr $d(%,,%)1_X1 $dRecCopy_X2
+      SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f4"
+                                                 @(SortInsert'
+                                                     (GHC.TypeLits.Internal.CmpSymbol "f3" "f4")
+                                                     ("f3" := Int)
+                                                     ("f4" := Int)
+                                                     '[])
+                                                 @Int
+                                                 @'["f4" := Int]
+                                                 @'[]
+                                                 $d(%,,%)_X1 $d(%,,%)1_X2 $dRecCopy_s6yb
+
+(This was with BigFieldList having only four elements.)
+
+The relevant function SuperRecord.$fRecCopy:ltsrts_$crecCopyInto is
+only a wrapper that we were specialising -- little or no benefit.  We
+don't want to specialise wrappers!  -}
+
 module DCo_Record where
 
 import SuperRecord
diff --git a/testsuite/tests/perf/compiler/T18223.hs b/testsuite/tests/perf/compiler/T18223.hs
index 3e160cc957..cb7374bc78 100644
--- a/testsuite/tests/perf/compiler/T18223.hs
+++ b/testsuite/tests/perf/compiler/T18223.hs
@@ -1,6 +1,20 @@
 {-# LANGUAGE PartialTypeSignatures #-}
 {-# LANGUAGE Strict #-}
 
+{- Notes on T18223
+~~~~~~~~~~~~~~~~~~
+If we inline
+      modify' :: MonadState s m => (s -> s) -> m ()
+early, before the specialiser, the casts all collapse immediately.
+It turns out that fixing #21286 causes this to happen, because
+we no longer w/w modify'.
+
+If we don't inline it before the specialiser we generate
+a specialised version of it.  Then it gets inlined and all
+the casts collapse, but we end up keeping the code for the
+specialised version right through the pipeline.
+-}
+
 import Control.Monad.State
 
 tester :: MonadState a m => m ()
diff --git a/testsuite/tests/simplCore/should_compile/T21261.stderr b/testsuite/tests/simplCore/should_compile/T21261.stderr
index 2af8b37a5d..fadd73c219 100644
--- a/testsuite/tests/simplCore/should_compile/T21261.stderr
+++ b/testsuite/tests/simplCore/should_compile/T21261.stderr
@@ -1,7 +1,7 @@
 
 ==================== Tidy Core ====================
 Result size of Tidy Core
-  = {terms: 182, types: 191, coercions: 0, joins: 0/0}
+  = {terms: 139, types: 130, coercions: 0, joins: 0/0}
 
 lvl = I# 3#
 
@@ -35,8 +35,7 @@ $wf7 = \ c -> (# case c lvl2 of { __DEFAULT -> c lvl lvl1 } #)
 
 f7 = \ c -> case $wf7 c of { (# ww #) -> Just ww }
 
-no_tricky_lazy
-  = \ c -> case $wf7 (\ x y -> c x y) of { (# ww #) -> Just ww }
+no_tricky_lazy = \ c -> f7 (\ x y -> c x y)
 
 $wf5
   = \ c ->
diff --git a/testsuite/tests/simplCore/should_compile/T21286.hs b/testsuite/tests/simplCore/should_compile/T21286.hs
new file mode 100644
index 0000000000..79b9db76d9
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21286.hs
@@ -0,0 +1,6 @@
+{-# OPTIONS_GHC -fspecialise-aggressively #-}
+
+module T21286 where
+import T21286a
+
+x = f (3::Int)
diff --git a/testsuite/tests/simplCore/should_compile/T21286.stderr b/testsuite/tests/simplCore/should_compile/T21286.stderr
new file mode 100644
index 0000000000..f2901c5b5e
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21286.stderr
@@ -0,0 +1,16 @@
+[1 of 2] Compiling T21286a          ( T21286a.hs, T21286a.o )
+[2 of 2] Compiling T21286           ( T21286.hs, T21286.o )
+Rule fired: Class op + (BUILTIN)
+Rule fired: Class op fromInteger (BUILTIN)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Class op fromInteger (BUILTIN)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: SPEC/T21286 g @Int (T21286)
+Rule fired: SPEC/T21286 g @Int (T21286)
+Rule fired: ==# (BUILTIN)
+Rule fired: tagToEnum# (BUILTIN)
+Rule fired: tagToEnum# (BUILTIN)
diff --git a/testsuite/tests/simplCore/should_compile/T21286a.hs b/testsuite/tests/simplCore/should_compile/T21286a.hs
new file mode 100644
index 0000000000..58f6db9b94
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21286a.hs
@@ -0,0 +1,10 @@
+{-# OPTIONS_GHC -fexpose-all-unfoldings #-}
+
+module T21286a( f ) where
+
+f :: (Eq a, Num a) => a -> a
+f x = g 20 x + 1
+
+g :: (Eq a, Num a) => a -> a -> a
+g n x | n + 1 == 0 = 0
+      | otherwise  = x + g (n-1) x
diff --git a/testsuite/tests/simplCore/should_compile/all.T b/testsuite/tests/simplCore/should_compile/all.T
index 5091485681..2da9a99ca1 100644
--- a/testsuite/tests/simplCore/should_compile/all.T
+++ b/testsuite/tests/simplCore/should_compile/all.T
@@ -428,3 +428,5 @@ test('T21763', only_ways(['optasm']), compile, ['-O2 -ddump-rules'])
 test('T21763a', only_ways(['optasm']), compile, ['-O2 -ddump-rules'])
 test('T22028', normal, compile, ['-O -ddump-rule-firings'])
 test('T22114', normal, compile, ['-O'])
+test('T21286',  normal, multimod_compile, ['T21286', '-O -ddump-rule-firings'])
+
diff --git a/testsuite/tests/stranal/sigs/T5075.stderr b/testsuite/tests/stranal/sigs/T5075.stderr
index 5b3701ef56..4cebcf85ae 100644
--- a/testsuite/tests/stranal/sigs/T5075.stderr
+++ b/testsuite/tests/stranal/sigs/T5075.stderr
@@ -1,6 +1,6 @@
 
 ==================== Strictness signatures ====================
-T5075.f: <S!P(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
+T5075.f: <SP(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
 T5075.g: <1L><S!P(L)>
 T5075.h: <S!P(L)>
 
@@ -14,7 +14,7 @@ T5075.h:
 
 
 ==================== Strictness signatures ====================
-T5075.f: <1P(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
+T5075.f: <SP(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
 T5075.g: <1L><S!P(L)>
 T5075.h: <1!P(L)>
author	Simon Peyton Jones <simonpj@microsoft.com>	2022-07-27 14:59:55 +0100
committer	Marge Bot <ben+marge-bot@smart-cactus.org>	2022-09-28 17:49:09 -0400
commit	2a53ac1877bbd29de432c0aca442904e9da96c4e (patch)
tree	e5b0ef5342d20003905933ca67cd59737a143d64
parent	b74b6191d7c442dffdfc9a9e2a6d476d7b3a28f2 (diff)
download	haskell-2a53ac1877bbd29de432c0aca442904e9da96c4e.tar.gz