summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSimon Peyton Jones <simonpj@microsoft.com>2022-07-27 14:59:55 +0100
committerMarge Bot <ben+marge-bot@smart-cactus.org>2022-09-28 17:49:09 -0400
commit2a53ac1877bbd29de432c0aca442904e9da96c4e (patch)
treee5b0ef5342d20003905933ca67cd59737a143d64
parentb74b6191d7c442dffdfc9a9e2a6d476d7b3a28f2 (diff)
downloadhaskell-2a53ac1877bbd29de432c0aca442904e9da96c4e.tar.gz
Improve aggressive specialisation
This patch fixes #21286, by not unboxing dictionaries in worker/wrapper (ever). The main payload is tiny: * In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox dictionaries in `get_dmd`. See Note [Do not unbox class dictionaries] in that module * I also found that imported wrappers were being fruitlessly specialised, so I fixed that too, in canSpecImport. See Note [Specialising imported functions] point (2). In doing due diligence in the testsuite I fixed a number of other things: * Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make, and Note [Inline specialisations] in GHC.Core.Opt.Specialise, and remove duplication between the two. The new Note describes how we specialise functions with an INLINABLE pragma. And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`. * Improve Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap. And (critially) make an actual change which is to propagate the user-written pragma from the original function to the wrapper; see `mkStrWrapperInlinePrag`. * Write new Note [Specialising imported functions] in GHC.Core.Opt.Specialise All this has a big effect on some compile times. This is compiler/perf, showing only changes over 1%: Metrics: compile_time/bytes allocated ------------------------------------- LargeRecord(normal) -50.2% GOOD ManyConstructors(normal) +1.0% MultiLayerModulesTH_OneShot(normal) +2.6% PmSeriesG(normal) -1.1% T10547(normal) -1.2% T11195(normal) -1.2% T11276(normal) -1.0% T11303b(normal) -1.6% T11545(normal) -1.4% T11822(normal) -1.3% T12150(optasm) -1.0% T12234(optasm) -1.2% T13056(optasm) -9.3% GOOD T13253(normal) -3.8% GOOD T15164(normal) -3.6% GOOD T16190(normal) -2.1% T16577(normal) -2.8% GOOD T16875(normal) -1.6% T17836(normal) +2.2% T17977b(normal) -1.0% T18223(normal) -33.3% GOOD T18282(normal) -3.4% GOOD T18304(normal) -1.4% T18698a(normal) -1.4% GOOD T18698b(normal) -1.3% GOOD T19695(normal) -2.5% GOOD T5837(normal) -2.3% T9630(normal) -33.0% GOOD WWRec(normal) -9.7% GOOD hard_hole_fits(normal) -2.1% GOOD hie002(normal) +1.6% geo. mean -2.2% minimum -50.2% maximum +2.6% I diligently investigated some of the big drops. * Caused by not doing w/w for dictionaries: T13056, T15164, WWRec, T18223 * Caused by not fruitlessly specialising wrappers LargeRecord, T9630 For runtimes, here is perf/should+_run: Metrics: runtime/bytes allocated -------------------------------- T12990(normal) -3.8% T5205(normal) -1.3% T9203(normal) -10.7% GOOD haddock.Cabal(normal) +0.1% haddock.base(normal) -1.1% haddock.compiler(normal) -0.3% lazy-bs-alloc(normal) -0.2% ------------------------------------------ geo. mean -0.3% minimum -10.7% maximum +0.1% I did not investigate exactly what happens in T9203. Nofib is a wash: +-------------------------------++--+-----------+-----------+ | || | tsv (rel) | std. err. | +===============================++==+===========+===========+ | real/anna || | -0.13% | 0.0% | | real/fem || | +0.13% | 0.0% | | real/fulsom || | -0.16% | 0.0% | | real/lift || | -1.55% | 0.0% | | real/reptile || | -0.11% | 0.0% | | real/smallpt || | +0.51% | 0.0% | | spectral/constraints || | +0.20% | 0.0% | | spectral/dom-lt || | +1.80% | 0.0% | | spectral/expert || | +0.33% | 0.0% | +===============================++==+===========+===========+ | geom mean || | | | +-------------------------------++--+-----------+-----------+ I spent quite some time investigating dom-lt, but it's pretty complicated. See my note on !7847. Conclusion: it's just a delicate inlining interaction, and we have plenty of those. Metric Decrease: LargeRecord T13056 T13253 T15164 T16577 T18223 T18282 T18698a T18698b T19695 T9630 WWRec hard_hole_fits T9203
-rw-r--r--compiler/GHC/Core/Opt/DmdAnal.hs54
-rw-r--r--compiler/GHC/Core/Opt/Simplify/Iteration.hs31
-rw-r--r--compiler/GHC/Core/Opt/Specialise.hs137
-rw-r--r--compiler/GHC/Core/Opt/WorkWrap.hs63
-rw-r--r--compiler/GHC/Core/Unfold/Make.hs49
-rw-r--r--testsuite/tests/arityanal/should_compile/Arity05.stderr25
-rw-r--r--testsuite/tests/arityanal/should_compile/Arity11.stderr25
-rw-r--r--testsuite/tests/arityanal/should_compile/Arity14.stderr23
-rw-r--r--testsuite/tests/indexed-types/should_compile/T7837.stderr4
-rw-r--r--testsuite/tests/perf/compiler/LargeRecord.hs43
-rw-r--r--testsuite/tests/perf/compiler/T18223.hs14
-rw-r--r--testsuite/tests/simplCore/should_compile/T21261.stderr5
-rw-r--r--testsuite/tests/simplCore/should_compile/T21286.hs6
-rw-r--r--testsuite/tests/simplCore/should_compile/T21286.stderr16
-rw-r--r--testsuite/tests/simplCore/should_compile/T21286a.hs10
-rw-r--r--testsuite/tests/simplCore/should_compile/all.T2
-rw-r--r--testsuite/tests/stranal/sigs/T5075.stderr4
17 files changed, 285 insertions, 226 deletions
diff --git a/compiler/GHC/Core/Opt/DmdAnal.hs b/compiler/GHC/Core/Opt/DmdAnal.hs
index fe98075487..86775592bb 100644
--- a/compiler/GHC/Core/Opt/DmdAnal.hs
+++ b/compiler/GHC/Core/Opt/DmdAnal.hs
@@ -1636,30 +1636,32 @@ the case on `x` up through the case on `burble`.
Note [Do not unbox class dictionaries]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If we have
- f :: Ord a => [a] -> Int -> a
- {-# INLINABLE f #-}
-and we worker/wrapper f, we'll get a worker with an INLINABLE pragma
-(see Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap),
-which can still be specialised by the type-class specialiser, something like
- fw :: Ord a => [a] -> Int# -> a
-
-BUT if f is strict in the Ord dictionary, we might unpack it, to get
- fw :: (a->a->Bool) -> [a] -> Int# -> a
-and the type-class specialiser can't specialise that. An example is #6056.
-
-But in any other situation, a dictionary is just an ordinary value,
-and can be unpacked. So we track the INLINABLE pragma, and discard the boxity
-flag in finaliseArgBoxities (see the isClassPred test).
-
-Historical note: #14955 describes how I got this fix wrong the first time.
-
-Note that the simplicity of this fix implies that INLINE functions (such as
-wrapper functions after the WW run) will never say that they unbox class
-dictionaries. That's not ideal, but not worth losing sleep over, as INLINE
-functions will have been inlined by the time we run demand analysis so we'll
-see the unboxing around the worker in client modules. I got aware of the issue
-in T5075 by the change in boxity of loop between demand analysis runs.
+We never unbox class dictionaries in worker/wrapper.
+
+1. INLINABLE functions
+ If we have
+ f :: Ord a => [a] -> Int -> a
+ {-# INLINABLE f #-}
+ and we worker/wrapper f, we'll get a worker with an INLINABLE pragma
+ (see Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap),
+ which can still be specialised by the type-class specialiser, something like
+ fw :: Ord a => [a] -> Int# -> a
+
+ BUT if f is strict in the Ord dictionary, we might unpack it, to get
+ fw :: (a->a->Bool) -> [a] -> Int# -> a
+ and the type-class specialiser can't specialise that. An example is #6056.
+
+ Historical note: #14955 describes how I got this fix wrong the first time.
+ I got aware of the issue in T5075 by the change in boxity of loop between
+ demand analysis runs.
+
+2. -fspecialise-aggressively. As #21286 shows, the same phenomenon can occur
+ occur without INLINABLE, when we use -fexpose-all-unfoldings and
+ -fspecialise-aggressively to do vigorous cross-module specialisation.
+
+TL;DR we /never/ unbox class dictionaries. Unboxing the dictionary, and passing
+a raft of higher-order functions isn't a huge win anyway -- you really want to
+specialise the function.
Note [Worker argument budget]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1725,7 +1727,6 @@ finaliseArgBoxities env fn arity rhs div
-- uses the info on the binders directly.
where
opts = ae_opts env
- is_inlinable_fn = isStableUnfolding (realIdUnfolding fn)
(bndrs, _body) = collectBinders rhs
max_wkr_args = dmd_max_worker_args opts `max` arity
-- See Note [Worker argument budget]
@@ -1743,8 +1744,7 @@ finaliseArgBoxities env fn arity rhs div
get_dmd :: Id -> Type -> Demand
get_dmd bndr bndr_ty
- | isClassPred bndr_ty
- , is_inlinable_fn = trimBoxity dmd
+ | isClassPred bndr_ty = trimBoxity dmd
-- See Note [Do not unbox class dictionaries]
-- NB: 'ty' has not been normalised, so this will (rightly)
-- catch newtype dictionaries too.
diff --git a/compiler/GHC/Core/Opt/Simplify/Iteration.hs b/compiler/GHC/Core/Opt/Simplify/Iteration.hs
index cfee2e5e56..afee252a40 100644
--- a/compiler/GHC/Core/Opt/Simplify/Iteration.hs
+++ b/compiler/GHC/Core/Opt/Simplify/Iteration.hs
@@ -668,9 +668,9 @@ tryCastWorkerWrapper env _ _ _ bndr rhs -- All other bindings
mkCastWrapperInlinePrag :: InlinePragma -> InlinePragma
-- See Note [Cast worker/wrapper]
-mkCastWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info })
+mkCastWrapperInlinePrag (InlinePragma { inl_inline = fn_inl, inl_act = fn_act, inl_rule = rule_info })
= InlinePragma { inl_src = SourceText "{-# INLINE"
- , inl_inline = NoUserInlinePrag -- See Note [Wrapper NoUserInlinePrag]
+ , inl_inline = fn_inl -- See Note [Worker/wrapper for INLINABLE functions]
, inl_sat = Nothing -- in GHC.Core.Opt.WorkWrap
, inl_act = wrap_act -- See Note [Wrapper activation]
, inl_rule = rule_info } -- in GHC.Core.Opt.WorkWrap
@@ -678,8 +678,8 @@ mkCastWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info })
where
-- See Note [Wrapper activation] in GHC.Core.Opt.WorkWrap
-- But simpler, because we don't need to disable during InitialPhase
- wrap_act | isNeverActive act = activateDuringFinal
- | otherwise = act
+ wrap_act | isNeverActive fn_act = activateDuringFinal
+ | otherwise = fn_act
{- *********************************************************************
@@ -4161,23 +4161,14 @@ simplStableUnfolding env bind_cxt id rhs_ty id_arity unf
; expr' <- simplExprC unf_env expr cont
; return (eta_expand expr') }
; case guide of
- UnfWhen { ug_arity = arity
- , ug_unsat_ok = sat_ok
- , ug_boring_ok = boring_ok
- }
- -- Happens for INLINE things
- -- Really important to force new_boring_ok as otherwise
- -- `ug_boring_ok` is a thunk chain of
- -- inlineBoringExprOk expr0
- -- || inlineBoringExprOk expr1 || ...
- -- See #20134
+ UnfWhen { ug_boring_ok = boring_ok }
+ -- Happens for INLINE things
+ -- Really important to force new_boring_ok since otherwise
+ -- `ug_boring_ok` is a thunk chain of
+ -- inlineBoringExprOk expr0 || inlineBoringExprOk expr1 || ...
+ -- See #20134
-> let !new_boring_ok = boring_ok || inlineBoringOk expr'
- guide' =
- UnfWhen { ug_arity = arity
- , ug_unsat_ok = sat_ok
- , ug_boring_ok = new_boring_ok
-
- }
+ guide' = guide { ug_boring_ok = new_boring_ok }
-- Refresh the boring-ok flag, in case expr'
-- has got small. This happens, notably in the inlinings
-- for dfuns for single-method classes; see
diff --git a/compiler/GHC/Core/Opt/Specialise.hs b/compiler/GHC/Core/Opt/Specialise.hs
index 4578f2681d..6708922495 100644
--- a/compiler/GHC/Core/Opt/Specialise.hs
+++ b/compiler/GHC/Core/Opt/Specialise.hs
@@ -647,6 +647,37 @@ See #10491
* *
********************************************************************* -}
+{- Note [Specialising imported functions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+specImports specialises imported functions, based on calls in this module.
+
+When -fspecialise-aggressively is on, we specialise any imported
+function for which we have an unfolding. The
+-fspecialise-aggressively flag is usually off, because we risk lots of
+orphan modules from over-vigorous specialisation. (See Note [Orphans]
+in GHC.Core.) However it's not a big deal: anything non-recursive with
+an unfolding-template will probably have been inlined already.
+
+When -fspecialise-aggressively is off, we are more selective about
+specialisation (see canSpecImport):
+
+(1) Without -fspecialise-aggressively, do not specialise
+ DFunUnfoldings. Note [Do not specialise imported DFuns].
+
+(2) Without -fspecialise-aggressively, specialise only imported things
+ that have a /user-supplied/ INLINE or INLINABLE pragma (hence
+ isAnyInlinePragma rather than isStableSource).
+
+ In particular, we don't want to specialise workers created by
+ worker/wrapper (for functions with no pragma) because they won't
+ specialise usefully, and they generate quite a bit of useless code
+ bloat.
+
+ Specialise even INLINE things; it hasn't inlined yet, so perhaps
+ it never will. Moreover it may have calls inside it that we want
+ to specialise
+-}
+
specImports :: SpecEnv
-> [CoreRule]
-> UsageDetails
@@ -768,23 +799,17 @@ spec_import top_env callers rb dict_binds cis@(CIS fn _)
-- See Note [Avoiding loops in specImports]
canSpecImport :: DynFlags -> Id -> Maybe CoreExpr
--- See Note [Specialise imported INLINABLE things]
canSpecImport dflags fn
- | CoreUnfolding { uf_src = src, uf_tmpl = rhs } <- unf
- , isStableSource src
- = Just rhs -- By default, specialise only imported things that have a stable
- -- unfolding; that is, have an INLINE or INLINABLE pragma
- -- Specialise even INLINE things; it hasn't inlined yet,
- -- so perhaps it never will. Moreover it may have calls
- -- inside it that we want to specialise
-
- -- CoreUnfolding case does /not/ include DFunUnfoldings;
- -- We only specialise DFunUnfoldings with -fspecialise-aggressively
- -- See Note [Do not specialise imported DFuns]
+ | CoreUnfolding { uf_tmpl = rhs } <- unf
+ -- See Note [Specialising imported functions] point (1).
+ , isAnyInlinePragma (idInlinePragma fn)
+ -- See Note [Specialising imported functions] point (2).
+ = Just rhs
| gopt Opt_SpecialiseAggressively dflags
- = maybeUnfoldingTemplate unf -- With -fspecialise-aggressively, specialise anything
- -- with an unfolding, stable or not, DFun or not
+ = maybeUnfoldingTemplate unf
+ -- With -fspecialise-aggressively, specialise anything
+ -- with an unfolding, stable or not, DFun or not
| otherwise = Nothing
where
@@ -1021,20 +1046,6 @@ And if the call is to the same type, one specialisation is enough.
Avoiding this recursive specialisation loop is one reason for the
'callers' stack passed to specImports and specImport.
-Note [Specialise imported INLINABLE things]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-What imported functions do we specialise? The basic set is
- * DFuns and things with INLINABLE pragmas.
-but with -fspecialise-aggressively we add
- * Anything with an unfolding template
-
-#8874 has a good example of why we want to auto-specialise DFuns.
-
-We have the -fspecialise-aggressively flag (usually off), because we
-risk lots of orphan modules from over-vigorous specialisation.
-However it's not a big deal: anything non-recursive with an
-unfolding-template will probably have been inlined already.
-
Note [Glom the bindings if imported functions are specialised]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suppose we have an imported, *recursive*, INLINABLE function
@@ -1604,19 +1615,16 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
simpl_opts = initSimpleOpts dflags
--------------------------------------
- -- Add a suitable unfolding if the spec_inl_prag says so
- -- See Note [Inline specialisations]
- (spec_inl_prag, spec_unf)
- | not is_local && isStrongLoopBreaker (idOccInfo fn)
- = (neverInlinePragma, noUnfolding)
- -- See Note [Specialising imported functions] in "GHC.Core.Opt.OccurAnal"
-
- | isInlinablePragma inl_prag
- = (inl_prag { inl_inline = NoUserInlinePrag }, noUnfolding)
-
+ -- Add a suitable unfolding; see Note [Inline specialisations]
+ spec_unf = specUnfolding simpl_opts spec_bndrs (`mkApps` spec_args)
+ rule_lhs_args fn_unf
+
+ spec_inl_prag
+ | not is_local -- See Note [Specialising imported functions]
+ , isStrongLoopBreaker (idOccInfo fn) -- in GHC.Core.Opt.OccurAnal
+ = neverInlinePragma
| otherwise
- = (inl_prag, specUnfolding simpl_opts spec_bndrs (`mkApps` spec_args)
- rule_lhs_args fn_unf)
+ = inl_prag
--------------------------------------
-- Adding arity information just propagates it a bit faster
@@ -2172,24 +2180,17 @@ So we suppress the WARN if the rhs is trivial.
Note [Inline specialisations]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is what we do with the InlinePragma of the original function
- * Activation/RuleMatchInfo: both transferred to the
- specialised function
- * InlineSpec:
- (a) An INLINE pragma is transferred
- (b) An INLINABLE pragma is *not* transferred
-
-Why (a): transfer INLINE pragmas? The point of INLINE was precisely to
-specialise the function at its call site, and arguably that's not so
-important for the specialised copies. BUT *pragma-directed*
-specialisation now takes place in the typechecker/desugarer, with
-manually specified INLINEs. The specialisation here is automatic.
-It'd be very odd if a function marked INLINE was specialised (because
-of some local use), and then forever after (including importing
-modules) the specialised version wasn't INLINEd. After all, the
-programmer said INLINE!
-
-You might wonder why we specialise INLINE functions at all. After
-all they should be inlined, right? Two reasons:
+
+ * Activation/RuleMatchInfo: both inherited from the original function
+
+ * InlineSpec: inherit from original function
+
+ * Unfolding: transfer a StableUnfolding iff it is UnfWhen
+ See GHC.Core.Unfold.Make.specUnfolding
+ and its Note [Specialising unfoldings]
+
+InlineSpec: you might wonder why we specialise INLINE functions at all.
+After all they should be inlined, right? Two reasons:
* Even INLINE functions are sometimes not inlined, when they aren't
applied to interesting arguments. But perhaps the type arguments
@@ -2215,26 +2216,6 @@ all they should be inlined, right? Two reasons:
(replicateM_ dMonadIO). We certainly want to specialise $wreplicateM_!
This particular example had a huge effect on the call to replicateM_
in nofib/shootout/n-body.
-
-Why (b): discard INLINABLE pragmas? See #4874 for persuasive examples.
-Suppose we have
- {-# INLINABLE f #-}
- f :: Ord a => [a] -> Int
- f xs = letrec f' = ...f'... in f'
-Then, when f is specialised and optimised we might get
- wgo :: [Int] -> Int#
- wgo = ...wgo...
- f_spec :: [Int] -> Int
- f_spec xs = case wgo xs of { r -> I# r }
-and we clearly want to inline f_spec at call sites. But if we still
-have the big, un-optimised of f (albeit specialised) captured in an
-INLINABLE pragma for f_spec, we won't get that optimisation.
-
-So we simply drop INLINABLE pragmas when specialising. It's not really
-a complete solution; ignoring specialisation for now, INLINABLE functions
-don't get properly strictness analysed, for example. But it works well
-for examples involving specialisation, which is the dominant use of
-INLINABLE. See #4874.
-}
{- *********************************************************************
diff --git a/compiler/GHC/Core/Opt/WorkWrap.hs b/compiler/GHC/Core/Opt/WorkWrap.hs
index 27d85d0545..fc1d9e2785 100644
--- a/compiler/GHC/Core/Opt/WorkWrap.hs
+++ b/compiler/GHC/Core/Opt/WorkWrap.hs
@@ -195,19 +195,47 @@ will not be specialised at call sites in other modules.
This comes up in practice (#6056).
-Solution: do the w/w for strictness analysis, but transfer the Stable
-unfolding to the *worker*. So we will get something like this:
+Solution:
- {-# INLINE[2] f #-}
+* Do the w/w for strictness analysis, even for INLINABLE functions
+
+* Transfer the Stable unfolding to the *worker*. How do we "transfer
+ the unfolding"? Easy: by using the old one, wrapped in work_fn! See
+ GHC.Core.Unfold.Make.mkWorkerUnfolding.
+
+* We use the /original, user-specified/ function's InlineSpec pragma
+ for both the wrapper and the worker (see `mkStrWrapperInlinePrag`).
+ So if f is INLINEABLE, both worker and wrapper will get an InlineSpec
+ of (Inlinable "blah").
+
+ It's important that both get this, because the specialiser uses
+ the existence of a /user-specified/ INLINE/INLINABLE pragma to
+ drive specialiation of imported functions. See GHC.Core.Opt.Specialise
+ Note [Specialising imported functions]
+
+* Remember, the subsequent inlining behaviour of the wrapper is expressed by
+ (a) the stable unfolding
+ (b) the unfolding guidance of UnfWhen
+ (c) the inl_act activation (see Note [Wrapper activation]
+
+For our {-# INLINEABLE f #-} example above, we will get something a
+bit like like this:
+
+ {-# Has stable unfolding, active in phase 2;
+ plus InlineSpec = INLINEABLE #-}
f :: Ord a => [a] -> Int -> a
f d x y = case y of I# y' -> fw d x y'
- {-# INLINABLE[2] fw #-}
+ {-# Has stable unfolding, plus InlineSpec = INLINEABLE #-}
fw :: Ord a => [a] -> Int# -> a
fw d x y' = let y = I# y' in ...f...
-How do we "transfer the unfolding"? Easy: by using the old one, wrapped
-in work_fn! See GHC.Core.Unfold.Make.mkWorkerUnfolding.
+
+(Historical note: we used to always give the wrapper an INLINE pragma,
+but CSE will not happen if there is a user-specified pragma, but
+should happen for w/w’ed things (#14186). But now we simply propagate
+any user-defined pragma info, so we'll defeat CSE (rightly) only when
+there is a user-supplied INLINE/INLINEABLE pragma.)
Note [No worker/wrapper for record selectors]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -496,19 +524,6 @@ fine. (To work reliably, `foo` would need an INLINABLE pragma,
in which case we don't unpack dictionaries for the worker; see
see Note [Do not unbox class dictionaries].)
-Note [Wrapper NoUserInlinePrag]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We use NoUserInlinePrag on the wrapper, to say that there is no
-user-specified inline pragma. (The worker inherits that; see Note
-[Worker/wrapper for INLINABLE functions].) The wrapper has no pragma
-given by the user.
-
-(Historical note: we used to give the wrapper an INLINE pragma, but
-CSE will not happen if there is a user-specified pragma, but should
-happen for w/w’ed things (#14186). We don't need a pragma, because
-everything we needs is expressed by (a) the stable unfolding and (b)
-the inl_act activation.)
-
Note [Drop absent bindings]
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider (#19824):
@@ -872,17 +887,19 @@ mkWWBindPair ww_opts fn_id fn_info fn_args fn_body work_uniq div
fn_rules = ruleInfoRules (ruleInfo fn_info)
mkStrWrapperInlinePrag :: InlinePragma -> [CoreRule] -> InlinePragma
--- See Note [Wrapper activation]
-mkStrWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info }) rules
+mkStrWrapperInlinePrag (InlinePragma { inl_inline = fn_inl
+ , inl_act = fn_act
+ , inl_rule = rule_info }) rules
= InlinePragma { inl_src = SourceText "{-# INLINE"
- , inl_inline = NoUserInlinePrag -- See Note [Wrapper NoUserInlinePrag]
, inl_sat = Nothing
+ , inl_inline = fn_inl -- See Note [Worker/wrapper for INLINABLE functions]
, inl_act = activeAfter wrapper_phase
+ -- See Note [Wrapper activation]
, inl_rule = rule_info } -- RuleMatchInfo is (and must be) unaffected
where
-- See Note [Wrapper activation]
wrapper_phase = foldr (laterPhase . get_rule_phase) earliest_inline_phase rules
- earliest_inline_phase = beginPhase act `laterPhase` nextPhase InitialPhase
+ earliest_inline_phase = beginPhase fn_act `laterPhase` nextPhase InitialPhase
-- laterPhase (nextPhase InitialPhase) is a temporary hack
-- to inline no earlier than phase 2. I got regressions in
-- 'mate', due to changes in full laziness due to Note [Case
diff --git a/compiler/GHC/Core/Unfold/Make.hs b/compiler/GHC/Core/Unfold/Make.hs
index ae20cb8a26..538af3db3d 100644
--- a/compiler/GHC/Core/Unfold/Make.hs
+++ b/compiler/GHC/Core/Unfold/Make.hs
@@ -193,25 +193,40 @@ specUnfolding to specialise its unfolding. Some important points:
must do so too! Otherwise we lose the magic rules that make it
interact with ClassOps
-* There is a bit of hack for INLINABLE functions:
- f :: Ord a => ....
- f = <big-rhs>
- {- INLINABLE f #-}
- Now if we specialise f, should the specialised version still have
- an INLINABLE pragma? If it does, we'll capture a specialised copy
- of <big-rhs> as its unfolding, and that probably won't inline. But
- if we don't, the specialised version of <big-rhs> might be small
- enough to inline at a call site. This happens with Control.Monad.liftM3,
- and can cause a lot more allocation as a result (nofib n-body shows this).
-
- Moreover, keeping the INLINABLE thing isn't much help, because
+* For a /stable/ CoreUnfolding, we specialise the unfolding, no matter
+ how big, iff it has UnfWhen guidance. This happens for INLINE
+ functions, and for wrappers. For these, it would be very odd if a
+ function marked INLINE was specialised (because of some local use),
+ and then forever after (including importing modules) the specialised
+ version wasn't INLINEd! After all, the programmer said INLINE.
+
+* However, for a stable CoreUnfolding with guidance UnfoldIfGoodArgs,
+ which arises from INLINABLE functions, we drop the unfolding.
+ See #4874 for persuasive examples. Suppose we have
+ {-# INLINABLE f #-}
+ f :: Ord a => [a] -> Int f xs = letrec f' = ...f'... in f'
+
+ Then, when f is specialised and optimised we might get
+ wgo :: [Int] -> Int#
+ wgo = ...wgo...
+ f_spec :: [Int] -> Int
+ f_spec xs = case wgo xs of { r -> I# r }
+
+ and we clearly want to inline f_spec at call sites. But if we still
+ have the big, un-optimised of f (albeit specialised) captured in the
+ stable unfolding for f_spec, we won't get that optimisation.
+
+ This happens with Control.Monad.liftM3, and can cause a lot more
+ allocation as a result (nofib n-body shows this).
+
+ Moreover, keeping the stable unfoldign isn't much help, because
the specialised function (probably) isn't overloaded any more.
- Conclusion: drop the INLINEABLE pragma. In practice what this means is:
- if a stable unfolding has UnfoldingGuidance of UnfWhen,
- we keep it (so the specialised thing too will always inline)
- if a stable unfolding has UnfoldingGuidance of UnfIfGoodArgs
- (which arises from INLINABLE), we discard it
+ TL;DR: we simply drop the stable unfolding when specialising. It's
+ not really a complete solution; ignoring specialisation for now,
+ INLINABLE functions don't get properly strictness analysed, for
+ example. But it works well for examples involving specialisation,
+ which is the dominant use of INLINABLE.
Note [Honour INLINE on 0-ary bindings]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/testsuite/tests/arityanal/should_compile/Arity05.stderr b/testsuite/tests/arityanal/should_compile/Arity05.stderr
index 91c909ecc6..17a0fb668a 100644
--- a/testsuite/tests/arityanal/should_compile/Arity05.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity05.stderr
@@ -1,6 +1,6 @@
==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 56, types: 87, coercions: 0, joins: 0/0}
+Result size of Tidy Core = {terms: 42, types: 44, coercions: 0, joins: 0/0}
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
F5.f5g1 :: Integer
@@ -9,26 +9,13 @@ F5.f5g1 = GHC.Num.Integer.IS 1#
-- RHS size: {terms: 12, types: 9, coercions: 0, joins: 0/0}
f5g :: forall {a} {t}. Num a => (t -> a) -> t -> a
-[GblId,
- Arity=3,
- Str=<SP(1C1(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)
- Tmpl= \ (@a) (@t) ($dNum :: Num a) (h [Occ=Once1!] :: t -> a) (z [Occ=Once1] :: t) -> + @a $dNum (h z) (fromInteger @a $dNum F5.f5g1)}]
+[GblId, Arity=3, Str=<SP(1C1(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 60 0] 90 0}]
f5g = \ (@a) (@t) ($dNum :: Num a) (h :: t -> a) (z :: t) -> + @a $dNum (h z) (fromInteger @a $dNum F5.f5g1)
--- RHS size: {terms: 15, types: 12, coercions: 0, joins: 0/0}
-F5.$wf5h [InlPrag=[2]] :: forall {a} {t}. (a -> a -> a) -> (Integer -> a) -> (t -> a) -> t -> (t -> a) -> a
-[GblId, Arity=5, Str=<SCS(C1(L))><MC1(L)><MC1(L)><L><MC1(L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 60 60 0 60] 120 0}]
-F5.$wf5h = \ (@a) (@t) (ww :: a -> a -> a) (ww1 :: Integer -> a) (f :: t -> a) (x :: t) (g :: t -> a) -> ww (f x) (ww (g x) (ww1 F5.f5g1))
-
--- RHS size: {terms: 15, types: 30, coercions: 0, joins: 0/0}
-f5h [InlPrag=[2]] :: forall {a} {t}. Num a => (t -> a) -> t -> (t -> a) -> a
-[GblId,
- Arity=4,
- Str=<1P(SCS(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L><MC1(L)>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=4,unsat_ok=True,boring_ok=False)
- Tmpl= \ (@a) (@t) ($dNum [Occ=Once1!] :: Num a) (f [Occ=Once1] :: t -> a) (x [Occ=Once1] :: t) (g [Occ=Once1] :: t -> a) -> case $dNum of { GHC.Num.C:Num ww [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] ww6 [Occ=Once1] -> F5.$wf5h @a @t ww ww6 f x g }}]
-f5h = \ (@a) (@t) ($dNum :: Num a) (f :: t -> a) (x :: t) (g :: t -> a) -> case $dNum of { GHC.Num.C:Num ww ww1 ww2 ww3 ww4 ww5 ww6 -> F5.$wf5h @a @t ww ww6 f x g }
+-- RHS size: {terms: 17, types: 12, coercions: 0, joins: 0/0}
+f5h :: forall {a} {t}. Num a => (t -> a) -> t -> (t -> a) -> a
+[GblId, Arity=4, Str=<SP(SCS(C1(L)),A,A,A,A,A,MC1(L))><MC1(L)><L><MC1(L)>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [90 60 0 60] 150 0}]
+f5h = \ (@a) (@t) ($dNum :: Num a) (f :: t -> a) (x :: t) (g :: t -> a) -> + @a $dNum (f x) (+ @a $dNum (g x) (fromInteger @a $dNum F5.f5g1))
-- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
f5y :: Integer -> Integer
diff --git a/testsuite/tests/arityanal/should_compile/Arity11.stderr b/testsuite/tests/arityanal/should_compile/Arity11.stderr
index 6144198ac4..82b162e531 100644
--- a/testsuite/tests/arityanal/should_compile/Arity11.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity11.stderr
@@ -1,6 +1,6 @@
==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 149, types: 104, coercions: 0, joins: 2/7}
+Result size of Tidy Core = {terms: 136, types: 75, coercions: 0, joins: 2/7}
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
F11.fib3 :: Integer
@@ -51,11 +51,11 @@ F11.fib1 :: Integer
[GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
F11.fib1 = GHC.Num.Integer.IS 0#
--- RHS size: {terms: 52, types: 26, coercions: 0, joins: 0/5}
-F11.$wfib [InlPrag=[2]] :: forall {t} {a}. (t -> t -> Bool) -> (Num t, Num a) => t -> a
-[GblId, Arity=4, Str=<SCS(C1(L))><LP(A,LCL(C1(L)),A,A,A,A,L)><LP(LCS(C1(L)),A,A,A,A,A,MC1(L))><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 150 60 0] 460 0}]
-F11.$wfib
- = \ (@t) (@a) (ww :: t -> t -> Bool) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) ->
+-- RHS size: {terms: 54, types: 27, coercions: 0, joins: 0/5}
+fib :: forall {t} {a}. (Eq t, Num t, Num a) => t -> a
+[GblId, Arity=4, Str=<SP(SCS(C1(L)),A)><LP(A,LCL(C1(L)),A,A,A,A,L)><LP(LCS(C1(L)),A,A,A,A,A,MC1(L))><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 150 60 0] 480 0}]
+fib
+ = \ (@t) (@a) ($dEq :: Eq t) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) ->
let {
lvl :: t
[LclId]
@@ -77,9 +77,9 @@ F11.$wfib
[LclId, Arity=1, Str=<L>, Unf=OtherCon []]
fib4
= \ (ds :: t) ->
- case ww ds lvl3 of {
+ case == @t $dEq ds lvl3 of {
False ->
- case ww ds lvl of {
+ case == @t $dEq ds lvl of {
False -> + @a $dNum1 (fib4 (- @t $dNum ds lvl)) (fib4 (- @t $dNum ds lvl1));
True -> lvl2
};
@@ -87,15 +87,6 @@ F11.$wfib
}; } in
fib4 eta
--- RHS size: {terms: 14, types: 19, coercions: 0, joins: 0/0}
-fib [InlPrag=[2]] :: forall {t} {a}. (Eq t, Num t, Num a) => t -> a
-[GblId,
- Arity=4,
- Str=<1P(SCS(C1(L)),A)><LP(A,LCL(C1(L)),A,A,A,A,L)><LP(LCS(C1(L)),A,A,A,A,A,LCS(L))><L>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=4,unsat_ok=True,boring_ok=False)
- Tmpl= \ (@t) (@a) ($dEq [Occ=Once1!] :: Eq t) ($dNum [Occ=Once1] :: Num t) ($dNum1 [Occ=Once1] :: Num a) (eta [Occ=Once1] :: t) -> case $dEq of { GHC.Classes.C:Eq ww [Occ=Once1] _ [Occ=Dead] -> F11.$wfib @t @a ww $dNum $dNum1 eta }}]
-fib = \ (@t) (@a) ($dEq :: Eq t) ($dNum :: Num t) ($dNum1 :: Num a) (eta :: t) -> case $dEq of { GHC.Classes.C:Eq ww ww1 -> F11.$wfib @t @a ww $dNum $dNum1 eta }
-
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
F11.f3 :: Integer
[GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
diff --git a/testsuite/tests/arityanal/should_compile/Arity14.stderr b/testsuite/tests/arityanal/should_compile/Arity14.stderr
index 966106bdbc..6fccde58a1 100644
--- a/testsuite/tests/arityanal/should_compile/Arity14.stderr
+++ b/testsuite/tests/arityanal/should_compile/Arity14.stderr
@@ -1,6 +1,6 @@
==================== Tidy Core ====================
-Result size of Tidy Core = {terms: 57, types: 81, coercions: 0, joins: 0/3}
+Result size of Tidy Core = {terms: 44, types: 38, coercions: 0, joins: 0/3}
-- RHS size: {terms: 3, types: 2, coercions: 0, joins: 0/0}
F14.f1 :: forall {t}. t -> t
@@ -12,11 +12,11 @@ F14.f2 :: Integer
[GblId, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
F14.f2 = GHC.Num.Integer.IS 1#
--- RHS size: {terms: 35, types: 23, coercions: 0, joins: 0/3}
-F14.$wf14 [InlPrag=[2]] :: forall {t}. (t -> t -> Bool) -> Num t => t -> t -> t -> t
-[GblId, Arity=4, Str=<SCS(C1(L))><LP(LCL(C1(L)),A,A,A,A,A,MC1(L))><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [60 90 0 0] 300 0}]
-F14.$wf14
- = \ (@t) (ww :: t -> t -> Bool) ($dNum :: Num t) (eta :: t) (eta1 :: t) ->
+-- RHS size: {terms: 36, types: 23, coercions: 0, joins: 0/3}
+f14 :: forall {t}. (Ord t, Num t) => t -> t -> t -> t
+[GblId, Arity=4, Str=<SP(A,A,SCS(C1(L)),A,A,A,A,A)><LP(LCL(C1(L)),A,A,A,A,A,MC1(L))><L><L>, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 90 0 0] 310 0}]
+f14
+ = \ (@t) ($dOrd :: Ord t) ($dNum :: Num t) (eta :: t) (eta1 :: t) ->
let {
lvl :: t
[LclId]
@@ -26,7 +26,7 @@ F14.$wf14
[LclId, Arity=2, Str=<L><L>, Unf=OtherCon []]
f3
= \ (n :: t) (x :: t) ->
- case ww x n of {
+ case < @t $dOrd x n of {
False -> F14.f1 @t;
True ->
let {
@@ -37,14 +37,5 @@ F14.$wf14
}; } in
f3 eta eta1
--- RHS size: {terms: 13, types: 33, coercions: 0, joins: 0/0}
-f14 [InlPrag=[2]] :: forall {t}. (Ord t, Num t) => t -> t -> t -> t
-[GblId,
- Arity=4,
- Str=<1P(A,A,SCS(C1(L)),A,A,A,A,A)><LP(LCL(C1(L)),A,A,A,A,A,LCS(L))><L><L>,
- Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=4,unsat_ok=True,boring_ok=False)
- Tmpl= \ (@t) ($dOrd [Occ=Once1!] :: Ord t) ($dNum [Occ=Once1] :: Num t) (eta [Occ=Once1] :: t) (eta1 [Occ=Once1] :: t) -> case $dOrd of { GHC.Classes.C:Ord _ [Occ=Dead] _ [Occ=Dead] ww2 [Occ=Once1] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] _ [Occ=Dead] -> F14.$wf14 @t ww2 $dNum eta eta1 }}]
-f14 = \ (@t) ($dOrd :: Ord t) ($dNum :: Num t) (eta :: t) (eta1 :: t) -> case $dOrd of { GHC.Classes.C:Ord ww ww1 ww2 ww3 ww4 ww5 ww6 ww7 -> F14.$wf14 @t ww2 $dNum eta eta1 }
-
diff --git a/testsuite/tests/indexed-types/should_compile/T7837.stderr b/testsuite/tests/indexed-types/should_compile/T7837.stderr
index dec839f72b..e2fcedcd2b 100644
--- a/testsuite/tests/indexed-types/should_compile/T7837.stderr
+++ b/testsuite/tests/indexed-types/should_compile/T7837.stderr
@@ -1,7 +1,3 @@
Rule fired: Class op signum (BUILTIN)
Rule fired: Class op abs (BUILTIN)
Rule fired: normalize/Double (T7837)
-Rule fired: Class op eq_sel (BUILTIN)
-Rule fired: Class op $p1Norm (BUILTIN)
-Rule fired: Class op / (BUILTIN)
-Rule fired: Class op norm (BUILTIN)
diff --git a/testsuite/tests/perf/compiler/LargeRecord.hs b/testsuite/tests/perf/compiler/LargeRecord.hs
index 11a35593ab..c31ef0ed08 100644
--- a/testsuite/tests/perf/compiler/LargeRecord.hs
+++ b/testsuite/tests/perf/compiler/LargeRecord.hs
@@ -9,6 +9,49 @@
{-# OPTIONS_GHC -freduction-depth=0 #-}
+{- Notes on LargeRecord
+~~~~~~~~~~~~~~~~~~~~~~~
+I noticed that in GHC of July 2022, when compiling this
+module I got lots of "SPEC" rules
+
+ SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f2"
+ @'["f1" := Int, "f2" := Int, "f3" := Int,
+ "f4" := Int]
+ @Int
+ @'["f2" := Int, "f3" := Int, "f4" := Int]
+ @'["f3" := Int, "f4" := Int]
+ $d(%,,%)_X1 $d(%,,%)1_X2 $dRecCopy_X3
+ SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f3"
+ @'["f1" := Int, "f2" := Int, "f3" := Int,
+ "f4" := Int]
+ @Int
+ @'["f2" := Int, "f3" := Int, "f4" := Int]
+ @'["f4" := Int]
+ $d(%,,%)_X1 $d(%,,%)1_s6yK $dRecCopy_X2
+
+ SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f3"
+ @'["f2" := Int, "f3" := Int, "f4" := Int]
+ @Int
+ @'["f3" := Int, "f4" := Int]
+ @'["f4" := Int]
+ $d(%,,%)_s6yr $d(%,,%)1_X1 $dRecCopy_X2
+ SuperRecord.$fRecCopy:ltsrts_$crecCopyInto @"f4"
+ @(SortInsert'
+ (GHC.TypeLits.Internal.CmpSymbol "f3" "f4")
+ ("f3" := Int)
+ ("f4" := Int)
+ '[])
+ @Int
+ @'["f4" := Int]
+ @'[]
+ $d(%,,%)_X1 $d(%,,%)1_X2 $dRecCopy_s6yb
+
+(This was with BigFieldList having only four elements.)
+
+The relevant function SuperRecord.$fRecCopy:ltsrts_$crecCopyInto is
+only a wrapper that we were specialising -- little or no benefit. We
+don't want to specialise wrappers! -}
+
module DCo_Record where
import SuperRecord
diff --git a/testsuite/tests/perf/compiler/T18223.hs b/testsuite/tests/perf/compiler/T18223.hs
index 3e160cc957..cb7374bc78 100644
--- a/testsuite/tests/perf/compiler/T18223.hs
+++ b/testsuite/tests/perf/compiler/T18223.hs
@@ -1,6 +1,20 @@
{-# LANGUAGE PartialTypeSignatures #-}
{-# LANGUAGE Strict #-}
+{- Notes on T18223
+~~~~~~~~~~~~~~~~~~
+If we inline
+ modify' :: MonadState s m => (s -> s) -> m ()
+early, before the specialiser, the casts all collapse immediately.
+It turns out that fixing #21286 causes this to happen, because
+we no longer w/w modify'.
+
+If we don't inline it before the specialiser we generate
+a specialised version of it. Then it gets inlined and all
+the casts collapse, but we end up keeping the code for the
+specialised version right through the pipeline.
+-}
+
import Control.Monad.State
tester :: MonadState a m => m ()
diff --git a/testsuite/tests/simplCore/should_compile/T21261.stderr b/testsuite/tests/simplCore/should_compile/T21261.stderr
index 2af8b37a5d..fadd73c219 100644
--- a/testsuite/tests/simplCore/should_compile/T21261.stderr
+++ b/testsuite/tests/simplCore/should_compile/T21261.stderr
@@ -1,7 +1,7 @@
==================== Tidy Core ====================
Result size of Tidy Core
- = {terms: 182, types: 191, coercions: 0, joins: 0/0}
+ = {terms: 139, types: 130, coercions: 0, joins: 0/0}
lvl = I# 3#
@@ -35,8 +35,7 @@ $wf7 = \ c -> (# case c lvl2 of { __DEFAULT -> c lvl lvl1 } #)
f7 = \ c -> case $wf7 c of { (# ww #) -> Just ww }
-no_tricky_lazy
- = \ c -> case $wf7 (\ x y -> c x y) of { (# ww #) -> Just ww }
+no_tricky_lazy = \ c -> f7 (\ x y -> c x y)
$wf5
= \ c ->
diff --git a/testsuite/tests/simplCore/should_compile/T21286.hs b/testsuite/tests/simplCore/should_compile/T21286.hs
new file mode 100644
index 0000000000..79b9db76d9
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21286.hs
@@ -0,0 +1,6 @@
+{-# OPTIONS_GHC -fspecialise-aggressively #-}
+
+module T21286 where
+import T21286a
+
+x = f (3::Int)
diff --git a/testsuite/tests/simplCore/should_compile/T21286.stderr b/testsuite/tests/simplCore/should_compile/T21286.stderr
new file mode 100644
index 0000000000..f2901c5b5e
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21286.stderr
@@ -0,0 +1,16 @@
+[1 of 2] Compiling T21286a ( T21286a.hs, T21286a.o )
+[2 of 2] Compiling T21286 ( T21286.hs, T21286.o )
+Rule fired: Class op + (BUILTIN)
+Rule fired: Class op fromInteger (BUILTIN)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Class op fromInteger (BUILTIN)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: Int# -> Integer -> Int# (GHC.Num.Integer)
+Rule fired: SPEC/T21286 g @Int (T21286)
+Rule fired: SPEC/T21286 g @Int (T21286)
+Rule fired: ==# (BUILTIN)
+Rule fired: tagToEnum# (BUILTIN)
+Rule fired: tagToEnum# (BUILTIN)
diff --git a/testsuite/tests/simplCore/should_compile/T21286a.hs b/testsuite/tests/simplCore/should_compile/T21286a.hs
new file mode 100644
index 0000000000..58f6db9b94
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T21286a.hs
@@ -0,0 +1,10 @@
+{-# OPTIONS_GHC -fexpose-all-unfoldings #-}
+
+module T21286a( f ) where
+
+f :: (Eq a, Num a) => a -> a
+f x = g 20 x + 1
+
+g :: (Eq a, Num a) => a -> a -> a
+g n x | n + 1 == 0 = 0
+ | otherwise = x + g (n-1) x
diff --git a/testsuite/tests/simplCore/should_compile/all.T b/testsuite/tests/simplCore/should_compile/all.T
index 5091485681..2da9a99ca1 100644
--- a/testsuite/tests/simplCore/should_compile/all.T
+++ b/testsuite/tests/simplCore/should_compile/all.T
@@ -428,3 +428,5 @@ test('T21763', only_ways(['optasm']), compile, ['-O2 -ddump-rules'])
test('T21763a', only_ways(['optasm']), compile, ['-O2 -ddump-rules'])
test('T22028', normal, compile, ['-O -ddump-rule-firings'])
test('T22114', normal, compile, ['-O'])
+test('T21286', normal, multimod_compile, ['T21286', '-O -ddump-rule-firings'])
+
diff --git a/testsuite/tests/stranal/sigs/T5075.stderr b/testsuite/tests/stranal/sigs/T5075.stderr
index 5b3701ef56..4cebcf85ae 100644
--- a/testsuite/tests/stranal/sigs/T5075.stderr
+++ b/testsuite/tests/stranal/sigs/T5075.stderr
@@ -1,6 +1,6 @@
==================== Strictness signatures ====================
-T5075.f: <S!P(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
+T5075.f: <SP(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
T5075.g: <1L><S!P(L)>
T5075.h: <S!P(L)>
@@ -14,7 +14,7 @@ T5075.h:
==================== Strictness signatures ====================
-T5075.f: <1P(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
+T5075.f: <SP(A,A,SCS(C1(L)),A,A,A,A,A)><LP(A,A,LCS(C1(L)),A,A,A,L)><L>
T5075.g: <1L><S!P(L)>
T5075.h: <1!P(L)>