delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Boxity: Don't update Boxity unless worker/wrapper follows (#21754)wip/T21754	Sebastian Graf	2022-09-30	3	-0/+18
\| \| \| \| \| \| \| \|	A small refactoring in our Core Opt pipeline and some new functions for transfering argument boxities from one signature to another to facilitate `Note [Don't change boxity without worker/wrapper]`. Fixes #21754.
*	Demand: Format Call SubDemands `Cn(sd)` as `C(n,sd)` (#22231)wip/T22231	Sebastian Graf	2022-09-29	17	-52/+52
\| \| \| \| \| \| \| \|	Justification in #22231. Short form: In a demand like `1C1(C1(L))` it was too easy to confuse which `1` belongs to which `C`. Now that should be more obvious. Fixes #22231
*	Improve aggressive specialisation	Simon Peyton Jones	2022-09-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes #21286, by not unboxing dictionaries in worker/wrapper (ever). The main payload is tiny: * In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox dictionaries in `get_dmd`. See Note [Do not unbox class dictionaries] in that module * I also found that imported wrappers were being fruitlessly specialised, so I fixed that too, in canSpecImport. See Note [Specialising imported functions] point (2). In doing due diligence in the testsuite I fixed a number of other things: * Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make, and Note [Inline specialisations] in GHC.Core.Opt.Specialise, and remove duplication between the two. The new Note describes how we specialise functions with an INLINABLE pragma. And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`. * Improve Note [Worker/wrapper for INLINABLE functions] in GHC.Core.Opt.WorkWrap. And (critially) make an actual change which is to propagate the user-written pragma from the original function to the wrapper; see `mkStrWrapperInlinePrag`. * Write new Note [Specialising imported functions] in GHC.Core.Opt.Specialise All this has a big effect on some compile times. This is compiler/perf, showing only changes over 1%: Metrics: compile_time/bytes allocated ------------------------------------- LargeRecord(normal) -50.2% GOOD ManyConstructors(normal) +1.0% MultiLayerModulesTH_OneShot(normal) +2.6% PmSeriesG(normal) -1.1% T10547(normal) -1.2% T11195(normal) -1.2% T11276(normal) -1.0% T11303b(normal) -1.6% T11545(normal) -1.4% T11822(normal) -1.3% T12150(optasm) -1.0% T12234(optasm) -1.2% T13056(optasm) -9.3% GOOD T13253(normal) -3.8% GOOD T15164(normal) -3.6% GOOD T16190(normal) -2.1% T16577(normal) -2.8% GOOD T16875(normal) -1.6% T17836(normal) +2.2% T17977b(normal) -1.0% T18223(normal) -33.3% GOOD T18282(normal) -3.4% GOOD T18304(normal) -1.4% T18698a(normal) -1.4% GOOD T18698b(normal) -1.3% GOOD T19695(normal) -2.5% GOOD T5837(normal) -2.3% T9630(normal) -33.0% GOOD WWRec(normal) -9.7% GOOD hard_hole_fits(normal) -2.1% GOOD hie002(normal) +1.6% geo. mean -2.2% minimum -50.2% maximum +2.6% I diligently investigated some of the big drops. * Caused by not doing w/w for dictionaries: T13056, T15164, WWRec, T18223 * Caused by not fruitlessly specialising wrappers LargeRecord, T9630 For runtimes, here is perf/should+_run: Metrics: runtime/bytes allocated -------------------------------- T12990(normal) -3.8% T5205(normal) -1.3% T9203(normal) -10.7% GOOD haddock.Cabal(normal) +0.1% haddock.base(normal) -1.1% haddock.compiler(normal) -0.3% lazy-bs-alloc(normal) -0.2% ------------------------------------------ geo. mean -0.3% minimum -10.7% maximum +0.1% I did not investigate exactly what happens in T9203. Nofib is a wash: +-------------------------------++--+-----------+-----------+ \| \|\| \| tsv (rel) \| std. err. \| +===============================++==+===========+===========+ \| real/anna \|\| \| -0.13% \| 0.0% \| \| real/fem \|\| \| +0.13% \| 0.0% \| \| real/fulsom \|\| \| -0.16% \| 0.0% \| \| real/lift \|\| \| -1.55% \| 0.0% \| \| real/reptile \|\| \| -0.11% \| 0.0% \| \| real/smallpt \|\| \| +0.51% \| 0.0% \| \| spectral/constraints \|\| \| +0.20% \| 0.0% \| \| spectral/dom-lt \|\| \| +1.80% \| 0.0% \| \| spectral/expert \|\| \| +0.33% \| 0.0% \| +===============================++==+===========+===========+ \| geom mean \|\| \| \| \| +-------------------------------++--+-----------+-----------+ I spent quite some time investigating dom-lt, but it's pretty complicated. See my note on !7847. Conclusion: it's just a delicate inlining interaction, and we have plenty of those. Metric Decrease: LargeRecord T13056 T13253 T15164 T16577 T18223 T18282 T18698a T18698b T19695 T9630 WWRec hard_hole_fits T9203
*	Demand: Clear distinction between Call SubDmd and eval Dmd (#21717)wip/T21717	Sebastian Graf	2022-09-27	11	-12/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #21717 we saw a reportedly unsound strictness signature due to an unsound definition of plusSubDmd on Calls. This patch contains a description and the fix to the unsoundness as outlined in `Note [Call SubDemand vs. evaluation Demand]`. This fix means we also get rid of the special handling of `-fpedantic-bottoms` in eta-reduction. Thanks to less strict and actually sound strictness results, we will no longer eta-reduce the problematic cases in the first place, even without `-fpedantic-bottoms`. So fixing the unsoundness also makes our eta-reduction code simpler with less hacks to explain. But there is another, more unfortunate side-effect: We unfix #21085, but fortunately we have a new fix ready: See `Note [mkCall and plusSubDmd]`. There's another change: I decided to make `Note [SubDemand denotes at least one evaluation]` a lot simpler by using `plusSubDmd` (instead of `lubPlusSubDmd`) even if both argument demands are lazy. That leads to less precise results, but in turn rids ourselves from the need for 4 different `OpMode`s and the complication of `Note [Manual specialisation of lubDmd/plusDmd]`. The result is simpler code that is in line with the paper draft on Demand Analysis. I left the abandoned idea in `Note [Unrealised opportunity in plusDmd]` for posterity. The fallout in terms of regressions is negligible, as the testsuite and NoFib shows. ``` Program Allocs Instrs -------------------------------------------------------------------------------- hidden +0.2% -0.2% linear -0.0% -0.7% -------------------------------------------------------------------------------- Min -0.0% -0.7% Max +0.2% +0.0% Geometric Mean +0.0% -0.0% ``` Fixes #21717.
*	DmdAnal: Don't panic in addCaseBndrDmd (#22039)	Sebastian Graf	2022-09-05	2	-0/+60
\| \| \| \| \| \| \| \| \| \|	Rather conservatively return Top. See Note [Untyped demand on case-alternative binders]. I also factored `addCaseBndrDmd` into two separate functions `scrutSubDmd` and `fieldBndrDmds`. Fixes #22039.
*	More improvements to worker/wrapper	Simon Peyton Jones	2022-07-25	8	-8/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes #21888, and simplifies finaliseArgBoxities by eliminating the (recently introduced) data type FinalDecision. A delicate interaction meant that this patch commit d1c25a48154236861a413e058ea38d1b8320273f Date: Tue Jul 12 16:33:46 2022 +0100 Refactor wantToUnboxArg a bit make worker/wrapper go into an infinite loop. This patch fixes it by narrowing the handling of case (B) of Note [Boxity for bottoming functions], to deal only the arguemnts that are type variables. Only then do we drop the trimBoxity call, which is what caused the bug. I also * Added documentation of case (B), which was previously completely un-mentioned. And a regression test, T21888a, to test it. * Made unboxDeeplyDmd stop at lazy demands. It's rare anyway for a bottoming function to have a lazy argument (mainly when the data type is recursive and then we don't want to unbox deeply). Plus there is Note [No lazy, Unboxed demands in demand signature] * Refactored the Case equation for dmdAnal a bit, to do less redundant pattern matching.
*	Don't mark lambda binders as OtherCon	Andreas Klebinger	2022-06-27	5	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to put OtherCon unfoldings on lambda binders of workers and sometimes also join points/specializations with with the assumption that since the wrapper would force these arguments once we execute the RHS they would indeed be in WHNF. This was wrong for reasons detailed in #21472. So now we purge evaluated unfoldings from all lambda binders. This fixes #21472, but at the cost of sometimes not using as efficient a calling convention. It can also change inlining behaviour as some occurances will no longer look like value arguments when they did before. As consequence we also change how we compute CBV information for arguments slightly. We now always determine the CBV convention for arguments during tidy. Earlier in the pipeline we merely mark functions as candidates for having their arguments treated as CBV. As before the process is described in the relevant notes: Note [CBV Function Ids] Note [Attaching CBV Marks to ids] Note [Never put `OtherCon` unfoldigns on lambda binders] ------------------------- Metric Decrease: T12425 T13035 T18223 T18223 T18923 MultiLayerModulesTH_OneShot Metric Increase: WWRec -------------------------
*	A bunch of changes related to eta reduction	Simon Peyton Jones	2022-05-30	5	-5/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a large collection of changes all relating to eta reduction, originally triggered by #18993, but there followed a long saga. Specifics: * Move state-hack stuff from GHC.Types.Id (where it never belonged) to GHC.Core.Opt.Arity (which seems much more appropriate). * Add a crucial mkCast in the Cast case of GHC.Core.Opt.Arity.eta_expand; helps with T18223 * Add clarifying notes about eta-reducing to PAPs. See Note [Do not eta reduce PAPs] * I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity, where it properly belongs. See Note [Eta reduce PAPs] * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for when eta-expansion is wanted, to make wantEtaExpansion, and all that same function in GHC.Core.Opt.Simplify.simplStableUnfolding. It was previously inconsistent, but it's doing the same thing. * I did a substantial refactor of ArityType; see Note [ArityType]. This allowed me to do away with the somewhat mysterious takeOneShots; more generally it allows arityType to describe the function, leaving its clients to decide how to use that information. I made ArityType abstract, so that clients have to use functions to access it. * Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called mkLam before) aware of the floats that the simplifier builds up, so that it can still do eta-reduction even if there are some floats. (Previously that would not happen.) That means passing the floats to rebuildLam, and an extra check when eta-reducting (etaFloatOk). * In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info in the idDemandInfo of the binder, as well as the CallArity info. The occurrence analyser did this but we were failing to take advantage here. In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity; see Note [Combining arityType with demand info], and functions idDemandOneShots and combineWithDemandOneShots. (These changes partly drove my refactoring of ArityType.) * In GHC.Core.Opt.Arity.findRhsArity * I'm now taking account of the demand on the binder to give extra one-shot info. E.g. if the fn is always called with two args, we can give better one-shot info on the binders than if we just look at the RHS. * Don't do any fixpointing in the non-recursive case -- simple short cut. * Trim arity inside the loop. See Note [Trim arity inside the loop] * Make SimpleOpt respect the eta-reduction flag (Some associated refactoring here.) * I made the CallCtxt which the Simplifier uses distinguish between recursive and non-recursive right-hand sides. data CallCtxt = ... \| RhsCtxt RecFlag \| ... It affects only one thing: - We call an RHS context interesting only if it is non-recursive see Note [RHS of lets] in GHC.Core.Unfold * Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification. See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep. Other incidental changes * Fix a fairly long-standing outright bug in the ApplyToVal case of GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the tail of 'dmds' in the recursive call, which meant the demands were All Wrong. I have no idea why this has not caused problems before now. * Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg Metrics: compile_time/bytes allocated Test Metric Baseline New value Change --------------------------------------------------------------------------------------- MultiLayerModulesTH_OneShot(normal) ghc/alloc 2,743,297,692 2,619,762,992 -4.5% GOOD T18223(normal) ghc/alloc 1,103,161,360 972,415,992 -11.9% GOOD T3064(normal) ghc/alloc 201,222,500 184,085,360 -8.5% GOOD T8095(normal) ghc/alloc 3,216,292,528 3,254,416,960 +1.2% T9630(normal) ghc/alloc 1,514,131,032 1,557,719,312 +2.9% BAD parsing001(normal) ghc/alloc 530,409,812 525,077,696 -1.0% geo. mean -0.1% Nofib: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- banner +0.0% +0.4% -8.9% -8.7% 0.0% exact-reals +0.0% -7.4% -36.3% -37.4% 0.0% fannkuch-redux +0.0% -0.1% -1.0% -1.0% 0.0% fft2 -0.1% -0.2% -17.8% -19.2% 0.0% fluid +0.0% -1.3% -2.1% -2.1% 0.0% gg -0.0% +2.2% -0.2% -0.1% 0.0% spectral-norm +0.1% -0.2% 0.0% 0.0% 0.0% tak +0.0% -0.3% -9.8% -9.8% 0.0% x2n1 +0.0% -0.2% -3.2% -3.2% 0.0% -------------------------------------------------------------------------------- Min -3.5% -7.4% -58.7% -59.9% 0.0% Max +0.1% +2.2% +32.9% +32.9% 0.0% Geometric Mean -0.0% -0.1% -14.2% -14.8% -0.0% Metric Decrease: MultiLayerModulesTH_OneShot T18223 T3064 T15185 T14766 Metric Increase: T9630
*	Assume at least one evaluation for nested SubDemands (#21081, #21133)wip/T21081	Sebastian Graf	2022-05-03	29	-200/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See the new `Note [SubDemand denotes at least one evaluation]`. A demand `n :* sd` on a let binder `x=e` now means > "`x` was evaluated `n` times and in any program trace it is evaluated, `e` is > evaluated deeply in sub-demand `sd`." The "any time it is evaluated" premise is what this patch adds. As a result, we get better nested strictness. For example (T21081) ```hs f :: (Bool, Bool) -> (Bool, Bool) f pr = (case pr of (a,b) -> a /= b, True) -- before: <MP(L,L)> -- after: <MP(SL,SL)> g :: Int -> (Bool, Bool) g x = let y = let z = odd x in (z,z) in f y ``` The change in demand signature "before" to "after" allows us to case-bind `z` here. Similarly good things happen for the `sd` in call sub-demands `Cn(sd)`, which allows for more eta-reduction (which is only sound with `-fno-pedantic-bottoms`, albeit). We also fix #21085, a surprising inconsistency with `Poly` to `Call` sub-demand expansion. In an attempt to fix a regression caused by less inlining due to eta-reduction in T15426, I eta-expanded the definition of `elemIndex` and `elemIndices`, thus fixing #21345 on the go. The main point of this patch is that it fixes #21081 and #21133. Annoyingly, I discovered that more precise demand signatures for join points can transform a program into a lazier program if that join point gets floated to the top-level, see #21392. There is no simple fix at the moment, but !5349 might. Thus, we accept a ~5% regression in `MultiLayerModulesTH_OneShot`, where #21392 bites us in `addListToUniqDSet`. T21392 reliably reproduces the issue. Surprisingly, ghc/alloc perf on Windows improves much more than on other jobs, by 0.4% in the geometric mean and by 2% in T16875. Metric Increase: MultiLayerModulesTH_OneShot Metric Decrease: T16875
*	StgLint: Lint constructor applications and strict workers for arity.	Andreas Klebinger	2022-04-14	1	-2/+2
\| \| \| \| \| \| \|	This will mean T9208 when run with lint will return a lint error instead of resulting in a panic. Fixes #21117
*	Drop the app invariantghc-9.5-start	Joachim Breitner	2022-04-09	2	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	previously, GHC had the "let/app-invariant" which said that the RHS of a let or the argument of an application must be of lifted type or ok for speculation. We want this on let to freely float them around, and we wanted that on app to freely convert between the two (e.g. in beta-reduction or inlining). However, the app invariant meant that simple code didn't stay simple and this got in the way of rules matching. By removing the app invariant, this thus fixes #20554. The new invariant is now called "let-can-float invariant", which is hopefully easier to guess its meaning correctly. Dropping the app invariant means that everywhere where we effectively do beta-reduction (in the two simplifiers, but also in `exprIsConApp_maybe` and other innocent looking places) we now have to check if the argument must be evaluated (unlifted and side-effecting), and analyses have to be adjusted to the new semantics of `App`. Also, `LetFloats` in the simplifier can now also carry such non-floating bindings. The fix for DmdAnal, refine by Sebastian, makes functions with unlifted arguments strict in these arguments, which changes some signatures. This causes some extra calls to `exprType` and `exprOkForSpeculation`, so some perf benchmarks regress a bit (while others improve). Metric Decrease: T9020 Metric Increase: LargeRecord T12545 T15164 T16577 T18223 T5642 T9961 Co-authored-by: Sebastian Graf <sebastian.graf@kit.edu>
*	Add warnings for file header pragmas that appear in the body of a module ↵	Zubin Duggal	2022-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#20385) Once we are done parsing the header of a module to obtain the options, we look through the rest of the tokens in order to determine if they contain any misplaced file header pragmas that would usually be ignored, potentially resulting in bad error messages. The warnings are reported immediately so that later errors don't shadow over potentially helpful warnings. Metric Increase: T13719
*	Change GHC.Prim to GHC.Exts in docs and tests	Krzysztof Gogolewski	2022-04-01	1	-1/+1
\| \| \| \| \|	Users are supposed to import GHC.Exts rather than GHC.Prim. Part of #18749.
*	testsuite: Remove library dependenices from T21119	Matthew Pickering	2022-03-24	3	-13/+45
\| \| \| \| \| \| \|	These dependencies would affect the demand signature depending on various rules and so on. Fixes #21271
*	WorkWrap: Handle partial FUN apps in `isRecDataCon` (#21265)	Sebastian Graf	2022-03-24	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Partial FUN apps like `(->) Bool` aren't detected by `splitFunTy_maybe`. A silly oversight that is easily fixed by replacing `splitFunTy_maybe` with a guard in the `splitTyConApp_maybe` case. But fortunately, Simon nudged me into rewriting the whole `isRecDataCon` function in a way that makes it much shorter and hence clearer which DataCons are actually considered as recursive. Fixes #21265.
*	Demand: Let `Boxed` win in `lubBoxity` (#21119)	Sebastian Graf	2022-03-16	29	-450/+351
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we let `Unboxed` win in `lubBoxity`, which is unsoundly optimistic in terms ob Boxity analysis. "Unsoundly" in the sense that we sometimes unbox parameters that we better shouldn't unbox. Examples are #18907 and T19871.absent. Until now, we thought that this hack pulled its weight becuase it worked around some shortcomings of the phase separation between Boxity analysis and CPR analysis. But it is a gross hack which caused regressions itself that needed all kinds of fixes and workarounds. See for example #20767. It became impossible to work with in !7599, so I want to remove it. For example, at the moment, `lubDmd B dmd` will not unbox `dmd`, but `lubDmd A dmd` will. Given that `B` is supposed to be the bottom element of the lattice, it's hardly justifiable to get a better demand when `lub`bing with `A`. The consequence of letting `Boxed` win in `lubBoxity` is that we would regress #2387, #16040 and parts of #5075 and T19871.sumIO, until Boxity and CPR are able to communicate better. Fortunately, that is not the case since I could tweak the other source of optimism in Boxity analysis that is described in `Note [Unboxed demand on function bodies returning small products]` so that we recursively assume unboxed demands on function bodies returning small products. See the updated Note. `Note [Boxity for bottoming functions]` describes why we need bottoming functions to have signatures that say that they deeply unbox their arguments. In so doing, I had to tweak `finaliseArgBoxities` so that it will never unbox recursive data constructors. This is in line with our handling of them in CPR. I updated `Note [Which types are unboxed?]` to reflect that. In turn we fix #21119, #20767, #18907, T19871.absent and get a much simpler implementation (at least to think about). We can also drop the very ad-hoc definition of `deferAfterPreciseException` and its Note in favor of the simple, intuitive definition we used to have. Metric Decrease: T16875 T18223 T18698a T18698b hard_hole_fits Metric Increase: LargeRecord MultiComponentModulesRecomp T15703 T8095 T9872d Out of all the regresions, only the one in T9872d doesn't vanish in a perf build, where the compiler is bootstrapped with -O2 and thus SpecConstr. Reason for regressions: * T9872d is due to `ty_co_subst` taking its `LiftingContext` boxed. That is because the context is passed to a function argument, for example in `liftCoSubstTyVarBndrUsing`. * In T15703, LargeRecord and T8095, we get a bit more allocations in `expand_syn` and `piResultTys`, because a `TCvSubst` isn't unboxed. In both cases that guards against reboxing in some code paths. * The same is true for MultiComponentModulesRecomp, where we get less unboxing in `GHC.Unit.Finder.$wfindInstalledHomeModule`. In a perf build, allocations actually improve by over 4%! Results on NoFib: -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- awards -0.4% +0.3% cacheprof -0.3% +2.4% fft -1.5% -5.1% fibheaps +1.2% +0.8% fluid -0.3% -0.1% ida +0.4% +0.9% k-nucleotide +0.4% -0.1% last-piece +10.5% +13.9% lift -4.4% +3.5% mandel2 -99.7% -99.8% mate -0.4% +3.6% parser -1.0% +0.1% puzzle -11.6% +6.5% reverse-complem -3.0% +2.0% scs -0.5% +0.1% sphere -0.4% -0.2% wave4main -8.2% -0.3% -------------------------------------------------------------------------------- Summary excludes mandel2 because of excessive bias Min -11.6% -5.1% Max +10.5% +13.9% Geometric Mean -0.2% +0.3% -------------------------------------------------------------------------------- Not bad for a bug fix. The regression in `last-piece` could become a win if SpecConstr would work on non-recursive functions. The regression in `fibheaps` is due to `Note [Reboxed crud for bottoming calls]`, e.g., #21128.
*	Worker/wrapper: Preserve float barriers (#21150)	Sebastian Graf	2022-03-13	3	-1/+276
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue #21150 shows that worker/wrapper allocated a worker function for a function with multiple calls that said "called at most once" when the first argument was absent. That's bad! This patch makes it so that WW preserves at least one non-one-shot value lambda (see `Note [Preserving float barriers]`) by passing around `void#` in place of absent arguments. Fixes #21150. Since the fix is pretty similar to `Note [Protecting the last value argument]`, I put the logic in `mkWorkerArgs`. There I realised (#21204) that `-ffun-to-thunk` is basically useless with `-ffull-laziness`, so I deprecated the flag, simplified and split into `needsVoidWorkerArg`/`addVoidWorkerArg`. SpecConstr is another client of that API. Fixes #21204. Metric Decrease: T14683
*	Improve out-of-order inferred type variables	sheaf	2022-03-02	1	-20/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't instantiate type variables for :type in `GHC.Tc.Gen.App.tcInstFun`, to avoid inconsistently instantianting `r1` but not `r2` in the type forall {r1} (a :: TYPE r1) {r2} (b :: TYPE r2). ... This fixes #21088. This patch also changes the primop pretty-printer to ensure that we put all the inferred type variables first. For example, the type of reallyUnsafePtrEquality# is now forall {l :: Levity} {k :: Levity} (a :: TYPE (BoxedRep l)) (b :: TYPE (BoxedRep k)). a -> b -> Int# This means we avoid running into issue #21088 entirely with the types of primops. Users can still write a type signature where the inferred type variables don't come first, however. This change to primops had a knock-on consequence, revealing that we were sometimes performing eta reduction on keepAlive#. This patch updates tryEtaReduce to avoid eta reducing functions with no binding, bringing it in line with tryEtaReducePrep, and thus fixing #21090.
*	Tag inference work.	Andreas Klebinger	2022-02-12	2	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This does three major things: * Enforce the invariant that all strict fields must contain tagged pointers. * Try to predict the tag on bindings in order to omit tag checks. * Allows functions to pass arguments unlifted (call-by-value). The former is "simply" achieved by wrapping any constructor allocations with a case which will evaluate the respective strict bindings. The prediction is done by a new data flow analysis based on the STG representation of a program. This also helps us to avoid generating redudant cases for the above invariant. StrictWorkers are created by W/W directly and SpecConstr indirectly. See the Note [Strict Worker Ids] Other minor changes: * Add StgUtil module containing a few functions needed by, but not specific to the tag analysis. ------------------------- Metric Decrease: T12545 T18698b T18140 T18923 LargeRecord Metric Increase: LargeRecord ManyAlternatives ManyConstructors T10421 T12425 T12707 T13035 T13056 T13253 T13253-spj T13379 T15164 T18282 T18304 T18698a T1969 T20049 T3294 T4801 T5321FD T5321Fun T783 T9233 T9675 T9961 T19695 WWRec -------------------------
*	More accurate unboxing	Simon Peyton Jones	2022-02-02	4	-5/+372
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a fix for #20817. It ensures that * The final strictness signature for a function accurately reflects the unboxing done by the wrapper See Note [Finalising boxity for demand signatures] and Note [Finalising boxity for let-bound Ids] * A much better "layer-at-a-time" implementation of the budget for how many worker arguments we can have See Note [Worker argument budget] Generally this leads to a bit more worker/wrapper generation, because instead of aborting entirely if the budget is exceeded (and then lying about boxity), we unbox a bit. Binary sizes in increase slightly (around 1.8%) because of the increase in worker/wrapper generation. The big effects are to GHC.Ix, GHC.Show, GHC.IO.Handle.Internals. If we did a better job of dropping dead code, this effect might go away. Some nofib perf improvements: Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- VSD +1.8% -0.5% 0.017 0.017 0.0% awards +1.8% -0.1% +2.3% +2.3% 0.0% banner +1.7% -0.2% +0.3% +0.3% 0.0% bspt +1.8% -0.1% +3.1% +3.1% 0.0% eliza +1.8% -0.1% +1.2% +1.2% 0.0% expert +1.7% -0.1% +9.6% +9.6% 0.0% fannkuch-redux +1.8% -0.4% -9.3% -9.3% 0.0% kahan +1.8% -0.1% +22.7% +22.7% 0.0% maillist +1.8% -0.9% +21.2% +21.6% 0.0% nucleic2 +1.7% -5.1% +7.5% +7.6% 0.0% pretty +1.8% -0.2% 0.000 0.000 0.0% reverse-complem +1.8% -2.5% +12.2% +12.2% 0.0% rfib +1.8% -0.2% +2.5% +2.5% 0.0% scc +1.8% -0.4% 0.000 0.000 0.0% simple +1.7% -1.3% +17.0% +17.0% +7.4% spectral-norm +1.8% -0.1% +6.8% +6.7% 0.0% sphere +1.7% -2.0% +13.3% +13.3% 0.0% tak +1.8% -0.2% +3.3% +3.3% 0.0% x2n1 +1.8% -0.4% +8.1% +8.1% 0.0% -------------------------------------------------------------------------------- Min +1.1% -5.1% -23.6% -23.6% 0.0% Max +1.8% +0.0% +36.2% +36.2% +7.4% Geometric Mean +1.7% -0.1% +6.8% +6.8% +0.1% Compiler allocations in CI have a geometric mean of +0.1%; many small decreases but there are three bigger increases (7%), all because we do more worker/wrapper than before, so there is simply more code to compile. That's OK. Perf benchmarks in perf/should_run improve in allocation by a geo mean of -0.2%, which is good. None get worse. T12996 improves by -5.8% Metric Decrease: T12996 Metric Increase: T18282 T18923 T9630
*	Fix a few Note inconsistencies	Ben Gamari	2022-02-01	1	-1/+1
\|
*	Improve boxity in deferAfterPreciseException	Simon Peyton Jones	2022-01-28	6	-0/+370
\| \| \| \| \| \| \| \| \| \| \| \| \|	As #20746 showed, the demand analyser behaved badly in a key I/O library (`GHC.IO.Handle.Text`), by unnessarily boxing and reboxing. This patch adjusts the subtle function deferAfterPreciseException; it's quite easy, just a bit subtle. See the new Note [deferAfterPreciseException] And this MR deals only with Problem 2 in #20746. Problem 1 is still open.
*	Add regression test for #20663	Sebastian Graf	2021-11-12	3	-0/+156
\|
*	WorkWrap: Update Unfolding with WW'd body prior to `tryWW` (#20510)	Sebastian Graf	2021-10-29	3	-0/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a function in #20510 that is small enough to get a stable unfolding in WW: ```hs small :: Int -> Int small x = go 0 x where go z 0 = z * x go z y = go (z+y) (y-1) ``` But it appears we failed to use the WW'd RHS as the stable unfolding. As a result, inlining `small` would expose the non-WW'd version of `go`. That appears to regress badly in #19727 which is a bit too large to extract a reproducer from that is guaranteed to reproduce across GHC versions. The solution is to simply update the unfolding in `certainlyWillInline` with the WW'd RHS. Fixes #20510.
*	DmdAnal: Implement Boxity Analysis (#19871)	Sebastian Graf	2021-10-24	32	-363/+827
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes some abundant reboxing of `DynFlags` in `GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic of #19407) by introducing a Boxity analysis to GHC, done as part of demand analysis. This allows to accurately capture ad-hoc unboxing decisions previously made in worker/wrapper in demand analysis now, where the boxity info can propagate through demand signatures. See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in `Note [No lazy, Unboxed demand in demand signature]`, but `Note [Finalising boxity for demand signature]` is probably a better entry-point. To support the fix for #19407, I had to change (what was) `Note [Add demands for strict constructors]` a bit (now `Note [Unboxing evaluated arguments]`). In particular, we now take care of it in `finaliseBoxity` (which is only called from demand analaysis) instead of `wantToUnboxArg`. I also had to resurrect `Note [Product demands for function body]` and rename it to `Note [Unboxed demand on function bodies returning small products]` to avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again. See the updated Note for details. A nice side-effect is that the worker/wrapper transformation no longer needs to look at strictness info and other bits such as `InsideInlineableFun` flags (needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects boxity info from argument demands and interprets them with a severely simplified `wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys` which would be awkward to export. I spent some time figuring out the reason for why `T16197` failed prior to my amendments to `Note [Unboxing evaluated arguments]`. After having it figured out, I minimised it a bit and added `T16197b`, which simply compares computed strictness signatures and thus should be far simpler to eyeball. The 12% ghc/alloc regression in T11545 is because of the additional `Boxity` field in `Poly` and `Prod` that results in more allocation during `lubSubDmd` and `plusSubDmd`. I made sure in the ticky profiles that the number of calls to those functions stayed the same. We can bear such an increase here, as we recently improved it by -68% (in b760c1f). T18698* regress slightly because there is more unboxing of dictionaries happening and that causes Lint (mostly) to allocate more. Fixes #19871, #19407, #4267, #16859, #18907 and #13331. Metric Increase: T11545 T18698a T18698b Metric Decrease: T12425 T16577 T18223 T18282 T4267 T9961
*	CprAnal: Activate Sum CPR for local bindings	Sebastian Graf	2021-10-05	2	-10/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've had Sum CPR (#5075) for top-level bindings for a couple of years now. That begs the question why we didn't also activate it for local bindings, and the reasons for that are described in `Note [CPR for sum types]`. Only that it didn't make sense! The Note said that Sum CPR would destroy let-no-escapes, but that should be a non-issue since we have syntactic join points in Core now and we don't WW for them (`Note [Don't w/w join points for CPR]`). So I simply activated CPR for all bindings of sum type, thus fixing #5075 and \#16570. NoFib approves: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- comp_lab_zift -0.0% +0.7% fluid +1.7% +0.7% reptile +0.1% +0.1% -------------------------------------------------------------------------------- Min -0.0% -0.2% Max +1.7% +0.7% Geometric Mean +0.0% +0.0% ``` There were quite a few metric decreases on the order of 1-4%, but T6048 seems to regress significantly, by 26.1%. WW'ing for a `Just` constructor and the nested data type meant additional Simplifier iterations and a 30% increase in term sizes as well as a 200-300% in type sizes due to unboxed 9-tuples. There's not much we can do about it, I'm afraid: We're just doing much more work there. Metric Decrease: T12425 T18698a T18698b T20049 T9020 WWRec Metric Increase: T6048
*	Nested CPR light unleashed (#18174)	Sebastian Graf	2021-09-30	3	-70/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables worker/wrapper for nested constructed products, as described in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations since !5338. CPR analysis already handles applications in batches since !5753. This patch just needs to flip a few more switches: 1. In `cprTransformDataConWork`, we need to look at the field expressions and their `CprType`s to see whether the evaluation of the expressions terminates quickly (= is in HNF) or if they are put in strict fields. If that is the case, then we retain their CPR info and may unbox nestedly later on. More details in `Note [Nested CPR]`. 2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`. 3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of fields to the `Unbox`. 4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have `cprTransformDataConWork` for workers and treat wrappers by analysing their unfolding. As a result, the code from GHC.Types.Id.Make went away completely. 5. I deactivated worker/wrappering for recursive DataCons and wrote a function `isRecDataCon` to detect them. We really don't want to give `repeat` or `replicate` the Nested CPR property. See Note [CPR for recursive data structures] for which kind of recursive DataCons we target. 6. Fix a couple of tests and their outputs. I also documented that CPR can destroy sharing and lead to asymptotic increase in allocations (which is tracked by #13331/#19326) in `Note [CPR for data structures can destroy sharing]`. Nofib results: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- ben-raytrace -3.1% -0.4% binary-trees +0.8% -2.9% digits-of-e2 +5.8% +1.2% event +0.8% -2.1% fannkuch-redux +0.0% -1.4% fish 0.0% -1.5% gamteb -1.4% -0.3% mkhprog +1.4% +0.8% multiplier +0.0% -1.9% pic -0.6% -0.1% reptile -20.9% -17.8% wave4main +4.8% +0.4% x2n1 -100.0% -7.6% -------------------------------------------------------------------------------- Min -95.0% -17.8% Max +5.8% +1.2% Geometric Mean -2.9% -0.4% ``` The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main regress because of that. Ultimately there are no great improvements due to Nested CPR alone, but at least it's a win. Binary sizes decrease by 0.6%. There are a significant number of metric decreases. The most notable ones (>1%): ``` ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2% ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8% MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1% PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8% PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8% PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9% T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0% T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9% T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1% T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5% T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7% T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4% T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1% T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7% T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2% T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0% T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2% T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5% T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6% T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2% T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8% T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3% T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1% T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0% T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0% T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5% T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4% T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0% T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0% T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5% T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0% WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6% T9203(normal) run/alloc 107284064.0 102881832.0 -4.1% haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1% haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1% haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0% ``` The biggest exception to the rule is T13701 which seems to fluctuate as usual (not unlike T12545). T14697 has a similar quality, being a generated multi-module test. T5837 is small enough that it similarly doesn't measure anything significant besides module loading overhead. T13253 simply does one additional round of Simplification due to Nested CPR. There are also some apparent regressions in T9198, T12234 and PmSeriesG that we (@mpickering and I) were simply unable to reproduce locally. @mpickering tried to run the CI script in a local Docker container and actually found that T9198 and PmSeriesG improved. In MRs that were rebased on top this one, like !4229, I did not experience such increases. Let's not get hung up on these regression tests, they were meant to test for asymptotic regressions. The build-cabal test improves by 1.2% in -O0. Metric Increase: T10421 T12234 T12545 T13035 T13056 T13701 T14697 T18923 T5837 T9198 Metric Decrease: ManyConstructors T12545 T12707 T13056 T14683 T16577 T18223 T1969 T3294 T9203 T9233 T9675 T9872a T9872b T9872c T9961 TcPlugin_RewritePerf
*	compiler: Name generated locals more descriptively	Ben Gamari	2021-07-29	1	-2/+3
\| \| \| \| \| \|	Previously `GHC.Types.Id.Make.newLocal` would name all locals `dt`, making it unnecessarily difficult to determine their origin. Noticed while looking at #19557.
*	Add regression test for #17819	Sebastian Graf	2021-06-27	3	-0/+331
\| \| \| \|	The only item left in #17819. Fixes #17819.
*	WorkWrap: Make mkWWstr and mkWWcpr generate fewer let bindings	Sebastian Graf	2021-06-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5814#note_355144, Simon noted that `mkWWstr` and `mkWWcpr` could generate fewer let bindings and be implemented less indirectly by returning the rebuilt expressions directly, e.g. instead of ``` f :: (Int, Int) -> Int f (x, y) = x+y ==> f :: (Int, Int) -> Int f p = case p of (x, y) -> case x of I# x' -> case y of I# y' -> case $wf x' y' of r' -> let r = I# r' -- immediately returned in r f :: Int# -> Int# -> Int# $wf x' y' = let x = I# x' in -- only used in p let y = I# y' in -- only used in p let p = (x, y) in -- only used in the App below case (\(x,y) -> x+y) p of I# r' -> r' ``` we know generate ``` f :: (Int, Int) -> Int f p = case p of (x, y) -> case x of I# x' -> case y of I# y' -> case $wf x' y' of r' -> I# r' -- 1 fewer let f :: Int# -> Int# -> Int# $wf x' y' = case (\(x,y) -> x+y) (I# x, I# y) of I# r' -> -- 3 fewer lets r' ``` Which is much nicer and makes it easier to comprehend the output of worker-wrapper pre-Simplification as well as puts less strain on the Simplifier. I had to drop support for #18983, but we found that it's broken anyway. Simon is working on a patch that provides a bit more justification.
*	WorkWrap: Remove mkWWargs (#19874)	Sebastian Graf	2021-06-27	3	-4/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`mkWWargs`'s job was pushing casts inwards and doing eta expansion to match the arity with the number of argument demands we w/w for. Nowadays, we use the Simplifier to eta expand to arity. In fact, in recent years we have even seen the eta expansion done by w/w as harmful, see Note [Don't eta expand in w/w]. If a function hasn't enough manifest lambdas, don't w/w it! What purpose does `mkWWargs` serve in this world? Not a great one, it turns out! I could remove it by pulling some important bits, notably Note [Freshen WW arguments] and Note [Join points and beta-redexes]. Result: We reuse the freshened binder names of the wrapper in the worker where possible (see testuite changes), much nicer! In order to avoid scoping errors due to lambda-bound unfoldings in worker arguments, we zap those unfoldings now. In doing so, we fix #19766. Fixes #19874.
*	Re-do rubbish literals	Simon Peyton Jones	2021-06-05	4	-13/+34
\| \| \| \| \| \| \| \| \| \|	As #19882 pointed out, we were simply doing rubbish literals wrong. (I'll refrain from explaining the wrong-ness here -- see the ticket.) This patch fixes it by adding a Type (of kind RuntimeRep) as field of LitRubbish, rather than [PrimRep]. The Note [Rubbish literals] in GHC.Types.Literal explains the details.
*	Make setBndrsDemandInfo work with only type variables	Matthew Pickering	2021-05-19	2	-0/+8
\| \| \| \| \| \|	Fixes #19849 Co-authored-by: Krzysztof Gogolewski <krzysztof.gogolewski@tweag.io>
*	CPR: Detect constructed products in `runRW#` apps (#19822)	Sebastian Graf	2021-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #19822, we realised that the Simplifier's new habit of floating cases into `runRW#` continuations inhibits CPR analysis from giving key functions of `text` the CPR property, such as `singleton`. This patch fixes that by anticipating part of !5667 (Nested CPR) to give `runRW#` the proper CPR transformer it now deserves: Namely, `runRW# (\s -> e)` should have the CPR property iff `e` has it. The details are in `Note [Simplification of runRW#]` in GHC.CoreToStg.Prep. The output of T18086 changed a bit: `panic` (which calls `runRW#`) now has `botCpr`. As outlined in Note [Bottom CPR iff Dead-Ending Divergence], that's OK. Fixes #19822. Metric Decrease: T9872d
*	Correct warning for deprecated and unrecognised flags	Alfredo Di Napoli	2021-04-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Fixes #19616. This commit changes the `GHC.Driver.Errors.handleFlagWarnings` function to rely on the newly introduced `DiagnosticReason`. This allows us to correctly pretty-print the flags which triggered some warnings and in turn remove the cruft around this function (like the extra filtering and the `shouldPrintWarning` function.
*	Rubbish literals for all representations (#18983)	Sebastian Graf	2021-03-26	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans up the complexity around WW's `mk_absent_let` by broadening the scope of `LitRubbish`. Rubbish literals now store the `PrimRep` they represent and are ultimately lowered in Cmm. This in turn allows absent literals of `VecRep` or `VoidRep`. The latter allows absent literals for unlifted coercions, as requested in #18983. I took the liberty to rewrite and clean up `Note [Absent fillers]` and `Note [Rubbish values]` to account for the new implementation and to make them more orthogonal in their description. I didn't add a new regression test, as `T18982` already contains the test in the ticket and its test output changes as expected. Fixes #18983.
*	Nested CPR light (#19398)	Sebastian Graf	2021-03-20	9	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While fixing #19232, it became increasingly clear that the vestigial hack described in `Note [Optimistic field binder CPR]` is complicated and causes reboxing. Rather than make the hack worse, this patch gets rid of it completely in favor of giving deeply unboxed parameters the Nested CPR property. Example: ```hs f :: (Int, Int) -> Int f p = case p of (x, y) \| x == y = x \| otherwise = y ``` Based on `p`'s `idDemandInfo` `1P(1P(L),1P(L))`, we can see that both fields of `p` will be available unboxed. As a result, we give `p` the nested CPR property `1(1,1)`. When analysing the `case`, the field CPRs are transferred to the binders `x` and `y`, respectively, so that we ultimately give `f` the CPR property. I took the liberty to do a bit of refactoring: - I renamed `CprResult` ("Constructed product result result") to plain `Cpr`. - I Introduced `FlatConCpr` in addition to (now nested) `ConCpr` and and according pattern synonym that rewrites flat `ConCpr` to `FlatConCpr`s, purely for compiler perf reasons. - Similarly for performance reasons, we now store binders with a Top signature in a separate `IntSet`, see `Note [Efficient Top sigs in SigEnv]`. - I moved a bit of stuff around in `GHC.Core.Opt.WorkWrap.Utils` and introduced `UnboxingDecision` to replace the `Maybe DataConPatContext` type we used to return from `wantToUnbox`. - Since the `Outputable Cpr` instance changed anyway, I removed the leading `m` which we used to emit for `ConCpr`. It's just noise, especially now that we may output nested CPRs. Fixes #19398.
*	Use GHC2021 as default language	Joachim Breitner	2021-03-10	3	-0/+3
\|
*	DmdAnal: Better syntax for demand signatures (#19016)	Sebastian Graf	2021-03-03	21	-135/+135
\| \| \| \| \| \| \| \| \|	The update of the Outputable instance resulted in a slew of documentation changes within Notes that used the old syntax. The most important doc changes are to `Note [Demand notation]` and the user's guide. Fixes #19016.
*	CPR analysis: Use CPR of scrutinee for Case Binder CPR (#19232)	Sebastian Graf	2021-02-28	3	-34/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For years we have lived in a supposedly sweet spot that gave case binders the CPR property, unconditionally. Which is an optimistic hack that is now described in `Historical Note [Optimistic case binder CPR]`. In #19232 the concern was raised that this might do more harm than good and that might be better off simply by taking the CPR property of the scrutinee for the CPR type of the case binder. And indeed that's what we do now. Since `Note [CPR in a DataAlt case alternative]` is now only about field binders, I renamed and garbage collected it into `Note [Optimistic field binder CPR]`. NoFib approves: ``` NoFib Results -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- anna +0.1% +0.1% nucleic2 -1.2% -0.6% sched 0.0% +0.9% transform -0.0% -0.1% -------------------------------------------------------------------------------- Min -1.2% -0.6% Max +0.1% +0.9% Geometric Mean -0.0% +0.0% ``` Fixes #19232.
*	Fix typos	Brian Wignall	2021-02-06	3	-3/+3
\|
*	WorkWrap: Use SysLocal Name for Thunk Splitting (#19180)	Sebastian Graf	2021-01-10	2	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since !4493 we annotate top-level bindings with demands, which leads to novel opportunities for thunk splitting absent top-level thunks. It turns out that thunk splitting wasn't quite equipped for that, because it re-used top-level, `External` Names for local helper Ids. That triggered a CoreLint error (#19180), reproducible with `T19180`. Fixed by adjusting the thunk splitting code to produce `SysLocal` names for the local bindings. Fixes #19180. Metric Decrease: T12227 T18282
*	WorkWrap: Unbox constructors with existentials (#18982)	Sebastian Graf	2020-12-23	3	-0/+289
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider ```hs data Ex where Ex :: e -> Int -> Ex f :: Ex -> Int f (Ex e n) = e `seq` n + 1 ``` Worker/wrapper should build the following worker for `f`: ```hs $wf :: forall e. e -> Int# -> Int# $wf e n = e `seq` n +# 1# ``` But previously it didn't, because `Ex` binds an existential. This patch lifts that condition. That entailed having to instantiate existential binders in `GHC.Core.Opt.WorkWrap.Utils.mkWWstr` via `GHC.Core.Utils.dataConRepFSInstPat`, requiring a bit of a refactoring around what is now `DataConPatContext`. CPR W/W still won't unbox DataCons with existentials. See `Note [Which types are unboxed?]` for details. I also refactored the various `tyCon*DataCon(s)_maybe` functions in `GHC.Core.TyCon`, deleting some of them which are no longer needed (`isDataProductType_maybe` and `isDataSumType_maybe`). I cleaned up a couple of call sites, some of which weren't very explicit about whether they cared for existentials or not. The test output of `T18013` changed, because we now unbox the `Rule` data type. Its constructor carries existential state and will be w/w'd now. In the particular example, the worker functions inlines right back into the wrapper, which then unnecessarily has a (quite big) stable unfolding. I think this kind of fallout is inevitable; see also Note [Don't w/w inline small non-loop-breaker things]. There's a new regression test case `T18982`. Fixes #18982.
*	Add regression test for #19053	Sebastian Graf	2020-12-15	2	-0/+9
\|
*	Demand: Simplify `CU(U)` to `U` (#19005)	Sebastian Graf	2020-12-12	2	-4/+4
\| \| \| \| \| \|	Both sub-demands encode the same information. This is a trivial change and already affects a few regression tests (e.g. `T5075`), so no separate regression test is necessary.
*	DmdAnal: Annotate top-level function bindings with demands (#18894)	Sebastian Graf	2020-12-12	5	-0/+643
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's useful to annotate a non-exported top-level function like `g` in ```hs module Lib (h) where g :: Int -> Int -> (Int,Int) g m 1 = (m, 0) g m n = (2 * m, 2 `div` n) {-# NOINLINE g #-} h :: Int -> Int h 1 = 0 h m \| odd m = snd (g m 2) \| otherwise = uncurry (+) (g 2 m) ``` with its demand `UCU(CS(P(1P(U),SP(U))`, which tells us that whenever `g` was called, the second component of the returned pair was evaluated strictly. Since #18903 we do so for local functions, where we can see all calls. For top-level functions, we can assume that all exported functions are demanded according to `topDmd` and thus get sound demands for non-exported top-level functions. The demand on `g` is crucial information for Nested CPR, which may the go on and unbox `g` for the second pair component. That is true even if that pair component may diverge, as is the case for the call site `g 13 0`, which throws a div-by-zero exception. In `T18894b`, you can even see the new demand annotation enabling us to eta-expand a function that we wouldn't be able to eta-expand without Call Arity. We only track bindings of function type in order not to risk huge compile-time regressions, see `isInterestingTopLevelFn`. There was a CoreLint check that rejected strict demand annotations on recursive or top-level bindings, which seems completely unjustified. All the cases I investigated were fine, so I removed it. Fixes #18894.
*	Demand: Interleave usage and strictness demands (#18903)	Sebastian Graf	2020-11-20	23	-76/+267
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As outlined in #18903, interleaving usage and strictness demands not only means a more compact demand representation, but also allows us to express demands that we weren't easily able to express before. Call demands are relative in the sense that a call demand `Cn(cd)` on `g` says "`g` is called `n` times. Whenever `g` is called, the result is used according to `cd`". Example from #18903: ```hs h :: Int -> Int h m = let g :: Int -> (Int,Int) g 1 = (m, 0) g n = (2 * n, 2 `div` n) {-# NOINLINE g #-} in case m of 1 -> 0 2 -> snd (g m) _ -> uncurry (+) (g m) ``` Without the interleaved representation, we would just get `L` for the strictness demand on `g`. Now we are able to express that whenever `g` is called, its second component is used strictly in denoting `g` by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the division, for example. Fixes #18903. While fixing regressions, I also discovered and fixed #18957. Metric Decrease: T13253-spj
*	Arity: Record arity types for non-recursive lets	Sebastian Graf	2020-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #18793, we saw a compelling example which requires us to look at non-recursive let-bindings during arity analysis and unleash their arity types at use sites. After the refactoring in the previous patch, the needed change is quite simple and very local to `arityType`'s defn for non-recurisve `Let`. Apart from that, we had to get rid of the second item of `Note [Dealing with bottoms]`, which was entirely a safety measure and hindered optimistic fixed-point iteration. Fixes #18793. The following metric increases are all caused by this commit and a result of the fact that we just do more work now: Metric Increase: T3294 T12545 T12707
*	Fix some missed opportunities for preInlineUnconditionally	Simon Peyton Jones	2020-10-14	3	-291/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two signficant changes here: * Ticket #18815 showed that we were missing some opportunities for preInlineUnconditionally. The one-line fix is in the code for GHC.Core.Opt.Simplify.Utils.preInlineUnconditionally, which now switches off only for INLINE pragmas. I expanded Note [Stable unfoldings and preInlineUnconditionally] to explain. * When doing this I discovered a way in which preInlineUnconditionally was occasionally /too/ eager. It's all explained in Note [Occurrences in stable unfoldings] in GHC.Core.Opt.OccurAnal, and the one-line change adding markAllMany to occAnalUnfolding. I also got confused about what NoUserInline meant, so I've renamed it to NoUserInlinePrag, and changed its pretty-printing slightly. That led to soem error messate wibbling, and touches quite a few files, but there is no change in functionality. I did a nofib run. As expected, no significant changes. Program Size Allocs ---------------------------------------- sphere -0.0% -0.4% ---------------------------------------- Min -0.0% -0.4% Max -0.0% +0.0% Geometric Mean -0.0% -0.0% I'm allowing a max-residency increase for T10370, which seems very irreproducible. (See comments on !4241.) There is always sampling error for max-residency measurements; and in any case the change shows up on some platforms but not others. Metric Increase: T10370
*	New linear types syntax: a %p -> b (#18459)	Vladislav Zavialov	2020-09-29	1	-1/+1
\| \| \| \| \| \|	Implements GHC Proposal #356 Updates the haddock submodule.