| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
A small refactoring in our Core Opt pipeline and some new functions for
transfering argument boxities from one signature to another to facilitate
`Note [Don't change boxity without worker/wrapper]`.
Fixes #21754.
|
|
|
|
|
|
|
|
| |
Justification in #22231. Short form: In a demand like `1C1(C1(L))`
it was too easy to confuse which `1` belongs to which `C`. Now
that should be more obvious.
Fixes #22231
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes #21286, by not unboxing dictionaries in
worker/wrapper (ever). The main payload is tiny:
* In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox
dictionaries in `get_dmd`. See Note [Do not unbox class dictionaries]
in that module
* I also found that imported wrappers were being fruitlessly
specialised, so I fixed that too, in canSpecImport.
See Note [Specialising imported functions] point (2).
In doing due diligence in the testsuite I fixed a number of
other things:
* Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make,
and Note [Inline specialisations] in GHC.Core.Opt.Specialise,
and remove duplication between the two. The new Note describes
how we specialise functions with an INLINABLE pragma.
And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`.
* Improve Note [Worker/wrapper for INLINABLE functions] in
GHC.Core.Opt.WorkWrap.
And (critially) make an actual change which is to propagate the
user-written pragma from the original function to the wrapper; see
`mkStrWrapperInlinePrag`.
* Write new Note [Specialising imported functions] in
GHC.Core.Opt.Specialise
All this has a big effect on some compile times. This is
compiler/perf, showing only changes over 1%:
Metrics: compile_time/bytes allocated
-------------------------------------
LargeRecord(normal) -50.2% GOOD
ManyConstructors(normal) +1.0%
MultiLayerModulesTH_OneShot(normal) +2.6%
PmSeriesG(normal) -1.1%
T10547(normal) -1.2%
T11195(normal) -1.2%
T11276(normal) -1.0%
T11303b(normal) -1.6%
T11545(normal) -1.4%
T11822(normal) -1.3%
T12150(optasm) -1.0%
T12234(optasm) -1.2%
T13056(optasm) -9.3% GOOD
T13253(normal) -3.8% GOOD
T15164(normal) -3.6% GOOD
T16190(normal) -2.1%
T16577(normal) -2.8% GOOD
T16875(normal) -1.6%
T17836(normal) +2.2%
T17977b(normal) -1.0%
T18223(normal) -33.3% GOOD
T18282(normal) -3.4% GOOD
T18304(normal) -1.4%
T18698a(normal) -1.4% GOOD
T18698b(normal) -1.3% GOOD
T19695(normal) -2.5% GOOD
T5837(normal) -2.3%
T9630(normal) -33.0% GOOD
WWRec(normal) -9.7% GOOD
hard_hole_fits(normal) -2.1% GOOD
hie002(normal) +1.6%
geo. mean -2.2%
minimum -50.2%
maximum +2.6%
I diligently investigated some of the big drops.
* Caused by not doing w/w for dictionaries:
T13056, T15164, WWRec, T18223
* Caused by not fruitlessly specialising wrappers
LargeRecord, T9630
For runtimes, here is perf/should+_run:
Metrics: runtime/bytes allocated
--------------------------------
T12990(normal) -3.8%
T5205(normal) -1.3%
T9203(normal) -10.7% GOOD
haddock.Cabal(normal) +0.1%
haddock.base(normal) -1.1%
haddock.compiler(normal) -0.3%
lazy-bs-alloc(normal) -0.2%
------------------------------------------
geo. mean -0.3%
minimum -10.7%
maximum +0.1%
I did not investigate exactly what happens in T9203.
Nofib is a wash:
+-------------------------------++--+-----------+-----------+
| || | tsv (rel) | std. err. |
+===============================++==+===========+===========+
| real/anna || | -0.13% | 0.0% |
| real/fem || | +0.13% | 0.0% |
| real/fulsom || | -0.16% | 0.0% |
| real/lift || | -1.55% | 0.0% |
| real/reptile || | -0.11% | 0.0% |
| real/smallpt || | +0.51% | 0.0% |
| spectral/constraints || | +0.20% | 0.0% |
| spectral/dom-lt || | +1.80% | 0.0% |
| spectral/expert || | +0.33% | 0.0% |
+===============================++==+===========+===========+
| geom mean || | | |
+-------------------------------++--+-----------+-----------+
I spent quite some time investigating dom-lt, but it's pretty
complicated. See my note on !7847. Conclusion: it's just a delicate
inlining interaction, and we have plenty of those.
Metric Decrease:
LargeRecord
T13056
T13253
T15164
T16577
T18223
T18282
T18698a
T18698b
T19695
T9630
WWRec
hard_hole_fits
T9203
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #21717 we saw a reportedly unsound strictness signature due to an unsound
definition of plusSubDmd on Calls. This patch contains a description and the fix
to the unsoundness as outlined in `Note [Call SubDemand vs. evaluation Demand]`.
This fix means we also get rid of the special handling of `-fpedantic-bottoms`
in eta-reduction. Thanks to less strict and actually sound strictness results,
we will no longer eta-reduce the problematic cases in the first place, even
without `-fpedantic-bottoms`.
So fixing the unsoundness also makes our eta-reduction code simpler with less
hacks to explain. But there is another, more unfortunate side-effect:
We *unfix* #21085, but fortunately we have a new fix ready:
See `Note [mkCall and plusSubDmd]`.
There's another change:
I decided to make `Note [SubDemand denotes at least one evaluation]` a lot
simpler by using `plusSubDmd` (instead of `lubPlusSubDmd`) even if both argument
demands are lazy. That leads to less precise results, but in turn rids ourselves
from the need for 4 different `OpMode`s and the complication of
`Note [Manual specialisation of lub*Dmd/plus*Dmd]`. The result is simpler code
that is in line with the paper draft on Demand Analysis.
I left the abandoned idea in `Note [Unrealised opportunity in plusDmd]` for
posterity. The fallout in terms of regressions is negligible, as the testsuite
and NoFib shows.
```
Program Allocs Instrs
--------------------------------------------------------------------------------
hidden +0.2% -0.2%
linear -0.0% -0.7%
--------------------------------------------------------------------------------
Min -0.0% -0.7%
Max +0.2% +0.0%
Geometric Mean +0.0% -0.0%
```
Fixes #21717.
|
|
|
|
|
|
|
|
|
|
| |
Rather conservatively return Top.
See Note [Untyped demand on case-alternative binders].
I also factored `addCaseBndrDmd` into two separate functions `scrutSubDmd` and
`fieldBndrDmds`.
Fixes #22039.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes #21888, and simplifies finaliseArgBoxities
by eliminating the (recently introduced) data type FinalDecision.
A delicate interaction meant that this patch
commit d1c25a48154236861a413e058ea38d1b8320273f
Date: Tue Jul 12 16:33:46 2022 +0100
Refactor wantToUnboxArg a bit
make worker/wrapper go into an infinite loop. This patch
fixes it by narrowing the handling of case (B) of
Note [Boxity for bottoming functions], to deal only the
arguemnts that are type variables. Only then do we drop
the trimBoxity call, which is what caused the bug.
I also
* Added documentation of case (B), which was previously
completely un-mentioned. And a regression test,
T21888a, to test it.
* Made unboxDeeplyDmd stop at lazy demands. It's rare anyway
for a bottoming function to have a lazy argument (mainly when
the data type is recursive and then we don't want to unbox
deeply). Plus there is Note [No lazy, Unboxed demands in
demand signature]
* Refactored the Case equation for dmdAnal a bit, to do less
redundant pattern matching.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to put OtherCon unfoldings on lambda binders of workers
and sometimes also join points/specializations with with the
assumption that since the wrapper would force these arguments
once we execute the RHS they would indeed be in WHNF.
This was wrong for reasons detailed in #21472. So now we purge
evaluated unfoldings from *all* lambda binders.
This fixes #21472, but at the cost of sometimes not using as efficient a
calling convention. It can also change inlining behaviour as some
occurances will no longer look like value arguments when they did
before.
As consequence we also change how we compute CBV information for
arguments slightly. We now *always* determine the CBV convention
for arguments during tidy. Earlier in the pipeline we merely mark
functions as candidates for having their arguments treated as CBV.
As before the process is described in the relevant notes:
Note [CBV Function Ids]
Note [Attaching CBV Marks to ids]
Note [Never put `OtherCon` unfoldigns on lambda binders]
-------------------------
Metric Decrease:
T12425
T13035
T18223
T18223
T18923
MultiLayerModulesTH_OneShot
Metric Increase:
WWRec
-------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a large collection of changes all relating to eta
reduction, originally triggered by #18993, but there followed
a long saga.
Specifics:
* Move state-hack stuff from GHC.Types.Id (where it never belonged)
to GHC.Core.Opt.Arity (which seems much more appropriate).
* Add a crucial mkCast in the Cast case of
GHC.Core.Opt.Arity.eta_expand; helps with T18223
* Add clarifying notes about eta-reducing to PAPs.
See Note [Do not eta reduce PAPs]
* I moved tryEtaReduce from GHC.Core.Utils to GHC.Core.Opt.Arity,
where it properly belongs. See Note [Eta reduce PAPs]
* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, pull out the code for
when eta-expansion is wanted, to make wantEtaExpansion, and all that
same function in GHC.Core.Opt.Simplify.simplStableUnfolding. It was
previously inconsistent, but it's doing the same thing.
* I did a substantial refactor of ArityType; see Note [ArityType].
This allowed me to do away with the somewhat mysterious takeOneShots;
more generally it allows arityType to describe the function, leaving
its clients to decide how to use that information.
I made ArityType abstract, so that clients have to use functions
to access it.
* Make GHC.Core.Opt.Simplify.Utils.rebuildLam (was stupidly called
mkLam before) aware of the floats that the simplifier builds up, so
that it can still do eta-reduction even if there are some floats.
(Previously that would not happen.) That means passing the floats
to rebuildLam, and an extra check when eta-reducting (etaFloatOk).
* In GHC.Core.Opt.Simplify.Utils.tryEtaExpandRhs, make use of call-info
in the idDemandInfo of the binder, as well as the CallArity info. The
occurrence analyser did this but we were failing to take advantage here.
In the end I moved the heavy lifting to GHC.Core.Opt.Arity.findRhsArity;
see Note [Combining arityType with demand info], and functions
idDemandOneShots and combineWithDemandOneShots.
(These changes partly drove my refactoring of ArityType.)
* In GHC.Core.Opt.Arity.findRhsArity
* I'm now taking account of the demand on the binder to give
extra one-shot info. E.g. if the fn is always called with two
args, we can give better one-shot info on the binders
than if we just look at the RHS.
* Don't do any fixpointing in the non-recursive
case -- simple short cut.
* Trim arity inside the loop. See Note [Trim arity inside the loop]
* Make SimpleOpt respect the eta-reduction flag
(Some associated refactoring here.)
* I made the CallCtxt which the Simplifier uses distinguish between
recursive and non-recursive right-hand sides.
data CallCtxt = ... | RhsCtxt RecFlag | ...
It affects only one thing:
- We call an RHS context interesting only if it is non-recursive
see Note [RHS of lets] in GHC.Core.Unfold
* Remove eta-reduction in GHC.CoreToStg.Prep, a welcome simplification.
See Note [No eta reduction needed in rhsToBody] in GHC.CoreToStg.Prep.
Other incidental changes
* Fix a fairly long-standing outright bug in the ApplyToVal case of
GHC.Core.Opt.Simplify.mkDupableContWithDmds. I was failing to take the
tail of 'dmds' in the recursive call, which meant the demands were All
Wrong. I have no idea why this has not caused problems before now.
* Delete dead function GHC.Core.Opt.Simplify.Utils.contIsRhsOrArg
Metrics: compile_time/bytes allocated
Test Metric Baseline New value Change
---------------------------------------------------------------------------------------
MultiLayerModulesTH_OneShot(normal) ghc/alloc 2,743,297,692 2,619,762,992 -4.5% GOOD
T18223(normal) ghc/alloc 1,103,161,360 972,415,992 -11.9% GOOD
T3064(normal) ghc/alloc 201,222,500 184,085,360 -8.5% GOOD
T8095(normal) ghc/alloc 3,216,292,528 3,254,416,960 +1.2%
T9630(normal) ghc/alloc 1,514,131,032 1,557,719,312 +2.9% BAD
parsing001(normal) ghc/alloc 530,409,812 525,077,696 -1.0%
geo. mean -0.1%
Nofib:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
banner +0.0% +0.4% -8.9% -8.7% 0.0%
exact-reals +0.0% -7.4% -36.3% -37.4% 0.0%
fannkuch-redux +0.0% -0.1% -1.0% -1.0% 0.0%
fft2 -0.1% -0.2% -17.8% -19.2% 0.0%
fluid +0.0% -1.3% -2.1% -2.1% 0.0%
gg -0.0% +2.2% -0.2% -0.1% 0.0%
spectral-norm +0.1% -0.2% 0.0% 0.0% 0.0%
tak +0.0% -0.3% -9.8% -9.8% 0.0%
x2n1 +0.0% -0.2% -3.2% -3.2% 0.0%
--------------------------------------------------------------------------------
Min -3.5% -7.4% -58.7% -59.9% 0.0%
Max +0.1% +2.2% +32.9% +32.9% 0.0%
Geometric Mean -0.0% -0.1% -14.2% -14.8% -0.0%
Metric Decrease:
MultiLayerModulesTH_OneShot
T18223
T3064
T15185
T14766
Metric Increase:
T9630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See the new `Note [SubDemand denotes at least one evaluation]`.
A demand `n :* sd` on a let binder `x=e` now means
> "`x` was evaluated `n` times and in any program trace it is evaluated, `e` is
> evaluated deeply in sub-demand `sd`."
The "any time it is evaluated" premise is what this patch adds. As a result,
we get better nested strictness. For example (T21081)
```hs
f :: (Bool, Bool) -> (Bool, Bool)
f pr = (case pr of (a,b) -> a /= b, True)
-- before: <MP(L,L)>
-- after: <MP(SL,SL)>
g :: Int -> (Bool, Bool)
g x = let y = let z = odd x in (z,z) in f y
```
The change in demand signature "before" to "after" allows us to case-bind `z`
here.
Similarly good things happen for the `sd` in call sub-demands `Cn(sd)`, which
allows for more eta-reduction (which is only sound with `-fno-pedantic-bottoms`,
albeit).
We also fix #21085, a surprising inconsistency with `Poly` to `Call` sub-demand
expansion.
In an attempt to fix a regression caused by less inlining due to eta-reduction
in T15426, I eta-expanded the definition of `elemIndex` and `elemIndices`, thus
fixing #21345 on the go.
The main point of this patch is that it fixes #21081 and #21133.
Annoyingly, I discovered that more precise demand signatures for join points can
transform a program into a lazier program if that join point gets floated to the
top-level, see #21392. There is no simple fix at the moment, but !5349 might.
Thus, we accept a ~5% regression in `MultiLayerModulesTH_OneShot`, where #21392
bites us in `addListToUniqDSet`. T21392 reliably reproduces the issue.
Surprisingly, ghc/alloc perf on Windows improves much more than on other jobs, by
0.4% in the geometric mean and by 2% in T16875.
Metric Increase:
MultiLayerModulesTH_OneShot
Metric Decrease:
T16875
|
|
|
|
|
|
|
| |
This will mean T9208 when run with lint will return a lint error instead
of resulting in a panic.
Fixes #21117
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, GHC had the "let/app-invariant" which said that the RHS of a
let or the argument of an application must be of lifted type or ok for
speculation. We want this on let to freely float them around, and we
wanted that on app to freely convert between the two (e.g. in
beta-reduction or inlining).
However, the app invariant meant that simple code didn't stay simple and
this got in the way of rules matching. By removing the app invariant,
this thus fixes #20554.
The new invariant is now called "let-can-float invariant", which is
hopefully easier to guess its meaning correctly.
Dropping the app invariant means that everywhere where we effectively do
beta-reduction (in the two simplifiers, but also in `exprIsConApp_maybe`
and other innocent looking places) we now have to check if the argument
must be evaluated (unlifted and side-effecting), and analyses have to be
adjusted to the new semantics of `App`.
Also, `LetFloats` in the simplifier can now also carry such non-floating
bindings.
The fix for DmdAnal, refine by Sebastian, makes functions with unlifted
arguments strict in these arguments, which changes some signatures.
This causes some extra calls to `exprType` and `exprOkForSpeculation`,
so some perf benchmarks regress a bit (while others improve).
Metric Decrease:
T9020
Metric Increase:
LargeRecord
T12545
T15164
T16577
T18223
T5642
T9961
Co-authored-by: Sebastian Graf <sebastian.graf@kit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#20385)
Once we are done parsing the header of a module to obtain the options, we
look through the rest of the tokens in order to determine if they contain any
misplaced file header pragmas that would usually be ignored, potentially
resulting in bad error messages.
The warnings are reported immediately so that later errors don't shadow
over potentially helpful warnings.
Metric Increase:
T13719
|
|
|
|
|
| |
Users are supposed to import GHC.Exts rather than GHC.Prim.
Part of #18749.
|
|
|
|
|
|
|
| |
These dependencies would affect the demand signature depending on
various rules and so on.
Fixes #21271
|
|
|
|
|
|
|
|
|
|
|
|
| |
Partial FUN apps like `(->) Bool` aren't detected by `splitFunTy_maybe`.
A silly oversight that is easily fixed by replacing `splitFunTy_maybe` with a
guard in the `splitTyConApp_maybe` case.
But fortunately, Simon nudged me into rewriting the whole `isRecDataCon`
function in a way that makes it much shorter and hence clearer which DataCons
are actually considered as recursive.
Fixes #21265.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we let `Unboxed` win in `lubBoxity`, which is unsoundly optimistic
in terms ob Boxity analysis. "Unsoundly" in the sense that we sometimes unbox
parameters that we better shouldn't unbox. Examples are #18907 and T19871.absent.
Until now, we thought that this hack pulled its weight becuase it worked around
some shortcomings of the phase separation between Boxity analysis and CPR
analysis. But it is a gross hack which caused regressions itself that needed all
kinds of fixes and workarounds. See for example #20767. It became impossible to
work with in !7599, so I want to remove it.
For example, at the moment, `lubDmd B dmd` will not unbox `dmd`,
but `lubDmd A dmd` will. Given that `B` is supposed to be the bottom element of
the lattice, it's hardly justifiable to get a better demand when `lub`bing with
`A`.
The consequence of letting `Boxed` win in `lubBoxity` is that we *would* regress
#2387, #16040 and parts of #5075 and T19871.sumIO, until Boxity and CPR
are able to communicate better. Fortunately, that is not the case since I could
tweak the other source of optimism in Boxity analysis that is described in
`Note [Unboxed demand on function bodies returning small products]` so that
we *recursively* assume unboxed demands on function bodies returning small
products. See the updated Note.
`Note [Boxity for bottoming functions]` describes why we need bottoming
functions to have signatures that say that they deeply unbox their arguments.
In so doing, I had to tweak `finaliseArgBoxities` so that it will never unbox
recursive data constructors. This is in line with our handling of them in CPR.
I updated `Note [Which types are unboxed?]` to reflect that.
In turn we fix #21119, #20767, #18907, T19871.absent and get a much simpler
implementation (at least to think about). We can also drop the very ad-hoc
definition of `deferAfterPreciseException` and its Note in favor of the
simple, intuitive definition we used to have.
Metric Decrease:
T16875
T18223
T18698a
T18698b
hard_hole_fits
Metric Increase:
LargeRecord
MultiComponentModulesRecomp
T15703
T8095
T9872d
Out of all the regresions, only the one in T9872d doesn't vanish in a perf
build, where the compiler is bootstrapped with -O2 and thus SpecConstr.
Reason for regressions:
* T9872d is due to `ty_co_subst` taking its `LiftingContext` boxed.
That is because the context is passed to a function argument, for
example in `liftCoSubstTyVarBndrUsing`.
* In T15703, LargeRecord and T8095, we get a bit more allocations in
`expand_syn` and `piResultTys`, because a `TCvSubst` isn't unboxed.
In both cases that guards against reboxing in some code paths.
* The same is true for MultiComponentModulesRecomp, where we get less unboxing
in `GHC.Unit.Finder.$wfindInstalledHomeModule`. In a perf build, allocations
actually *improve* by over 4%!
Results on NoFib:
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
awards -0.4% +0.3%
cacheprof -0.3% +2.4%
fft -1.5% -5.1%
fibheaps +1.2% +0.8%
fluid -0.3% -0.1%
ida +0.4% +0.9%
k-nucleotide +0.4% -0.1%
last-piece +10.5% +13.9%
lift -4.4% +3.5%
mandel2 -99.7% -99.8%
mate -0.4% +3.6%
parser -1.0% +0.1%
puzzle -11.6% +6.5%
reverse-complem -3.0% +2.0%
scs -0.5% +0.1%
sphere -0.4% -0.2%
wave4main -8.2% -0.3%
--------------------------------------------------------------------------------
Summary excludes mandel2 because of excessive bias
Min -11.6% -5.1%
Max +10.5% +13.9%
Geometric Mean -0.2% +0.3%
--------------------------------------------------------------------------------
Not bad for a bug fix.
The regression in `last-piece` could become a win if SpecConstr would work on
non-recursive functions. The regression in `fibheaps` is due to
`Note [Reboxed crud for bottoming calls]`, e.g., #21128.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issue #21150 shows that worker/wrapper allocated a worker function for a
function with multiple calls that said "called at most once" when the first
argument was absent. That's bad!
This patch makes it so that WW preserves at least one non-one-shot value lambda
(see `Note [Preserving float barriers]`) by passing around `void#` in place of
absent arguments.
Fixes #21150.
Since the fix is pretty similar to `Note [Protecting the last value argument]`,
I put the logic in `mkWorkerArgs`. There I realised (#21204) that
`-ffun-to-thunk` is basically useless with `-ffull-laziness`, so I deprecated
the flag, simplified and split into `needsVoidWorkerArg`/`addVoidWorkerArg`.
SpecConstr is another client of that API.
Fixes #21204.
Metric Decrease:
T14683
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't instantiate type variables for :type in
`GHC.Tc.Gen.App.tcInstFun`, to avoid inconsistently instantianting
`r1` but not `r2` in the type
forall {r1} (a :: TYPE r1) {r2} (b :: TYPE r2). ...
This fixes #21088.
This patch also changes the primop pretty-printer to ensure
that we put all the inferred type variables first. For example,
the type of reallyUnsafePtrEquality# is now
forall {l :: Levity} {k :: Levity}
(a :: TYPE (BoxedRep l))
(b :: TYPE (BoxedRep k)).
a -> b -> Int#
This means we avoid running into issue #21088 entirely with
the types of primops. Users can still write a type signature where
the inferred type variables don't come first, however.
This change to primops had a knock-on consequence, revealing that
we were sometimes performing eta reduction on keepAlive#.
This patch updates tryEtaReduce to avoid eta reducing functions
with no binding, bringing it in line with tryEtaReducePrep,
and thus fixing #21090.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does three major things:
* Enforce the invariant that all strict fields must contain tagged
pointers.
* Try to predict the tag on bindings in order to omit tag checks.
* Allows functions to pass arguments unlifted (call-by-value).
The former is "simply" achieved by wrapping any constructor allocations with
a case which will evaluate the respective strict bindings.
The prediction is done by a new data flow analysis based on the STG
representation of a program. This also helps us to avoid generating
redudant cases for the above invariant.
StrictWorkers are created by W/W directly and SpecConstr indirectly.
See the Note [Strict Worker Ids]
Other minor changes:
* Add StgUtil module containing a few functions needed by, but
not specific to the tag analysis.
-------------------------
Metric Decrease:
T12545
T18698b
T18140
T18923
LargeRecord
Metric Increase:
LargeRecord
ManyAlternatives
ManyConstructors
T10421
T12425
T12707
T13035
T13056
T13253
T13253-spj
T13379
T15164
T18282
T18304
T18698a
T1969
T20049
T3294
T4801
T5321FD
T5321Fun
T783
T9233
T9675
T9961
T19695
WWRec
-------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements a fix for #20817. It ensures that
* The final strictness signature for a function accurately
reflects the unboxing done by the wrapper
See Note [Finalising boxity for demand signatures]
and Note [Finalising boxity for let-bound Ids]
* A much better "layer-at-a-time" implementation of the
budget for how many worker arguments we can have
See Note [Worker argument budget]
Generally this leads to a bit more worker/wrapper generation,
because instead of aborting entirely if the budget is exceeded
(and then lying about boxity), we unbox a bit.
Binary sizes in increase slightly (around 1.8%) because of the increase
in worker/wrapper generation. The big effects are to GHC.Ix,
GHC.Show, GHC.IO.Handle.Internals. If we did a better job of dropping
dead code, this effect might go away.
Some nofib perf improvements:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
VSD +1.8% -0.5% 0.017 0.017 0.0%
awards +1.8% -0.1% +2.3% +2.3% 0.0%
banner +1.7% -0.2% +0.3% +0.3% 0.0%
bspt +1.8% -0.1% +3.1% +3.1% 0.0%
eliza +1.8% -0.1% +1.2% +1.2% 0.0%
expert +1.7% -0.1% +9.6% +9.6% 0.0%
fannkuch-redux +1.8% -0.4% -9.3% -9.3% 0.0%
kahan +1.8% -0.1% +22.7% +22.7% 0.0%
maillist +1.8% -0.9% +21.2% +21.6% 0.0%
nucleic2 +1.7% -5.1% +7.5% +7.6% 0.0%
pretty +1.8% -0.2% 0.000 0.000 0.0%
reverse-complem +1.8% -2.5% +12.2% +12.2% 0.0%
rfib +1.8% -0.2% +2.5% +2.5% 0.0%
scc +1.8% -0.4% 0.000 0.000 0.0%
simple +1.7% -1.3% +17.0% +17.0% +7.4%
spectral-norm +1.8% -0.1% +6.8% +6.7% 0.0%
sphere +1.7% -2.0% +13.3% +13.3% 0.0%
tak +1.8% -0.2% +3.3% +3.3% 0.0%
x2n1 +1.8% -0.4% +8.1% +8.1% 0.0%
--------------------------------------------------------------------------------
Min +1.1% -5.1% -23.6% -23.6% 0.0%
Max +1.8% +0.0% +36.2% +36.2% +7.4%
Geometric Mean +1.7% -0.1% +6.8% +6.8% +0.1%
Compiler allocations in CI have a geometric mean of +0.1%; many small
decreases but there are three bigger increases (7%), all because we do
more worker/wrapper than before, so there is simply more code to
compile. That's OK.
Perf benchmarks in perf/should_run improve in allocation by a geo mean
of -0.2%, which is good. None get worse. T12996 improves by -5.8%
Metric Decrease:
T12996
Metric Increase:
T18282
T18923
T9630
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As #20746 showed, the demand analyser behaved badly in a key I/O
library (`GHC.IO.Handle.Text`), by unnessarily boxing and reboxing.
This patch adjusts the subtle function deferAfterPreciseException;
it's quite easy, just a bit subtle.
See the new Note [deferAfterPreciseException]
And this MR deals only with Problem 2 in #20746.
Problem 1 is still open.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a function in #20510 that is small enough to get a stable unfolding in WW:
```hs
small :: Int -> Int
small x = go 0 x
where
go z 0 = z * x
go z y = go (z+y) (y-1)
```
But it appears we failed to use the WW'd RHS as the stable unfolding. As a result,
inlining `small` would expose the non-WW'd version of `go`. That appears to regress
badly in #19727 which is a bit too large to extract a reproducer from that is
guaranteed to reproduce across GHC versions.
The solution is to simply update the unfolding in `certainlyWillInline` with the
WW'd RHS.
Fixes #20510.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes some abundant reboxing of `DynFlags` in
`GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic
of #19407) by introducing a Boxity analysis to GHC, done as part of demand
analysis. This allows to accurately capture ad-hoc unboxing decisions previously
made in worker/wrapper in demand analysis now, where the boxity info can
propagate through demand signatures.
See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in
`Note [No lazy, Unboxed demand in demand signature]`, but
`Note [Finalising boxity for demand signature]` is probably a better entry-point.
To support the fix for #19407, I had to change (what was)
`Note [Add demands for strict constructors]` a bit
(now `Note [Unboxing evaluated arguments]`). In particular, we now take care of
it in `finaliseBoxity` (which is only called from demand analaysis) instead of
`wantToUnboxArg`.
I also had to resurrect `Note [Product demands for function body]` and rename
it to `Note [Unboxed demand on function bodies returning small products]` to
avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again.
See the updated Note for details.
A nice side-effect is that the worker/wrapper transformation no longer needs to
look at strictness info and other bits such as `InsideInlineableFun` flags
(needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects
boxity info from argument demands and interprets them with a severely simplified
`wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved
to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys`
which would be awkward to export.
I spent some time figuring out the reason for why `T16197` failed prior to my
amendments to `Note [Unboxing evaluated arguments]`. After having it figured
out, I minimised it a bit and added `T16197b`, which simply compares computed
strictness signatures and thus should be far simpler to eyeball.
The 12% ghc/alloc regression in T11545 is because of the additional `Boxity`
field in `Poly` and `Prod` that results in more allocation during `lubSubDmd`
and `plusSubDmd`. I made sure in the ticky profiles that the number of calls
to those functions stayed the same. We can bear such an increase here, as we
recently improved it by -68% (in b760c1f).
T18698* regress slightly because there is more unboxing of dictionaries
happening and that causes Lint (mostly) to allocate more.
Fixes #19871, #19407, #4267, #16859, #18907 and #13331.
Metric Increase:
T11545
T18698a
T18698b
Metric Decrease:
T12425
T16577
T18223
T18282
T4267
T9961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had Sum CPR (#5075) for top-level bindings for a couple of years now.
That begs the question why we didn't also activate it for local bindings, and
the reasons for that are described in `Note [CPR for sum types]`. Only that it
didn't make sense! The Note said that Sum CPR would destroy let-no-escapes, but
that should be a non-issue since we have syntactic join points in Core now and
we don't WW for them (`Note [Don't w/w join points for CPR]`).
So I simply activated CPR for all bindings of sum type, thus fixing #5075 and
\#16570. NoFib approves:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
comp_lab_zift -0.0% +0.7%
fluid +1.7% +0.7%
reptile +0.1% +0.1%
--------------------------------------------------------------------------------
Min -0.0% -0.2%
Max +1.7% +0.7%
Geometric Mean +0.0% +0.0%
```
There were quite a few metric decreases on the order of 1-4%, but T6048 seems to
regress significantly, by 26.1%. WW'ing for a `Just` constructor and the nested
data type meant additional Simplifier iterations and a 30% increase in term
sizes as well as a 200-300% in type sizes due to unboxed 9-tuples. There's not
much we can do about it, I'm afraid: We're just doing much more work there.
Metric Decrease:
T12425
T18698a
T18698b
T20049
T9020
WWRec
Metric Increase:
T6048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:
1. In `cprTransformDataConWork`, we need to look at the field expressions
and their `CprType`s to see whether the evaluation of the expressions
terminates quickly (= is in HNF) or if they are put in strict fields.
If that is the case, then we retain their CPR info and may unbox nestedly
later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
`cprTransformDataConWork` for workers and treat wrappers by analysing their
unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
`isRecDataCon` to detect them. We really don't want to give `repeat` or
`replicate` the Nested CPR property.
See Note [CPR for recursive data structures] for which kind of recursive
DataCons we target.
6. Fix a couple of tests and their outputs.
I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.
Nofib results:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
ben-raytrace -3.1% -0.4%
binary-trees +0.8% -2.9%
digits-of-e2 +5.8% +1.2%
event +0.8% -2.1%
fannkuch-redux +0.0% -1.4%
fish 0.0% -1.5%
gamteb -1.4% -0.3%
mkhprog +1.4% +0.8%
multiplier +0.0% -1.9%
pic -0.6% -0.1%
reptile -20.9% -17.8%
wave4main +4.8% +0.4%
x2n1 -100.0% -7.6%
--------------------------------------------------------------------------------
Min -95.0% -17.8%
Max +5.8% +1.2%
Geometric Mean -2.9% -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.
There are a significant number of metric decreases. The most notable ones (>1%):
```
ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2%
ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8%
MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1%
PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8%
PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8%
PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9%
T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0%
T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9%
T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1%
T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5%
T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7%
T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD
T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD
T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4%
T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD
T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1%
T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD
T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7%
T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2%
T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0%
T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2%
T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD
T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5%
T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6%
T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2%
T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8%
T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3%
T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1%
T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD
T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD
T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0%
T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0%
T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5%
T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4%
T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0%
T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0%
T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD
T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD
T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5%
T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD
T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD
T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD
T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD
TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0%
WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6%
T9203(normal) run/alloc 107284064.0 102881832.0 -4.1%
haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1%
haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1%
haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.
There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.
The build-cabal test improves by 1.2% in -O0.
Metric Increase:
T10421
T12234
T12545
T13035
T13056
T13701
T14697
T18923
T5837
T9198
Metric Decrease:
ManyConstructors
T12545
T12707
T13056
T14683
T16577
T18223
T1969
T3294
T9203
T9233
T9675
T9872a
T9872b
T9872c
T9961
TcPlugin_RewritePerf
|
|
|
|
|
|
| |
Previously `GHC.Types.Id.Make.newLocal` would name all locals `dt`,
making it unnecessarily difficult to determine their origin.
Noticed while looking at #19557.
|
|
|
|
| |
The only item left in #17819. Fixes #17819.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5814#note_355144,
Simon noted that `mkWWstr` and `mkWWcpr` could generate fewer let bindings and
be implemented less indirectly by returning the rebuilt expressions directly, e.g. instead of
```
f :: (Int, Int) -> Int
f (x, y) = x+y
==>
f :: (Int, Int) -> Int
f p = case p of (x, y) ->
case x of I# x' ->
case y of I# y' ->
case $wf x' y' of r' ->
let r = I# r' -- immediately returned
in r
f :: Int# -> Int# -> Int#
$wf x' y' = let x = I# x' in -- only used in p
let y = I# y' in -- only used in p
let p = (x, y) in -- only used in the App below
case (\(x,y) -> x+y) p of I# r' ->
r'
```
we know generate
```
f :: (Int, Int) -> Int
f p = case p of (x, y) ->
case x of I# x' ->
case y of I# y' ->
case $wf x' y' of r' ->
I# r' -- 1 fewer let
f :: Int# -> Int# -> Int#
$wf x' y' = case (\(x,y) -> x+y) (I# x, I# y) of I# r' -> -- 3 fewer lets
r'
```
Which is much nicer and makes it easier to comprehend the output of
worker-wrapper pre-Simplification as well as puts less strain on the Simplifier.
I had to drop support for #18983, but we found that it's broken anyway.
Simon is working on a patch that provides a bit more justification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`mkWWargs`'s job was pushing casts inwards and doing eta expansion to match
the arity with the number of argument demands we w/w for.
Nowadays, we use the Simplifier to eta expand to arity. In fact, in recent years
we have even seen the eta expansion done by w/w as harmful, see Note [Don't eta
expand in w/w]. If a function hasn't enough manifest lambdas, don't w/w it!
What purpose does `mkWWargs` serve in this world? Not a great one, it turns out!
I could remove it by pulling some important bits,
notably Note [Freshen WW arguments] and Note [Join points and beta-redexes].
Result: We reuse the freshened binder names of the wrapper in the
worker where possible (see testuite changes), much nicer!
In order to avoid scoping errors due to lambda-bound unfoldings in worker
arguments, we zap those unfoldings now. In doing so, we fix #19766.
Fixes #19874.
|
|
|
|
|
|
|
|
|
|
| |
As #19882 pointed out, we were simply doing rubbish literals wrong.
(I'll refrain from explaining the wrong-ness here -- see the ticket.)
This patch fixes it by adding a Type (of kind RuntimeRep) as field of
LitRubbish, rather than [PrimRep].
The Note [Rubbish literals] in GHC.Types.Literal explains the details.
|
|
|
|
|
|
| |
Fixes #19849
Co-authored-by: Krzysztof Gogolewski <krzysztof.gogolewski@tweag.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #19822, we realised that the Simplifier's new habit of floating cases into
`runRW#` continuations inhibits CPR analysis from giving key functions of `text`
the CPR property, such as `singleton`.
This patch fixes that by anticipating part of !5667 (Nested CPR) to give
`runRW#` the proper CPR transformer it now deserves: Namely, `runRW# (\s -> e)`
should have the CPR property iff `e` has it.
The details are in `Note [Simplification of runRW#]` in GHC.CoreToStg.Prep.
The output of T18086 changed a bit: `panic` (which calls `runRW#`) now has
`botCpr`. As outlined in Note [Bottom CPR iff Dead-Ending Divergence], that's
OK.
Fixes #19822.
Metric Decrease:
T9872d
|
|
|
|
|
|
|
|
|
|
| |
Fixes #19616.
This commit changes the `GHC.Driver.Errors.handleFlagWarnings` function
to rely on the newly introduced `DiagnosticReason`. This allows us to
correctly pretty-print the flags which triggered some warnings and in
turn remove the cruft around this function (like the extra filtering
and the `shouldPrintWarning` function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up the complexity around WW's `mk_absent_let` by
broadening the scope of `LitRubbish`. Rubbish literals now store the
`PrimRep` they represent and are ultimately lowered in Cmm.
This in turn allows absent literals of `VecRep` or `VoidRep`. The latter
allows absent literals for unlifted coercions, as requested in #18983.
I took the liberty to rewrite and clean up `Note [Absent fillers]` and
`Note [Rubbish values]` to account for the new implementation and to
make them more orthogonal in their description.
I didn't add a new regression test, as `T18982` already contains the
test in the ticket and its test output changes as expected.
Fixes #18983.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While fixing #19232, it became increasingly clear that the vestigial
hack described in `Note [Optimistic field binder CPR]` is complicated
and causes reboxing. Rather than make the hack worse, this patch
gets rid of it completely in favor of giving deeply unboxed parameters
the Nested CPR property. Example:
```hs
f :: (Int, Int) -> Int
f p = case p of
(x, y) | x == y = x
| otherwise = y
```
Based on `p`'s `idDemandInfo` `1P(1P(L),1P(L))`, we can see that both
fields of `p` will be available unboxed. As a result, we give `p` the
nested CPR property `1(1,1)`. When analysing the `case`, the field
CPRs are transferred to the binders `x` and `y`, respectively, so that
we ultimately give `f` the CPR property.
I took the liberty to do a bit of refactoring:
- I renamed `CprResult` ("Constructed product result result") to plain
`Cpr`.
- I Introduced `FlatConCpr` in addition to (now nested) `ConCpr` and
and according pattern synonym that rewrites flat `ConCpr` to
`FlatConCpr`s, purely for compiler perf reasons.
- Similarly for performance reasons, we now store binders with a
Top signature in a separate `IntSet`,
see `Note [Efficient Top sigs in SigEnv]`.
- I moved a bit of stuff around in `GHC.Core.Opt.WorkWrap.Utils` and
introduced `UnboxingDecision` to replace the `Maybe DataConPatContext`
type we used to return from `wantToUnbox`.
- Since the `Outputable Cpr` instance changed anyway, I removed the
leading `m` which we used to emit for `ConCpr`. It's just noise,
especially now that we may output nested CPRs.
Fixes #19398.
|
| |
|
|
|
|
|
|
|
|
|
| |
The update of the Outputable instance resulted in a slew of
documentation changes within Notes that used the old syntax.
The most important doc changes are to `Note [Demand notation]`
and the user's guide.
Fixes #19016.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For years we have lived in a supposedly sweet spot that gave case
binders the CPR property, unconditionally. Which is an optimistic hack
that is now described in `Historical Note [Optimistic case binder CPR]`.
In #19232 the concern was raised that this might do more harm than good
and that might be better off simply by taking the CPR property of the
scrutinee for the CPR type of the case binder. And indeed that's what we
do now.
Since `Note [CPR in a DataAlt case alternative]` is now only about field
binders, I renamed and garbage collected it into
`Note [Optimistic field binder CPR]`.
NoFib approves:
```
NoFib Results
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
anna +0.1% +0.1%
nucleic2 -1.2% -0.6%
sched 0.0% +0.9%
transform -0.0% -0.1%
--------------------------------------------------------------------------------
Min -1.2% -0.6%
Max +0.1% +0.9%
Geometric Mean -0.0% +0.0%
```
Fixes #19232.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since !4493 we annotate top-level bindings with demands, which leads to
novel opportunities for thunk splitting absent top-level thunks.
It turns out that thunk splitting wasn't quite equipped for that,
because it re-used top-level, `External` Names for local helper Ids.
That triggered a CoreLint error (#19180), reproducible with `T19180`.
Fixed by adjusting the thunk splitting code to produce `SysLocal` names
for the local bindings.
Fixes #19180.
Metric Decrease:
T12227
T18282
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider
```hs
data Ex where
Ex :: e -> Int -> Ex
f :: Ex -> Int
f (Ex e n) = e `seq` n + 1
```
Worker/wrapper should build the following worker for `f`:
```hs
$wf :: forall e. e -> Int# -> Int#
$wf e n = e `seq` n +# 1#
```
But previously it didn't, because `Ex` binds an existential.
This patch lifts that condition. That entailed having to instantiate
existential binders in `GHC.Core.Opt.WorkWrap.Utils.mkWWstr` via
`GHC.Core.Utils.dataConRepFSInstPat`, requiring a bit of a refactoring
around what is now `DataConPatContext`.
CPR W/W still won't unbox DataCons with existentials.
See `Note [Which types are unboxed?]` for details.
I also refactored the various `tyCon*DataCon(s)_maybe` functions in
`GHC.Core.TyCon`, deleting some of them which are no longer needed
(`isDataProductType_maybe` and `isDataSumType_maybe`).
I cleaned up a couple of call sites, some of which weren't very explicit
about whether they cared for existentials or not.
The test output of `T18013` changed, because we now unbox the `Rule`
data type. Its constructor carries existential state and will be
w/w'd now. In the particular example, the worker functions inlines right
back into the wrapper, which then unnecessarily has a (quite big) stable
unfolding. I think this kind of fallout is inevitable;
see also Note [Don't w/w inline small non-loop-breaker things].
There's a new regression test case `T18982`.
Fixes #18982.
|
| |
|
|
|
|
|
|
| |
Both sub-demands encode the same information.
This is a trivial change and already affects a few regression tests
(e.g. `T5075`), so no separate regression test is necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's useful to annotate a non-exported top-level function like `g` in
```hs
module Lib (h) where
g :: Int -> Int -> (Int,Int)
g m 1 = (m, 0)
g m n = (2 * m, 2 `div` n)
{-# NOINLINE g #-}
h :: Int -> Int
h 1 = 0
h m
| odd m = snd (g m 2)
| otherwise = uncurry (+) (g 2 m)
```
with its demand `UCU(CS(P(1P(U),SP(U))`, which tells us that whenever `g` was
called, the second component of the returned pair was evaluated strictly.
Since #18903 we do so for local functions, where we can see all calls.
For top-level functions, we can assume that all *exported* functions are
demanded according to `topDmd` and thus get sound demands for
non-exported top-level functions.
The demand on `g` is crucial information for Nested CPR, which may the
go on and unbox `g` for the second pair component. That is true even if
that pair component may diverge, as is the case for the call site `g 13
0`, which throws a div-by-zero exception.
In `T18894b`, you can even see the new demand annotation enabling us to
eta-expand a function that we wouldn't be able to eta-expand without
Call Arity.
We only track bindings of function type in order not to risk huge compile-time
regressions, see `isInterestingTopLevelFn`.
There was a CoreLint check that rejected strict demand annotations on
recursive or top-level bindings, which seems completely unjustified.
All the cases I investigated were fine, so I removed it.
Fixes #18894.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As outlined in #18903, interleaving usage and strictness demands not
only means a more compact demand representation, but also allows us to
express demands that we weren't easily able to express before.
Call demands are *relative* in the sense that a call demand `Cn(cd)`
on `g` says "`g` is called `n` times. *Whenever `g` is called*, the
result is used according to `cd`". Example from #18903:
```hs
h :: Int -> Int
h m =
let g :: Int -> (Int,Int)
g 1 = (m, 0)
g n = (2 * n, 2 `div` n)
{-# NOINLINE g #-}
in case m of
1 -> 0
2 -> snd (g m)
_ -> uncurry (+) (g m)
```
Without the interleaved representation, we would just get `L` for the
strictness demand on `g`. Now we are able to express that whenever
`g` is called, its second component is used strictly in denoting `g`
by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the
division, for example.
Fixes #18903.
While fixing regressions, I also discovered and fixed #18957.
Metric Decrease:
T13253-spj
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #18793, we saw a compelling example which requires us to look at
non-recursive let-bindings during arity analysis and unleash their arity
types at use sites.
After the refactoring in the previous patch, the needed change is quite
simple and very local to `arityType`'s defn for non-recurisve `Let`.
Apart from that, we had to get rid of the second item of
`Note [Dealing with bottoms]`, which was entirely a safety measure and
hindered optimistic fixed-point iteration.
Fixes #18793.
The following metric increases are all caused by this commit and a
result of the fact that we just do more work now:
Metric Increase:
T3294
T12545
T12707
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two signficant changes here:
* Ticket #18815 showed that we were missing some opportunities for
preInlineUnconditionally. The one-line fix is in the code for
GHC.Core.Opt.Simplify.Utils.preInlineUnconditionally, which now
switches off only for INLINE pragmas. I expanded
Note [Stable unfoldings and preInlineUnconditionally] to explain.
* When doing this I discovered a way in which preInlineUnconditionally
was occasionally /too/ eager. It's all explained in
Note [Occurrences in stable unfoldings] in GHC.Core.Opt.OccurAnal,
and the one-line change adding markAllMany to occAnalUnfolding.
I also got confused about what NoUserInline meant, so I've renamed
it to NoUserInlinePrag, and changed its pretty-printing slightly.
That led to soem error messate wibbling, and touches quite a few
files, but there is no change in functionality.
I did a nofib run. As expected, no significant changes.
Program Size Allocs
----------------------------------------
sphere -0.0% -0.4%
----------------------------------------
Min -0.0% -0.4%
Max -0.0% +0.0%
Geometric Mean -0.0% -0.0%
I'm allowing a max-residency increase for T10370, which seems
very irreproducible. (See comments on !4241.) There is always
sampling error for max-residency measurements; and in any case
the change shows up on some platforms but not others.
Metric Increase:
T10370
|
|
|
|
|
|
| |
Implements GHC Proposal #356
Updates the haddock submodule.
|