| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Justification in #22231. Short form: In a demand like `1C1(C1(L))`
it was too easy to confuse which `1` belongs to which `C`. Now
that should be more obvious.
Fixes #22231
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I finally got tired of the way that IfaceUnfolding reflected
a previous structure of unfoldings, not the current one. This
MR refactors UnfoldingSource and IfaceUnfolding to be simpler
and more consistent.
It's largely just a refactor, but in UnfoldingSource (which moves
to GHC.Types.Basic, since it is now used in IfaceSyn too), I
distinguish between /user-specified/ and /system-generated/ stable
unfoldings.
data UnfoldingSource
= VanillaSrc
| StableUserSrc -- From a user-specified pragma
| StableSystemSrc -- From a system-generated unfolding
| CompulsorySrc
This has a minor effect in CSE (see the use of isisStableUserUnfolding
in GHC.Core.Opt.CSE), which I tripped over when working on
specialisation, but it seems like a Good Thing to know anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See the new `Note [SubDemand denotes at least one evaluation]`.
A demand `n :* sd` on a let binder `x=e` now means
> "`x` was evaluated `n` times and in any program trace it is evaluated, `e` is
> evaluated deeply in sub-demand `sd`."
The "any time it is evaluated" premise is what this patch adds. As a result,
we get better nested strictness. For example (T21081)
```hs
f :: (Bool, Bool) -> (Bool, Bool)
f pr = (case pr of (a,b) -> a /= b, True)
-- before: <MP(L,L)>
-- after: <MP(SL,SL)>
g :: Int -> (Bool, Bool)
g x = let y = let z = odd x in (z,z) in f y
```
The change in demand signature "before" to "after" allows us to case-bind `z`
here.
Similarly good things happen for the `sd` in call sub-demands `Cn(sd)`, which
allows for more eta-reduction (which is only sound with `-fno-pedantic-bottoms`,
albeit).
We also fix #21085, a surprising inconsistency with `Poly` to `Call` sub-demand
expansion.
In an attempt to fix a regression caused by less inlining due to eta-reduction
in T15426, I eta-expanded the definition of `elemIndex` and `elemIndices`, thus
fixing #21345 on the go.
The main point of this patch is that it fixes #21081 and #21133.
Annoyingly, I discovered that more precise demand signatures for join points can
transform a program into a lazier program if that join point gets floated to the
top-level, see #21392. There is no simple fix at the moment, but !5349 might.
Thus, we accept a ~5% regression in `MultiLayerModulesTH_OneShot`, where #21392
bites us in `addListToUniqDSet`. T21392 reliably reproduces the issue.
Surprisingly, ghc/alloc perf on Windows improves much more than on other jobs, by
0.4% in the geometric mean and by 2% in T16875.
Metric Increase:
MultiLayerModulesTH_OneShot
Metric Decrease:
T16875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #20541 by making mkTyConApp do more sharing of types.
In particular, replace
* BoxedRep Lifted ==> LiftedRep
* BoxedRep Unlifted ==> UnliftedRep
* TupleRep '[] ==> ZeroBitRep
* TYPE ZeroBitRep ==> ZeroBitType
In each case, the thing on the right is a type synonym
for the thing on the left, declared in ghc-prim:GHC.Types.
See Note [Using synonyms to compress types] in GHC.Core.Type.
The synonyms for ZeroBitRep and ZeroBitType are new, but absolutely
in the same spirit as the other ones. (These synonyms are mainly
for internal use, though the programmer can use them too.)
I also renamed GHC.Core.Ty.Rep.isVoidTy to isZeroBitTy, to be
compatible with the "zero-bit" nomenclature above. See discussion
on !6806.
There is a tricky wrinkle: see GHC.Core.Types
Note [Care using synonyms to compress types]
Compiler allocation decreases by up to 0.8%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes some abundant reboxing of `DynFlags` in
`GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic
of #19407) by introducing a Boxity analysis to GHC, done as part of demand
analysis. This allows to accurately capture ad-hoc unboxing decisions previously
made in worker/wrapper in demand analysis now, where the boxity info can
propagate through demand signatures.
See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in
`Note [No lazy, Unboxed demand in demand signature]`, but
`Note [Finalising boxity for demand signature]` is probably a better entry-point.
To support the fix for #19407, I had to change (what was)
`Note [Add demands for strict constructors]` a bit
(now `Note [Unboxing evaluated arguments]`). In particular, we now take care of
it in `finaliseBoxity` (which is only called from demand analaysis) instead of
`wantToUnboxArg`.
I also had to resurrect `Note [Product demands for function body]` and rename
it to `Note [Unboxed demand on function bodies returning small products]` to
avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again.
See the updated Note for details.
A nice side-effect is that the worker/wrapper transformation no longer needs to
look at strictness info and other bits such as `InsideInlineableFun` flags
(needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects
boxity info from argument demands and interprets them with a severely simplified
`wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved
to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys`
which would be awkward to export.
I spent some time figuring out the reason for why `T16197` failed prior to my
amendments to `Note [Unboxing evaluated arguments]`. After having it figured
out, I minimised it a bit and added `T16197b`, which simply compares computed
strictness signatures and thus should be far simpler to eyeball.
The 12% ghc/alloc regression in T11545 is because of the additional `Boxity`
field in `Poly` and `Prod` that results in more allocation during `lubSubDmd`
and `plusSubDmd`. I made sure in the ticky profiles that the number of calls
to those functions stayed the same. We can bear such an increase here, as we
recently improved it by -68% (in b760c1f).
T18698* regress slightly because there is more unboxing of dictionaries
happening and that causes Lint (mostly) to allocate more.
Fixes #19871, #19407, #4267, #16859, #18907 and #13331.
Metric Increase:
T11545
T18698a
T18698b
Metric Decrease:
T12425
T16577
T18223
T18282
T4267
T9961
|
|
|
|
|
| |
This flag is used to remove the output of core stats per binding in Core
dumps.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`mkWWargs`'s job was pushing casts inwards and doing eta expansion to match
the arity with the number of argument demands we w/w for.
Nowadays, we use the Simplifier to eta expand to arity. In fact, in recent years
we have even seen the eta expansion done by w/w as harmful, see Note [Don't eta
expand in w/w]. If a function hasn't enough manifest lambdas, don't w/w it!
What purpose does `mkWWargs` serve in this world? Not a great one, it turns out!
I could remove it by pulling some important bits,
notably Note [Freshen WW arguments] and Note [Join points and beta-redexes].
Result: We reuse the freshened binder names of the wrapper in the
worker where possible (see testuite changes), much nicer!
In order to avoid scoping errors due to lambda-bound unfoldings in worker
arguments, we zap those unfoldings now. In doing so, we fix #19766.
Fixes #19874.
|
|
|
|
|
|
| |
With -dsuppress-coercions, it's still good to be able to see the
type of the coercion. This patch prints the type. Maybe we should
have a flag to control this too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In another small step towards bringing a manageable variant of Nested
CPR into GHC, this patch refactors worker/wrapper to be able to exploit
Nested CPR signatures. See the new Note [Worker/wrapper for CPR].
The nested code path is currently not triggered, though, because all
signatures that we annotate are still flat. So purely a refactoring.
I am very confident that it works, because I ripped it off !1866 95%
unchanged.
A few test case outputs changed, but only it's auxiliary names only.
I also added test cases for #18109 and #18401.
There's a 2.6% metric increase in T13056 after a rebase, caused by an
additional Simplifier run. It appears b1d0b9c saw a similar additional
iteration. I think it's just a fluke.
Metric Increase:
T13056
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This vastly reduces memory usage when compiling with `--make` mode, from
about 900M when compiling Cabal to about 300M.
As a matter of uniformity, it also ensures that reading from an
interface performs the same as using the in-memory cache. We can also
delete all the horrible knot-tying in updateIdInfos.
Goes some way to fixing #13586
Accept new output of tests fixing some bugs along the way
-------------------------
Metric Decrease:
T12545
-------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the BoxedRep proposal, refactoring the `RuntimeRep`
hierarchy from:
```haskell
data RuntimeRep = LiftedPtrRep | UnliftedPtrRep | ...
```
to
```haskell
data RuntimeRep = BoxedRep Levity | ...
data Levity = Lifted | Unlifted
```
Updates binary, haddock submodules.
Closes #17526.
Metric Increase:
T12545
|
|
|
|
|
|
|
|
|
| |
The update of the Outputable instance resulted in a slew of
documentation changes within Notes that used the old syntax.
The most important doc changes are to `Note [Demand notation]`
and the user's guide.
Fixes #19016.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider
```hs
data Ex where
Ex :: e -> Int -> Ex
f :: Ex -> Int
f (Ex e n) = e `seq` n + 1
```
Worker/wrapper should build the following worker for `f`:
```hs
$wf :: forall e. e -> Int# -> Int#
$wf e n = e `seq` n +# 1#
```
But previously it didn't, because `Ex` binds an existential.
This patch lifts that condition. That entailed having to instantiate
existential binders in `GHC.Core.Opt.WorkWrap.Utils.mkWWstr` via
`GHC.Core.Utils.dataConRepFSInstPat`, requiring a bit of a refactoring
around what is now `DataConPatContext`.
CPR W/W still won't unbox DataCons with existentials.
See `Note [Which types are unboxed?]` for details.
I also refactored the various `tyCon*DataCon(s)_maybe` functions in
`GHC.Core.TyCon`, deleting some of them which are no longer needed
(`isDataProductType_maybe` and `isDataSumType_maybe`).
I cleaned up a couple of call sites, some of which weren't very explicit
about whether they cared for existentials or not.
The test output of `T18013` changed, because we now unbox the `Rule`
data type. Its constructor carries existential state and will be
w/w'd now. In the particular example, the worker functions inlines right
back into the wrapper, which then unnecessarily has a (quite big) stable
unfolding. I think this kind of fallout is inevitable;
see also Note [Don't w/w inline small non-loop-breaker things].
There's a new regression test case `T18982`.
Fixes #18982.
|
|
|
|
|
|
| |
This was inadvertently merged.
This reverts commit 6c2eb2232b39ff4720fda0a4a009fb6afbc9dcea.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the BoxedRep proposal, refacoring the `RuntimeRep`
hierarchy from:
```haskell
data RuntimeRep = LiftedPtrRep | UnliftedPtrRep | ...
```
to
```haskell
data RuntimeRep = BoxedRep Levity | ...
data Levity = Lifted | Unlifted
```
Closes #17526.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the compilation of programs GHC very frequently deals with
the `Type` type, which is a synonym of `TYPE 'LiftedRep`. This patch
teaches GHC to avoid expanding the `Type` synonym (and other nullary
type synonyms) during type comparisons, saving a good amount of work.
This optimisation is described in `Note [Comparing nullary type
synonyms]`.
To maximize the impact of this optimisation, we introduce a few
special-cases to reduce `TYPE 'LiftedRep` to `Type`. See
`Note [Prefer Type over TYPE 'LiftedPtrRep]`.
Closes #17958.
Metric Decrease:
T18698b
T1969
T12227
T12545
T12707
T14683
T3064
T5631
T5642
T9020
T9630
T9872a
T13035
haddock.Cabal
haddock.base
|
|
|
|
|
|
| |
This was inadvertently merged.
This reverts commit 7e9debd4ceb068effe8ac81892d2cabcb8f55850.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the compilation of programs GHC very frequently deals with
the `Type` type, which is a synonym of `TYPE 'LiftedRep`. This patch
teaches GHC to avoid expanding the `Type` synonym (and other nullary
type synonyms) during type comparisons, saving a good amount of work.
This optimisation is described in `Note [Comparing nullary type
synonyms]`.
To maximize the impact of this optimisation, we introduce a few
special-cases to reduce `TYPE 'LiftedRep` to `Type`. See
`Note [Prefer Type over TYPE 'LiftedPtrRep]`.
Closes #17958.
Metric Decrease:
T18698b
T1969
T12227
T12545
T12707
T14683
T3064
T5631
T5642
T9020
T9630
T9872a
T13035
haddock.Cabal
haddock.base
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As outlined in #18903, interleaving usage and strictness demands not
only means a more compact demand representation, but also allows us to
express demands that we weren't easily able to express before.
Call demands are *relative* in the sense that a call demand `Cn(cd)`
on `g` says "`g` is called `n` times. *Whenever `g` is called*, the
result is used according to `cd`". Example from #18903:
```hs
h :: Int -> Int
h m =
let g :: Int -> (Int,Int)
g 1 = (m, 0)
g n = (2 * n, 2 `div` n)
{-# NOINLINE g #-}
in case m of
1 -> 0
2 -> snd (g m)
_ -> uncurry (+) (g m)
```
Without the interleaved representation, we would just get `L` for the
strictness demand on `g`. Now we are able to express that whenever
`g` is called, its second component is used strictly in denoting `g`
by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the
division, for example.
Fixes #18903.
While fixing regressions, I also discovered and fixed #18957.
Metric Decrease:
T13253-spj
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ticket #18282 showed that the result discount given by conSize
was massively too large. This patch reduces that discount to
a constant 10, which just balances the cost of the constructor
application itself.
Note [Constructor size and result discount] elaborates, as
does the ticket #18282.
Reducing result discount reduces inlining, which affects perf. I
found that I could increase the unfoldingUseThrehold from 80 to 90 in
compensation; in combination with the result discount change I get
these overall nofib numbers:
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
boyer -0.2% +5.4% -3.2% -3.4% 0.0%
cichelli -0.1% +5.9% -11.2% -11.7% 0.0%
compress2 -0.2% +9.6% -6.0% -6.8% 0.0%
cryptarithm2 -0.1% -3.9% -6.0% -5.7% 0.0%
gamteb -0.2% +2.6% -13.8% -14.4% 0.0%
genfft -0.1% -1.6% -29.5% -29.9% 0.0%
gg -0.0% -2.2% -17.2% -17.8% -20.0%
life -0.1% -2.2% -62.3% -63.4% 0.0%
mate +0.0% +1.4% -5.1% -5.1% -14.3%
parser -0.2% -2.1% +7.4% +6.7% 0.0%
primetest -0.2% -12.8% -14.3% -14.2% 0.0%
puzzle -0.2% +2.1% -10.0% -10.4% 0.0%
rsa -0.2% -11.7% -3.7% -3.8% 0.0%
simple -0.2% +2.8% -36.7% -38.3% -2.2%
wheel-sieve2 -0.1% -19.2% -48.8% -49.2% -42.9%
--------------------------------------------------------------------------------
Min -0.4% -19.2% -62.3% -63.4% -42.9%
Max +0.3% +9.6% +7.4% +11.0% +16.7%
Geometric Mean -0.1% -0.3% -17.6% -18.0% -0.7%
I'm ok with these numbers, remembering that this change removes
an *exponential* increase in code size in some in-the-wild cases.
I investigated compress2. The difference is entirely caused by this
function no longer inlining
WriteRoutines.$woutputCodes
= \ (w :: [CodeEvent]) ->
let result_s1Sr
= case WriteRoutines.outputCodes_$s$woutput w 0# 0# 8# 9# of
(# ww1, ww2 #) -> (ww1, ww2)
in (# case result_s1Sr of (x, _) ->
map @Int @Char WriteRoutines.outputCodes1 x
, case result_s1Sr of { (_, y) -> y } #)
It was right on the cusp before, driven by the excessive result
discount. Too bad!
Happily, the compiler/perf tests show a number of improvements:
T12227 compiler bytes-alloc -6.6%
T12545 compiler bytes-alloc -4.7%
T13056 compiler bytes-alloc -3.3%
T15263 runtime bytes-alloc -13.1%
T17499 runtime bytes-alloc -14.3%
T3294 compiler bytes-alloc -1.1%
T5030 compiler bytes-alloc -11.7%
T9872a compiler bytes-alloc -2.0%
T9872b compiler bytes-alloc -1.2%
T9872c compiler bytes-alloc -1.5%
Metric Decrease:
T12227
T12545
T13056
T15263
T17499
T3294
T5030
T9872a
T9872b
T9872c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step towards implementation of the linear types proposal
(https://github.com/ghc-proposals/ghc-proposals/pull/111).
It features
* A language extension -XLinearTypes
* Syntax for linear functions in the surface language
* Linearity checking in Core Lint, enabled with -dlinear-core-lint
* Core-to-core passes are mostly compatible with linearity
* Fields in a data type can be linear or unrestricted; linear fields
have multiplicity-polymorphic constructors.
If -XLinearTypes is disabled, the GADT syntax defaults to linear fields
The following items are not yet supported:
* a # m -> b syntax (only prefix FUN is supported for now)
* Full multiplicity inference (multiplicities are really only checked)
* Decent linearity error messages
* Linear let, where, and case expressions in the surface language
(each of these currently introduce the unrestricted variant)
* Multiplicity-parametric fields
* Syntax for annotating lambda-bound or let-bound with a multiplicity
* Syntax for non-linear/multiple-field-multiplicity records
* Linear projections for records with a single linear field
* Linear pattern synonyms
* Multiplicity coercions (test LinearPolyType)
A high-level description can be found at
https://ghc.haskell.org/trac/ghc/wiki/LinearTypes/Implementation
Following the link above you will find a description of the changes made to Core.
This commit has been authored by
* Richard Eisenberg
* Krzysztof Gogolewski
* Matthew Pickering
* Arnaud Spiwack
With contributions from:
* Mark Barbone
* Alexander Vershilov
Updates haddock submodule.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead, look through expandable unfoldings in `cprTransform`.
See the new Note [CPR for expandable unfoldings]:
```
Long static data structures (whether top-level or not) like
xs = x1 : xs1
xs1 = x2 : xs2
xs2 = x3 : xs3
should not get CPR signatures, because they
* Never get WW'd, so their CPR signature should be irrelevant after analysis
(in fact the signature might even be harmful for that reason)
* Would need to be inlined/expanded to see their constructed product
* Recording CPR on them blows up interface file sizes and is redundant with
their unfolding. In case of Nested CPR, this blow-up can be quadratic!
But we can't just stop giving DataCon application bindings the CPR property,
for example
fac 0 = 1
fac n = n * fac (n-1)
fac certainly has the CPR property and should be WW'd! But FloatOut will
transform the first clause to
lvl = 1
fac 0 = lvl
If lvl doesn't have the CPR property, fac won't either. But lvl doesn't have a
CPR signature to extrapolate into a CPR transformer ('cprTransform'). So
instead we keep on cprAnal'ing through *expandable* unfoldings for these arity
0 bindings via 'cprExpandUnfolding_maybe'.
In practice, GHC generates a lot of (nested) TyCon and KindRep bindings, one
for each data declaration. It's wasteful to attach CPR signatures to each of
them (and intractable in case of Nested CPR).
```
Fixes #18154.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that DataCon wrappers don’t inline until phase 0 (see commit
b78cc64e923716ac0512c299f42d4d0012306c05), it’s important that
case-of-known-constructor and RULE matching be able to see saturated
applications of DataCon wrappers in unfoldings. Making them conlike is a
natural way to do it, since they are, in fact, precisely the sort of
thing the CONLIKE pragma exists to solve.
Fixes #18012.
This also bumps the version of the parsec submodule to incorporate a
patch that avoids a metric increase on the haddock perf tests. The
increase was not really a flaw in this patch, as parsec was implicitly
relying on inlining heuristics. The patch to parsec just adds some
INLINABLE pragmas, and we get a nice performance bump out of it (well
beyond the performance we lost from this patch).
Metric Decrease:
T12234
WWRec
haddock.Cabal
haddock.base
haddock.compiler
|
|
This fixes #18013 by adding INLINE pragmas to both Control.Category.>>>
and GHC.Desugar.>>>. The functional change in this patch is tiny (just
two lines of pragmas!), but an accompanying Note explains in gory
detail what’s going on.
|