| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Both sub-demands encode the same information.
This is a trivial change and already affects a few regression tests
(e.g. `T5075`), so no separate regression test is necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's useful to annotate a non-exported top-level function like `g` in
```hs
module Lib (h) where
g :: Int -> Int -> (Int,Int)
g m 1 = (m, 0)
g m n = (2 * m, 2 `div` n)
{-# NOINLINE g #-}
h :: Int -> Int
h 1 = 0
h m
| odd m = snd (g m 2)
| otherwise = uncurry (+) (g 2 m)
```
with its demand `UCU(CS(P(1P(U),SP(U))`, which tells us that whenever `g` was
called, the second component of the returned pair was evaluated strictly.
Since #18903 we do so for local functions, where we can see all calls.
For top-level functions, we can assume that all *exported* functions are
demanded according to `topDmd` and thus get sound demands for
non-exported top-level functions.
The demand on `g` is crucial information for Nested CPR, which may the
go on and unbox `g` for the second pair component. That is true even if
that pair component may diverge, as is the case for the call site `g 13
0`, which throws a div-by-zero exception.
In `T18894b`, you can even see the new demand annotation enabling us to
eta-expand a function that we wouldn't be able to eta-expand without
Call Arity.
We only track bindings of function type in order not to risk huge compile-time
regressions, see `isInterestingTopLevelFn`.
There was a CoreLint check that rejected strict demand annotations on
recursive or top-level bindings, which seems completely unjustified.
All the cases I investigated were fine, so I removed it.
Fixes #18894.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As outlined in #18903, interleaving usage and strictness demands not
only means a more compact demand representation, but also allows us to
express demands that we weren't easily able to express before.
Call demands are *relative* in the sense that a call demand `Cn(cd)`
on `g` says "`g` is called `n` times. *Whenever `g` is called*, the
result is used according to `cd`". Example from #18903:
```hs
h :: Int -> Int
h m =
let g :: Int -> (Int,Int)
g 1 = (m, 0)
g n = (2 * n, 2 `div` n)
{-# NOINLINE g #-}
in case m of
1 -> 0
2 -> snd (g m)
_ -> uncurry (+) (g m)
```
Without the interleaved representation, we would just get `L` for the
strictness demand on `g`. Now we are able to express that whenever
`g` is called, its second component is used strictly in denoting `g`
by `1C1(P(1P(U),SP(U)))`. This would allow Nested CPR to unbox the
division, for example.
Fixes #18903.
While fixing regressions, I also discovered and fixed #18957.
Metric Decrease:
T13253-spj
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #18793, we saw a compelling example which requires us to look at
non-recursive let-bindings during arity analysis and unleash their arity
types at use sites.
After the refactoring in the previous patch, the needed change is quite
simple and very local to `arityType`'s defn for non-recurisve `Let`.
Apart from that, we had to get rid of the second item of
`Note [Dealing with bottoms]`, which was entirely a safety measure and
hindered optimistic fixed-point iteration.
Fixes #18793.
The following metric increases are all caused by this commit and a
result of the fact that we just do more work now:
Metric Increase:
T3294
T12545
T12707
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two signficant changes here:
* Ticket #18815 showed that we were missing some opportunities for
preInlineUnconditionally. The one-line fix is in the code for
GHC.Core.Opt.Simplify.Utils.preInlineUnconditionally, which now
switches off only for INLINE pragmas. I expanded
Note [Stable unfoldings and preInlineUnconditionally] to explain.
* When doing this I discovered a way in which preInlineUnconditionally
was occasionally /too/ eager. It's all explained in
Note [Occurrences in stable unfoldings] in GHC.Core.Opt.OccurAnal,
and the one-line change adding markAllMany to occAnalUnfolding.
I also got confused about what NoUserInline meant, so I've renamed
it to NoUserInlinePrag, and changed its pretty-printing slightly.
That led to soem error messate wibbling, and touches quite a few
files, but there is no change in functionality.
I did a nofib run. As expected, no significant changes.
Program Size Allocs
----------------------------------------
sphere -0.0% -0.4%
----------------------------------------
Min -0.0% -0.4%
Max -0.0% +0.0%
Geometric Mean -0.0% -0.0%
I'm allowing a max-residency increase for T10370, which seems
very irreproducible. (See comments on !4241.) There is always
sampling error for max-residency measurements; and in any case
the change shows up on some platforms but not others.
Metric Increase:
T10370
|
|
|
|
|
|
| |
Implements GHC Proposal #356
Updates the haddock submodule.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes #18223, which made GHC generate an exponential
amount of code. There are three quite separate changes in here
1. Re-engineer eta-expansion (again). The eta-expander was
generating lots of intermediate stuff, which could be optimised
away, but which choked the simplifier meanwhile. Relatively
easy to kill it off at source.
See Note [The EtaInfo mechanism] in GHC.Core.Opt.Arity.
The main new thing is the use of pushCoArg in getArg_maybe.
2. Stop Specialise specalising DFuns. This is the cause of a huge
(and utterly unnecessary) blowup in program size in #18223.
See Note [Do not specialise DFuns] in GHC.Core.Opt.Specialise.
I also refactored the Specialise monad a bit... it was silly,
because it passed on unchanging values as if they were mutable
state.
3. Do an extra Simplifer run, after SpecConstra and before
late-Specialise. I found (investigating perf/compiler/T16473)
that failing to do this was crippling *both* SpecConstr *and*
Specialise. See Note [Simplify after SpecConstr] in
GHC.Core.Opt.Pipeline.
This change does mean an extra run of the Simplifier, but only
with -O2, and I think that's acceptable.
T16473 allocates *three* times less with this change. (I changed
it to check runtime rather than compile time.)
Some smaller consequences
* I moved pushCoercion, pushCoArg and friends from SimpleOpt
to Arity, because it was needed by the new etaInfoApp.
And pushCoValArg now returns a MCoercion rather than Coercion for
the argument Coercion.
* A minor, incidental improvement to Core pretty-printing
This does fix #18223, (which was otherwise uncompilable. Hooray. But
there is still a big intermediate because there are some very deeply
nested types in that program.
Modest reductions in compile-time allocation on a couple of benchmarks
T12425 -2.0%
T13253 -10.3%
Metric increase with -O2, due to extra simplifier run
T9233 +5.8%
T12227 +1.8%
T15630 +5.0%
There is a spurious apparent increase on heap residency on T9630,
on some architectures at least. I tried it with -G1 and the residency
is essentially unchanged.
Metric Increase
T9233
T12227
T9630
Metric Decrease
T12425
T13253
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically:
#13253 exponential inlining
#10421 ditto
#18140 strict constructors
#18282 another nested-function call case
This patch makes one really significant changes: change the way that
mkDupableCont handles StrictArg. The details are explained in
GHC.Core.Opt.Simplify Note [Duplicating StrictArg].
Specific changes
* In mkDupableCont, when making auxiliary bindings for the other arguments
of a call, add extra plumbing so that we don't forget the demand on them.
Otherwise we haev to wait for another round of strictness analysis. But
actually all the info is to hand. This change affects:
- Make the strictness list in ArgInfo be [Demand] instead of [Bool],
and rename it to ai_dmds.
- Add as_dmd to ValArg
- Simplify.makeTrivial takes a Demand
- mkDupableContWithDmds takes a [Demand]
There are a number of other small changes
1. For Ids that are used at most once in each branch of a case, make
the occurrence analyser record the total number of syntactic
occurrences. Previously we recorded just OneBranch or
MultipleBranches.
I thought this was going to be useful, but I ended up barely
using it; see Note [Note [Suppress exponential blowup] in
GHC.Core.Opt.Simplify.Utils
Actual changes:
* See the occ_n_br field of OneOcc.
* postInlineUnconditionally
2. I found a small perf buglet in SetLevels; see the new
function GHC.Core.Opt.SetLevels.hasFreeJoin
3. Remove the sc_cci field of StrictArg. I found I could get
its information from the sc_fun field instead. Less to get
wrong!
4. In ArgInfo, arrange that ai_dmds and ai_discs have a simpler
invariant: they line up with the value arguments beyond ai_args
This allowed a bit of nice refactoring; see isStrictArgInfo,
lazyArgcontext, strictArgContext
There is virtually no difference in nofib. (The runtime numbers
are bogus -- I tried a few manually.)
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
fft +0.0% -2.0% -48.3% -49.4% 0.0%
multiplier +0.0% -2.2% -50.3% -50.9% 0.0%
--------------------------------------------------------------------------------
Min -0.4% -2.2% -59.2% -60.4% 0.0%
Max +0.0% +0.1% +3.3% +4.9% 0.0%
Geometric Mean +0.0% -0.0% -33.2% -34.3% -0.0%
Test T18282 is an existing example of these deeply-nested strict calls.
We get a big decrease in compile time (-85%) because so much less
inlining takes place.
Metric Decrease:
T18282
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* support detection of slow ghc-bignum backend (to replace the detection
of integer-simple use). There are still some test cases that the
native backend doesn't handle efficiently enough.
* remove tests for GMP only functions that have been removed from
ghc-bignum
* fix test results showing dependent packages (e.g. integer-gmp) or
showing suggested instances
* fix test using Integer/Natural API or showing internal names
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step towards implementation of the linear types proposal
(https://github.com/ghc-proposals/ghc-proposals/pull/111).
It features
* A language extension -XLinearTypes
* Syntax for linear functions in the surface language
* Linearity checking in Core Lint, enabled with -dlinear-core-lint
* Core-to-core passes are mostly compatible with linearity
* Fields in a data type can be linear or unrestricted; linear fields
have multiplicity-polymorphic constructors.
If -XLinearTypes is disabled, the GADT syntax defaults to linear fields
The following items are not yet supported:
* a # m -> b syntax (only prefix FUN is supported for now)
* Full multiplicity inference (multiplicities are really only checked)
* Decent linearity error messages
* Linear let, where, and case expressions in the surface language
(each of these currently introduce the unrestricted variant)
* Multiplicity-parametric fields
* Syntax for annotating lambda-bound or let-bound with a multiplicity
* Syntax for non-linear/multiple-field-multiplicity records
* Linear projections for records with a single linear field
* Linear pattern synonyms
* Multiplicity coercions (test LinearPolyType)
A high-level description can be found at
https://ghc.haskell.org/trac/ghc/wiki/LinearTypes/Implementation
Following the link above you will find a description of the changes made to Core.
This commit has been authored by
* Richard Eisenberg
* Krzysztof Gogolewski
* Matthew Pickering
* Arnaud Spiwack
With contributions from:
* Mark Barbone
* Alexander Vershilov
Updates haddock submodule.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cast worker/wrapper transformation transforms
x = e |> co
into
y = e
x = y |> co
This is done by the simplifier, but we were being
careless about transferring IdInfo from x to y,
and about what to do if x is a NOINLNE function.
This resulted in a series of bugs:
#17673, #18093, #18078.
This patch fixes all that:
* Main change is in GHC.Core.Opt.Simplify, and
the new prepareBinding function, which does this
cast worker/wrapper transform.
See Note [Cast worker/wrappers].
* There is quite a bit of refactoring around
prepareRhs, makeTrivial etc. It's nicer now.
* Some wrappers from strictness and cast w/w, notably those for
a function with a NOINLINE, should inline very late. There
wasn't really a mechanism for that, which was an existing bug
really; so I invented a new finalPhase = Phase (-1). It's used
for all simplifier runs after the user-visible phase 2,1,0 have
run. (No new runs of the simplifier are introduced thereby.)
See new Note [Compiler phases] in GHC.Types.Basic;
the main changes are in GHC.Core.Opt.Driver
* Doing this made me trip over two places where the AnonArgFlag on a
FunTy was being lost so we could end up with (Num a -> ty)
rather than (Num a => ty)
- In coercionLKind/coercionRKind
- In contHoleType in the Simplifier
I fixed the former by defining mkFunctionType and using it in
coercionLKind/RKind.
I could have done the same for the latter, but the information
is almost to hand. So I fixed the latter by
- adding sc_hole_ty to ApplyToVal (like ApplyToTy),
- adding as_hole_ty to ValArg (like TyArg)
- adding sc_fun_ty to StrictArg
Turned out I could then remove ai_type from ArgInfo. This is
just moving the deck chairs around, but it worked out nicely.
See the new Note [AnonArgFlag] in GHC.Types.Var
* When looking at the 'arity decrease' thing (#18093) I discovered
that stable unfoldings had a much lower arity than the actual
optimised function. That's what led to the arity-decrease
message. Simple solution: eta-expand.
It's described in Note [Eta-expand stable unfoldings]
in GHC.Core.Opt.Simplify
* I also discovered that unsafeCoerce wasn't being inlined if
the context was boring. So (\x. f (unsafeCoerce x)) would
create a thunk -- yikes! I fixed that by making inlineBoringOK
a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold.
I also found that unsafeCoerceName was unused, so I removed it.
I made a test case for #18078, and a very similar one for #17673.
The net effect of all this on nofib is very modest, but positive:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
anna -0.4% -0.1% -3.1% -3.1% 0.0%
fannkuch-redux -0.4% -0.3% -0.1% -0.1% 0.0%
maillist -0.4% -0.1% -7.8% -1.0% -14.3%
primetest -0.4% -15.6% -7.1% -6.6% 0.0%
--------------------------------------------------------------------------------
Min -0.9% -15.6% -13.3% -14.2% -14.3%
Max -0.3% 0.0% +12.1% +12.4% 0.0%
Geometric Mean -0.4% -0.2% -2.3% -2.2% -0.1%
All following metric decreases are compile-time allocation decreases
between -1% and -3%:
Metric Decrease:
T5631
T13701
T14697
T15164
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider
```hs
m :: IO ()
m = do
putStrLn "foo"
error "bar"
```
`m` (from #18086) always throws a (precise or imprecise) exception or
diverges. Yet demand analysis infers `<L,A>` as demand signature instead
of `<L,A>x` for it.
That's because the demand analyser sees `putStrLn` occuring in a case
scrutinee and decides that it has to `deferAfterPreciseException`,
because `putStrLn` throws a precise exception on some control flow
paths. This will mask the `botDiv` `Divergence`of the single case alt
containing `error` to `topDiv`. Since `putStrLn` has `topDiv` itself,
the final `Divergence` is `topDiv`.
This is easily fixed: `deferAfterPreciseException` works by `lub`ing
with the demand type of a virtual case branch denoting the precise
exceptional control flow. We used `nopDmdType` before, but we can be
more precise and use `exnDmdType`, which is `nopDmdType` with `exnDiv`.
Now the `Divergence` from the case alt will degrade `botDiv` to `exnDiv`
instead of `topDiv`, which combines with the result from the scrutinee
to `exnDiv`, and all is well.
Fixes #18086.
|
|
|
|
|
|
|
|
|
|
| |
We should allow a wrapper with up to 82 parameters when the original
function had 82 parameters to begin with.
I verified that this made no difference on NoFib, but then again
it doesn't use huge records...
Fixes #18122.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does two things: Fix possible unsoundness in what was called
the "IO hack" and implement part 2.1 of the "fixing precise exceptions"
plan in
https://gitlab.haskell.org/ghc/ghc/wikis/fixing-precise-exceptions,
which, in combination with !2956, supersedes !3014 and !2525.
**IO hack**
The "IO hack" (which is a fallback to preserve precise exceptions
semantics and thus soundness, rather than some smart thing that
increases precision) is called `exprMayThrowPreciseException` now.
I came up with two testcases exemplifying possible unsoundness (if
twisted enough) in the old approach:
- `T13380d`: Demonstrating unsoundness of the "IO hack" when resorting
to manual state token threading and direct use of primops.
More details below.
- `T13380e`: Demonstrating unsoundness of the "IO hack" when we have
Nested CPR. Not currently relevant, as we don't have Nested
CPR yet.
- `T13380f`: Demonstrating unsoundness of the "IO hack" for safe FFI
calls.
Basically, the IO hack assumed that precise exceptions can only be
thrown from a case scrutinee of type `(# State# RealWorld, _ #)`. I
couldn't come up with a program using the `IO` abstraction that violates
this assumption. But it's easy to do so via manual state token threading
and direct use of primops, see `T13380d`. Also similar code might be
generated by Nested CPR in the (hopefully not too) distant future, see
`T13380e`. Hence, we now have a more careful test in `forcesRealWorld`
that passes `T13380{d,e}` (and will hopefully be robust to Nested CPR).
**Precise exceptions**
In #13380 and #17676 we saw that we didn't preserve precise exception
semantics in demand analysis. We fixed that with minimal changes in
!2956, but that was terribly unprincipled.
That unprincipledness resulted in a loss of precision, which is tracked
by these new test cases:
- `T13380b`: Regression in dead code elimination, because !2956 was too
syntactic about `raiseIO#`
- `T13380c`: No need to apply the "IO hack" when the IO action may not
throw a precise exception (and the existing IO hack doesn't
detect that)
Fixing both issues in !3014 turned out to be too complicated and had
the potential to regress in the future. Hence we decided to only fix
`T13380b` and augment the `Divergence` lattice with a new middle-layer
element, `ExnOrDiv`, which means either `Diverges` (, throws an
imprecise exception) or throws a *precise* exception.
See the wiki page on Step 2.1 for more implementational details:
https://gitlab.haskell.org/ghc/ghc/wikis/fixing-precise-exceptions#dead-code-elimination-for-raiseio-with-isdeadenddiv-introducing-exnordiv-step-21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead, look through expandable unfoldings in `cprTransform`.
See the new Note [CPR for expandable unfoldings]:
```
Long static data structures (whether top-level or not) like
xs = x1 : xs1
xs1 = x2 : xs2
xs2 = x3 : xs3
should not get CPR signatures, because they
* Never get WW'd, so their CPR signature should be irrelevant after analysis
(in fact the signature might even be harmful for that reason)
* Would need to be inlined/expanded to see their constructed product
* Recording CPR on them blows up interface file sizes and is redundant with
their unfolding. In case of Nested CPR, this blow-up can be quadratic!
But we can't just stop giving DataCon application bindings the CPR property,
for example
fac 0 = 1
fac n = n * fac (n-1)
fac certainly has the CPR property and should be WW'd! But FloatOut will
transform the first clause to
lvl = 1
fac 0 = lvl
If lvl doesn't have the CPR property, fac won't either. But lvl doesn't have a
CPR signature to extrapolate into a CPR transformer ('cprTransform'). So
instead we keep on cprAnal'ing through *expandable* unfoldings for these arity
0 bindings via 'cprExpandUnfolding_maybe'.
In practice, GHC generates a lot of (nested) TyCon and KindRep bindings, one
for each data declaration. It's wasteful to attach CPR signatures to each of
them (and intractable in case of Nested CPR).
```
Fixes #18154.
|
|
|
|
|
|
|
| |
Introduce GHC.Unit.* hierarchy for everything concerning units, packages
and modules.
Update Haddock submodule
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that DataCon wrappers don’t inline until phase 0 (see commit
b78cc64e923716ac0512c299f42d4d0012306c05), it’s important that
case-of-known-constructor and RULE matching be able to see saturated
applications of DataCon wrappers in unfoldings. Making them conlike is a
natural way to do it, since they are, in fact, precisely the sort of
thing the CONLIKE pragma exists to solve.
Fixes #18012.
This also bumps the version of the parsec submodule to incorporate a
patch that avoids a metric increase on the haddock perf tests. The
increase was not really a flaw in this patch, as parsec was implicitly
relying on inlining heuristics. The patch to parsec just adds some
INLINABLE pragmas, and we get a nice performance bump out of it (well
beyond the performance we lost from this patch).
Metric Decrease:
T12234
WWRec
haddock.Cabal
haddock.base
haddock.compiler
|
|
|
|
|
|
|
|
|
|
|
| |
* GHC.Core.Op => GHC.Core.Opt
* GHC.Core.Opt.Simplify.Driver => GHC.Core.Opt.Driver
* GHC.Core.Opt.Tidy => GHC.Core.Tidy
* GHC.Core.Opt.WorkWrap.Lib => GHC.Core.Opt.WorkWrap.Utils
As discussed in:
* https://mail.haskell.org/pipermail/ghc-devs/2020-April/018758.html
* https://gitlab.haskell.org/ghc/ghc/issues/13009#note_264650
|
|
|
|
|
|
|
|
|
|
|
| |
Fix #13380 and #17676 by
1. Changing `raiseIO#` to have `topDiv` instead of `botDiv`
2. Give it special treatment in `Simplifier.Util.mkArgInfo`, treating it
as if it still had `botDiv`, to recover dead code elimination.
This is the first commit of the plan outlined in
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/2525#note_260886.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ticket #17932 showed that we were using a stupid demand for the RHS
of a let-binding, when the result is a product. This was the result
of a "fix" in 2013, which (happily) turns out to no longer be
necessary.
So I just deleted the code, which simplifies the demand analyser,
and fixes #17932. That in turn uncovered that the anticipation
of worker/wrapper in CPR analysis was inaccurate, hence the logic
that decides whether to unbox an argument in WW was extracted into
a function `wantToUnbox`, now consulted by CPR analysis.
I tried nofib, and got 0.0% perf changes.
All this came up when messing about with !2873 (ticket #17917),
but is idependent of it.
Unfortunately, this patch regresses #4267 and realised that it is now
blocked on #16335.
|
| |
|
|
|
|
| |
fixes #17852
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now always show "forall {a}. T" for inferred variables,
previously this was controlled by -fprint-explicit-foralls.
This implements part 1 of https://github.com/ghc-proposals/ghc-proposals/pull/179.
Part of GHC ticket #16320.
Furthermore, when printing a levity restriction error, we now display
the HsWrap of the expression. This lets users see the full elaboration with
-fprint-typechecker-elaboration (see also #17670)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reasons for that can be found in the wiki:
https://gitlab.haskell.org/ghc/ghc/wikis/nested-cpr/split-off-cpr
We now run CPR after demand analysis (except for after the final demand
analysis run just before code gen). CPR got its own dump flags
(`-ddump-cpr-anal`, `-ddump-cpr-signatures`), but not its own flag to
activate/deactivate. It will run with `-fstrictness`/`-fworker-wrapper`.
As explained on the wiki page, this step is necessary for a sane Nested
CPR analysis. And it has quite positive impact on compiler performance:
Metric Decrease:
T9233
T9675
T9961
T15263
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes all CafInfo predictions and various hacks to preserve
predicted CafInfos from the compiler and assigns final CafInfos to
interface Ids after code generation. SRT analysis is extended to support
static data, and Cmm generator is modified to allow generating
static_link fields after SRT analysis.
This also fixes `-fcatch-bottoms`, which introduces error calls in case
expressions in CorePrep, which runs *after* CoreTidy (which is where we
decide on CafInfos) and turns previously non-CAFFY things into CAFFY.
Fixes #17648
Fixes #9718
Evaluation
==========
NoFib
-----
Boot with: `make boot mode=fast`
Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1`
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS -0.0% 0.0% -0.0% -0.0% -0.0%
CSD -0.0% 0.0% -0.0% -0.0% -0.0%
FS -0.0% 0.0% -0.0% -0.0% -0.0%
S -0.0% 0.0% -0.0% -0.0% -0.0%
VS -0.0% 0.0% -0.0% -0.0% -0.0%
VSD -0.0% 0.0% -0.0% -0.0% -0.5%
VSM -0.0% 0.0% -0.0% -0.0% -0.0%
anna -0.1% 0.0% -0.0% -0.0% -0.0%
ansi -0.0% 0.0% -0.0% -0.0% -0.0%
atom -0.0% 0.0% -0.0% -0.0% -0.0%
awards -0.0% 0.0% -0.0% -0.0% -0.0%
banner -0.0% 0.0% -0.0% -0.0% -0.0%
bernouilli -0.0% 0.0% -0.0% -0.0% -0.0%
binary-trees -0.0% 0.0% -0.0% -0.0% -0.0%
boyer -0.0% 0.0% -0.0% -0.0% -0.0%
boyer2 -0.0% 0.0% -0.0% -0.0% -0.0%
bspt -0.0% 0.0% -0.0% -0.0% -0.0%
cacheprof -0.0% 0.0% -0.0% -0.0% -0.0%
calendar -0.0% 0.0% -0.0% -0.0% -0.0%
cichelli -0.0% 0.0% -0.0% -0.0% -0.0%
circsim -0.0% 0.0% -0.0% -0.0% -0.0%
clausify -0.0% 0.0% -0.0% -0.0% -0.0%
comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0%
compress -0.0% 0.0% -0.0% -0.0% -0.0%
compress2 -0.0% 0.0% -0.0% -0.0% -0.0%
constraints -0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0%
cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0%
cse -0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0%
digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0%
dom-lt -0.0% 0.0% -0.0% -0.0% -0.0%
eliza -0.0% 0.0% -0.0% -0.0% -0.0%
event -0.0% 0.0% -0.0% -0.0% -0.0%
exact-reals -0.0% 0.0% -0.0% -0.0% -0.0%
exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0%
expert -0.0% 0.0% -0.0% -0.0% -0.0%
fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0%
fasta -0.0% 0.0% -0.0% -0.0% -0.0%
fem -0.0% 0.0% -0.0% -0.0% -0.0%
fft -0.0% 0.0% -0.0% -0.0% -0.0%
fft2 -0.0% 0.0% -0.0% -0.0% -0.0%
fibheaps -0.0% 0.0% -0.0% -0.0% -0.0%
fish -0.0% 0.0% -0.0% -0.0% -0.0%
fluid -0.1% 0.0% -0.0% -0.0% -0.0%
fulsom -0.0% 0.0% -0.0% -0.0% -0.0%
gamteb -0.0% 0.0% -0.0% -0.0% -0.0%
gcd -0.0% 0.0% -0.0% -0.0% -0.0%
gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0%
genfft -0.0% 0.0% -0.0% -0.0% -0.0%
gg -0.0% 0.0% -0.0% -0.0% -0.0%
grep -0.0% 0.0% -0.0% -0.0% -0.0%
hidden -0.0% 0.0% -0.0% -0.0% -0.0%
hpg -0.1% 0.0% -0.0% -0.0% -0.0%
ida -0.0% 0.0% -0.0% -0.0% -0.0%
infer -0.0% 0.0% -0.0% -0.0% -0.0%
integer -0.0% 0.0% -0.0% -0.0% -0.0%
integrate -0.0% 0.0% -0.0% -0.0% -0.0%
k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0%
kahan -0.0% 0.0% -0.0% -0.0% -0.0%
knights -0.0% 0.0% -0.0% -0.0% -0.0%
lambda -0.0% 0.0% -0.0% -0.0% -0.0%
last-piece -0.0% 0.0% -0.0% -0.0% -0.0%
lcss -0.0% 0.0% -0.0% -0.0% -0.0%
life -0.0% 0.0% -0.0% -0.0% -0.0%
lift -0.0% 0.0% -0.0% -0.0% -0.0%
linear -0.1% 0.0% -0.0% -0.0% -0.0%
listcompr -0.0% 0.0% -0.0% -0.0% -0.0%
listcopy -0.0% 0.0% -0.0% -0.0% -0.0%
maillist -0.0% 0.0% -0.0% -0.0% -0.0%
mandel -0.0% 0.0% -0.0% -0.0% -0.0%
mandel2 -0.0% 0.0% -0.0% -0.0% -0.0%
mate -0.0% 0.0% -0.0% -0.0% -0.0%
minimax -0.0% 0.0% -0.0% -0.0% -0.0%
mkhprog -0.0% 0.0% -0.0% -0.0% -0.0%
multiplier -0.0% 0.0% -0.0% -0.0% -0.0%
n-body -0.0% 0.0% -0.0% -0.0% -0.0%
nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0%
para -0.0% 0.0% -0.0% -0.0% -0.0%
paraffins -0.0% 0.0% -0.0% -0.0% -0.0%
parser -0.1% 0.0% -0.0% -0.0% -0.0%
parstof -0.1% 0.0% -0.0% -0.0% -0.0%
pic -0.0% 0.0% -0.0% -0.0% -0.0%
pidigits -0.0% 0.0% -0.0% -0.0% -0.0%
power -0.0% 0.0% -0.0% -0.0% -0.0%
pretty -0.0% 0.0% -0.3% -0.4% -0.4%
primes -0.0% 0.0% -0.0% -0.0% -0.0%
primetest -0.0% 0.0% -0.0% -0.0% -0.0%
prolog -0.0% 0.0% -0.0% -0.0% -0.0%
puzzle -0.0% 0.0% -0.0% -0.0% -0.0%
queens -0.0% 0.0% -0.0% -0.0% -0.0%
reptile -0.0% 0.0% -0.0% -0.0% -0.0%
reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0%
rewrite -0.0% 0.0% -0.0% -0.0% -0.0%
rfib -0.0% 0.0% -0.0% -0.0% -0.0%
rsa -0.0% 0.0% -0.0% -0.0% -0.0%
scc -0.0% 0.0% -0.3% -0.5% -0.4%
sched -0.0% 0.0% -0.0% -0.0% -0.0%
scs -0.0% 0.0% -0.0% -0.0% -0.0%
simple -0.1% 0.0% -0.0% -0.0% -0.0%
solid -0.0% 0.0% -0.0% -0.0% -0.0%
sorting -0.0% 0.0% -0.0% -0.0% -0.0%
spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0%
sphere -0.0% 0.0% -0.0% -0.0% -0.0%
symalg -0.0% 0.0% -0.0% -0.0% -0.0%
tak -0.0% 0.0% -0.0% -0.0% -0.0%
transform -0.0% 0.0% -0.0% -0.0% -0.0%
treejoin -0.0% 0.0% -0.0% -0.0% -0.0%
typecheck -0.0% 0.0% -0.0% -0.0% -0.0%
veritas -0.0% 0.0% -0.0% -0.0% -0.0%
wang -0.0% 0.0% -0.0% -0.0% -0.0%
wave4main -0.0% 0.0% -0.0% -0.0% -0.0%
wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0%
wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0%
x2n1 -0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -0.3% -0.5% -0.5%
Max -0.0% 0.0% -0.0% -0.0% -0.0%
Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
circsim -0.1% 0.0% -0.0% -0.0% -0.0%
constraints -0.0% 0.0% -0.0% -0.0% -0.0%
fibheaps -0.0% 0.0% -0.0% -0.0% -0.0%
gc_bench -0.0% 0.0% -0.0% -0.0% -0.0%
hash -0.0% 0.0% -0.0% -0.0% -0.0%
lcss -0.0% 0.0% -0.0% -0.0% -0.0%
power -0.0% 0.0% -0.0% -0.0% -0.0%
spellcheck -0.0% 0.0% -0.0% -0.0% -0.0%
--------------------------------------------------------------------------------
Min -0.1% 0.0% -0.0% -0.0% -0.0%
Max -0.0% 0.0% -0.0% -0.0% -0.0%
Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0%
Manual inspection of programs in testsuite/tests/programs
---------------------------------------------------------
I built these programs with a bunch of dump flags and `-O` and compared
STG, Cmm, and Asm dumps and file sizes.
(Below the numbers in parenthesis show number of modules in the program)
These programs have identical compiler (same .hi and .o sizes, STG, and
Cmm and Asm dumps):
- Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3),
andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2),
jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4),
jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1),
bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1),
strict_anns (1), thurston-module-arith (2), okeefe_neural (1),
joao-circular (6), 10queens (1)
Programs with different compiler outputs:
- jl_defaults (1): For some reason GHC HEAD marks a lot of top-level
`[Int]` closures as CAFFY for no reason. With this patch we no longer
make them CAFFY and generate less SRT entries. For some reason Main.o
is slightly larger with this patch (1.3%) and the executable sizes are
the same. (I'd expect both to be smaller)
- launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked
as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the
executable sizes are the same.
- galois_raytrace (13): Differences are in the Parse module. There are a
lot, but some of the changes are caused by the fact that for some
reason (I think a bug) GHC HEAD marks the dictionary for `Functor
Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the
same.
- north_array: We now generate less SRT entries because some of array
primops used in this program like `NewArrayOp` get eliminated during
Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24%
larger (9224 bytes from 9000 bytes), executable sizes are the same.
- seward-space-leak: Difference in this program is better shown by this
smaller example:
module Lib where
data CDS
= Case [CDS] [(Int, CDS)]
| Call CDS CDS
instance Eq CDS where
Case sels1 rets1 == Case sels2 rets2 =
sels1 == sels2 && rets1 == rets2
Call a1 b1 == Call a2 b2 =
a1 == a2 && b1 == b2
_ == _ =
False
In this program GHC HEAD builds a new SRT for the recursive group of
`(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==`
in its SRT field, and `==` uses the SRT object as its SRT. With this
patch we use the closure for `/=` as the SRT and add `==` there. Then
`/=` gets an empty SRT field and `==` points to `/=` in its SRT
field.
This change looks fine to me.
Main.o gets 0.07% larger, executable sizes are identical.
head.hackage
------------
head.hackage's CI script builds 428 packages from Hackage using this
patch with no failures.
Compiler performance
--------------------
The compiler perf tests report that the compiler allocates slightly more
(worst case observed so far is 4%). However most programs in the test
suite are small, single file programs. To benchmark compiler performance
on something more realistic I build Cabal (the library, 236 modules)
with different optimisation levels. For the "max residency" row I run
GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows
are generated with just `-s`. (This is because `-i0` causes running GC
much more frequently and as a result "bytes copied" gets inflated by
more than 25x in some cases)
* -O0
| | GHC HEAD | This MR | Diff |
| --------------- | -------------- | -------------- | ------ |
| Bytes allocated | 54,413,350,872 | 54,701,099,464 | +0.52% |
| Bytes copied | 4,926,037,184 | 4,990,638,760 | +1.31% |
| Max residency | 421,225,624 | 424,324,264 | +0.73% |
* -O1
| | GHC HEAD | This MR | Diff |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 245,849,209,992 | 246,562,088,672 | +0.28% |
| Bytes copied | 26,943,452,560 | 27,089,972,296 | +0.54% |
| Max residency | 982,643,440 | 991,663,432 | +0.91% |
* -O2
| | GHC HEAD | This MR | Diff |
| --------------- | --------------- | --------------- | ------ |
| Bytes allocated | 291,044,511,408 | 291,863,910,912 | +0.28% |
| Bytes copied | 37,044,237,616 | 36,121,690,472 | -2.49% |
| Max residency | 1,071,600,328 | 1,086,396,256 | +1.38% |
Extra compiler allocations
--------------------------
Runtime allocations of programs are as reported above (NoFib section).
The compiler now allocates more than before. Main source of allocation
in this patch compared to base commit is the new SRT algorithm
(GHC.Cmm.Info.Build). Below is some of the extra work we do with this
patch, numbers generated by profiled stage 2 compiler when building a
pathological case (the test 'ManyConstructors') with '-O2':
- We now sort the final STG for a module, which means traversing the
entire program, generating free variable set for each top-level
binding, doing SCC analysis, and re-ordering the program. In
ManyConstructors this step allocates 97,889,952 bytes.
- We now do SRT analysis on static data, which in a program like
ManyConstructors causes analysing 10,000 bindings that we would
previously just skip. This step allocates 70,898,352 bytes.
- We now maintain an SRT map for the entire module as we compile Cmm
groups:
data ModuleSRTInfo = ModuleSRTInfo
{ ...
, moduleSRTMap :: SRTMap
}
(SRTMap is just a strict Map from the 'containers' library)
This map gets an entry for most bindings in a module (exceptions are
THUNKs and CAFFY static functions). For ManyConstructors this map
gets 50015 entries.
- Once we're done with code generation we generate a NameSet from SRTMap
for the non-CAFFY names in the current module. This set gets the same
number of entries as the SRTMap.
- Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using
the NameSet generated in the previous step. This usually does the
least amount of allocation among the work listed here.
Only place with this patch where we do less work in the CAF analysis in
the tidying pass (CoreTidy). However that doesn't save us much, as the
pass still needs to traverse the whole program and update IdInfos for
other reasons. Only thing we don't here do is the `hasCafRefs` pass over
the RHS of bindings, which is a stateless pass that returns a boolean
value, so it doesn't allocate much.
(Metric changes blow are all increased allocations)
Metric changes
--------------
Metric Increase:
ManyAlternatives
ManyConstructors
T13035
T14683
T1969
T9961
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously `makefile_test` and `run_command` tests could easily end up
in a situation where they wouldn't be run if the user used the
`only_ways` modifier. The reason is to build the set of a ways to run
the test in we first start with a candidate set determined by the test
type (e.g. `makefile_test`, `compile_run`, etc.) and then filter that
set with the constraints given by the test's modifiers.
`makefile_test` and `run_command` tests' candidate sets were simply
`{normal}`, and consequently most uses of `only_ways` would result in
the test being never run.
To avoid this we rather use all ways as the candidate sets for these
test types. This may result in a few more testcases than we would like
(given that some `run_command` tests are insensitive to way) but this
can be fixed by adding modifiers and we would much rather run too many
tests than too few.
This fixes #16042 and a number of other tests afflicted by the same issue.
However, there were a few cases that required special attention:
* `T14028` is currently failing and is therefore marked as broken due
to #17300
* `T-signals-child` is fragile in the `threaded1` and `threaded2` ways
(tracked in #17307)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does four things:
1. Look at `idArity` instead of manifest lambdas to decide whether to use LetUp
2. Compute the strictness signature in LetDown assuming at least `idArity`
incoming arguments
3. Remove the special case for trivial RHSs, which is subsumed by 2
4. Don't perform the W/W split when doing so would eta expand a binding.
Otherwise we would eta expand PAPs, causing unnecessary churn in the
Simplifier.
NoFib Results
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
fannkuch-redux +0.3% 0.0%
gg -0.0% -0.1%
maillist +0.2% +0.2%
minimax 0.0% +0.8%
pretty 0.0% -0.1%
reptile -0.0% -1.2%
--------------------------------------------------------------------------------
Min -0.0% -1.2%
Max +0.3% +0.8%
Geometric Mean +0.0% -0.0%
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trac #10069 revealed that small NOINLINE functions didn't get split
into worker and wrapper. This was due to `certainlyWillInline`
saying that any unfoldings with a guidance of `UnfWhen` inline
unconditionally. That isn't the case for NOINLINE functions, so we
catch this case earlier now.
Nofib results:
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
fannkuch-redux -0.3% 0.0%
gg +0.0% +0.1%
maillist -0.2% -0.2%
minimax 0.0% -0.8%
--------------------------------------------------------------------------------
Min -0.3% -0.8%
Max +0.0% +0.1%
Geometric Mean -0.0% -0.0%
Fixes #10069.
-------------------------
Metric Increase:
T9233
-------------------------
|
| |
|
|
|
|
|
| |
This eliminates most uses of run_command in the testsuite in favor of the more
structured makefile_test.
|
|
|
|
| |
This reverts commit 76c8fd674435a652c75a96c85abbf26f1f221876.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch collects a few improvements triggered by Trac #15696,
and fixing Trac #16029
* Stop making toCleanDmd behave specially for unlifted types.
This special case was the cause of stupid behaviour in Trac
#16029. And to my joy I discovered the let/app invariant
rendered it unnecessary. (Maybe the special case pre-dated
the let/app invariant.)
Result: less special-case handling in the compiler, and
better perf for the compiled code.
* In WwLib.mkWWstr_one, treat seqDmd like U(AAA). It was not
being so treated before, which again led to stupid code.
* Update and improve Notes
There are .stderr test wibbles because we get slightly different
strictness signatures for an argumment of unlifted type:
<L,U> rather than <S,U> for Int#
<S,U> rather than <S(S),U(U)> for Int
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Trac #9279 reminded us that the worker wrapper transformation copes
really badly with absent unlifted boxed bindings.
As `Note [Absent errors]` in WwLib.hs points out, we can't just use
`absentError` for unlifted bindings because there is no bottom to hide
the error in.
So instead, we synthesise a new `RubbishLit` of type
`forall (a :: TYPE 'UnliftedRep). a`, which code-gen may subsitute for
any boxed value. We choose `()`, so that there is a good chance that
the program crashes instead instead of leading to corrupt data, should
absence analysis have been too optimistic (#11126).
Reviewers: simonpj, hvr, goldfire, bgamari, simonmar
Reviewed By: simonpj
Subscribers: osa1, rwbarton, carter
GHC Trac Issues: #15627, #9279, #4306, #11126
Differential Revision: https://phabricator.haskell.org/D5153
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch has a single significant change:
strictness wrapper functions are inlined earlier,
in phase 2 rather than phase 0.
As shown by Trac #15056, this gives a better chance for RULEs to fire.
Before this change, a function that would have inlined early without
strictness analyss was instead inlining late. Result: applying
"optimisation" made the program worse.
This does not make too much difference in nofib, but I've stumbled
over the problem more than once, so even a "no-change" result would be
quite acceptable. Here are the headlines:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
cacheprof -0.5% -0.5% +2.5% +2.5% 0.0%
fulsom -1.0% +2.6% -0.1% -0.1% 0.0%
mate -0.6% +2.4% -0.9% -0.9% 0.0%
veritas -0.7% -23.2% 0.002 0.002 0.0%
--------------------------------------------------------------------------------
Min -1.4% -23.2% -12.5% -15.3% 0.0%
Max +0.6% +2.6% +4.4% +4.3% +19.0%
Geometric Mean -0.7% -0.2% -1.4% -1.7% +0.2%
* A worthwhile reduction in binary size.
* Runtimes are not to be trusted much but look as if they
are moving the right way.
* A really big win in veritas, described in comment:1 of
Trac #15056; more fusion rules fired.
* I investigated the losses in 'mate' and 'fulsom'; see #15056.
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: bgamari
Reviewed By: bgamari
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D4565
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See Trac #14626, comment:4. We want to maintain evaluted-ness
info on Ids into the code generateor for two reasons
(see Note [Preserve evaluated-ness in CorePrep] in CorePrep)
- DataToTag magic
- Potentially using it in the codegen (this is Gabor's
current work)
But it was all being done very inconsistently, and actually
outright wrong -- the DataToTag magic hasn't been working for
years.
This patch tidies it all up, with Notes to match.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This dark corner was exposed by Trac #14285. It involves the
interaction between absence analysis and INLINABLE pragmas.
There is a full explanation in Note [aBSENT_ERROR_ID] in MkCore,
which you can read there. The changes in this patch are
* Make exprIsHNF return True for absentError, treating
absentError like an honorary data constructor.
* Make absentError /not/ be diverging, unlike other error Ids.
This is all a bit horrible.
* While doing this I found that exprOkForSpeculation didn't
have a case for value lambdas so I added one. It's not
really called on lifted types much, but it seems like the
right thing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The strictness signature for a data con wrapper wasn't including any
dictionary arguments, which meant that bangs on the fields of a
constructor with an existential context would be moved to the wrong
fields. See T14290 for an example.
Test Plan:
* New test T14290
* validate
Reviewers: simonpj, niteria, austin, bgamari, erikd
Reviewed By: simonpj, bgamari
Subscribers: rwbarton, thomie
GHC Trac Issues: #14290
Differential Revision: https://phabricator.haskell.org/D4040
|
|
|
|
|
|
|
|
| |
the worker/wrapper creates an artificial INLINE pragma, which caused CSE
to not do its work. We now recognize such artificial pragmas by using
`NoUserInline` instead of `Inline` as the `InlineSpec`.
Differential Revision: https://phabricator.haskell.org/D3939
|
|
|
|
|
|
| |
and move the generally useful helpers check_errmsg and grep_errmsg to
testlib.py. Some documentation can be found on
https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is follow-up to https://ghc.haskell.org/trac/ghc/ticket/10773
Reviewers: austin, goldfire, bgamari, RyanGlScott
Reviewed By: RyanGlScott
Subscribers: RyanGlScott, rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3816
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit da4687f63ffe5a6162e3d7856aa53de048dd0f42.
It's not entirely trivial to clean up the dead code this patch
introduced. In particular, when we see
```
case raiseIO# m s of
s' -> e
```
we want to know that `e` is dead. For scrutinees that are properly
bottom (which we don't want to consider `raiseIO# m s` to be, this
is handled by rewriting `bot` to `case bot of {}`. But if we do
that for `raiseIO#`, we end up with
```
case raiseIO# m s of {}
```
which looks a lot like bottom and could confuse demand analysis.
I think we need to wait with this change until we have a more
complete story.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3413
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They were `expect_broken` without any output. Add the actual
output and remove the `expect_broken`.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: rwbarton, thomie
Differential Revision: https://phabricator.haskell.org/D3306
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The script I used is included as testsuite/driver/kill_extra_files.py,
though at this point it is for mostly historical interest.
Some of the tests in libraries/hpc relied on extra_files.py, so this
commit includes an update to that submodule.
One test in libraries/process also relies on extra_files.py, but we
cannot update that submodule so easily, so for now we special-case it
in the test driver.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Disabling worker/wrapper for NOINLINE things can cause unnecessary
reboxing of values. Consider
{-# NOINLINE f #-}
f :: Int -> a
f x = error (show x)
g :: Bool -> Bool -> Int -> Int
g True True p = f p
g False True p = p + 1
g b False p = g b True p
the strictness analysis will discover f and g are strict, but because f
has no wrapper, the worker for g will rebox p. So we get
$wg x y p# =
let p = I# p# in -- Yikes! Reboxing!
case x of
False ->
case y of
False -> $wg False True p#
True -> +# p# 1#
True ->
case y of
False -> $wg True True p#
True -> case f p of { }
g x y p = case p of (I# p#) -> $wg x y p#
Now, in this case the reboxing will float into the True branch, an so
the allocation will only happen on the error path. But it won't float
inwards if there are multiple branches that call (f p), so the reboxing
will happen on every call of g. Disaster.
Solution: do worker/wrapper even on NOINLINE things; but move the
NOINLINE pragma to the worker.
Test Plan: make test TEST="13143"
Reviewers: simonpj, bgamari, dfeuer, austin
Reviewed By: simonpj, bgamari
Subscribers: dfeuer, thomie
Differential Revision: https://phabricator.haskell.org/D3046
|