| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes some abundant reboxing of `DynFlags` in
`GHC.HsToCore.Match.Literal.warnAboutOverflowedLit` (which was the topic
of #19407) by introducing a Boxity analysis to GHC, done as part of demand
analysis. This allows to accurately capture ad-hoc unboxing decisions previously
made in worker/wrapper in demand analysis now, where the boxity info can
propagate through demand signatures.
See the new `Note [Boxity analysis]`. The actual fix for #19407 is described in
`Note [No lazy, Unboxed demand in demand signature]`, but
`Note [Finalising boxity for demand signature]` is probably a better entry-point.
To support the fix for #19407, I had to change (what was)
`Note [Add demands for strict constructors]` a bit
(now `Note [Unboxing evaluated arguments]`). In particular, we now take care of
it in `finaliseBoxity` (which is only called from demand analaysis) instead of
`wantToUnboxArg`.
I also had to resurrect `Note [Product demands for function body]` and rename
it to `Note [Unboxed demand on function bodies returning small products]` to
avoid huge regressions in `join004` and `join007`, thereby fixing #4267 again.
See the updated Note for details.
A nice side-effect is that the worker/wrapper transformation no longer needs to
look at strictness info and other bits such as `InsideInlineableFun` flags
(needed for `Note [Do not unbox class dictionaries]`) at all. It simply collects
boxity info from argument demands and interprets them with a severely simplified
`wantToUnboxArg`. All the smartness is in `finaliseBoxity`, which could be moved
to DmdAnal completely, if it wasn't for the call to `dubiousDataConInstArgTys`
which would be awkward to export.
I spent some time figuring out the reason for why `T16197` failed prior to my
amendments to `Note [Unboxing evaluated arguments]`. After having it figured
out, I minimised it a bit and added `T16197b`, which simply compares computed
strictness signatures and thus should be far simpler to eyeball.
The 12% ghc/alloc regression in T11545 is because of the additional `Boxity`
field in `Poly` and `Prod` that results in more allocation during `lubSubDmd`
and `plusSubDmd`. I made sure in the ticky profiles that the number of calls
to those functions stayed the same. We can bear such an increase here, as we
recently improved it by -68% (in b760c1f).
T18698* regress slightly because there is more unboxing of dictionaries
happening and that causes Lint (mostly) to allocate more.
Fixes #19871, #19407, #4267, #16859, #18907 and #13331.
Metric Increase:
T11545
T18698a
T18698b
Metric Decrease:
T12425
T16577
T18223
T18282
T4267
T9961
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
In #20539 we had a type
```hs
newtype Measured a = Measured { unmeasure :: () -> a }
```
and `isRecDataCon Measured` recursed into `go_arg_ty` for `(->) ()`, because
`unwrapNewTyConEtad_maybe` eta-reduced it. That triggered an assertion error a
bit later. Eta reducing the field type is completely wrong to do here! Just call
`unwrapNewTyCon_maybe` instead.
Fixes #20539 and adds a regression test T20539.
|
| |
|
|
| |
For #16040 and #2387.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In #18824 we saw that the Simplifier didn't nuke a CPR signature of a join point
when it pushed a continuation into it when it better should have.
But join points are local, mostly non-exported bindings. We don't use their
CPR signature anyway and would discard it at the end of the Core pipeline.
Their main purpose is to propagate CPR info during CPR analysis and by the time
worker/wrapper runs the signature will have served its purpose. So we zap it!
Fixes #18824.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:
1. In `cprTransformDataConWork`, we need to look at the field expressions
and their `CprType`s to see whether the evaluation of the expressions
terminates quickly (= is in HNF) or if they are put in strict fields.
If that is the case, then we retain their CPR info and may unbox nestedly
later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
`cprTransformDataConWork` for workers and treat wrappers by analysing their
unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
`isRecDataCon` to detect them. We really don't want to give `repeat` or
`replicate` the Nested CPR property.
See Note [CPR for recursive data structures] for which kind of recursive
DataCons we target.
6. Fix a couple of tests and their outputs.
I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.
Nofib results:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
ben-raytrace -3.1% -0.4%
binary-trees +0.8% -2.9%
digits-of-e2 +5.8% +1.2%
event +0.8% -2.1%
fannkuch-redux +0.0% -1.4%
fish 0.0% -1.5%
gamteb -1.4% -0.3%
mkhprog +1.4% +0.8%
multiplier +0.0% -1.9%
pic -0.6% -0.1%
reptile -20.9% -17.8%
wave4main +4.8% +0.4%
x2n1 -100.0% -7.6%
--------------------------------------------------------------------------------
Min -95.0% -17.8%
Max +5.8% +1.2%
Geometric Mean -2.9% -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.
There are a significant number of metric decreases. The most notable ones (>1%):
```
ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2%
ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8%
MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1%
PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8%
PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8%
PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9%
T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0%
T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9%
T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1%
T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5%
T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7%
T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD
T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD
T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4%
T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD
T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1%
T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD
T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7%
T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2%
T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0%
T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2%
T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD
T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5%
T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6%
T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2%
T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8%
T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3%
T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1%
T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD
T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD
T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0%
T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0%
T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5%
T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4%
T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0%
T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0%
T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD
T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD
T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5%
T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD
T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD
T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD
T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD
TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0%
WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6%
T9203(normal) run/alloc 107284064.0 102881832.0 -4.1%
haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1%
haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1%
haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.
There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.
The build-cabal test improves by 1.2% in -O0.
Metric Increase:
T10421
T12234
T12545
T13035
T13056
T13701
T14697
T18923
T5837
T9198
Metric Decrease:
ManyConstructors
T12545
T12707
T13056
T14683
T16577
T18223
T1969
T3294
T9203
T9233
T9675
T9872a
T9872b
T9872c
T9961
TcPlugin_RewritePerf
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #19822, we realised that the Simplifier's new habit of floating cases into
`runRW#` continuations inhibits CPR analysis from giving key functions of `text`
the CPR property, such as `singleton`.
This patch fixes that by anticipating part of !5667 (Nested CPR) to give
`runRW#` the proper CPR transformer it now deserves: Namely, `runRW# (\s -> e)`
should have the CPR property iff `e` has it.
The details are in `Note [Simplification of runRW#]` in GHC.CoreToStg.Prep.
The output of T18086 changed a bit: `panic` (which calls `runRW#`) now has
`botCpr`. As outlined in Note [Bottom CPR iff Dead-Ending Divergence], that's
OK.
Fixes #19822.
Metric Decrease:
T9872d
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In another small step towards bringing a manageable variant of Nested
CPR into GHC, this patch refactors worker/wrapper to be able to exploit
Nested CPR signatures. See the new Note [Worker/wrapper for CPR].
The nested code path is currently not triggered, though, because all
signatures that we annotate are still flat. So purely a refactoring.
I am very confident that it works, because I ripped it off !1866 95%
unchanged.
A few test case outputs changed, but only it's auxiliary names only.
I also added test cases for #18109 and #18401.
There's a 2.6% metric increase in T13056 after a rebase, caused by an
additional Simplifier run. It appears b1d0b9c saw a similar additional
iteration. I think it's just a fluke.
Metric Increase:
T13056
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While fixing #19232, it became increasingly clear that the vestigial
hack described in `Note [Optimistic field binder CPR]` is complicated
and causes reboxing. Rather than make the hack worse, this patch
gets rid of it completely in favor of giving deeply unboxed parameters
the Nested CPR property. Example:
```hs
f :: (Int, Int) -> Int
f p = case p of
(x, y) | x == y = x
| otherwise = y
```
Based on `p`'s `idDemandInfo` `1P(1P(L),1P(L))`, we can see that both
fields of `p` will be available unboxed. As a result, we give `p` the
nested CPR property `1(1,1)`. When analysing the `case`, the field
CPRs are transferred to the binders `x` and `y`, respectively, so that
we ultimately give `f` the CPR property.
I took the liberty to do a bit of refactoring:
- I renamed `CprResult` ("Constructed product result result") to plain
`Cpr`.
- I Introduced `FlatConCpr` in addition to (now nested) `ConCpr` and
and according pattern synonym that rewrites flat `ConCpr` to
`FlatConCpr`s, purely for compiler perf reasons.
- Similarly for performance reasons, we now store binders with a
Top signature in a separate `IntSet`,
see `Note [Efficient Top sigs in SigEnv]`.
- I moved a bit of stuff around in `GHC.Core.Opt.WorkWrap.Utils` and
introduced `UnboxingDecision` to replace the `Maybe DataConPatContext`
type we used to return from `wantToUnbox`.
- Since the `Outputable Cpr` instance changed anyway, I removed the
leading `m` which we used to emit for `ConCpr`. It's just noise,
especially now that we may output nested CPRs.
Fixes #19398.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For years we have lived in a supposedly sweet spot that gave case
binders the CPR property, unconditionally. Which is an optimistic hack
that is now described in `Historical Note [Optimistic case binder CPR]`.
In #19232 the concern was raised that this might do more harm than good
and that might be better off simply by taking the CPR property of the
scrutinee for the CPR type of the case binder. And indeed that's what we
do now.
Since `Note [CPR in a DataAlt case alternative]` is now only about field
binders, I renamed and garbage collected it into
`Note [Optimistic field binder CPR]`.
NoFib approves:
```
NoFib Results
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
anna +0.1% +0.1%
nucleic2 -1.2% -0.6%
sched 0.0% +0.9%
transform -0.0% -0.1%
--------------------------------------------------------------------------------
Min -1.2% -0.6%
Max +0.1% +0.9%
Geometric Mean -0.0% +0.0%
```
Fixes #19232.
|
| |
|
|
|
|
|
|
|
| |
-------------------------
Metric Decrease:
T12425
Metric Increase:
T17516
-------------------------
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `clean_cmd` and `extra_clean` setup functions don't do anything.
Remove them from .T files.
Created using https://github.com/thomie/refactor-ghc-testsuite. This
diff is a test for the .T-file parser/processor/pretty-printer in that
repository.
find . -name '*.T' -exec ~/refactor-ghc-testsuite/Main "{}" \;
Tests containing inline comments or multiline strings are not modified.
Preparation for #12223.
Test Plan: Harbormaster
Reviewers: austin, hvr, simonmar, mpickering, bgamari
Reviewed By: mpickering
Subscribers: mpickering
Differential Revision: https://phabricator.haskell.org/D3000
GHC Trac Issues: #12223
|
| | |
|
| |
|
|
|
|
|
|
|
| |
It's a bit confusing to have .gitignore files spread all over the
filesystem. This commit tries to consolidate those into one .gitignore
file per component. Moreover, we try to describe files to be ignored which
happen to have a common identifying pattern by glob patterns.
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
| |
|
|
|
|
|
|
| |
I used this shell command to automatically generate the lists:
for i in `git ls-files -o --exclude-standard --directory`; do echo "`basename $i`" >> "`dirname "$i"`/.gitignore"; done
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
| |
|
|
|
| |
...not that we do have nested CPR right now, but when we do, this should
better not break.
|
| |
|
|
|
| |
Authored-by: David Luposchainsky <dluposchainsky@gmail.com>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
| |
|
|
|
|
|
|
| |
This allows them to give framework failures.
I also had to change how setTestOpts works. Now, rather than applying
the options to the directory's "default options", it just stores the
options to be applied for each test (i.e. once we know the test name).
|
| |
|