| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had Sum CPR (#5075) for top-level bindings for a couple of years now.
That begs the question why we didn't also activate it for local bindings, and
the reasons for that are described in `Note [CPR for sum types]`. Only that it
didn't make sense! The Note said that Sum CPR would destroy let-no-escapes, but
that should be a non-issue since we have syntactic join points in Core now and
we don't WW for them (`Note [Don't w/w join points for CPR]`).
So I simply activated CPR for all bindings of sum type, thus fixing #5075 and
\#16570. NoFib approves:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
comp_lab_zift -0.0% +0.7%
fluid +1.7% +0.7%
reptile +0.1% +0.1%
--------------------------------------------------------------------------------
Min -0.0% -0.2%
Max +1.7% +0.7%
Geometric Mean +0.0% +0.0%
```
There were quite a few metric decreases on the order of 1-4%, but T6048 seems to
regress significantly, by 26.1%. WW'ing for a `Just` constructor and the nested
data type meant additional Simplifier iterations and a 30% increase in term
sizes as well as a 200-300% in type sizes due to unboxed 9-tuples. There's not
much we can do about it, I'm afraid: We're just doing much more work there.
Metric Decrease:
T12425
T18698a
T18698b
T20049
T9020
WWRec
Metric Increase:
T6048
|
|
|
|
|
|
|
|
| |
The examples in the Note were inaccurate (`$s$dm` has arity 1 and that seems OK)
and the code didn't actually nuke the demand *signature* anyway. Specialise has
to nuke it, but it starts from a clean IdInfo anyway (in `newSpecIdM`).
So I just deleted the code. Fixes #20450.
|
|
|
|
|
|
|
|
|
|
|
|
| |
In #18824 we saw that the Simplifier didn't nuke a CPR signature of a join point
when it pushed a continuation into it when it better should have.
But join points are local, mostly non-exported bindings. We don't use their
CPR signature anyway and would discard it at the end of the Core pipeline.
Their main purpose is to propagate CPR info during CPR analysis and by the time
worker/wrapper runs the signature will have served its purpose. So we zap it!
Fixes #18824.
|
|
|
|
|
|
| |
Removed an intermediate list via a fold.
realRegsAlias: Manually inlined the list functions to get better code.
Linear.hs added a bang somewhere.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a plugins was specified using the -plugin-package-(id) flag then the
module it applied to was always recompiled.
The recompilation checker was previously using `findImportedModule`,
which looked for packages in the HPT and then in the package database
but only for modules specified using `-package`.
The correct lookup function for plugins is `findPluginModule`, therefore
we check normal imports with `findImportedModule` and plugins with
`findPluginModule`.
Fixes #20417
|
|
|
|
|
|
|
|
|
| |
Fixes #20416
I thought about adding a test for this case but I struggled to think of
something robust. Grepping -v3 will include different paths on different
systems and the structure of the result file depends on which
preprocessor you are using.
|
|
|
|
|
| |
Metric Decrease:
T12545
|
|
|
|
|
|
|
|
|
| |
Previously we were using `System.Environment.getEnvironment`, which
decodes all environment variables into Haskell `String`s, where a simple
environment lookup would do. This made the compiler's allocations
unnecessarily dependent on the environment.
Fixes #20431.
|
|
|
|
|
|
| |
Move HsTick and HsBinTick to XExpr, the extension tree of HsExpr.
Part of #16830 .
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables worker/wrapper for nested constructed products, as described
in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already
there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations
since !5338. CPR analysis already handles applications in batches since !5753.
This patch just needs to flip a few more switches:
1. In `cprTransformDataConWork`, we need to look at the field expressions
and their `CprType`s to see whether the evaluation of the expressions
terminates quickly (= is in HNF) or if they are put in strict fields.
If that is the case, then we retain their CPR info and may unbox nestedly
later on. More details in `Note [Nested CPR]`.
2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`.
3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of
fields to the `Unbox`.
4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have
`cprTransformDataConWork` for workers and treat wrappers by analysing their
unfolding. As a result, the code from GHC.Types.Id.Make went away completely.
5. I deactivated worker/wrappering for recursive DataCons and wrote a function
`isRecDataCon` to detect them. We really don't want to give `repeat` or
`replicate` the Nested CPR property.
See Note [CPR for recursive data structures] for which kind of recursive
DataCons we target.
6. Fix a couple of tests and their outputs.
I also documented that CPR can destroy sharing and lead to asymptotic increase
in allocations (which is tracked by #13331/#19326) in
`Note [CPR for data structures can destroy sharing]`.
Nofib results:
```
--------------------------------------------------------------------------------
Program Allocs Instrs
--------------------------------------------------------------------------------
ben-raytrace -3.1% -0.4%
binary-trees +0.8% -2.9%
digits-of-e2 +5.8% +1.2%
event +0.8% -2.1%
fannkuch-redux +0.0% -1.4%
fish 0.0% -1.5%
gamteb -1.4% -0.3%
mkhprog +1.4% +0.8%
multiplier +0.0% -1.9%
pic -0.6% -0.1%
reptile -20.9% -17.8%
wave4main +4.8% +0.4%
x2n1 -100.0% -7.6%
--------------------------------------------------------------------------------
Min -95.0% -17.8%
Max +5.8% +1.2%
Geometric Mean -2.9% -0.4%
```
The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to
refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main
regress because of that. Ultimately there are no great improvements due to
Nested CPR alone, but at least it's a win.
Binary sizes decrease by 0.6%.
There are a significant number of metric decreases. The most notable ones (>1%):
```
ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2%
ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8%
MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1%
PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8%
PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8%
PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9%
T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0%
T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9%
T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1%
T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5%
T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7%
T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD
T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD
T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4%
T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD
T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1%
T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD
T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7%
T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2%
T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0%
T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2%
T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD
T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5%
T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6%
T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2%
T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8%
T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3%
T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1%
T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD
T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD
T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0%
T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0%
T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5%
T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4%
T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0%
T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0%
T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD
T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD
T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5%
T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD
T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD
T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD
T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD
TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0%
WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6%
T9203(normal) run/alloc 107284064.0 102881832.0 -4.1%
haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1%
haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1%
haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0%
```
The biggest exception to the rule is T13701 which seems to fluctuate as usual
(not unlike T12545). T14697 has a similar quality, being a generated
multi-module test. T5837 is small enough that it similarly doesn't measure
anything significant besides module loading overhead.
T13253 simply does one additional round of Simplification due to Nested CPR.
There are also some apparent regressions in T9198, T12234 and PmSeriesG that we
(@mpickering and I) were simply unable to reproduce locally. @mpickering tried
to run the CI script in a local Docker container and actually found that T9198
and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229,
I did not experience such increases. Let's not get hung up on these regression
tests, they were meant to test for asymptotic regressions.
The build-cabal test improves by 1.2% in -O0.
Metric Increase:
T10421
T12234
T12545
T13035
T13056
T13701
T14697
T18923
T5837
T9198
Metric Decrease:
ManyConstructors
T12545
T12707
T13056
T14683
T16577
T18223
T1969
T3294
T9203
T9233
T9675
T9872a
T9872b
T9872c
T9961
TcPlugin_RewritePerf
|
|
|
|
|
|
|
|
| |
Sebastian unfortunately wrote a very long commit message in !5667 which
caused `xargs` to fail on windows because the environment was too big.
Fortunately `xargs` and `rm` don't need anything from the environment so
just run those commands in an empty environment (which is what env -i
achieves).
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Surprisingly this previously didn't appear to introduce any visible
non-determinism but it seems worth avoiding non-determinism here.
|
|
|
|
| |
Rename to nonDetTyConEnvElts.
|
|
|
|
| |
And remove the former.
|
|
|
|
|
| |
Rather than nonDetEltsUFM; this should eliminate some unnecessary list
allocations.
|
|
|
|
|
| |
It was already commented out and contained a reference to the
non-deterministic nameEnvElts so let's just drop it.
|
| |
|
|
|
|
| |
Closes ticket #17820.
|
|
|
|
| |
This isn't clear from the existing doc.
|
|
|
|
| |
and exhibit similar behaviors. Issue 20400
|
| |
|
|
|
|
|
| |
In the multi-threaded RTS this can lead to hard to debug
performance issues.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocateForCompact() is called when the current allocation for the
compact region does not fit in the nursery. It previously had a special
case for objects exceeding the large object threshold. In that case, it
would allocate a new compact region block just for that object. That led
to a lot of small blocks being allocated in compact regions with a
larger default block size (`autoBlockW`).
This commit removes this special case because having a lot of small
compact region blocks contributes significantly to memory fragmentation.
The removal should be valid because
- a more generic case for allocating a new compact region block follows
at the end of allocateForCompact(), and that one takes `autoBlockW`
into account
- the reason for allocating separate blocks for large objects in the
main heap seems to be to avoid copying during GCs, but once inside
the compact region, the object will never be copied anyway.
Fixes #18757.
A regression test T18757 was added.
|
|
|
|
|
|
|
|
|
|
|
| |
See Note [Equality on FunTys] in TyCoRep.
Close #17675.
Close #17655, about documentation improvements included in
this patch.
Close #19677, about a further mistake around FunTy.
test cases: typecheck/should_compile/T19677
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Given the previous commit, `-V` allows us to see some useful information
in CI (such as the call stack on failure) which normally people don't
want to see.
As a result the $VERBOSE variable now tweaks the diagnostic level one
level higher (to Diagnostic), which produces a lot of output.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before we really only had two verbosity levels, normal and verbose.
There are now three levels:
Normal: Commands show stderr (no stdout) and minimal build failure messages.
Verbose (-V): Commands also show stdout, build failure message contains
callstack and additional information
Diagnostic (-VV): Very verbose output showing all command lines and passing -v3 to cabal commands.
-V is similar to the default verbosity from before (but a little more
verbose)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change reduces the default verbosity of error messages to omit the
stack trace information from the printed output.
For example, before all errors would have a long call trace:
```
Error when running Shake build system:
at action, called at src/Rules.hs:39:19 in main:Rules
at need, called at src/Rules.hs:61:5 in main:Rules
* Depends on: _build/stage1/lib/package.conf.d/ghc-9.3.conf
* Depends on: _build/stage1/compiler/build/libHSghc-9.3.a
* Depends on: _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.o
* Depends on: _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.o _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.hi
at cmd', called at src/Builder.hs:330:23 in main:Builder
at cmd, called at src/Builder.hs:432:8 in main:Builder
* Raised the exception:
```
Which can be useful but it confusing for GHC rather than hadrian
developers.
Ticket #20386
|
|
|
|
|
| |
This flag is used to remove the output of core stats per binding in Core
dumps.
|
|
|
|
|
|
|
|
|
| |
Even in -j1 we now fork all the work into it's own thread so that Ctrl-C
exceptions are thrown on the main thread, which is blocked waiting for
the work thread to finish. The default exception handler then picks up
Ctrl-C exception and the dangling thread is killed.
Fixes #20292
|
|
|
|
|
|
| |
NoGhcTc is removed from HsMatchContext. As a result of this,
HsMatchContext GhcTc is now a valid type that has Id in it,
instead of Name and tcMatchesFun now takes Id instead of Name.
|
| |
|
|
|
|
|
| |
As of cbfc0e933660626c9f4eaf5480076b6fcd31dceb we set $HOME to a
non-existent directory to ensure hermeticity.
|
| |
|
|
|
|
| |
Otherwise we are dependent upon the C++ compiler's default language.
|
|
|
|
|
| |
Otherwise we may get warnings from the toolchain if the bootstrap
compiler was built with a different deployment target.
|
|
|
|
|
|
|
|
|
| |
Previously the C/C++ language version check in STG could throw an
undefined macro warning due to __STDC_VERSION__ when compiled with a C++
compiler. Fix this by defining __STDC_VERSION__==0 when compiling with a
C++ compiler.
Fixes #20394.
|
|
|
|
|
| |
The BSD sed implementation doesn't allow `sed -i COMMAND FILE`; one must
rather use `sed -i -e COMMAND FILE`.
|
| |
|
| |
|
|
|
|
| |
ipeMap.c failed to #include <string.h>
|
| |
|