summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* CprAnal: Activate Sum CPR for local bindingswip/T5075Sebastian Graf2021-10-053-74/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had Sum CPR (#5075) for top-level bindings for a couple of years now. That begs the question why we didn't also activate it for local bindings, and the reasons for that are described in `Note [CPR for sum types]`. Only that it didn't make sense! The Note said that Sum CPR would destroy let-no-escapes, but that should be a non-issue since we have syntactic join points in Core now and we don't WW for them (`Note [Don't w/w join points for CPR]`). So I simply activated CPR for all bindings of sum type, thus fixing #5075 and \#16570. NoFib approves: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- comp_lab_zift -0.0% +0.7% fluid +1.7% +0.7% reptile +0.1% +0.1% -------------------------------------------------------------------------------- Min -0.0% -0.2% Max +1.7% +0.7% Geometric Mean +0.0% +0.0% ``` There were quite a few metric decreases on the order of 1-4%, but T6048 seems to regress significantly, by 26.1%. WW'ing for a `Just` constructor and the nested data type meant additional Simplifier iterations and a 30% increase in term sizes as well as a 200-300% in type sizes due to unboxed 9-tuples. There's not much we can do about it, I'm afraid: We're just doing much more work there. Metric Decrease: T12425 T18698a T18698b T20049 T9020 WWRec Metric Increase: T6048
* Simplifier: Get rid of demand zapping based on Note [Arity decrease]Sebastian Graf2021-10-041-32/+3
| | | | | | | | The examples in the Note were inaccurate (`$s$dm` has arity 1 and that seems OK) and the code didn't actually nuke the demand *signature* anyway. Specialise has to nuke it, but it starts from a clean IdInfo anyway (in `newSpecIdM`). So I just deleted the code. Fixes #20450.
* WorkWrap: Nuke CPR signatures of join points (#18824)Sebastian Graf2021-10-044-24/+79
| | | | | | | | | | | | In #18824 we saw that the Simplifier didn't nuke a CPR signature of a join point when it pushed a continuation into it when it better should have. But join points are local, mostly non-exported bindings. We don't use their CPR signature anyway and would discard it at the end of the Core pipeline. Their main purpose is to propagate CPR info during CPR analysis and by the time worker/wrapper runs the signature will have served its purpose. So we zap it! Fixes #18824.
* NCG: Linear-reg-alloc: A few small implemenation tweaks.Andreas Klebinger2021-09-303-25/+40
| | | | | | Removed an intermediate list via a fold. realRegsAlias: Manually inlined the list functions to get better code. Linear.hs added a bang somewhere.
* Recompilation: Handle -plugin-package correctlyMatthew Pickering2021-09-304-5/+25
| | | | | | | | | | | | | | | If a plugins was specified using the -plugin-package-(id) flag then the module it applied to was always recompiled. The recompilation checker was previously using `findImportedModule`, which looked for packages in the HPT and then in the package database but only for modules specified using `-package`. The correct lookup function for plugins is `findPluginModule`, therefore we check normal imports with `findImportedModule` and plugins with `findPluginModule`. Fixes #20417
* driver: Fix -E -XCPP, copy output from CPP ouput rather than .hs outputMatthew Pickering2021-09-301-2/+2
| | | | | | | | | Fixes #20416 I thought about adding a test for this case but I struggled to think of something robust. Grepping -v3 will include different paths on different systems and the structure of the result file depends on which preprocessor you are using.
* Rules for sized conversion primops (#19769)Sylvain Henry2021-09-3011-28/+337
| | | | | Metric Decrease: T12545
* ghc-boot: Eliminate unnecessary use of getEnvironmentBen Gamari2021-09-301-2/+2
| | | | | | | | | Previously we were using `System.Environment.getEnvironment`, which decodes all environment variables into Haskell `String`s, where a simple environment lookup would do. This made the compiler's allocations unnecessarily dependent on the environment. Fixes #20431.
* Trees That Grow refactor for HsTick and HsBinTickAndrea Condoluci2021-09-3012-86/+74
| | | | | | Move HsTick and HsBinTick to XExpr, the extension tree of HsExpr. Part of #16830 .
* Nested CPR light unleashed (#18174)Sebastian Graf2021-09-3035-443/+1631
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables worker/wrapper for nested constructed products, as described in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations since !5338. CPR analysis already handles applications in batches since !5753. This patch just needs to flip a few more switches: 1. In `cprTransformDataConWork`, we need to look at the field expressions and their `CprType`s to see whether the evaluation of the expressions terminates quickly (= is in HNF) or if they are put in strict fields. If that is the case, then we retain their CPR info and may unbox nestedly later on. More details in `Note [Nested CPR]`. 2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`. 3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of fields to the `Unbox`. 4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have `cprTransformDataConWork` for workers and treat wrappers by analysing their unfolding. As a result, the code from GHC.Types.Id.Make went away completely. 5. I deactivated worker/wrappering for recursive DataCons and wrote a function `isRecDataCon` to detect them. We really don't want to give `repeat` or `replicate` the Nested CPR property. See Note [CPR for recursive data structures] for which kind of recursive DataCons we target. 6. Fix a couple of tests and their outputs. I also documented that CPR can destroy sharing and lead to asymptotic increase in allocations (which is tracked by #13331/#19326) in `Note [CPR for data structures can destroy sharing]`. Nofib results: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- ben-raytrace -3.1% -0.4% binary-trees +0.8% -2.9% digits-of-e2 +5.8% +1.2% event +0.8% -2.1% fannkuch-redux +0.0% -1.4% fish 0.0% -1.5% gamteb -1.4% -0.3% mkhprog +1.4% +0.8% multiplier +0.0% -1.9% pic -0.6% -0.1% reptile -20.9% -17.8% wave4main +4.8% +0.4% x2n1 -100.0% -7.6% -------------------------------------------------------------------------------- Min -95.0% -17.8% Max +5.8% +1.2% Geometric Mean -2.9% -0.4% ``` The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main regress because of that. Ultimately there are no great improvements due to Nested CPR alone, but at least it's a win. Binary sizes decrease by 0.6%. There are a significant number of metric decreases. The most notable ones (>1%): ``` ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2% ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8% MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1% PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8% PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8% PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9% T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0% T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9% T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1% T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5% T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7% T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4% T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1% T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7% T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2% T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0% T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2% T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5% T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6% T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2% T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8% T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3% T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1% T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0% T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0% T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5% T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4% T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0% T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0% T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5% T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0% WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6% T9203(normal) run/alloc 107284064.0 102881832.0 -4.1% haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1% haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1% haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0% ``` The biggest exception to the rule is T13701 which seems to fluctuate as usual (not unlike T12545). T14697 has a similar quality, being a generated multi-module test. T5837 is small enough that it similarly doesn't measure anything significant besides module loading overhead. T13253 simply does one additional round of Simplification due to Nested CPR. There are also some apparent regressions in T9198, T12234 and PmSeriesG that we (@mpickering and I) were simply unable to reproduce locally. @mpickering tried to run the CI script in a local Docker container and actually found that T9198 and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229, I did not experience such increases. Let's not get hung up on these regression tests, they were meant to test for asymptotic regressions. The build-cabal test improves by 1.2% in -O0. Metric Increase: T10421 T12234 T12545 T13035 T13056 T13701 T14697 T18923 T5837 T9198 Metric Decrease: ManyConstructors T12545 T12707 T13056 T14683 T16577 T18223 T1969 T3294 T9203 T9233 T9675 T9872a T9872b T9872c T9961 TcPlugin_RewritePerf
* testsuite: Make cabal01 more robust to large environmentsMatthew Pickering2021-09-301-2/+8
| | | | | | | | Sebastian unfortunately wrote a very long commit message in !5667 which caused `xargs` to fail on windows because the environment was too big. Fortunately `xargs` and `rm` don't need anything from the environment so just run those commands in an empty environment (which is what env -i achieves).
* compiler: occEnvElts -> nonDetOccEnvEltsBen Gamari2021-09-296-10/+10
|
* compiler: Use seqEltsNameEnv rather that nameEnvEltsBen Gamari2021-09-292-1/+4
|
* compiler: Rename nameEnvElts -> nonDetNameEnvEltsBen Gamari2021-09-2910-13/+13
|
* compiler: Make nubAvails deterministicBen Gamari2021-09-292-5/+12
| | | | | Surprisingly this previously didn't appear to introduce any visible non-determinism but it seems worth avoiding non-determinism here.
* compiler: Fix name of GHC.Core.TyCon.Env.nameEnvEltsBen Gamari2021-09-291-3/+3
| | | | Rename to nonDetTyConEnvElts.
* compiler: Rewrite all eltsUFM occurrences to nonDetEltsUFMBen Gamari2021-09-298-11/+8
| | | | And remove the former.
* compiler: Reimplement seqEltsUFM in terms of foldBen Gamari2021-09-292-3/+3
| | | | | Rather than nonDetEltsUFM; this should eliminate some unnecessary list allocations.
* GHC: Drop dead packageDbModulesBen Gamari2021-09-291-24/+0
| | | | | It was already commented out and contained a reference to the non-deterministic nameEnvElts so let's just drop it.
* Add tests for T17820Andrea Condoluci2021-09-2911-0/+69
|
* TH stage restriction check for constructors, selectors, and class methodsAndrea Condoluci2021-09-298-31/+68
| | | | Closes ticket #17820.
* Document that `eqType`/`coreView` do not look through type familiesZiyang Liu2021-09-291-2/+4
| | | | This isn't clear from the existing doc.
* Rectifying COMMENT and `mkComment` across platforms to work with SDocBenjamin Maurer2021-09-2912-25/+27
| | | | and exhibit similar behaviors. Issue 20400
* Add a regression test for #17912Kamil Dworakowski2021-09-293-0/+36
|
* Document interaction between unsafe FFI and GCAlexander Kjeldaas2021-09-291-5/+23
| | | | | In the multi-threaded RTS this can lead to hard to debug performance issues.
* Fix comment typosKirill Zaborsky2021-09-291-2/+2
|
* Remove special case for large objects in allocateForCompactFabian Thorand2021-09-295-11/+47
| | | | | | | | | | | | | | | | | | | | | | | | allocateForCompact() is called when the current allocation for the compact region does not fit in the nursery. It previously had a special case for objects exceeding the large object threshold. In that case, it would allocate a new compact region block just for that object. That led to a lot of small blocks being allocated in compact regions with a larger default block size (`autoBlockW`). This commit removes this special case because having a lot of small compact region blocks contributes significantly to memory fragmentation. The removal should be valid because - a more generic case for allocating a new compact region block follows at the end of allocateForCompact(), and that one takes `autoBlockW` into account - the reason for allocating separate blocks for large objects in the main heap seems to be to avoid copying during GCs, but once inside the compact region, the object will never be copied anyway. Fixes #18757. A regression test T18757 was added.
* Compare FunTys as if they were TyConApps.Richard Eisenberg2021-09-2912-72/+215
| | | | | | | | | | | See Note [Equality on FunTys] in TyCoRep. Close #17675. Close #17655, about documentation improvements included in this patch. Close #19677, about a further mistake around FunTy. test cases: typecheck/should_compile/T19677
* Documented yet undocumented dump flags #18641Benjamin Maurer2021-09-282-6/+38
|
* Remove outdated note about pragma layouttaylorfausak2021-09-281-3/+1
|
* hadrian: Update comments on verbosity handlingMatthew Pickering2021-09-282-2/+14
|
* hadrian: Update documentation for new verbosity optionsMatthew Pickering2021-09-282-37/+5
|
* ci: Increase default verbosity level to `-V` (Verbose)Matthew Pickering2021-09-281-0/+1
| | | | | | | | | Given the previous commit, `-V` allows us to see some useful information in CI (such as the call stack on failure) which normally people don't want to see. As a result the $VERBOSE variable now tweaks the diagnostic level one level higher (to Diagnostic), which produces a lot of output.
* hadrian: Rework the verbosity levelsMatthew Pickering2021-09-286-26/+27
| | | | | | | | | | | | | Before we really only had two verbosity levels, normal and verbose. There are now three levels: Normal: Commands show stderr (no stdout) and minimal build failure messages. Verbose (-V): Commands also show stdout, build failure message contains callstack and additional information Diagnostic (-VV): Very verbose output showing all command lines and passing -v3 to cabal commands. -V is similar to the default verbosity from before (but a little more verbose)
* hadrian: Remove deprecated tracing functionsMatthew Pickering2021-09-2814-27/+27
|
* hadrian: Reduce default verbosityMatthew Pickering2021-09-283-3/+28
| | | | | | | | | | | | | | | | | | | | | | | | | This change reduces the default verbosity of error messages to omit the stack trace information from the printed output. For example, before all errors would have a long call trace: ``` Error when running Shake build system: at action, called at src/Rules.hs:39:19 in main:Rules at need, called at src/Rules.hs:61:5 in main:Rules * Depends on: _build/stage1/lib/package.conf.d/ghc-9.3.conf * Depends on: _build/stage1/compiler/build/libHSghc-9.3.a * Depends on: _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.o * Depends on: _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.o _build/stage1/compiler/build/GHC/Tc/Solver/Rewrite.hi at cmd', called at src/Builder.hs:330:23 in main:Builder at cmd, called at src/Builder.hs:432:8 in main:Builder * Raised the exception: ``` Which can be useful but it confusing for GHC rather than hadrian developers. Ticket #20386
* Add `-dsuppress-core-sizes` flag (#20342)Sylvain Henry2021-09-2825-8/+201
| | | | | This flag is used to remove the output of core stats per binding in Core dumps.
* driver: Fix Ctrl-C handling with -j1Matthew Pickering2021-09-281-27/+35
| | | | | | | | | Even in -j1 we now fork all the work into it's own thread so that Ctrl-C exceptions are thrown on the main thread, which is blocked waiting for the work thread to finish. The default exception handler then picks up Ctrl-C exception and the dangling thread is killed. Fixes #20292
* Remove NoGhcTc usage from HsMatchContextArtyom Kuznetsov2021-09-2813-39/+70
| | | | | | NoGhcTc is removed from HsMatchContext. As a result of this, HsMatchContext GhcTc is now a valid type that has Id in it, instead of Name and tcMatchesFun now takes Id instead of Name.
* gitlab-ci: Ensure that temporary home existsBen Gamari2021-09-271-1/+2
|
* gitlab-ci: Don't rely on $HOME when pushing test metricsBen Gamari2021-09-242-9/+14
| | | | | As of cbfc0e933660626c9f4eaf5480076b6fcd31dceb we set $HOME to a non-existent directory to ensure hermeticity.
* Constant-folding for timesInt2# (#20374)Sylvain Henry2021-09-234-0/+94
|
* testsuite: Ensure that C++11 is used in T20199Ben Gamari2021-09-231-1/+1
| | | | Otherwise we are dependent upon the C++ compiler's default language.
* gitlab-ci: Unset MACOSX_DEPLOYMENT_TARGET in stage0 buildBen Gamari2021-09-231-0/+3
| | | | | Otherwise we may get warnings from the toolchain if the bootstrap compiler was built with a different deployment target.
* rts: Ensure that headers don't refer to undefined __STDC_VERSION__Ben Gamari2021-09-231-0/+6
| | | | | | | | | Previously the C/C++ language version check in STG could throw an undefined macro warning due to __STDC_VERSION__ when compiled with a C++ compiler. Fix this by defining __STDC_VERSION__==0 when compiling with a C++ compiler. Fixes #20394.
* testsuite: Fix gnu sed-ismBen Gamari2021-09-232-2/+2
| | | | | The BSD sed implementation doesn't allow `sed -i COMMAND FILE`; one must rather use `sed -i -e COMMAND FILE`.
* testsuite: Don't use cc directly in section_alignment testBen Gamari2021-09-231-1/+1
|
* testsuite: Make unsigned_reloc_macho_x64 and section_alignment makefile_testsBen Gamari2021-09-231-2/+2
|
* testsuite: Fix ipeMapBen Gamari2021-09-231-0/+1
| | | | ipeMap.c failed to #include <string.h>
* testsuite: Pass CFLAGS to hsc2hs testsBen Gamari2021-09-234-10/+19
|