summaryrefslogtreecommitdiff
path: root/compiler
Commit message (Collapse)AuthorAgeFilesLines
...
* Don't use FastString for UTF-8 encoding onlySylvain Henry2021-10-023-4/+14
|
* Use eqType, not tcEqType, in metavar kind checkRichard Eisenberg2021-10-025-41/+35
| | | | | | | | | | | | Close #20356. See addendum to Note [coreView vs tcView] in GHC.Core.Type for the details. Also killed old Note about metaTyVarUpdateOK, which has been gone for some time. test case: typecheck/should_fail/T20356
* CmmToLlvm: Sign/Zero extend parameters for foreign calls on RISC-VAndreas Schwab2021-10-021-0/+1
| | | | | Like S390 and PPC64, RISC-V requires parameters for foreign calls to be extended to full words.
* code gen: Improve efficiency of findPrefRealRegMatthew Pickering2021-10-014-19/+54
| | | | | | | | | | | | | | | | | | Old strategy: For each variable linearly scan through all the blocks and check to see if the variable is any of the block register mappings. This is very slow when you have a lot of blocks. New strategy: Maintain a map from virtual registers to the first real register the virtual register was assigned to. Consult this map in findPrefRealReg. The map is updated when the register mapping is updated and is hidden behind the BlockAssigment abstraction. On the mmark package this reduces compilation time from about 44s to 32s. Ticket: #19471
* Convert Diagnostics GHC.Tc.Gen.* (Part 3)Aaron Allen2021-10-013-107/+266
| | | | | | Converts all diagnostics in the `GHC.Tc.Gen.Expr` module. (#20116)
* NCG: Linear-reg-alloc: A few small implemenation tweaks.Andreas Klebinger2021-09-303-25/+40
| | | | | | Removed an intermediate list via a fold. realRegsAlias: Manually inlined the list functions to get better code. Linear.hs added a bang somewhere.
* Recompilation: Handle -plugin-package correctlyMatthew Pickering2021-09-301-5/+9
| | | | | | | | | | | | | | | If a plugins was specified using the -plugin-package-(id) flag then the module it applied to was always recompiled. The recompilation checker was previously using `findImportedModule`, which looked for packages in the HPT and then in the package database but only for modules specified using `-package`. The correct lookup function for plugins is `findPluginModule`, therefore we check normal imports with `findImportedModule` and plugins with `findPluginModule`. Fixes #20417
* driver: Fix -E -XCPP, copy output from CPP ouput rather than .hs outputMatthew Pickering2021-09-301-2/+2
| | | | | | | | | Fixes #20416 I thought about adding a test for this case but I struggled to think of something robust. Grepping -v3 will include different paths on different systems and the structure of the result file depends on which preprocessor you are using.
* Rules for sized conversion primops (#19769)Sylvain Henry2021-09-301-27/+11
| | | | | Metric Decrease: T12545
* Trees That Grow refactor for HsTick and HsBinTickAndrea Condoluci2021-09-3011-82/+74
| | | | | | Move HsTick and HsBinTick to XExpr, the extension tree of HsExpr. Part of #16830 .
* Nested CPR light unleashed (#18174)Sebastian Graf2021-09-307-248/+742
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables worker/wrapper for nested constructed products, as described in `Note [Nested CPR]`. The machinery for expressing Nested CPR was already there, since !5054. Worker/wrapper is equipped to exploit Nested CPR annotations since !5338. CPR analysis already handles applications in batches since !5753. This patch just needs to flip a few more switches: 1. In `cprTransformDataConWork`, we need to look at the field expressions and their `CprType`s to see whether the evaluation of the expressions terminates quickly (= is in HNF) or if they are put in strict fields. If that is the case, then we retain their CPR info and may unbox nestedly later on. More details in `Note [Nested CPR]`. 2. Enable nested `ConCPR` signatures in `GHC.Types.Cpr`. 3. In the `asConCpr` call in `GHC.Core.Opt.WorkWrap.Utils`, pass CPR info of fields to the `Unbox`. 4. Instead of giving CPR signatures to DataCon workers and wrappers, we now have `cprTransformDataConWork` for workers and treat wrappers by analysing their unfolding. As a result, the code from GHC.Types.Id.Make went away completely. 5. I deactivated worker/wrappering for recursive DataCons and wrote a function `isRecDataCon` to detect them. We really don't want to give `repeat` or `replicate` the Nested CPR property. See Note [CPR for recursive data structures] for which kind of recursive DataCons we target. 6. Fix a couple of tests and their outputs. I also documented that CPR can destroy sharing and lead to asymptotic increase in allocations (which is tracked by #13331/#19326) in `Note [CPR for data structures can destroy sharing]`. Nofib results: ``` -------------------------------------------------------------------------------- Program Allocs Instrs -------------------------------------------------------------------------------- ben-raytrace -3.1% -0.4% binary-trees +0.8% -2.9% digits-of-e2 +5.8% +1.2% event +0.8% -2.1% fannkuch-redux +0.0% -1.4% fish 0.0% -1.5% gamteb -1.4% -0.3% mkhprog +1.4% +0.8% multiplier +0.0% -1.9% pic -0.6% -0.1% reptile -20.9% -17.8% wave4main +4.8% +0.4% x2n1 -100.0% -7.6% -------------------------------------------------------------------------------- Min -95.0% -17.8% Max +5.8% +1.2% Geometric Mean -2.9% -0.4% ``` The huge wins in x2n1 (loopy list) and reptile (see #19970) are due to refraining from unboxing (:). Other benchmarks like digits-of-e2 or wave4main regress because of that. Ultimately there are no great improvements due to Nested CPR alone, but at least it's a win. Binary sizes decrease by 0.6%. There are a significant number of metric decreases. The most notable ones (>1%): ``` ManyAlternatives(normal) ghc/alloc 771656002.7 762187472.0 -1.2% ManyConstructors(normal) ghc/alloc 4191073418.7 4114369216.0 -1.8% MultiLayerModules(normal) ghc/alloc 3095678333.3 3128720704.0 +1.1% PmSeriesG(normal) ghc/alloc 50096429.3 51495664.0 +2.8% PmSeriesS(normal) ghc/alloc 63512989.3 64681600.0 +1.8% PmSeriesV(normal) ghc/alloc 62575424.0 63767208.0 +1.9% T10547(normal) ghc/alloc 29347469.3 29944240.0 +2.0% T11303b(normal) ghc/alloc 46018752.0 47367576.0 +2.9% T12150(optasm) ghc/alloc 81660890.7 82547696.0 +1.1% T12234(optasm) ghc/alloc 59451253.3 60357952.0 +1.5% T12545(normal) ghc/alloc 1705216250.7 1751278952.0 +2.7% T12707(normal) ghc/alloc 981000472.0 968489800.0 -1.3% GOOD T13056(optasm) ghc/alloc 389322664.0 372495160.0 -4.3% GOOD T13253(normal) ghc/alloc 337174229.3 341954576.0 +1.4% T13701(normal) ghc/alloc 2381455173.3 2439790328.0 +2.4% BAD T14052(ghci) ghc/alloc 2162530642.7 2139108784.0 -1.1% T14683(normal) ghc/alloc 3049744728.0 2977535064.0 -2.4% GOOD T14697(normal) ghc/alloc 362980213.3 369304512.0 +1.7% T15164(normal) ghc/alloc 1323102752.0 1307480600.0 -1.2% T15304(normal) ghc/alloc 1304607429.3 1291024568.0 -1.0% T16190(normal) ghc/alloc 281450410.7 284878048.0 +1.2% T16577(normal) ghc/alloc 7984960789.3 7811668768.0 -2.2% GOOD T17516(normal) ghc/alloc 1171051192.0 1153649664.0 -1.5% T17836(normal) ghc/alloc 1115569746.7 1098197592.0 -1.6% T17836b(normal) ghc/alloc 54322597.3 55518216.0 +2.2% T17977(normal) ghc/alloc 47071754.7 48403408.0 +2.8% T17977b(normal) ghc/alloc 42579133.3 43977392.0 +3.3% T18923(normal) ghc/alloc 71764237.3 72566240.0 +1.1% T1969(normal) ghc/alloc 784821002.7 773971776.0 -1.4% GOOD T3294(normal) ghc/alloc 1634913973.3 1614323584.0 -1.3% GOOD T4801(normal) ghc/alloc 295619648.0 292776440.0 -1.0% T5321FD(normal) ghc/alloc 278827858.7 276067280.0 -1.0% T5631(normal) ghc/alloc 586618202.7 577579960.0 -1.5% T5642(normal) ghc/alloc 494923048.0 487927208.0 -1.4% T5837(normal) ghc/alloc 37758061.3 39261608.0 +4.0% T9020(optasm) ghc/alloc 257362077.3 254672416.0 -1.0% T9198(normal) ghc/alloc 49313365.3 50603936.0 +2.6% BAD T9233(normal) ghc/alloc 704944258.7 685692712.0 -2.7% GOOD T9630(normal) ghc/alloc 1476621560.0 1455192784.0 -1.5% T9675(optasm) ghc/alloc 443183173.3 433859696.0 -2.1% GOOD T9872a(normal) ghc/alloc 1720926653.3 1693190072.0 -1.6% GOOD T9872b(normal) ghc/alloc 2185618061.3 2162277568.0 -1.1% GOOD T9872c(normal) ghc/alloc 1765842405.3 1733618088.0 -1.8% GOOD TcPlugin_RewritePerf(normal) ghc/alloc 2388882730.7 2365504696.0 -1.0% WWRec(normal) ghc/alloc 607073186.7 597512216.0 -1.6% T9203(normal) run/alloc 107284064.0 102881832.0 -4.1% haddock.Cabal(normal) run/alloc 24025329589.3 23768382560.0 -1.1% haddock.base(normal) run/alloc 25660521653.3 25370321824.0 -1.1% haddock.compiler(normal) run/alloc 74064171706.7 73358712280.0 -1.0% ``` The biggest exception to the rule is T13701 which seems to fluctuate as usual (not unlike T12545). T14697 has a similar quality, being a generated multi-module test. T5837 is small enough that it similarly doesn't measure anything significant besides module loading overhead. T13253 simply does one additional round of Simplification due to Nested CPR. There are also some apparent regressions in T9198, T12234 and PmSeriesG that we (@mpickering and I) were simply unable to reproduce locally. @mpickering tried to run the CI script in a local Docker container and actually found that T9198 and PmSeriesG *improved*. In MRs that were rebased on top this one, like !4229, I did not experience such increases. Let's not get hung up on these regression tests, they were meant to test for asymptotic regressions. The build-cabal test improves by 1.2% in -O0. Metric Increase: T10421 T12234 T12545 T13035 T13056 T13701 T14697 T18923 T5837 T9198 Metric Decrease: ManyConstructors T12545 T12707 T13056 T14683 T16577 T18223 T1969 T3294 T9203 T9233 T9675 T9872a T9872b T9872c T9961 TcPlugin_RewritePerf
* compiler: occEnvElts -> nonDetOccEnvEltsBen Gamari2021-09-296-10/+10
|
* compiler: Use seqEltsNameEnv rather that nameEnvEltsBen Gamari2021-09-292-1/+4
|
* compiler: Rename nameEnvElts -> nonDetNameEnvEltsBen Gamari2021-09-2910-13/+13
|
* compiler: Make nubAvails deterministicBen Gamari2021-09-292-5/+12
| | | | | Surprisingly this previously didn't appear to introduce any visible non-determinism but it seems worth avoiding non-determinism here.
* compiler: Fix name of GHC.Core.TyCon.Env.nameEnvEltsBen Gamari2021-09-291-3/+3
| | | | Rename to nonDetTyConEnvElts.
* compiler: Rewrite all eltsUFM occurrences to nonDetEltsUFMBen Gamari2021-09-298-11/+8
| | | | And remove the former.
* compiler: Reimplement seqEltsUFM in terms of foldBen Gamari2021-09-292-3/+3
| | | | | Rather than nonDetEltsUFM; this should eliminate some unnecessary list allocations.
* GHC: Drop dead packageDbModulesBen Gamari2021-09-291-24/+0
| | | | | It was already commented out and contained a reference to the non-deterministic nameEnvElts so let's just drop it.
* TH stage restriction check for constructors, selectors, and class methodsAndrea Condoluci2021-09-298-31/+68
| | | | Closes ticket #17820.
* Document that `eqType`/`coreView` do not look through type familiesZiyang Liu2021-09-291-2/+4
| | | | This isn't clear from the existing doc.
* Rectifying COMMENT and `mkComment` across platforms to work with SDocBenjamin Maurer2021-09-2912-25/+27
| | | | and exhibit similar behaviors. Issue 20400
* Compare FunTys as if they were TyConApps.Richard Eisenberg2021-09-2910-72/+195
| | | | | | | | | | | See Note [Equality on FunTys] in TyCoRep. Close #17675. Close #17655, about documentation improvements included in this patch. Close #19677, about a further mistake around FunTy. test cases: typecheck/should_compile/T19677
* Add `-dsuppress-core-sizes` flag (#20342)Sylvain Henry2021-09-284-8/+14
| | | | | This flag is used to remove the output of core stats per binding in Core dumps.
* driver: Fix Ctrl-C handling with -j1Matthew Pickering2021-09-281-27/+35
| | | | | | | | | Even in -j1 we now fork all the work into it's own thread so that Ctrl-C exceptions are thrown on the main thread, which is blocked waiting for the work thread to finish. The default exception handler then picks up Ctrl-C exception and the dangling thread is killed. Fixes #20292
* Remove NoGhcTc usage from HsMatchContextArtyom Kuznetsov2021-09-2812-38/+69
| | | | | | NoGhcTc is removed from HsMatchContext. As a result of this, HsMatchContext GhcTc is now a valid type that has Id in it, instead of Name and tcMatchesFun now takes Id instead of Name.
* Constant-folding for timesInt2# (#20374)Sylvain Henry2021-09-231-0/+33
|
* Use Info Table Provenances to decode cloned stack (#18163)Sven Tennie2021-09-2314-81/+356
| | | | | | | | | | | | | | | | Emit an Info Table Provenance Entry (IPE) for every stack represeted info table if -finfo-table-map is turned on. To decode a cloned stack, lookupIPE() is used. It provides a mapping between info tables and their source location. Please see these notes for details: - [Stacktraces from Info Table Provenance Entries (IPE based stack unwinding)] - [Mapping Info Tables to Source Positions] Metric Increase: T12545
* Introduce stack snapshotting / cloning (#18741)Sven Tennie2021-09-234-11/+46
| | | | | | | | | | | | | | Add `StackSnapshot#` primitive type that represents a cloned stack (StgStack). The cloning interface consists of two functions, that clone either the treads own stack (cloneMyStack) or another threads stack (cloneThreadStack). The stack snapshot is offline/cold, i.e. it isn't evaluated any further. This is useful for analyses as it prevents concurrent modifications. For technical details, please see Note [Stack Cloning]. Co-authored-by: Ben Gamari <bgamari.foss@gmail.com> Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
* Typo [skip ci]wip/typo-cgMatthew Pickering2021-09-231-1/+1
|
* Remove unused, undocumented debug/dump flag `-ddump-vt-trace`. See 20403.Benjamin Maurer2021-09-222-4/+0
|
* Link with libm dynamically (#19877)Sylvain Henry2021-09-224-10/+16
| | | | The compiler should be independent of the target.
* Convert Diagnostics in GHC.Tc.Gen.* (Part 2)Aaron Allen2021-09-224-114/+330
| | | | | | Converts diagnostics in: (#20116) - GHC.Tc.Gen.Default - GHC.Tc.Gen.Export
* deriving: Always use module prefix in dataTypeNameMatthew Pickering2021-09-181-1/+6
| | | | | | | | | | | This fixes a long standard bug where the module prefix was omitted from the data type name supplied by Data.Typeable instances. Instead of reusing the Outputable instance for TyCon, we now take matters into our own hands and explicitly print the module followed by the type constructor name. Fixes #20371
* CoreUtils: Make exprIsHNF return True for unlifted variables (#20140)Sebastian Graf2021-09-181-0/+2
| | | | | | Clearly, evaluating an unlifted variable will never perform any work. Fixes #20140.
* WorkWrap: Update Note [Wrapper activation] (#15056)Sebastian Graf2021-09-181-23/+28
| | | | | | | | | | | The last point of the Conclusion was wrong; we inline functions without pragmas after the initial phase. It also appears that #15056 was fixed, as there already is a test T15056 which properly does foldr/build fusion for the reproducer. I made sure that T15056's `foo` is just large enough for WW to happen (which it wasn't), but for the worker to be small enough to inline into `blam`. Fixes #15056.
* Constant folding for ctz/clz/popCnt (#20376)Sylvain Henry2021-09-171-0/+45
|
* Use an ADT for RecompReasonSylvain Henry2021-09-173-54/+113
|
* Refactor module dependencies codeSylvain Henry2021-09-1711-192/+212
| | | | | | * moved deps related code into GHC.Unit.Module.Deps * refactored Deps module to not export Dependencies constructor to help maintaining invariants
* Fix annoying warning about Data.List unqualified importSylvain Henry2021-09-171-1/+1
|
* Code Gen: Rewrite shortcutWeightMap more efficientlyMatthew Pickering2021-09-171-33/+53
| | | | | | | | | | | | This function was one of the main sources of allocation in a ticky profile due to how it repeatedly deleted nodes from a large map. Now firstly the cuts are normalised, so that chains of cuts are elimated before any rewrites are applied. Then the CFG is traversed and reconstructed once whilst applying the necessary rewrites to remove shortcutted edges (based on the normalised cuts). Ticket: #19471
* Code Gen: Use more efficient block merging algorithmMatthew Pickering2021-09-173-27/+109
| | | | | | | | | | | | | | | | | | The previous algorithm scaled poorly when there was a large number of blocks and edges. The algorithm links together block chains which have edges between them in the CFG. The new algorithm uses a union find data structure in order to efficiently merge together blocks and calculate which block chain each block id belonds to. I copied the UnionFind data structure which already existed in Cabal into the GHC library rathert than reimplement it myself. This change results in a very significant reduction in allocations when compiling the mmark package. Ticket: #19471
* Code Gen: Optimise successors calculation in loop calculationMatthew Pickering2021-09-171-5/+4
| | | | | | | | | | Before this change, the whole map would be traversed in order to delete a node from the graph before calculating successors. This is quite inefficient if the CFG is big, as was the case in the mmark package. A more efficient alternative is to leave the CFG untouched and then just delete the node once after the lookups have been performed. Ticket: #19471
* Code Gen: Replace another lazy fmap with strict mapMapMatthew Pickering2021-09-171-1/+1
|
* Code Gen: Use strict map rather than lazy map in loop analysisMatthew Pickering2021-09-171-1/+3
| | | | | | | | | | | We were ending up with a big 1GB thunk spike as the `fmap` operation did not force the key values promptly. This fixes the high maximum memory consumption when compiling the mmark package. Compilation is still slow and allocates a lot more than previous releases. Related to #19471
* compiler: Ensure that all CoreTodos have SCCsBen Gamari2021-09-172-3/+5
| | | | | | In #20365 we noticed that a significant amount of time is spend in the Core2Core cost-center, suggesting that some passes are likely missing SCC pragmas. Try to fix this.
* driver: Clean up temporary files after a module has been compiledMatthew Pickering2021-09-171-2/+8
| | | | | | | | | The refactoring accidently removed these calls to eagerly remove temporary files after a module has been compiled. This caused some issues with tmpdirs getting filled up on my system when the project had a large number of modules (for example, Agda) Fixes #20293
* Stop leaking <defunct> llc processesMatthew Pickering2021-09-171-1/+2
| | | | | | | | | | We needed to wait for the process to exit in the clean-up script as otherwise the `llc` process will not be killed until compilation finishes. This leads to running out of process spaces on some OSs. Thanks to Edsko de Vries for suggesting this fix. Fixes #20305
* Ensure .dyn_hi doesn't overwrite .hiZiyang Liu2021-09-173-9/+16
| | | | | | | | This commit fixes the following bug: when `outputHi` is set, and both `.dyn_hi` and `.hi` are needed, both would be written to `outputHi`, causing `.dyn_hi` to overwrite `.hi`. This causes subsequent `readIface` to fail - "mismatched interface file profile tag (wanted "", got "dyn")" - triggering unnecessary rebuild.
* driver: -M allow omitting the -dep-suffix (means empty) (fix #15483)Artem Pelenitsyn2021-09-171-4/+6
|