summaryrefslogtreecommitdiff
path: root/compiler/cmm
Commit message (Collapse)AuthorAgeFilesLines
...
* Correctly add unwinding info in manifestSp and makeFixupBlocksBartosz Nitka2018-05-032-37/+116
| | | | | | | | | | | | | | | | | | | | | | | | In `manifestSp` the unwind info was before the relevant instruction, not after. I added some notes to establish semantics. Also removes redundant annotation in stg_catch_frame. For `makeFixupBlocks` it looks like we were off by `wORD_SIZE dflags`. I'm not sure why, but it lines up with `manifestSp`. In fact it lines up so well so that I can consolidate the Sp unwind logic in `maybeAddUnwind`. I detected the problems with `makeFixupBlocks` by running T14779b after patching D4559. Test Plan: added a new test Reviewers: bgamari, scpmw, simonmar, erikd Reviewed By: bgamari Subscribers: thomie, carter GHC Trac Issues: #14999 Differential Revision: https://phabricator.haskell.org/D4606
* Use newtype deriving for Hoopl codeU-Maokai\andi2018-04-132-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hoopl.Collections/.Label has newtype containers which derive Functor/Traversable. Enabling GeneralizedNewtypeDeriving improves allocation count and compile time when building these files for GHC. ``` Vanilla, O1 <<ghc: 378555664 bytes, 76 GCs, 8663140/18535016 avg/max bytes residency (5 samples), 63M in use, 0.000 INIT (0.000 elapsed), 0.219 MUT (0.354 elapsed), 0.141 GC (0.138 elapsed) :ghc>> GeneralizedNewtypeDeriving , O1 <<ghc: 301026536 bytes, 78 GCs, 8392886/17181088 avg/max bytes residency (5 samples), 63M in use, 0.000 INIT (0.000 elapsed), 0.156 MUT (0.230 elapsed), 0.094 GC (0.106 elapsed) :ghc>> ``` Test Plan: ci Reviewers: bgamari, simonmar, RyanGlScott Reviewed By: RyanGlScott Subscribers: mpickering, RyanGlScott, thomie, carter Differential Revision: https://phabricator.haskell.org/D4583
* Revert "CmmPipeline: add a second pass of CmmCommonBlockElim"Michal Terepeta2018-04-132-43/+7
| | | | | | | | | | | | | | | | This reverts commit d5c4d46a62ce6a0cfa6440344f707136eff18119. Please see #14989 for details. Test Plan: ./validate Reviewers: bgamari, simonmar Subscribers: thomie, carter GHC Trac Issues: #14989 Differential Revision: https://phabricator.haskell.org/D4577
* CmmPipeline: add a second pass of CmmCommonBlockElimMichal Terepeta2018-03-272-7/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sinking pass often gets rid of unnecessary registers registers/assignements exposing more opportunities for CBE, so this commit adds a second round of CBE after the sinking pass and should fix #12915 (and some examples in #14226). Nofib results: * Binary size: 0.9% reduction on average * Compile allocations: 0.7% increase on average * Runtime: noisy, two separate runs of nofib showed a tiny reduction on average, (~0.2-0.3%), but I think this is mostly noise * Compile time: very noisy, but generally within +/- 0.5% (one run faster, one slower) One interesting part of this change is that running CBE invalidates results of proc-point analysis. But instead of re-doing the whole analysis, we can use the map that CBE creates for replacing/comparing block labels (maps a redundant label to a useful one) to update the results of proc-point analysis. This lowers the overhead compared to the previous experiment in #12915. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #12915, #14226 Differential Revision: https://phabricator.haskell.org/D4417
* CmmUtils: get rid of insertBlockMichal Terepeta2018-03-195-24/+19
| | | | | | | | | | | | | | | | | | | `Hoopl.Graph` has almost exactly the same function, so let's use that. Also, use `IntMap.alter` to make it more efficient. Also switch `Hoopl` to use strict maps. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: dfeuer, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4493
* Hoopl: improve postorder calculationMichal Terepeta2018-03-199-102/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fix the naming and comments to indicate that we are calculating *reverse* postorder (and not the standard postorder). - Rewrite the calculation to avoid CPS code. I found it fairly difficult to understand and the new one seems faster (according to nofib, decreases compiler allocations by 0.2%) - Remove `LabelsPtr`, which seems unnecessary and could be *really* confusing. For instance, previously: `postorder_dfs_from <block with label X>` and `postorder_dfs_from <label X>` would actually mean quite different things (and give different results). - Change the `Dataflow` module to always use entry of the graph for reverse postorder calculation. This should be the only change in behavior of this commit. Previously, if the caller provided initial facts for some of the labels, we would use those labels for our postorder calculation. However, I don't think that's correct in general - if the initial facts did not contain the entry of the graph, we would never analyze the blocks reachable from the entry but unreachable from the labels provided with the initial facts. It seems that the only analysis that used this was proc-point analysis, which I think would always include the entry block (so I don't think there's any bug due to this). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4464
* Get rid of more CPP in cmm/ and codeGen/Michal Terepeta2018-03-194-17/+8
| | | | | | | | | | | | | | | | | | This removes a bunch of unnecessary includes of `HsVersions.h` along with unnecessary CPP (e.g., due to checking for DEBUG which can be achieved by looking at `debugIsOn`) Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4462
* Be more selective in which conditionals we invertSimon Marlow2018-03-193-31/+42
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, AndreasK, erikd Reviewed By: AndreasK Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4398
* Remove splitEithers, use partitionEithers from baseÖmer Sinan Ağacan2018-03-121-2/+3
|
* Add -fexternal-dynamic-refsSimon Marlow2018-03-081-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The `-dynamic` flag does two things: * In the code generator, it generates code designed to link against external shared libraries. References outside of the current module go through platform-specific indirection tables (e.g. the GOT on ELF). * It enables a "way", which changes which hi files we look for (`Foo.dyn_hi`) and which libraries we link against. Some specialised applications want the first of these without the second. (I could go into detail here but it's probably not all that important). This diff splits out the code-generation effects of `-dynamic` from the "way" parts of its behaviour, via a new flag `-fexternal-dynamic-refs`. Test Plan: validate Reviewers: niteria, bgamari, erikd Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4477
* cmm/: Avoid using lazy left foldsMichal Terepeta2018-03-066-21/+27
| | | | | | | | | | | | | | | | | | This basically replaces all uses of `foldl` with `foldl'`. I've looked at all the call sites and there doesn't seem to be any reason to prefer the lazy version. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4463
* CBE: re-introduce bgamari's fixesMichal Terepeta2018-02-181-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | During some recent work on CBE we discovered that `zipWith` is used to check for equality, but that doesn't quite work if lists are of different lengths! This was fixed by bgamari, but unfortunately the fix had to be rolled back due to other changes in CBE in 50adbd7c5fe5894d3e6e2a58b353ed07e5f8949d. Since I wanted to have another look at CBE anyway, we agreed that the first thing to do would be to re-introduce the fix. Sadly I don't have any actual test case that would exercise this. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14226 Differential Revision: https://phabricator.haskell.org/D4387
* Tidy up and consolidate canned CmmReg and CmmGlobalsSimon Marlow2018-02-184-11/+38
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4380
* cmm: Remove unnecessary HsVersion.h includesMichal Terepeta2018-02-0610-32/+3
| | | | | | | | | | | | Test Plan: ./validate Reviewers: goldfire, bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4367
* cmm: Revert more aggressive CBE due to #14226Ben Gamari2018-02-033-237/+86
| | | | | | | | | | | | | | | | | Trac #14226 noted that the C-- CBE pass frequently fails to common up semantically identical blocks due to the differences in local register naming. These patches fixed this by making the pass consider equality up to alpha-renaming. However, the new logic failed to consider the possibility that local register naming *may* matter across multiple blocks. This lead to the regression #14754. I'll need to do a bit of thinking on a proper solution to this but in the meantime I'm reverting all four patches. This reverts commit a27056f9823f8bbe2302f1924b3ab38fd6752e37. This reverts commit 6f990c54f922beae80362fe62426beededc21290. This reverts commit 9aa73892e10e90a1799b9277da593e816a827364. This reverts commit 7920a7d9c53083b234e060a3e72f00b601a46808.
* Hoopl.Collections: change right folds to strict left foldsMichal Terepeta2018-02-028-34/+41
| | | | | | | | | | | | | | | | | | | | | | It seems that most uses of these folds should be strict left folds (I could only find a single place that benefits from a right fold). So this removes the existing `setFold`/`mapFold`/`mapFoldWihKey` replaces them with: - `setFoldl`/`mapFoldl`/`mapFoldlWithKey` (strict left folds) - `setFoldr`/`mapFoldr` (for the less common case where a right fold actually makes sense, e.g., `CmmProcPoint`) Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter, kavon Differential Revision: https://phabricator.haskell.org/D4356
* Invert likeliness when improving conditionalsAlexander Biehl2018-01-291-1/+5
| | | | ... in CmmSink
* cmm: Use two equality checks for two alt switch with defaultU-Maokai\andi2018-01-261-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For code like: f 1 = e1 f 7 = e2 f _ = e3 We can treat it as a sparse jump table, check if we are outside of the range in one direction first and then start checking the values. GHC currently does this by checking for x>7, then x <= 7 and at last x == 1. This patch changes this such that we only compare for equality against the two values and jump to the default if non are equal. The resulting code is both faster and smaller. wheel-sieve1 improves by 4-8% depending on problem size. This implements the idea from #14644 Reviewers: bgamari, simonmar, simonpj, nomeata Reviewed By: simonpj, nomeata Subscribers: nomeata, simonpj, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4294
* Remove Hoopl.UniqueMichal Terepeta2018-01-264-109/+85
| | | | | | | | | | | | | | | | | | | | | | | | | Reasons to remove: - It's confusing - we already have a widely used `Unique` module in `basicTypes/` that defines a newtype called `Unique` - `Hoopl.Unique` is not actually used much I've also moved the `Unique{Map,Set}` from `Hoopl.Unique` to `Hoopl.Collections` to keep things together. But that module is also a bit funny - it defines two type-classes that have only one instance each. So we should probably either remove them or use them more widely... In any case, that will be a separate change. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: kavon, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4331
* Add ability to parse likely flags for ifs in Cmm.klebinger.andreas@gmx.at2018-01-262-21/+41
| | | | | | | | | | | | | | | | | Adding the ability to parse likely flags in Cmm allows better codegen for cmm files. Test Plan: ci Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14672 Differential Revision: https://phabricator.haskell.org/D4316
* Handle the likely:True case in CmmContFlowOptklebinger.andreas@gmx.at2018-01-261-13/+32
| | | | | | | | | | | | | | | | | | It's better to fall through to the likely case than to jump to it. We optimize for this in CmmContFlowOpt when likely:False. This commit extends the logic there to handle cases with likely:True as well. Test Plan: ci Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: simonmar, alexbiehl, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4306
* Use IntSet in DataflowBartosz Nitka2018-01-211-23/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change, a list was used as a substitute for a heap. This led to quadratic behavior on a simple program (see new test case). This change replaces it with IntSet in effect reverting 5a1a2633553. @simonmar said it's fine to revert as long as nofib results are good. Test Plan: new test case: 20% improvement 3x improvement when N=10000 nofib: I run it twice for before and after because the compile time results are noisy. - Compile Allocations: ``` before before re-run after after re-run -1 s.d. ----- -0.0% -0.1% -0.1% +1 s.d. ----- +0.0% +0.1% +0.1% Average ----- +0.0% -0.0% -0.0% ``` - Compile Time: ``` before before re-run after after re-run -1 s.d. ----- -0.1% -2.3% -2.6% +1 s.d. ----- +5.2% +3.7% +4.4% Average ----- +2.5% +0.7% +0.8% ``` I checked each case and couldn't find consistent slow-down/speed-up on compile time. Full results here: P173 Reviewers: simonpj, simonmar, bgamari Reviewed By: bgamari Subscribers: rwbarton, thomie, carter, simonmar GHC Trac Issues: #14667 Differential Revision: https://phabricator.haskell.org/D4329
* Add new mbmi and mbmi2 compiler flagsJohn Ky2018-01-213-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
* cmm: Include braces on default branch as required by the parserklebinger.andreas@gmx.at2018-01-181-2/+2
| | | | | | | | | | | | Test Plan: Looking at cmm-dump Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4293
* Fix references to cminusminus.orgBen Gamari2018-01-183-6/+6
| | | | | | | | | | | | Reviewers: simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter GHC Trac Issues: #14665 Differential Revision: https://phabricator.haskell.org/D4311
* Simplify guard in createSwitchPlan.klebinger.andreas@gmx.at2018-01-151-5/+4
| | | | | | | | | | | | | | | | | | Given that we have two unique keys (guaranteed by Map) checking that `|range| == 1` is faster. The fact that `x1 == lo` and `x2 == hi` is guaranteed by mkSwitchTargets which removes values outside of the range. Test Plan: ci Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4295
* Get rid of some stuttering in comments and docsGabor Greif2017-12-191-1/+1
|
* CLabel: A bit of documentationBen Gamari2017-11-281-3/+25
|
* Cmm: Add missing cases for BlockInfoTableBen Gamari2017-11-281-0/+2
| | | | | Silly rabbit, BlockInfoTables are data. This fixes the unregisterised build, finally fixing #14454.
* CLabel: More specific debug output from CLabelBen Gamari2017-11-281-2/+3
|
* CLabel: Refactor pprDynamicLinkerAsmLabelBen Gamari2017-11-281-49/+59
|
* cmm: Use LocalBlockLabel instead of AsmTempLabel to represent blocksBen Gamari2017-11-283-11/+47
| | | | | | | | | | blockLbl was originally changed in 8b007abbeb3045900a11529d907a835080129176 to use mkTempAsmLabel to fix an inconsistency resulting in #14221. However, this breaks the C code generator, which doesn't support AsmTempLabels (#14454). Instead let's try going the other direction: use a new CLabel variety, LocalBlockLabel. Then we can teach the C code generator to deal with these as well.
* CLabel.labelType: Make catch-all case explicitBen Gamari2017-11-281-3/+14
|
* Revert "Add new mbmi and mbmi2 compiler flags"Ben Gamari2017-11-223-14/+0
| | | | | | This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
* cmm: Optimise remainders by powers of twoBen Gamari2017-11-211-25/+41
| | | | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #14437 Differential Revision: https://phabricator.haskell.org/D4180
* CLabels: Remove CaseLabelBen Gamari2017-11-151-32/+0
| | | | | | | | Reviewers: simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4188
* CLabel: Clean up unused label typesBen Gamari2017-11-151-45/+5
| | | | | | | | | | | | | | Test Plan: Validate Reviewers: trommler, simonmar Reviewed By: trommler Subscribers: rwbarton, thomie GHC Trac Issues: #14454 Differential Revision: https://phabricator.haskell.org/D4182
* Add new mbmi and mbmi2 compiler flagsJohn Ky2017-11-153-0/+14
| | | | | | | | | | | | | | | | | This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
* Fix PPC NCG after blockID patchPeter Trommler2017-11-092-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit rGHC8b007ab assigns the same label to the first basic block of a proc and to the proc entry point. This violates the PPC 64-bit ELF v. 1.9 and v. 2.0 ABIs and leads to duplicate symbols. This patch fixes duplicate symbols caused by block labels In commit rGHCd7b8da1 an info table label is generated from a block id. Getting the entry label from that info label leads to an undefined symbol because a suffix "_entry" that is not present in the block label. To fix that issue add a new info table label flavour for labels derived from block ids. Converting such a label with toEntryLabel produces the original block label. Fixes #14311 Test Plan: ./validate Reviewers: austin, bgamari, simonmar, erikd, hvr, angerman Reviewed By: bgamari Subscribers: rwbarton, thomie GHC Trac Issues: #14311 Differential Revision: https://phabricator.haskell.org/D4149
* cmm/CBE: Fix a few more zip usesBen Gamari2017-11-061-3/+8
| | | | | | | | | | | | | | | Ensure that we don't consider lists of equal length to be equal when they are not. I noticed these while working on the fix for #14361. Reviewers: austin, simonmar, michalt Reviewed By: michalt Subscribers: rwbarton, thomie GHC Trac Issues: #14361 Differential Revision: https://phabricator.haskell.org/D4153
* cmm/CBE: Fix comparison between blocks of different lengthsBen Gamari2017-11-061-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously CBE computed equality by taking the lists of middle nodes of the blocks being compared and zipping them together. It would then map over this list with the equality relation, and accumulate the result. However, this is completely wrong: Consider what will happen when we compare a block with no middle nodes with one with one or more. The result of `zip` will be empty and consequently the pass may conclude that the two are indeed equivalent (if their last nodes also match). This is very bad and the cause of #14361. The solution I chose was just to write out an explicit recursion, like I distinctly recall considering doing when I first wrote this code. Unfortunately I was feeling clever at the time. Unfortunately this case was just rare enough not to be triggered by the testsuite. I still need to find a testcase that doesn't have external dependencies. Test Plan: Need to find a more minimal testcase Reviewers: austin, simonmar, michalt Reviewed By: michalt Subscribers: michalt, rwbarton, thomie, hvr GHC Trac Issues: #14361 Differential Revision: https://phabricator.haskell.org/D4152
* CmmSink: Use a IntSet instead of a listalexbiehl2017-11-021-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CmmProcs which have *lots* of local variables take a considerable amount of time in CmmSink. This was noticed by @tdammers in #7258 while compiling files with large records (~200-400 fields). Before: ``` Sun Oct 29 19:58 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs -fforce-recomp -O2 total time = 26.00 secs (25996 ticks @ 1000 us, 1 processor) total alloc = 14,921,627,912 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 55.7 15.9 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 19.5 30.6 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 4.2 9.0 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 4.0 11.1 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 2.8 6.3 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 1.6 3.7 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 1.5 3.5 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 1.2 2.4 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 1.2 1.9 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 0.9 1.8 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 0.9 2.1 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 0.8 1.9 ``` After: ``` Sun Oct 29 19:18 2017 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/Users/alexbiehl/git/ghc/inplace/lib /Users/alexbiehl/Downloads/W2.hs -fforce-recomp -O2 total time = 13.31 secs (13307 ticks @ 1000 us, 1 processor) total alloc = 15,772,184,488 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:761:39-74 38.3 29.0 sink CmmPipeline compiler/cmm/CmmPipeline.hs:(104,13)-(105,59) 13.2 20.3 RegAlloc-linear AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(658,27)-(660,55) 8.3 10.5 FloatOutwards SimplCore compiler/simplCore/SimplCore.hs:471:40-66 8.1 8.5 pprNativeCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(529,37)-(530,65) 5.4 5.9 NewStranal SimplCore compiler/simplCore/SimplCore.hs:480:40-63 3.1 3.5 OccAnal SimplCore compiler/simplCore/SimplCore.hs:(739,22)-(740,67) 2.9 3.3 StgCmm HscMain compiler/main/HscMain.hs:(1426,13)-(1427,62) 2.3 2.3 regLiveness AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(591,17)-(593,52) 2.1 1.8 NativeCodeGen CodeOutput compiler/main/CodeOutput.hs:171:18-78 1.7 2.0 genMachCode AsmCodeGen compiler/nativeGen/AsmCodeGen.hs:(580,17)-(582,62) 1.6 1.7 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 1.4 1.8 foldNodesBwdOO Hoopl.Dataflow compiler/cmm/Hoopl/Dataflow.hs:(397,1)-(403,17) 1.1 0.8 ``` Reviewers: austin, bgamari, simonmar Reviewed By: bgamari Subscribers: duog, rwbarton, thomie, tdammers GHC Trac Issues: #7258 Differential Revision: https://phabricator.haskell.org/D4145
* Allow packing constructor fieldsMichal Terepeta2017-10-292-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is another step for fixing #13825 and is based on D38 by Simon Marlow. The change allows storing multiple constructor fields within the same word. This currently applies only to `Float`s, e.g., ``` data Foo = Foo {-# UNPACK #-} !Float {-# UNPACK #-} !Float ``` on 64-bit arch, will now store both fields within the same constructor word. For `WordX/IntX` we'll need to introduce new primop types. Main changes: - We now use sizes in bytes when we compute the offsets for constructor fields in `StgCmmLayout` and introduce padding if necessary (word-sized fields are still word-aligned) - `ByteCodeGen` had to be updated to correctly construct the data types. This required some new bytecode instructions to allow pushing things that are not full words onto the stack (and updating `Interpreter.c`). Note that we only use the packed stuff when constructing data types (i.e., for `PACK`), in all other cases the behavior should not change. - `RtClosureInspect` was changed to handle the new layout when extracting subterms. This seems to be used by things like `:print`. I've also added a test for this. - I deviated slightly from Simon's approach and use `PrimRep` instead of `ArgRep` for computing the size of fields. This seemed more natural and in the future we'll probably want to introduce new primitive types (e.g., `Int8#`) and `PrimRep` seems like a better place to do that (where we already have `Int64Rep` for example). `ArgRep` on the other hand seems to be more focused on calling functions. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: bgamari, simonmar, austin, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: maoe, rwbarton, thomie GHC Trac Issues: #13825 Differential Revision: https://phabricator.haskell.org/D3809
* Turn `compareByteArrays#` out-of-line primop into inline primopalexbiehl2017-10-293-0/+5
| | | | | | | | | | | | Depends on D4090 Reviewers: austin, bgamari, erikd, simonmar, alexbiehl Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4091
* Add -falignment-sanitization flagBen Gamari2017-10-294-3/+24
| | | | | | | | | | | | | | | | | | | Here we add a flag to instruct the native code generator to add alignment checks in all info table dereferences. This is helpful in catching pointer tagging issues. Thanks to @jrtc27 for uncovering the tagging issues on Sparc which inspired this flag. Test Plan: Validate Reviewers: simonmar, austin, erikd Reviewed By: simonmar Subscribers: rwbarton, trofi, thomie, jrtc27 Differential Revision: https://phabricator.haskell.org/D4101
* Typofix in commentGabor Greif2017-10-181-1/+1
|
* A bunch of typofixesGabor Greif2017-09-261-1/+1
|
* don't allow AsmTempLabel in UNREG mode (Trac #14264)Sergei Trofimovich2017-09-241-0/+2
| | | | | | | | | | | | | AsmTempLabel is really a label that describes label in assembly output (or equivalent like LLVM IR). Unregisterised build does not handle it correctly. This change does not fix UNREG build failure in Ticket #14264 but reverts back to panic: pprCLbl AsmTempLabel Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
* Fix AsmTempLabelMoritz Angermann2017-09-231-1/+0
| | | | | | | | | | | | | | | | | | Summary: This is another fallout from 8b007abb should fix Trac #14264. I am not sure if this is complete. It does however allow me to build an iOS LLVM cross compiler. Reviewers: bgamari, trofi, austin, simonmar Reviewed By: trofi Subscribers: rwbarton, thomie GHC Trac Issues: #14264 Differential Revision: https://phabricator.haskell.org/D4014
* Fix broken LLVM code genMoritz Angermann2017-09-211-2/+3
| | | | | | | | | | | | | | | In 8b007abbeb30 (nativeGen: Consistently use blockLbl to generate CLabels from BlockIds) all blockLbls were changed. This interfered with the `toInfoLbl` call in CmmProcPoint, and caused the LLVM backend to fall over. Reviewers: bgamari, austin, simonmar Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4006