delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Modules: CmmToAsm (#13009)	Sylvain Henry	2020-02-24	1	-1236/+0
\|
*	Modules: Driver (#13009)	Sylvain Henry	2020-02-21	1	-1/+1
\| \| \| \|	submodule updates: nofib, haddock
*	Disentangle DynFlags and SDoc	Sylvain Henry	2020-02-20	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove several uses of `sdocWithDynFlags`. The remaining ones are mostly CodeGen related (e.g. depend on target platform constants) and will be fixed separately. Metric Decrease: T12425 T9961 WWRec T1969 T14683
*	Do CafInfo/SRT analysis in Cmm	Ömer Sinan Ağacan	2020-01-31	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs after CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] \| Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| -------------- \| -------------- \| ------ \| \| Bytes allocated \| 54,413,350,872 \| 54,701,099,464 \| +0.52% \| \| Bytes copied \| 4,926,037,184 \| 4,990,638,760 \| +1.31% \| \| Max residency \| 421,225,624 \| 424,324,264 \| +0.73% \| * -O1 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 245,849,209,992 \| 246,562,088,672 \| +0.28% \| \| Bytes copied \| 26,943,452,560 \| 27,089,972,296 \| +0.54% \| \| Max residency \| 982,643,440 \| 991,663,432 \| +0.91% \| * -O2 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 291,044,511,408 \| 291,863,910,912 \| +0.28% \| \| Bytes copied \| 37,044,237,616 \| 36,121,690,472 \| -2.49% \| \| Max residency \| 1,071,600,328 \| 1,086,396,256 \| +1.38% \| Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
*	Disable two warnings for files that trigger them	Tom Ellis	2020-01-27	1	-0/+2
\| \| \| \| \| \|	incomplete-uni-patterns and incomplete-record-updates will be in -Wall at a future date, so prepare for that by disabling those warnings on files that trigger them.
*	Module hierarchy: Cmm (cf #13009)	Sylvain Henry	2020-01-25	1	-11/+11
\|
*	Add GHC-API logging hooks	Sylvain Henry	2019-12-18	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add 'dumpAction' hook to DynFlags. It allows GHC API users to catch dumped intermediate codes and information. The format of the dump (Core, Stg, raw text, etc.) is now reported allowing easier automatic handling. * Add 'traceAction' hook to DynFlags. Some dumps go through the trace mechanism (for instance unfoldings that have been considered for inlining). This is problematic because: 1) dumps aren't written into files even with -ddump-to-file on 2) dumps are written on stdout even with GHC API 3) in this specific case, dumping depends on unsafe globally stored DynFlags which is bad for GHC API users We introduce 'traceAction' hook which allows GHC API to catch those traces and to avoid using globally stored DynFlags. * Avoid dumping empty logs via dumpAction/traceAction (but still write empty files to keep the existing behavior)
*	Make dynflag argument for withTiming pure.	Andreas Klebinger	2019-10-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	19 times out of 20 we already have dynflags in scope. We could just always use `return dflags`. But this is in fact not free. When looking at some STG code I noticed that we always allocate a closure for this expression in the heap. Clearly a waste in these cases. For the other cases we can either just modify the callsite to get dynflags or use the _D variants of withTiming I added which will use getDynFlags under the hood.
*	Fix bug in the x86 backend involving the CFG.	Andreas Klebinger	2019-10-23	1	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part two of fixing #17334. There are two parts to this commit: - A bugfix for computing loop levels - A bugfix of basic block invariants in the NCG. ----------------------------------------------------------- In the first bug we ended up with a CFG of the sort: [A -> B -> C] This was represented via maps as fromList [(A,B),(B,C)] and later transformed into a adjacency array. However the transformation did not include block C in the array (since we only looked at the keys of the map). This was still fine until we tried to look up successors for C and tried to read outside of the array bounds when accessing C. In order to prevent this in the future I refactored to code to include all nodes as keys in the map representation. And make this a invariant which is checked in a few places. Overall I expect this to make the code more robust as now any failed lookup will represent an error, versus failed lookups sometimes being expected and sometimes not. In terms of performance this makes some things cheaper (getting a list of all nodes) and others more expensive (adding a new edge). Overall this adds up to no noteable performance difference. ----------------------------------------------------------- Part 2: When the NCG generated a new basic block, it did not always insert a NEWBLOCK meta instruction in the stream which caused a quite subtle bug. During instruction selection a statement `s` in a block B with control of the sort: B -> C will sometimes result in control flow of the sort: ┌ < ┐ v ^ B -> B1 ┴ -> C as is the case for some atomic operations. Now to keep the CFG in sync when introducing B1 we clearly want to insert it between B and C. However there is a catch when we have to deal with self loops. We might start with code and a CFG of these forms: loop: stmt1 ┌ < ┐ .... v ^ stmtX loop ┘ stmtY .... goto loop: Now we introduce B1: ┌ ─ ─ ─ ─ ─┐ loop: │ ┌ < ┐ │ instrs v │ │ ^ .... loop ┴ B1 ┴ ┘ instrsFromX stmtY goto loop: This is simple, all outgoing edges from loop now simply start from B1 instead and the code generator knows which new edges it introduced for the self loop of B1. Disaster strikes if the statement Y follows the same pattern. If we apply the same rule that all outgoing edges change then we end up with: loop ─> B1 ─> B2 ┬─┐ │ │ └─<┤ │ │ └───<───┘ │ └───────<────────┘ This is problematic. The edge B1->B1 is modified as expected. However the modification is wrong! The assembly in this case looked like this: _loop: <instrs> _B1: ... cmpxchgq ... jne _B1 <instrs> <end _B1> _B2: ... cmpxchgq ... jne _B2 <instrs> jmp loop There is no edge _B2 -> _B1 here. It's still a self loop onto _B1. The problem here is that really B1 should be two basic blocks. Otherwise we have control flow in the middle of a basic block. A contradiction! So to account for this we add yet another basic block marker: _B: <instrs> _B1: ... cmpxchgq ... jne _B1 jmp _B1' _B1': <instrs> <end _B1> _B2: ... Now when inserting B2 we will only look at the outgoing edges of B1' and everything will work out nicely. You might also wonder why we don't insert jumps at the end of _B1'. There is no way another block ends up jumping to the labels _B1 or _B2 since they are essentially invisible to other blocks. View them as control flow labels local to the basic block if you'd like. Not doing this ultimately caused (part 2 of) #17334.
*	Implement s390x LLVM backend.	Stefan Schulze Frielinghaus	2019-10-22	1	-0/+1
\| \| \| \| \| \|	This patch adds support for the s390x architecture for the LLVM code generator. The patch includes a register mapping of STG registers onto s390x machine registers which enables a registerised build.
*	Add loop level analysis to the NCG backend.	klebinger.andreas@gmx.at	2019-10-16	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For backends maintaining the CFG during codegen we can now find loops and their nesting level. This is based on the Cmm CFG and dominator analysis. As a result we can estimate edge frequencies a lot better for methods, resulting in far better code layout. Speedup on nofib: ~1.5% Increase in compile times: ~1.9% To make this feasible this commit adds: * Dominator analysis based on the Lengauer-Tarjan Algorithm. * An algorithm estimating global edge frequences from branch probabilities - In CFG.hs A few static branch prediction heuristics: * Expect to take the backedge in loops. * Expect to take the branch NOT exiting a loop. * Expect integer vs constant comparisons to be false. We also treat heap/stack checks special for branch prediction to avoid them being treated as loops.
*	Fix #17334 where NCG did not properly update the CFG.wip/andreask/17334	Andreas Klebinger	2019-10-13	1	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Statements can change the basic block in which instructions are placed during instruction selection. We have to keep track of this switch of the current basic block as we need this information in order to properly update the CFG. This commit implements this change and fixes #17334. We do so by having stmtToInstr return the new block id if a statement changed the basic block.
*	ErrUtils: split withTiming into withTiming and withTimingSilent	Alp Mestanogullari	2019-09-19	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'withTiming' becomes a function that, when passed '-vN' (N >= 2) or '-ddump-timings', will print timing (and possibly allocations) related information. When additionally built with '-eventlog' and executed with '+RTS -l', 'withTiming' will also emit both 'traceMarker' and 'traceEvent' events to the eventlog. 'withTimingSilent' on the other hand will never print any timing information, under any circumstance, and will only emit 'traceEvent' events to the eventlog. As pointed out in !1672, 'traceMarker' is better suited for things that we might want to visualize in tools like eventlog2html, while 'traceEvent' is better suited for internal events that occur a lot more often and that we don't necessarily want to visualize. This addresses #17138 by using 'withTimingSilent' for all the codegen bits that are expressed as a bunch of small computations over streams of codegen ASTs.
*	Remove empty NCG.h	John Ericson	2019-09-13	1	-2/+0
\|
*	Module hierarchy: StgToCmm (#13009)	Sylvain Henry	2019-09-10	1	-1/+1
\| \| \| \| \| \|	Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
*	Few tweaks in -ddump-debug output, minor refactoring	Ömer Sinan Ağacan	2019-09-02	1	-8/+6
\| \| \| \| \| \| \|	- Fixes crazy indentation in -ddump-debug output - We no longer dump empty sections in -ddump-debug when a code block does not have any generated debug info - Minor refactoring in Debug.hs and AsmCodeGen.hs
*	Return results of Cmm streams in backends	Ömer Sinan Ağacan	2019-08-28	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This generalizes code generators (outputAsm, outputLlvm, outputC, and the call site codeOutput) so that they'll return the return values of the passed Cmm streams. This allows accumulating data during Cmm generation and returning it to the call site in HscMain. Previously the Cmm streams were assumed to return (), so the code generators returned () as well. This change is required by !1304 and !1530. Skipping CI as this was tested before and I only updated the commit message. [skip ci]
*	compiler: emit finer grained codegen events to eventlog	Alp Mestanogullari	2019-08-02	1	-20/+25
\|
*	Move 'Platform' to ghc-boot	John Ericson	2019-06-19	1	-1/+1
\| \| \| \| \| \| \|	ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
*	Use DeriveFunctor throughout the codebase (#15654)	Krzysztof Gogolewski	2019-06-12	1	-5/+4
\|
*	Use datatype for unboxed returns when loading ghc into ghci	Michael Sloan	2019-05-22	1	-12/+30
\| \| \| \|	See #13101 and #15454
*	Remove all target-specific portions of Config.hs	John Ericson	2019-05-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. If GHC is to be multi-target, these cannot be baked in at compile time. 2. Compile-time flags have a higher maintenance than run-time flags. 3. The old way makes build system implementation (various bootstrapping details) with the thing being built. E.g. GHC doesn't need to care about which integer library will be used---this is purely a crutch so the build system doesn't need to pass flags later when using that library. 4. Experience with cross compilation in Nixpkgs has shown things work nicer when compiler's can optionally delegate the bootstrapping the package manager. The package manager knows the entire end-goal build plan, and thus can make top-down decisions on bootstrapping. GHC can just worry about GHC, not even core library like base and ghc-prim!
*	removing x87 register support from native code gen	Carter Schonwald	2019-04-10	1	-20/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
*	NCG: correctly escape path strings on Windows (#16389)	Sylvain Henry	2019-03-09	1	-1/+1
\| \| \| \| \|	GHC native code generator generates .incbin and .file directives. We need to escape those strings correctly on Windows (see #16389).
*	Rip out object splitting	Ben Gamari	2019-03-05	1	-25/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
*	Don't wrap the entry map for LiveInfo in Maybe.	klebinger.andreas@gmx.at	2019-02-15	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It never really encoded a invariant. * The linear register allocator just did partial pattern matches * The graph allocator just set it to (Just mapEmpty) for Nothing So I changed LiveInfo to directly contain the map. Further natCmmTopToLive which filled in Nothing is no longer exported. Instead we know call cmmTopLiveness which changes the type AND fills in the map.
*	Allow resizing the stack for the graph allocator.	klebinger.andreas@gmx.at	2019-02-08	1	-3/+15
\| \| \| \| \| \| \| \| \| \|	The graph allocator now dynamically resizes the number of stack slots when running into the limit. This fixes #8657. Also loop membership of basic blocks is now available in the register allocator for cost heuristics.
*	NCG: New code layout algorithm.	Andreas Klebinger	2018-11-17	1	-177/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
*	Replace most occurences of foldl with foldl'.	klebinger.andreas@gmx.at	2018-08-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds foldl' to GhcPrelude and changes must occurences of foldl to foldl'. This leads to better performance especially for quick builds where GHC does not perform strictness analysis. It does change strictness behaviour when we use foldl' to turn a argument list into function applications. But this is only a drawback if code looks ONLY at the last argument but not at the first. And as the benchmarks show leads to fewer allocations in practice at O2. Compiler performance for Nofib: O2 Allocations: -1 s.d. ----- -0.0% +1 s.d. ----- -0.0% Average ----- -0.0% O2 Compile Time: -1 s.d. ----- -2.8% +1 s.d. ----- +1.3% Average ----- -0.8% O0 Allocations: -1 s.d. ----- -0.2% +1 s.d. ----- -0.1% Average ----- -0.2% Test Plan: ci Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal Reviewed By: bgamari, monoidal Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4929
*	Make shortcutting at the asm stage toggleable and default for O2.	Andreas Klebinger	2018-04-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shortcutting during the asm stage of codegen is often redundant as most cases get caught during the Cmm passes. For example during compilation of all of nofib only 508 jumps are eleminated. For this reason I moved the pass from -O1 to -O2. I also made it toggleable with -fasm-shortcutting. Test Plan: ci Reviewers: bgamari Reviewed By: bgamari Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4555
*	nativeGen: Use foldl' instead of foldl	Ben Gamari	2017-11-28	1	-1/+1
\|
*	nativeGen: Use plusUFMList instead of foldr	Ben Gamari	2017-11-28	1	-2/+2
\|
*	compiler: introduce custom "GhcPrelude" Prelude	Herbert Valerio Riedel	2017-09-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
*	Add support for producing position-independent executables	Ben Gamari	2017-08-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589
*	Use correct section types syntax for architecture	Ben Gamari	2017-07-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously GHC would always assume that section types began with `@` while producing assembly, which is not true. For instance, in ARM assembly syntax section types begin with `%`. This abstracts out section type pretty-printing and adjusts it to correctly account for the target architectures assembly flavor. Reviewers: austin, hvr, Phyx Reviewed By: Phyx Subscribers: Phyx, rwbarton, thomie, erikd GHC Trac Issues: #13937 Differential Revision: https://phabricator.haskell.org/D3712
*	Hoopl: remove dependency on Hoopl package	Michal Terepeta	2017-06-23	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
*	Replace Digraph's Node type synonym with a data type	Matthew Pickering	2017-04-04	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This refactoring makes it more obvious when we are constructing a Node for the digraph rather than a less useful 3-tuple. Reviewers: austin, goldfire, bgamari, simonmar, dfeuer Reviewed By: dfeuer Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3414
*	Generalize CmmUnwind and pass unwind information through NCG	Ben Gamari	2017-02-08	1	-9/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in D1532, Trac Trac #11337, and Trac Trac #11338, the stack unwinding information produced by GHC is currently quite approximate. Essentially we assume that register values do not change at all within a basic block. While this is somewhat true in normal Haskell code, blocks containing foreign calls often break this assumption. This results in unreliable call stacks, especially in the code containing foreign calls. This is worse than it sounds as unreliable unwinding information can at times result in segmentation faults. This patch set attempts to improve this situation by tracking unwinding information with finer granularity. By dispensing with the assumption of one unwinding table per block, we allow the compiler to accurately represent the areas surrounding foreign calls. Towards this end we generalize the representation of unwind information in the backend in three ways, * Multiple CmmUnwind nodes can occur per block * CmmUnwind nodes can now carry unwind information for multiple registers (while not strictly necessary; this makes emitting unwinding information a bit more convenient in the compiler) * The NCG backend is given an opportunity to modify the unwinding records since it may need to make adjustments due to, for instance, native calling convention requirements for foreign calls (see #11353). This sets the stage for resolving #11337 and #11338. Test Plan: Validate Reviewers: scpmw, simonmar, austin, erikd Subscribers: qnikst, thomie Differential Revision: https://phabricator.haskell.org/D2741
*	Fix terminal corruption bug and clean up SDoc interface.	Phil Ruffwind	2017-01-10	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Fix #13076 by wrapping `printDoc_` so that the terminal color is reset even if an exception occurs. - Add `printSDoc`, `printSDocLn`, and `bufLeftRenderSDoc` to keep `SDoc` values abstract (they are wrappers of `printDoc_`, `printDoc`, and `bufLeftRender` respectively). - Remove unused function: `printForAsm` Test Plan: manual Reviewers: RyanGlScott, austin, dfeuer, bgamari Reviewed By: dfeuer, bgamari Subscribers: dfeuer, mpickering, thomie Differential Revision: https://phabricator.haskell.org/D2932 GHC Trac Issues: #13076
*	BlockId: remove BlockMap and BlockSet synonyms	Michal Terepeta	2016-12-08	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This continues removal of `BlockId` module in favor of Hoopl's `Label`. Most of the changes here are mechanical, apart from the orphan `Outputable` instances for `LabelMap` and `LabelSet`. For now I've moved them to `cmm/Hoopl`, since it's already trying to manage all imports from Hoopl (to avoid any collisions). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: bgamari, austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2800
*	Remove most functions from cmm/BlockId	Michal Terepeta	2016-11-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems that `BlockId` module could simply go away in favor of Hoopl's `Label`. This is the first step to do that. In a few places I had to add some type signatures, but most of them seem to help with code readability. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, simonmar, bgamari Reviewed By: bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2765
*	AsmCodeGen: Refactor worker in cmmNativeGens	Ben Gamari	2016-11-29	1	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Test Plan: Validate Reviewers: austin, simonmar, michalt Reviewed By: simonmar, michalt Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2736
*	Test for unnecessary register spills	Thomas Jakway	2016-11-15	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: mainland, simonmar, michalt, bgamari, austin Reviewed By: bgamari Subscribers: simonpj, mpickering, thomie Differential Revision: https://phabricator.haskell.org/D2638 GHC Trac Issues: #12744, #12745
*	AsmCodeGen: Give linear-scan and coloring reg. allocators different cc names	Ömer Sinan Ağacan	2016-08-06	1	-2/+2
\|
*	Document some codegen nondeterminism	Bartosz Nitka	2016-07-07	1	-1/+3
\| \| \| \| \| \| \| \| \|	Bit-for-bit reproducible binaries are not a goal for now, so this is just marking places that could be a problem. Doing this will allow eltsUFM to be removed and will leave only nonDetEltsUFM. GHC Trac: #4012
*	nativeGen: Allow -fregs-graph to be used	Ben Gamari	2016-06-30	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the flag was silently ignored due the #7679 and #8657. This, however, seems unnecessarily brutal and makes experimentation unduly difficult for users. Test Plan: Validate Reviewers: austin, simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2335 GHC Trac Issues: #7679, #8657
*	Provide Uniquable version of SCC	Bartosz Nitka	2016-06-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to remove the `Ord Unique` instance because there's no way to implement it in deterministic way and it's too easy to use by accident. We sometimes compute SCC for datatypes whose Ord instance is implemented in terms of Unique. The Ord constraint on SCC is just an artifact of some internal data structures. We can have an alternative implementation with a data structure that uses Uniquable instead. This does exactly that and I'm pleased that I didn't have to introduce any duplication to do that. Test Plan: ./validate I looked at performance tests and it's a tiny bit better. Reviewers: bgamari, simonmar, ezyang, austin, goldfire Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2359 GHC Trac Issues: #4012
*	Replace calls to `ptext . sLit` with `text`	Jan Stolarek	2016-01-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the past the canonical way for constructing an SDoc string literal was the composition `ptext . sLit`. But for some time now we have function `text` that does the same. Plus it has some rules that optimize its runtime behaviour. This patch takes all uses of `ptext . sLit` in the compiler and replaces them with calls to `text`. The main benefits of this patch are clener (shorter) code and less dependencies between module, because many modules now do not need to import `FastString`. I don't expect any performance benefits - we mostly use SDocs to report errors and it seems there is little to be gained here. Test Plan: ./validate Reviewers: bgamari, austin, goldfire, hvr, alanz Subscribers: goldfire, thomie, mpickering Differential Revision: https://phabricator.haskell.org/D1784
*	Remove some redundant definitions/constraints	Herbert Valerio Riedel	2015-12-31	1	-1/+0
\| \| \| \| \| \|	Starting with GHC 7.10 and base-4.8, `Monad` implies `Applicative`, which allows to simplify some definitions to exploit the superclass relationship. This a first refactoring to that end.
*	Drop pre-AMP compatibility CPP conditionals	Herbert Valerio Riedel	2015-12-31	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Since GHC 8.1/8.2 only needs to be bootstrap-able by GHC 7.10 and GHC 8.0 (and GHC 8.2), we can now finally drop all that pre-AMP compatibility CPP-mess for good! Reviewers: austin, goldfire, bgamari Subscribers: goldfire, thomie, erikd Differential Revision: https://phabricator.haskell.org/D1724