delta/haskell.git - gitlab.haskell.org: ghc/ghc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Modules: CmmToAsm (#13009)	Sylvain Henry	2020-02-24	18	-3912/+0
\|
*	Modules: Driver (#13009)	Sylvain Henry	2020-02-21	7	-7/+7
\| \| \| \|	submodule updates: nofib, haddock
*	Do CafInfo/SRT analysis in Cmm	Ömer Sinan Ağacan	2020-01-31	6	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all CafInfo predictions and various hacks to preserve predicted CafInfos from the compiler and assigns final CafInfos to interface Ids after code generation. SRT analysis is extended to support static data, and Cmm generator is modified to allow generating static_link fields after SRT analysis. This also fixes `-fcatch-bottoms`, which introduces error calls in case expressions in CorePrep, which runs after CoreTidy (which is where we decide on CafInfos) and turns previously non-CAFFY things into CAFFY. Fixes #17648 Fixes #9718 Evaluation ========== NoFib ----- Boot with: `make boot mode=fast` Run: `make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1` -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- CS -0.0% 0.0% -0.0% -0.0% -0.0% CSD -0.0% 0.0% -0.0% -0.0% -0.0% FS -0.0% 0.0% -0.0% -0.0% -0.0% S -0.0% 0.0% -0.0% -0.0% -0.0% VS -0.0% 0.0% -0.0% -0.0% -0.0% VSD -0.0% 0.0% -0.0% -0.0% -0.5% VSM -0.0% 0.0% -0.0% -0.0% -0.0% anna -0.1% 0.0% -0.0% -0.0% -0.0% ansi -0.0% 0.0% -0.0% -0.0% -0.0% atom -0.0% 0.0% -0.0% -0.0% -0.0% awards -0.0% 0.0% -0.0% -0.0% -0.0% banner -0.0% 0.0% -0.0% -0.0% -0.0% bernouilli -0.0% 0.0% -0.0% -0.0% -0.0% binary-trees -0.0% 0.0% -0.0% -0.0% -0.0% boyer -0.0% 0.0% -0.0% -0.0% -0.0% boyer2 -0.0% 0.0% -0.0% -0.0% -0.0% bspt -0.0% 0.0% -0.0% -0.0% -0.0% cacheprof -0.0% 0.0% -0.0% -0.0% -0.0% calendar -0.0% 0.0% -0.0% -0.0% -0.0% cichelli -0.0% 0.0% -0.0% -0.0% -0.0% circsim -0.0% 0.0% -0.0% -0.0% -0.0% clausify -0.0% 0.0% -0.0% -0.0% -0.0% comp_lab_zift -0.0% 0.0% -0.0% -0.0% -0.0% compress -0.0% 0.0% -0.0% -0.0% -0.0% compress2 -0.0% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm1 -0.0% 0.0% -0.0% -0.0% -0.0% cryptarithm2 -0.0% 0.0% -0.0% -0.0% -0.0% cse -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e1 -0.0% 0.0% -0.0% -0.0% -0.0% digits-of-e2 -0.0% 0.0% -0.0% -0.0% -0.0% dom-lt -0.0% 0.0% -0.0% -0.0% -0.0% eliza -0.0% 0.0% -0.0% -0.0% -0.0% event -0.0% 0.0% -0.0% -0.0% -0.0% exact-reals -0.0% 0.0% -0.0% -0.0% -0.0% exp3_8 -0.0% 0.0% -0.0% -0.0% -0.0% expert -0.0% 0.0% -0.0% -0.0% -0.0% fannkuch-redux -0.0% 0.0% -0.0% -0.0% -0.0% fasta -0.0% 0.0% -0.0% -0.0% -0.0% fem -0.0% 0.0% -0.0% -0.0% -0.0% fft -0.0% 0.0% -0.0% -0.0% -0.0% fft2 -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% fish -0.0% 0.0% -0.0% -0.0% -0.0% fluid -0.1% 0.0% -0.0% -0.0% -0.0% fulsom -0.0% 0.0% -0.0% -0.0% -0.0% gamteb -0.0% 0.0% -0.0% -0.0% -0.0% gcd -0.0% 0.0% -0.0% -0.0% -0.0% gen_regexps -0.0% 0.0% -0.0% -0.0% -0.0% genfft -0.0% 0.0% -0.0% -0.0% -0.0% gg -0.0% 0.0% -0.0% -0.0% -0.0% grep -0.0% 0.0% -0.0% -0.0% -0.0% hidden -0.0% 0.0% -0.0% -0.0% -0.0% hpg -0.1% 0.0% -0.0% -0.0% -0.0% ida -0.0% 0.0% -0.0% -0.0% -0.0% infer -0.0% 0.0% -0.0% -0.0% -0.0% integer -0.0% 0.0% -0.0% -0.0% -0.0% integrate -0.0% 0.0% -0.0% -0.0% -0.0% k-nucleotide -0.0% 0.0% -0.0% -0.0% -0.0% kahan -0.0% 0.0% -0.0% -0.0% -0.0% knights -0.0% 0.0% -0.0% -0.0% -0.0% lambda -0.0% 0.0% -0.0% -0.0% -0.0% last-piece -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% life -0.0% 0.0% -0.0% -0.0% -0.0% lift -0.0% 0.0% -0.0% -0.0% -0.0% linear -0.1% 0.0% -0.0% -0.0% -0.0% listcompr -0.0% 0.0% -0.0% -0.0% -0.0% listcopy -0.0% 0.0% -0.0% -0.0% -0.0% maillist -0.0% 0.0% -0.0% -0.0% -0.0% mandel -0.0% 0.0% -0.0% -0.0% -0.0% mandel2 -0.0% 0.0% -0.0% -0.0% -0.0% mate -0.0% 0.0% -0.0% -0.0% -0.0% minimax -0.0% 0.0% -0.0% -0.0% -0.0% mkhprog -0.0% 0.0% -0.0% -0.0% -0.0% multiplier -0.0% 0.0% -0.0% -0.0% -0.0% n-body -0.0% 0.0% -0.0% -0.0% -0.0% nucleic2 -0.0% 0.0% -0.0% -0.0% -0.0% para -0.0% 0.0% -0.0% -0.0% -0.0% paraffins -0.0% 0.0% -0.0% -0.0% -0.0% parser -0.1% 0.0% -0.0% -0.0% -0.0% parstof -0.1% 0.0% -0.0% -0.0% -0.0% pic -0.0% 0.0% -0.0% -0.0% -0.0% pidigits -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% pretty -0.0% 0.0% -0.3% -0.4% -0.4% primes -0.0% 0.0% -0.0% -0.0% -0.0% primetest -0.0% 0.0% -0.0% -0.0% -0.0% prolog -0.0% 0.0% -0.0% -0.0% -0.0% puzzle -0.0% 0.0% -0.0% -0.0% -0.0% queens -0.0% 0.0% -0.0% -0.0% -0.0% reptile -0.0% 0.0% -0.0% -0.0% -0.0% reverse-complem -0.0% 0.0% -0.0% -0.0% -0.0% rewrite -0.0% 0.0% -0.0% -0.0% -0.0% rfib -0.0% 0.0% -0.0% -0.0% -0.0% rsa -0.0% 0.0% -0.0% -0.0% -0.0% scc -0.0% 0.0% -0.3% -0.5% -0.4% sched -0.0% 0.0% -0.0% -0.0% -0.0% scs -0.0% 0.0% -0.0% -0.0% -0.0% simple -0.1% 0.0% -0.0% -0.0% -0.0% solid -0.0% 0.0% -0.0% -0.0% -0.0% sorting -0.0% 0.0% -0.0% -0.0% -0.0% spectral-norm -0.0% 0.0% -0.0% -0.0% -0.0% sphere -0.0% 0.0% -0.0% -0.0% -0.0% symalg -0.0% 0.0% -0.0% -0.0% -0.0% tak -0.0% 0.0% -0.0% -0.0% -0.0% transform -0.0% 0.0% -0.0% -0.0% -0.0% treejoin -0.0% 0.0% -0.0% -0.0% -0.0% typecheck -0.0% 0.0% -0.0% -0.0% -0.0% veritas -0.0% 0.0% -0.0% -0.0% -0.0% wang -0.0% 0.0% -0.0% -0.0% -0.0% wave4main -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve1 -0.0% 0.0% -0.0% -0.0% -0.0% wheel-sieve2 -0.0% 0.0% -0.0% -0.0% -0.0% x2n1 -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.3% -0.5% -0.5% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% -0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Program Size Allocs Instrs Reads Writes -------------------------------------------------------------------------------- circsim -0.1% 0.0% -0.0% -0.0% -0.0% constraints -0.0% 0.0% -0.0% -0.0% -0.0% fibheaps -0.0% 0.0% -0.0% -0.0% -0.0% gc_bench -0.0% 0.0% -0.0% -0.0% -0.0% hash -0.0% 0.0% -0.0% -0.0% -0.0% lcss -0.0% 0.0% -0.0% -0.0% -0.0% power -0.0% 0.0% -0.0% -0.0% -0.0% spellcheck -0.0% 0.0% -0.0% -0.0% -0.0% -------------------------------------------------------------------------------- Min -0.1% 0.0% -0.0% -0.0% -0.0% Max -0.0% 0.0% -0.0% -0.0% -0.0% Geometric Mean -0.0% +0.0% -0.0% -0.0% -0.0% Manual inspection of programs in testsuite/tests/programs --------------------------------------------------------- I built these programs with a bunch of dump flags and `-O` and compared STG, Cmm, and Asm dumps and file sizes. (Below the numbers in parenthesis show number of modules in the program) These programs have identical compiler (same .hi and .o sizes, STG, and Cmm and Asm dumps): - Queens (1), andre_monad (1), cholewo-eval (2), cvh_unboxing (3), andy_cherry (7), fun_insts (1), hs-boot (4), fast2haskell (2), jl_defaults (1), jq_readsPrec (1), jules_xref (1), jtod_circint (4), jules_xref2 (1), lennart_range (1), lex (1), life_space_leak (1), bargon-mangler-bug (7), record_upd (1), rittri (1), sanders_array (1), strict_anns (1), thurston-module-arith (2), okeefe_neural (1), joao-circular (6), 10queens (1) Programs with different compiler outputs: - jl_defaults (1): For some reason GHC HEAD marks a lot of top-level `[Int]` closures as CAFFY for no reason. With this patch we no longer make them CAFFY and generate less SRT entries. For some reason Main.o is slightly larger with this patch (1.3%) and the executable sizes are the same. (I'd expect both to be smaller) - launchbury (1): Same as jl_defaults: top-level `[Int]` closures marked as CAFFY for no reason. Similarly `Main.o` is 1.4% larger but the executable sizes are the same. - galois_raytrace (13): Differences are in the Parse module. There are a lot, but some of the changes are caused by the fact that for some reason (I think a bug) GHC HEAD marks the dictionary for `Functor Identity` as CAFFY. Parse.o is 0.4% larger, the executable size is the same. - north_array: We now generate less SRT entries because some of array primops used in this program like `NewArrayOp` get eliminated during Stg-to-Cmm and turn some CAFFY things into non-CAFFY. Main.o gets 24% larger (9224 bytes from 9000 bytes), executable sizes are the same. - seward-space-leak: Difference in this program is better shown by this smaller example: module Lib where data CDS = Case [CDS] [(Int, CDS)] \| Call CDS CDS instance Eq CDS where Case sels1 rets1 == Case sels2 rets2 = sels1 == sels2 && rets1 == rets2 Call a1 b1 == Call a2 b2 = a1 == a2 && b1 == b2 _ == _ = False In this program GHC HEAD builds a new SRT for the recursive group of `(==)`, `(/=)` and the dictionary closure. Then `/=` points to `==` in its SRT field, and `==` uses the SRT object as its SRT. With this patch we use the closure for `/=` as the SRT and add `==` there. Then `/=` gets an empty SRT field and `==` points to `/=` in its SRT field. This change looks fine to me. Main.o gets 0.07% larger, executable sizes are identical. head.hackage ------------ head.hackage's CI script builds 428 packages from Hackage using this patch with no failures. Compiler performance -------------------- The compiler perf tests report that the compiler allocates slightly more (worst case observed so far is 4%). However most programs in the test suite are small, single file programs. To benchmark compiler performance on something more realistic I build Cabal (the library, 236 modules) with different optimisation levels. For the "max residency" row I run GHC with `+RTS -s -A100k -i0 -h` for more accurate numbers. Other rows are generated with just `-s`. (This is because `-i0` causes running GC much more frequently and as a result "bytes copied" gets inflated by more than 25x in some cases) * -O0 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| -------------- \| -------------- \| ------ \| \| Bytes allocated \| 54,413,350,872 \| 54,701,099,464 \| +0.52% \| \| Bytes copied \| 4,926,037,184 \| 4,990,638,760 \| +1.31% \| \| Max residency \| 421,225,624 \| 424,324,264 \| +0.73% \| * -O1 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 245,849,209,992 \| 246,562,088,672 \| +0.28% \| \| Bytes copied \| 26,943,452,560 \| 27,089,972,296 \| +0.54% \| \| Max residency \| 982,643,440 \| 991,663,432 \| +0.91% \| * -O2 \| \| GHC HEAD \| This MR \| Diff \| \| --------------- \| --------------- \| --------------- \| ------ \| \| Bytes allocated \| 291,044,511,408 \| 291,863,910,912 \| +0.28% \| \| Bytes copied \| 37,044,237,616 \| 36,121,690,472 \| -2.49% \| \| Max residency \| 1,071,600,328 \| 1,086,396,256 \| +1.38% \| Extra compiler allocations -------------------------- Runtime allocations of programs are as reported above (NoFib section). The compiler now allocates more than before. Main source of allocation in this patch compared to base commit is the new SRT algorithm (GHC.Cmm.Info.Build). Below is some of the extra work we do with this patch, numbers generated by profiled stage 2 compiler when building a pathological case (the test 'ManyConstructors') with '-O2': - We now sort the final STG for a module, which means traversing the entire program, generating free variable set for each top-level binding, doing SCC analysis, and re-ordering the program. In ManyConstructors this step allocates 97,889,952 bytes. - We now do SRT analysis on static data, which in a program like ManyConstructors causes analysing 10,000 bindings that we would previously just skip. This step allocates 70,898,352 bytes. - We now maintain an SRT map for the entire module as we compile Cmm groups: data ModuleSRTInfo = ModuleSRTInfo { ... , moduleSRTMap :: SRTMap } (SRTMap is just a strict Map from the 'containers' library) This map gets an entry for most bindings in a module (exceptions are THUNKs and CAFFY static functions). For ManyConstructors this map gets 50015 entries. - Once we're done with code generation we generate a NameSet from SRTMap for the non-CAFFY names in the current module. This set gets the same number of entries as the SRTMap. - Finally we update CafInfos in ModDetails for the non-CAFFY Ids, using the NameSet generated in the previous step. This usually does the least amount of allocation among the work listed here. Only place with this patch where we do less work in the CAF analysis in the tidying pass (CoreTidy). However that doesn't save us much, as the pass still needs to traverse the whole program and update IdInfos for other reasons. Only thing we don't here do is the `hasCafRefs` pass over the RHS of bindings, which is a stateless pass that returns a boolean value, so it doesn't allocate much. (Metric changes blow are all increased allocations) Metric changes -------------- Metric Increase: ManyAlternatives ManyConstructors T13035 T14683 T1969 T9961
*	Disable two warnings for files that trigger them	Tom Ellis	2020-01-27	1	-0/+2
\| \| \| \| \| \|	incomplete-uni-patterns and incomplete-record-updates will be in -Wall at a future date, so prepare for that by disabling those warnings on files that trigger them.
*	Module hierarchy: Cmm (cf #13009)	Sylvain Henry	2020-01-25	13	-30/+30
\|
*	Add `timesInt2#` primop	Sylvain Henry	2019-12-02	1	-0/+1
\|
*	Fix typos, using Wikipedia list of common typos	Brian Wignall	2019-11-28	1	-1/+1
\|
*	Clean up `#include`s in the compiler	John Ericson	2019-10-05	1	-1/+0
\| \| \| \| \| \| \| \|	- Remove unneeded ones - Use <..> for inter-package. Besides general clean up, helps distinguish between the RTS we link against vs the RTS we compile for.
*	Remove empty NCG.h	John Ericson	2019-09-13	3	-3/+0
\|
*	Module hierarchy: StgToCmm (#13009)	Sylvain Henry	2019-09-10	3	-3/+3
\| \| \| \| \| \|	Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
*	Remove unused imports of the form 'import foo ()' (Fixes #17065)	James Foster	2019-08-15	6	-6/+4
\| \| \| \| \| \| \| \| \| \| \|	These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
*	Revert "Add support for SIMD operations in the NCG"	Ben Gamari	2019-07-16	2	-5/+11
\| \| \| \| \| \| \|	Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
*	Add support for SIMD operations in the NCG	Abhiroop Sarkar	2019-07-03	2	-11/+5
\| \| \| \| \| \| \|	This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
*	Correct closure observation, construction, and mutation on weak memory machines.	Travis Whitaker	2019-06-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
*	Move 'Platform' to ghc-boot	John Ericson	2019-06-19	4	-4/+4
\| \| \| \| \| \| \|	ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
*	Introduce log1p and expm1 primops	chessai	2019-06-09	1	-0/+4
\| \| \| \| \|	Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
*	asm-emit-time IND_STATIC elimination	Gabor Greif	2019-04-15	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \|	When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
*	removing x87 register support from native code gen	Carter Schonwald	2019-04-10	3	-10/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
*	Add support for bitreverse primop	Alexandre	2019-04-01	1	-0/+1
\| \| \| \| \| \|	This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
*	Rip out object splitting	Ben Gamari	2019-03-05	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
*	NCG: fast compilation of very large strings (#16190)	Sylvain Henry	2019-02-14	1	-7/+1
\| \| \| \| \| \| \| \| \| \|	This patch adds an optimization into the NCG: for large strings (threshold configurable via -fbinary-blob-threshold=NNN flag), instead of printing `.asciz "..."` in the generated ASM source, we print `.incbin "tmpXXX.dat"` and we dump the contents of the string into a temporary "tmpXXX.dat" file. See the note for more details.
*	Use ByteString to represent Cmm string literals (#16198)	Sylvain Henry	2019-01-31	1	-1/+2
\| \| \| \|	Also used ByteString in some other relevant places
*	PPC NCG: Reduce memory consumption emitting string literals	Peter Trommler	2019-01-13	1	-10/+5
\|
*	Rename literal constructors	Sylvain Henry	2018-11-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a previous patch we replaced some built-in literal constructors (MachInt, MachWord, etc.) with a single LitNumber constructor. In this patch we replace the `Mach` prefix of the remaining constructors with `Lit` for consistency (e.g., LitChar, LitLabel, etc.). Sadly the name `LitString` was already taken for a kind of FastString and it would become misleading to have both `LitStr` (literal constructor renamed after `MachStr`) and `LitString` (FastString variant). Hence this patch renames the FastString variant `PtrString` (which is more accurate) and the literal string constructor now uses the least surprising `LitString` name. Both `Literal` and `LitString/PtrString` have recently seen breaking changes so doing this kind of renaming now shouldn't harm much. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4881
*	NCG: New code layout algorithm.	Andreas Klebinger	2018-11-17	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
*	Fix precision of asinh/acosh/atanh by making them primops	Artem Pelenitsyn	2018-08-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Reviewers: hvr, bgamari, simonmar, jrtc27 Reviewed By: bgamari Subscribers: alpmestan, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5034
*	Turn on MonadFail desugaring by default	Herbert Valerio Riedel	2018-08-07	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This contains two commits: ---- Make GHC's code-base compatible w/ `MonadFail` There were a couple of use-sites which implicitly used pattern-matches in `do`-notation even though the underlying `Monad` didn't explicitly support `fail` This refactoring turns those use-sites into explicit case discrimations and adds an `MonadFail` instance for `UniqSM` (`UniqSM` was the worst offender so this has been postponed for a follow-up refactoring) --- Turn on MonadFail desugaring by default This finally implements the phase scheduled for GHC 8.6 according to https://prime.haskell.org/wiki/Libraries/Proposals/MonadFail#Transitionalstrategy This also preserves some tests that assumed MonadFail desugaring to be active; all ghc boot libs were already made compatible with this `MonadFail` long ago, so no changes were needed there. Test Plan: Locally performed ./validate --fast Reviewers: bgamari, simonmar, jrtc27, RyanGlScott Reviewed By: bgamari Subscribers: bgamari, RyanGlScott, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5028
*	Allow CmmLabelDiffOff with different widths	Simon Marlow	2018-05-16	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change makes it possible to generate a static 32-bit relative label offset on x86_64. Currently we can only generate word-sized label offsets. This will be used in D4634 to shrink info tables. See D4632 for more details. Test Plan: See D4632 Reviewers: bgamari, niteria, michalt, erikd, jrtc27, osa1 Subscribers: thomie, carter Differential Revision: https://phabricator.haskell.org/D4633
*	Add 'addWordC#' PrimOp	Sebastian Graf	2018-05-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly for congruence with 'subWordC#' and '{add,sub}IntC#'. I found 'plusWord2#' while implementing this, which both lacks documentation and has a slightly different specification than 'addWordC#', which means the generic implementation is unnecessarily complex. While I was at it, I also added lacking meta-information on PrimOps and refactored 'subWordC#'s generic implementation to be branchless. Reviewers: bgamari, simonmar, jrtc27, dfeuer Reviewed By: bgamari, dfeuer Subscribers: dfeuer, thomie, carter Differential Revision: https://phabricator.haskell.org/D4592
*	SPARC nativeGen: Support for MO_SS_Conv_W32_W64	Peter Trommler	2018-03-19	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for signed conversion from 32 bit to 64 bit integers is required by D4363. Test Plan: validate (perhaps also on SPARC) Reviewers: simonmar, bgamari, kgardas, jrtc27 Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4489
*	Add new mbmi and mbmi2 compiler flags	John Ky	2018-01-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Implement x86 code generator for pdep and pext. Properly initialise bmiVersion field. pdep and pext test cases Fix pattern match for pdep and pext instructions Fix build of pdep and pext code for 32-bit architectures Test Plan: Validate Reviewers: austin, simonmar, bgamari, angerman Reviewed By: bgamari Subscribers: trommler, carter, angerman, thomie, rwbarton, newhoggy GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4236
*	Get rid of some stuttering in comments and docs	Gabor Greif	2017-12-19	2	-2/+2
\|
*	cmm: Use LocalBlockLabel instead of AsmTempLabel to represent blocks	Ben Gamari	2017-11-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	blockLbl was originally changed in 8b007abbeb3045900a11529d907a835080129176 to use mkTempAsmLabel to fix an inconsistency resulting in #14221. However, this breaks the C code generator, which doesn't support AsmTempLabels (#14454). Instead let's try going the other direction: use a new CLabel variety, LocalBlockLabel. Then we can teach the C code generator to deal with these as well.
*	Revert "Add new mbmi and mbmi2 compiler flags"	Ben Gamari	2017-11-22	1	-2/+0
\| \| \| \| \| \|	This broke the 32-bit build. This reverts commit f5dc8ccc29429d0a1d011f62b6b430f6ae50290c.
*	Add new mbmi and mbmi2 compiler flags	John Ky	2017-11-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the bit deposit and extraction operations provided by the BMI and BMI2 instruction set extensions on modern amd64 machines. Test Plan: Validate Reviewers: austin, simonmar, bgamari, hvr, goldfire, erikd Reviewed By: bgamari Subscribers: goldfire, erikd, trommler, newhoggy, rwbarton, thomie GHC Trac Issues: #14206 Differential Revision: https://phabricator.haskell.org/D4063
*	Turn `compareByteArrays#` out-of-line primop into inline primop	alexbiehl	2017-10-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Depends on D4090 Reviewers: austin, bgamari, erikd, simonmar, alexbiehl Reviewed By: bgamari Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D4091
*	A bunch of typofixes	Gabor Greif	2017-09-26	1	-1/+1
\|
*	compiler: introduce custom "GhcPrelude" Prelude	Herbert Valerio Riedel	2017-09-19	17	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This switches the compiler/ component to get compiled with -XNoImplicitPrelude and a `import GhcPrelude` is inserted in all modules. This is motivated by the upcoming "Prelude" re-export of `Semigroup((<>))` which would cause lots of name clashes in every modulewhich imports also `Outputable` Reviewers: austin, goldfire, bgamari, alanz, simonmar Reviewed By: bgamari Subscribers: goldfire, rwbarton, thomie, mpickering, bgamari Differential Revision: https://phabricator.haskell.org/D3989
*	nativeGen: Consistently use blockLbl to generate CLabels from BlockIds	Ben Gamari	2017-09-19	3	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes #14221, where the NCG and the DWARF code were apparently giving two different names to the same block. Test Plan: Validate with DWARF support enabled. Reviewers: simonmar, austin Subscribers: rwbarton, thomie GHC Trac Issues: #14221 Differential Revision: https://phabricator.haskell.org/D3977
*	Fix typos in diagnostics, testsuite and comments	Gabor Greif	2017-09-07	1	-1/+1
\|
*	Typo fixed	Gabor Greif	2017-08-24	1	-1/+1
\| \| \| \|	and update to the 'nofib' submodule
*	Add support for producing position-independent executables	Ben Gamari	2017-08-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously due to #12759 we disabled PIE support entirely. However, this breaks the user's ability to produce PIEs. Add an explicit flag, -fPIE, allowing the user to build PIEs. Test Plan: Validate Reviewers: rwbarton, austin, simonmar Subscribers: trommler, simonmar, trofi, jrtc27, thomie GHC Trac Issues: #12759, #13702 Differential Revision: https://phabricator.haskell.org/D3589
*	Hoopl: remove dependency on Hoopl package	Michal Terepeta	2017-06-23	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This copies the subset of Hoopl's functionality needed by GHC to `cmm/Hoopl` and removes the dependency on the Hoopl package. The main motivation for this change is the confusing/noisy interface between GHC and Hoopl: - Hoopl has `Label` which is GHC's `BlockId` but different than GHC's `CLabel` - Hoopl has `Unique` which is different than GHC's `Unique` - Hoopl has `Unique{Map,Set}` which are different than GHC's `Uniq{FM,Set}` - GHC has its own specialized copy of `Dataflow`, so `cmm/Hoopl` is needed just to filter the exposed functions (filter out some of the Hoopl's and add the GHC ones) With this change, we'll be able to simplify this significantly. It'll also be much easier to do invasive changes (Hoopl is a public package on Hackage with users that depend on the current behavior) This should introduce no changes in functionality - it merely copies the relevant code. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate Reviewers: austin, bgamari, simonmar Reviewed By: bgamari, simonmar Subscribers: simonpj, kavon, rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D3616
*	Generate better fp abs for X86 and llvm with default cmm otherwise	Dominic Steinitz	2017-03-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we have this in libraries/base/GHC/Float.hs: ``` abs x \| x == 0 = 0 -- handles (-0.0) \| x > 0 = x \| otherwise = negateFloat x ``` But 3-4 years ago it was noted that this was inefficient: https://mail.haskell.org/pipermail/libraries/2013-April/019690.html We can generate better code for X86 and llvm and for others generate some custom cmm code which is similar to what the compiler generates now. Reviewers: austin, simonmar, hvr, bgamari Reviewed By: bgamari Subscribers: dfeuer, thomie Differential Revision: https://phabricator.haskell.org/D3265
*	Honour -dsuppress-uniques more thoroughly	Simon Peyton Jones	2017-02-17	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	I found that tests parser/should_compile/DumpRenamedAst and friends were printing uniques, which makes the test fragile. But -dsuppress-uniques made no difference! It turned out that pprName wasn't properly consulting Opt_SuppressUniques. This patch fixes the problem, and updates those three tests to use -dsuppress-uniques
*	Nix typo and redundant where-clauses	Ryan Scott	2017-01-25	1	-2/+0
\|
*	BlockId: remove BlockMap and BlockSet synonyms	Michal Terepeta	2016-12-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This continues removal of `BlockId` module in favor of Hoopl's `Label`. Most of the changes here are mechanical, apart from the orphan `Outputable` instances for `LabelMap` and `LabelSet`. For now I've moved them to `cmm/Hoopl`, since it's already trying to manage all imports from Hoopl (to avoid any collisions). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: validate Reviewers: bgamari, austin, simonmar Reviewed By: simonmar Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2800
*	Reduce the size of string literals in binaries.	Thijs Alkemade	2016-12-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed the alignment for strings and mark then as cstring sections in the generated asm so the linker can merge duplicate sections. Reviewers: rwbarton, trofi, austin, trommler, simonmar, hvr, bgamari Reviewed By: hvr, bgamari Subscribers: simonpj, hvr, thomie Differential Revision: https://phabricator.haskell.org/D1290 GHC Trac Issues: #9577
*	Inline compiler/NOTES into X86/Ppr.hs	Matthew Pickering	2016-11-16	1	-1/+1
\| \| \| \| \| \| \| \|	Reviewers: austin, bgamari Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2721
*	Restore original alignment for info tables	Simon Brenner	2016-01-27	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was broken in 4a32bf925b8aba7885d9c745769fe84a10979a53, meaning that info tables and subsequent code are no longer guaranteed to have the recommended alignment. Split up the section header and section alignment printers, and print an appropriate alignment directive before each info table. Fixes Trac #11486 Reviewers: austin, bgamari, rwbarton Reviewed By: bgamari, rwbarton Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D1847 GHC Trac Issues: #11486