summaryrefslogtreecommitdiff
path: root/compiler/cmm
Commit message (Collapse)AuthorAgeFilesLines
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-037-52/+122
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-283-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Simplify link_caf and mkForeignLabel functionsÖmer Sinan Ağacan2019-06-251-2/+1
|
* Move 'Platform' to ghc-bootJohn Ericson2019-06-1911-11/+11
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Fix a Note name in CmmNodeÖmer Sinan Ağacan2019-06-191-1/+1
| | | | | | ("Continuation BlockIds" is referenced in CmmProcPoint) [skip ci]
* Use TupleSections in CmmParse.y, simplify a few exprsÖmer Sinan Ağacan2019-06-161-26/+28
|
* Use DeriveFunctor throughout the codebase (#15654)Krzysztof Gogolewski2019-06-123-18/+10
|
* Introduce log1p and expm1 primopschessai2019-06-092-0/+8
| | | | | Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
* Remove trailing whitespaceMatthew Pickering2019-06-081-2/+2
| | | | | | [skip ci] This should really be caught by the linters! (#16711)
* Inline `Settings` into `DynFlags`John Ericson2019-05-293-12/+12
| | | | | | | | | | After the previous commit, `Settings` is just a thin wrapper around other groups of settings. While `Settings` is used by GHC-the-executable to initalize `DynFlags`, in principle another consumer of GHC-the-library could initialize `DynFlags` a different way. It therefore doesn't make sense for `DynFlags` itself (library code) to separate the settings that typically come from `Settings` from the settings that typically don't.
* Add missing opening braces in Cmm dumpsÖmer Sinan Ağacan2019-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously -ddump-cmm was generating code with unbalanced curly braces: stg_atomically_entry() // [R1] { info_tbls: [(cfl, label: stg_atomically_info rep: tag:16 HeapRep 1 ptrs { Thunk } srt: Nothing)] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cfl: // cfk unwind Sp = Just Sp + 0; _cfk::P64 = R1; //tick src<rts/PrimOps.cmm:(1243,1)-(1245,1)> R1 = I64[_cfk::P64 + 8 + 8 + 0 * 8]; call stg_atomicallyzh(R1) args: 8, res: 0, upd: 8; } }, <---- OPENING BRACE MISSING After this patch: stg_atomically_entry() { // [R1] <---- MISSING OPENING BRACE HERE { info_tbls: [(cfl, label: stg_atomically_info rep: tag:16 HeapRep 1 ptrs { Thunk } srt: Nothing)] stack_info: arg_space: 8 updfr_space: Just 8 } {offset cfl: // cfk unwind Sp = Just Sp + 0; _cfk::P64 = R1; //tick src<rts/PrimOps.cmm:(1243,1)-(1245,1)> R1 = I64[_cfk::P64 + 8 + 8 + 0 * 8]; call stg_atomicallyzh(R1) args: 8, res: 0, upd: 8; } },
* Remove all target-specific portions of Config.hsJohn Ericson2019-05-141-30/+27
| | | | | | | | | | | | | | | | | | | 1. If GHC is to be multi-target, these cannot be baked in at compile time. 2. Compile-time flags have a higher maintenance than run-time flags. 3. The old way makes build system implementation (various bootstrapping details) with the thing being built. E.g. GHC doesn't need to care about which integer library *will* be used---this is purely a crutch so the build system doesn't need to pass flags later when using that library. 4. Experience with cross compilation in Nixpkgs has shown things work nicer when compiler's can *optionally* delegate the bootstrapping the package manager. The package manager knows the entire end-goal build plan, and thus can make top-down decisions on bootstrapping. GHC can just worry about GHC, not even core library like base and ghc-prim!
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-151-1/+137
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* codegen: unroll memcpy calls for small bytearraysArtem Pyanykh2019-04-141-1/+10
|
* removing x87 register support from native code genCarter Schonwald2019-04-103-9/+12
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* Add support for bitreverse primopAlexandre2019-04-012-0/+2
| | | | | | This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-253-5/+5
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-153-3/+3
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* Rip out object splittingBen Gamari2019-03-051-3/+1
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* Fix warnings and fatal parsing errorsVladislav Zavialov2019-02-172-6/+3
|
* Cmm: Promote stack arguments to word sizePeter Trommler2019-02-161-7/+29
| | | | | | | | | | | | | | | | | Smaller than word size integers must be promoted to word size when passed on the stack. While on little endian systems we can get away with writing a small integer to a word size stack slot and read it as a word ignoring the upper bits, on big endian systems a small integer write ends up in the most significant bits and a word size read that ignores the upper bits delivers a random value. On little endian systems a smaller than word size write to the stack might be more efficient but that decision is system specific and should be done as an optimization in the respective backends. Fixes #16258
* PPC NCG: Promote integers to word size in C callsPeter Trommler2019-01-311-2/+6
| | | | Fixes #16222
* Use ByteString to represent Cmm string literals (#16198)Sylvain Henry2019-01-317-33/+23
| | | | Also used ByteString in some other relevant places
* Prepare source-tree for base-4.13 MFP bumpHerbert Valerio Riedel2019-01-181-0/+4
|
* PPC NCG: Remove Darwin supportPeter Trommler2019-01-011-7/+0
| | | | | | | Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
* Fix unused-import warningsDavid Eichmann2018-11-225-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a fairly long-standing bug (dating back to 2015) in RdrName.bestImport, namely commit 9376249b6b78610db055a10d05f6592d6bbbea2f Author: Simon Peyton Jones <simonpj@microsoft.com> Date: Wed Oct 28 17:16:55 2015 +0000 Fix unused-import stuff in a better way In that patch got the sense of the comparison back to front, and thereby failed to implement the unused-import rules described in Note [Choosing the best import declaration] in RdrName This led to Trac #13064 and #15393 Fixing this bug revealed a bunch of unused imports in libraries; the ones in the GHC repo are part of this commit. The two important changes are * Fix the bug in bestImport * Modified the rules by adding (a) in Note [Choosing the best import declaration] in RdrName Reason: the previosu rules made Trac #5211 go bad again. And the new rule (a) makes sense to me. In unravalling this I also ended up doing a few other things * Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the things that are used, rather than [AvailInfo]. This is simpler and more direct. * Rename greParentName to greParent_maybe, to follow GHC naming conventions * Delete dead code RdrName.greUsedRdrName Bumps a few submodules. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27 Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5312
* UNREG: PprC: Add support for adjacent floatsJames Clarke2018-11-221-1/+23
| | | | | | | | | | | | | | | | When two 32-bit floats are adjacent for a 64-bit target, there is no padding between them to force alignment, so we must combine their bit representations into a single word. Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15853 Differential Revision: https://phabricator.haskell.org/D5306
* Remove warnings-silencing flags for code generated by AlexSimon Jakobi2018-11-221-7/+0
| | | | | | | | | | | | | | | | | | Current versions of Alex don't seem to produce as many warnings any more. In order to silence a warning and to avoid overlong lines, I've taken the liberty of refactoring 'tok_num'. Test Plan: ./validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: erikd, rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5319
* Rename literal constructorsSylvain Henry2018-11-222-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | In a previous patch we replaced some built-in literal constructors (MachInt, MachWord, etc.) with a single LitNumber constructor. In this patch we replace the `Mach` prefix of the remaining constructors with `Lit` for consistency (e.g., LitChar, LitLabel, etc.). Sadly the name `LitString` was already taken for a kind of FastString and it would become misleading to have both `LitStr` (literal constructor renamed after `MachStr`) and `LitString` (FastString variant). Hence this patch renames the FastString variant `PtrString` (which is more accurate) and the literal string constructor now uses the least surprising `LitString` name. Both `Literal` and `LitString/PtrString` have recently seen breaking changes so doing this kind of renaming now shouldn't harm much. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4881
* Introduce Int16# and Word16#Abhiroop Sarkar2018-11-171-0/+4
| | | | | | | | | | | | This builds off of D4475. Bumps binary submodule. Reviewers: carter, AndreasK, hvr, goldfire, bgamari, simonmar Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D5006
* Minor refactoringGabor Greif2018-11-171-13/+13
| | | | PR: https://github.com/ghc/ghc/pull/223/
* NCG: New code layout algorithm.Andreas Klebinger2018-11-175-6/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch implements a new code layout algorithm. It has been tested for x86 and is disabled on other platforms. Performance varies slightly be CPU/Machine but in general seems to be better by around 2%. Nofib shows only small differences of about +/- ~0.5% overall depending on flags/machine performance in other benchmarks improved significantly. Other benchmarks includes at least the benchmarks of: aeson, vector, megaparsec, attoparsec, containers, text and xeno. While the magnitude of gains differed three different CPUs where tested with all getting faster although to differing degrees. I tested: Sandy Bridge(Xeon), Haswell, Skylake * Library benchmark results summarized: * containers: ~1.5% faster * aeson: ~2% faster * megaparsec: ~2-5% faster * xml library benchmarks: 0.2%-1.1% faster * vector-benchmarks: 1-4% faster * text: 5.5% faster On average GHC compile times go down, as GHC compiled with the new layout is faster than the overhead introduced by using the new layout algorithm, Things this patch does: * Move code responsilbe for block layout in it's own module. * Move the NcgImpl Class into the NCGMonad module. * Extract a control flow graph from the input cmm. * Update this cfg to keep it in sync with changes during asm codegen. This has been tested on x64 but should work on x86. Other platforms still use the old codelayout. * Assign weights to the edges in the CFG based on type and limited static analysis which are then used for block layout. * Once we have the final code layout eliminate some redundant jumps. In particular turn a sequences of: jne .foo jmp .bar foo: into je bar foo: .. Test Plan: ci Reviewers: bgamari, jmct, jrtc27, simonmar, simonpj, RyanGlScott Reviewed By: RyanGlScott Subscribers: RyanGlScott, trommler, jmct, carter, thomie, rwbarton GHC Trac Issues: #15124 Differential Revision: https://phabricator.haskell.org/D4726
* Fix a bug in SRT generation (#15892)Simon Marlow2018-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The logic in `Note [recursive SRTs]` was correct. However, my implementation of it wasn't: I got the associativity of `Set.difference` wrong, which led to an extremely subtle and difficult to find bug. Fortunately now we have a test case. I was able to cut down the code to something manageable, and I've added it to the test suite. Test Plan: Before (using my stage 1 compiler without the fix): ``` ====> T15892(normal) 1 of 1 [0, 0, 0] cd "T15892.run" && "/home/smarlow/ghc/inplace/bin/ghc-stage1" -o T15892 T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat -dno-debug-output -O cd "T15892.run" && ./T15892 +RTS -G1 -A32k -RTS Wrong exit code for T15892(normal)(expected 0 , actual 134 ) Stderr ( T15892 ): T15892: internal error: evacuate: strange closure type 0 (GHC version 8.7.20181113 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Aborted (core dumped) *** unexpected failure for T15892(normal) =====> T15892(g1) 1 of 1 [0, 1, 0] cd "T15892.run" && "/home/smarlow/ghc/inplace/bin/ghc-stage1" -o T15892 T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat -dno-debug-output -O cd "T15892.run" && ./T15892 +RTS -G1 -RTS +RTS -G1 -A32k -RTS Wrong exit code for T15892(g1)(expected 0 , actual 134 ) Stderr ( T15892 ): T15892: internal error: evacuate: strange closure type 0 (GHC version 8.7.20181113 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Aborted (core dumped) ``` After (using my stage 2 compiler with the fix): ``` =====> T15892(normal) 1 of 1 [0, 0, 0] cd "T15892.run" && "/home/smarlow/ghc/inplace/test spaces/ghc-stage2" -o T15892 T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat -dno-debug-output cd "T15892.run" && ./T15892 +RTS -G1 -A32k -RTS =====> T15892(g1) 1 of 1 [0, 0, 0] cd "T15892.run" && "/home/smarlow/ghc/inplace/test spaces/ghc-stage2" -o T15892 T15892.hs -dcore-lint -dcmm-lint -no-user-package-db -rtsopts -fno-warn-missed-specialisations -fshow-warning-groups -fdiagnostics-color=never -fno-diagnostics-show-caret -Werror=compat -dno-debug-output cd "T15892.run" && ./T15892 +RTS -G1 -RTS +RTS -G1 -A32k -RTS ``` Reviewers: bgamari, osa1, erikd Reviewed By: osa1 Subscribers: rwbarton, carter GHC Trac Issues: #15892 Differential Revision: https://phabricator.haskell.org/D5334
* Add Int8# and Word8#Michal Terepeta2018-11-025-13/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step of implementing: https://github.com/ghc-proposals/ghc-proposals/pull/74 The main highlights/changes: primops.txt.pp gets two new sections for two new primitive types for signed and unsigned 8-bit integers (Int8# and Word8 respectively) along with basic arithmetic and comparison operations. PrimRep/RuntimeRep get two new constructors for them. All of the primops translate into the existing MachOPs. For CmmCalls the codegen will now zero-extend the values at call site (so that they can be moved to the right register) and then truncate them back their original width. x86 native codegen needed some updates, since it wasn't able to deal with the new widths, but all the changes are quite localized. LLVM backend seems to just work. This is the second attempt at merging this, after the first attempt in D4475 had to be backed out due to regressions on i386. Bumps binary submodule. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate (on both x86-{32,64}) Reviewers: bgamari, hvr, goldfire, simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5258
* Generate correct relocation for external cost centreZejun Wu2018-10-151-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to always generate direct access for cost centre labels. We fixed this by generating indirect data load for cost centre defined in external module. Test Plan: The added test used to fail with error message ``` /bin/ld.gold: error: T15723B.o: requires dynamic R_X86_64_PC32 reloc against 'T15723A_foo1_EXPR_cc' which may overflow at runtime; recompile with -fPIC ``` and now passes. Also check that `R_X86_64_PC32` is generated for CostCentre from the same module and `R_X86_64_GOTPCREL` is generated for CostCentre from external module: ``` $ objdump -rdS T15723B.o 0000000000000028 <T15723B_test_info>: 28: 48 8d 45 f0 lea -0x10(%rbp),%rax 2c: 4c 39 f8 cmp %r15,%rax 2f: 72 70 jb a1 <T15723B_test_info+0x79> 31: 48 83 ec 08 sub $0x8,%rsp 35: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # 3c <T15723B_test_info+0x14> 38: R_X86_64_PC32 T15723B_test1_EXPR_cc-0x4 3c: 49 8b bd 60 03 00 00 mov 0x360(%r13),%rdi 43: 31 c0 xor %eax,%eax 45: e8 00 00 00 00 callq 4a <T15723B_test_info+0x22> 46: R_X86_64_PLT32 pushCostCentre-0x4 4a: 48 83 c4 08 add $0x8,%rsp 4e: 48 ff 40 30 incq 0x30(%rax) 52: 49 89 85 60 03 00 00 mov %rax,0x360(%r13) 59: 48 83 ec 08 sub $0x8,%rsp 5d: 49 8b bd 60 03 00 00 mov 0x360(%r13),%rdi 64: 48 8b 35 00 00 00 00 mov 0x0(%rip),%rsi # 6b <T15723B_test_info+0x43> 67: R_X86_64_GOTPCREL T15723A_foo1_EXPR_cc-0x4 6b: 31 c0 xor %eax,%eax 6d: e8 00 00 00 00 callq 72 <T15723B_test_info+0x4a> 6e: R_X86_64_PLT32 pushCostCentre-0x4 72: 48 83 c4 08 add $0x8,%rsp 76: 48 ff 40 30 incq 0x30(%rax) ``` Reviewers: simonmar, bgamari Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15723 Differential Revision: https://phabricator.haskell.org/D5214
* Deprecate -fllvm-pass-vectors-in-regsBen Gamari2018-10-151-12/+5
| | | | | | | | | | | | Summary: The behavior previously enabled by this flag is as been the default since 8.6.1. Reviewers: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5193
* Revert "Add Int8# and Word8#"Ben Gamari2018-10-095-84/+13
| | | | | | | | | This unfortunately broke i386 support since it introduced references to byte-sized registers that don't exist on that architecture. Reverts binary submodule This reverts commit 5d5307f943d7581d7013ffe20af22233273fba06.
* Add Int8# and Word8#Michal Terepeta2018-10-075-13/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step of implementing: https://github.com/ghc-proposals/ghc-proposals/pull/74 The main highlights/changes: - `primops.txt.pp` gets two new sections for two new primitive types for signed and unsigned 8-bit integers (`Int8#` and `Word8` respectively) along with basic arithmetic and comparison operations. `PrimRep`/`RuntimeRep` get two new constructors for them. All of the primops translate into the existing `MachOP`s. - For `CmmCall`s the codegen will now zero-extend the values at call site (so that they can be moved to the right register) and then truncate them back their original width. - x86 native codegen needed some updates, since it wasn't able to deal with the new widths, but all the changes are quite localized. LLVM backend seems to just work. Bumps binary submodule. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com> Test Plan: ./validate with new tests Reviewers: hvr, goldfire, bgamari, simonmar Subscribers: Abhiroop, dfeuer, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4475
* UNREG: don't prefix asm prefixes in via-C modeSergei Trofimovich2018-10-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 64c54fff2d6534e1229359a8d357ec1dc6c21b73 ("Mark system and internal symbols as private symbols in asm") Added `internalNamePrefix` helper. Unfortunately it generates invalid label in unregisterised mode: ``` $ ./configure --enable-unregisterised /tmp/ghc19372_0/ghc_4.hc:2831:22: error: error: expected identifier or '(' before '.' token static const StgWord .Lcl3_info[]__attribute__((aligned(8)))= { ^ ``` Here asm-style prefix is applied to C symbol. The fix is simple: apply asm-style labels only to assembly code. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewers: simonmar, last_g, bgamari Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5207
* Don't shortcut SRTs for static functions (#15544)Simon Marlow2018-09-181-22/+130
| | | | | | | | | | | | | | | | | | | | Shortcutting the SRT for a static function can lead to resurrecting a static object at runtime, which violates assumptions in the GC. See comments for details. Test Plan: - manual testing (in progress) - validate Reviewers: osa1, bgamari, erikd Reviewed By: bgamari Subscribers: rwbarton, carter GHC Trac Issues: #15544 Differential Revision: https://phabricator.haskell.org/D5145
* Mark code related symbols as @function not @objectSergei Azovskov2018-09-141-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This diff is a part of the bigger project which goal is to improve common profiling tools support (perf) for GHC binaries. A similar job was already done and reverted in the past: * https://phabricator.haskell.org/rGHCb1f453e16f0ce11a2ab18cc4c350bdcbd36299a6 * https://phabricator.haskell.org/rGHCf1f3c4f50650110ad0f700d6566a44c515b0548f Reasoning: `Perf` and similar tools build in memory symbol table from the .symtab section of the ELF file to display human-readable function names instead of the addresses in the output. `Perf` uses only two types of symbols: `@function` and `@notype` but GHC is not capable to produce any `@function` symbols so the `perf` output is pretty useless (All the haskell symbols that you can see in `perf` now are `@notype` internal symbols extracted by mistake/hack). The changes: * mark code related symbols as @function * small hack to mark InfoTable symbols as code if TABLES_NEXT_TO_CODE is true Limitations: * The perf symbolization support is not complete after this patch but I'm working on the second patch. * Constructor symbols are not supported. To fix that we can issue extra local symbols which mark code sections as code and will be only used for debug. Test Plan: tests any additional ideas? Perf output on stock ghc 8.4.1: ``` 9.78% FibbSlow FibbSlow [.] ckY_info 9.59% FibbSlow FibbSlow [.] cjqd_info 7.17% FibbSlow FibbSlow [.] c3sg_info 6.62% FibbSlow FibbSlow [.] c1X_info 5.32% FibbSlow FibbSlow [.] cjsX_info 4.18% FibbSlow FibbSlow [.] s3rN_info 3.82% FibbSlow FibbSlow [.] c2m_info 3.68% FibbSlow FibbSlow [.] cjlJ_info 3.26% FibbSlow FibbSlow [.] c3sb_info 3.19% FibbSlow FibbSlow [.] cjPQ_info 3.05% FibbSlow FibbSlow [.] cjQd_info 2.97% FibbSlow FibbSlow [.] cjAB_info 2.78% FibbSlow FibbSlow [.] cjzP_info 2.40% FibbSlow FibbSlow [.] cjOS_info 2.38% FibbSlow FibbSlow [.] s3rK_info 2.27% FibbSlow FibbSlow [.] cjq0_info 2.18% FibbSlow FibbSlow [.] cKQ_info 2.13% FibbSlow FibbSlow [.] cjSl_info 1.99% FibbSlow FibbSlow [.] s3rL_info 1.98% FibbSlow FibbSlow [.] c2cC_info 1.80% FibbSlow FibbSlow [.] s3rO_info 1.37% FibbSlow FibbSlow [.] c2f2_info ... ``` Perf output on patched ghc: ``` 7.97% FibbSlow FibbSlow [.] c3rM_info 6.75% FibbSlow FibbSlow [.] 0x000000000032cfa8 6.63% FibbSlow FibbSlow [.] cifA_info 4.98% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info 4.55% FibbSlow FibbSlow [.] chXn_info 4.52% FibbSlow FibbSlow [.] c3rH_info 4.45% FibbSlow FibbSlow [.] chZB_info 4.04% FibbSlow FibbSlow [.] Main_fibbzuslow_info 4.03% FibbSlow FibbSlow [.] stg_ap_0_fast 3.76% FibbSlow FibbSlow [.] chXA_info 3.67% FibbSlow FibbSlow [.] cifu_info 3.25% FibbSlow FibbSlow [.] ci4r_info 2.64% FibbSlow FibbSlow [.] s3rf_info 2.42% FibbSlow FibbSlow [.] s3rg_info 2.39% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info 2.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info 2.17% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info 2.09% FibbSlow FibbSlow [.] cicc_info 2.03% FibbSlow FibbSlow [.] 0x0000000000331e15 2.02% FibbSlow FibbSlow [.] s3ri_info 1.91% FibbSlow FibbSlow [.] 0x0000000000331bb8 1.89% FibbSlow FibbSlow [.] ci4N_info ... ``` Reviewers: simonmar, niteria, bgamari, goldfire Reviewed By: simonmar, bgamari Subscribers: lelf, rwbarton, thomie, carter GHC Trac Issues: #15501 Differential Revision: https://phabricator.haskell.org/D4713
* Mark system and internal symbols as private symbols in asmSergei Azovskov2018-09-141-17/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This marks system and internal symbols as private in asm output so those random generated sysmbols won't appear in .symtab Reasoning: * internal symbols don't help to debug because names are just random * the symbols style breaks perf logic * internal symbols can take ~75% of the .symtab. In the same time .symtab can take about 20% of the binary file size Notice: This diff mostly makes sense on top of the D4713 (or similar) Test Plan: tests Perf from D4713 ``` 7.97% FibbSlow FibbSlow [.] c3rM_info 6.75% FibbSlow FibbSlow [.] 0x000000000032cfa8 6.63% FibbSlow FibbSlow [.] cifA_info 4.98% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info 4.55% FibbSlow FibbSlow [.] chXn_info 4.52% FibbSlow FibbSlow [.] c3rH_info 4.45% FibbSlow FibbSlow [.] chZB_info 4.04% FibbSlow FibbSlow [.] Main_fibbzuslow_info 4.03% FibbSlow FibbSlow [.] stg_ap_0_fast 3.76% FibbSlow FibbSlow [.] chXA_info 3.67% FibbSlow FibbSlow [.] cifu_info 3.25% FibbSlow FibbSlow [.] ci4r_info 2.64% FibbSlow FibbSlow [.] s3rf_info 2.42% FibbSlow FibbSlow [.] s3rg_info 2.39% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info 2.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info 2.17% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info 2.09% FibbSlow FibbSlow [.] cicc_info 2.03% FibbSlow FibbSlow [.] 0x0000000000331e15 2.02% FibbSlow FibbSlow [.] s3ri_info 1.91% FibbSlow FibbSlow [.] 0x0000000000331bb8 1.89% FibbSlow FibbSlow [.] ci4N_info ... ``` Perf from this patch: ``` 15.37% FibbSlow FibbSlow [.] Main_fibbzuslow_info 15.33% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_minusInteger_info 13.34% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info 9.24% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_plusInteger_info 9.08% FibbSlow FibbSlow [.] frame_dummy 8.25% FibbSlow FibbSlow [.] integerzmgmp_GHCziIntegerziType_eqInteger_info 4.29% FibbSlow FibbSlow [.] 0x0000000000321ab0 3.84% FibbSlow FibbSlow [.] stg_ap_0_fast 3.07% FibbSlow FibbSlow [.] ghczmprim_GHCziClasses_zeze_info 2.39% FibbSlow FibbSlow [.] 0x0000000000321ab7 1.90% FibbSlow FibbSlow [.] 0x00000000003266b8 1.88% FibbSlow FibbSlow [.] base_GHCziNum_zm_info 1.83% FibbSlow FibbSlow [.] 0x0000000000326915 1.34% FibbSlow FibbSlow [.] 0x00000000003248cc 1.07% FibbSlow FibbSlow [.] base_GHCziNum_zp_info 0.98% FibbSlow FibbSlow [.] 0x00000000003247c8 0.80% FibbSlow FibbSlow [.] 0x0000000000121498 0.79% FibbSlow FibbSlow [.] stg_gc_noregs 0.75% FibbSlow FibbSlow [.] 0x0000000000321ad6 0.67% FibbSlow FibbSlow [.] 0x0000000000321aca 0.64% FibbSlow FibbSlow [.] 0x0000000000321b4a 0.61% FibbSlow FibbSlow [.] 0x00000000002ff633 ``` Reviewers: simonmar, niteria, bgamari Reviewed By: simonmar Subscribers: lelf, angerman, olsner, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4722
* A few typos [ci skip]Gabor Greif2018-08-301-1/+1
|
* Fix precision of asinh/acosh/atanh by making them primopsArtem Pelenitsyn2018-08-212-0/+12
| | | | | | | | | | Reviewers: hvr, bgamari, simonmar, jrtc27 Reviewed By: bgamari Subscribers: alpmestan, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D5034
* Replace most occurences of foldl with foldl'.klebinger.andreas@gmx.at2018-08-214-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds foldl' to GhcPrelude and changes must occurences of foldl to foldl'. This leads to better performance especially for quick builds where GHC does not perform strictness analysis. It does change strictness behaviour when we use foldl' to turn a argument list into function applications. But this is only a drawback if code looks ONLY at the last argument but not at the first. And as the benchmarks show leads to fewer allocations in practice at O2. Compiler performance for Nofib: O2 Allocations: -1 s.d. ----- -0.0% +1 s.d. ----- -0.0% Average ----- -0.0% O2 Compile Time: -1 s.d. ----- -2.8% +1 s.d. ----- +1.3% Average ----- -0.8% O0 Allocations: -1 s.d. ----- -0.2% +1 s.d. ----- -0.1% Average ----- -0.2% Test Plan: ci Reviewers: goldfire, bgamari, simonmar, tdammers, monoidal Reviewed By: bgamari, monoidal Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4929
* Minor refactoring in CmmUtils.mkLivenessÖmer Sinan Ağacan2018-07-122-12/+9
| | | | | | | | | | | | Test Plan: validate Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4957
* Typofixes in comments and whitespace only [ci skip]Gabor Greif2018-06-261-1/+1
|
* Use __FILE__ for Cmm assertion locations, fix #8619Ömer Sinan Ağacan2018-06-171-2/+0
| | | | | | | | | | | | | | | | | | | | It seems like we currently support string literals in Cmm, so we can use __LINE__ CPP macro in assertion macros. This improves error messages that previously looked like ASSERTION FAILED: file (null), line 1302 (null) part now shows the actual file name. Also inline some single-use string literals in PrimOps.cmm. Reviewers: bgamari, simonmar, erikd Reviewed By: bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4862
* UNREG: fix CmmRegOff large offset handling on W64 platformsSergei Trofimovich2018-06-171-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gabor noticed C warning when building unregisterised 64-bit compiler on GHC.Integer.Types (from integer-simple). Minimised example with a warning: ```haskell {-# LANGUAGE MagicHash #-} {-# LANGUAGE NoImplicitPrelude #-} {-# OPTIONS_GHC -Wall #-} module M (bug) where import GHC.Prim (Word#, minusWord#, ltWord#) import GHC.Types (isTrue#) -- assume Word = Word64 bug :: Word# -> Word# bug x = if isTrue# (x `ltWord#` 0x8000000000000000##) then 0## else x `minusWord#` 0x8000000000000000## ``` ``` $ LANG=C x86_64-UNREG-linux-gnu-ghc -O1 -c M.hs -fforce-recomp /tmp/ghc30219_0/ghc_1.hc: In function 'M_bug_entry': /tmp/ghc30219_0/ghc_1.hc:20:14: error: warning: integer constant is so large that it is unsigned ``` It's caused by limited handling of integer literals in CmmRegOff. This change switches to use standard integer literal pretty-printer. C code before the change: ```c FN_(M_bug_entry) { W_ _sAg; _cAr: _sAg = *Sp; switch ((W_)(_sAg < 0x8000000000000000UL)) { case 0x1UL: goto _cAq; default: goto _cAp; } _cAp: R1.w = _sAg+-9223372036854775808; // ... ``` C code after the change: ```c FN_(M_bug_entry) { W_ _sAg; _cAr: _sAg = *Sp; switch ((W_)(_sAg < 0x8000000000000000UL)) { case 0x1UL: goto _cAq; default: goto _cAp; } _cAp: R1.w = _sAg+(-0x8000000000000000UL); ``` URL: https://mail.haskell.org/pipermail/ghc-devs/2018-June/015875.html Reported-by: Gabor Greif Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Test Plan: test generated code on unregisterised mips64 and amd64 Reviewers: simonmar, ggreif, bgamari Reviewed By: ggreif, bgamari Subscribers: rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4856
* UNREG: PprC: add support for of W16 literals (Ticket #15237)Sergei Trofimovich2018-06-151-0/+8
| | | | | | | | | | | | | | | Fix UNREG build failure for 32-bit targets. This change is an equivalent of commit 0238a6c78102d43dae2f56192bd3486e4f9ecf1d ("UNREG: PprC: add support for of W32 literals") The change allows combining two subwords into one word on 32-bit targets. Tested on nios2-unknown-linux-gnu. GHC Trac Issues: #15237 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>