summaryrefslogtreecommitdiff
path: root/compiler/codeGen
Commit message (Collapse)AuthorAgeFilesLines
* Module hierarchy: StgToCmm (#13009)Sylvain Henry2019-09-1027-11330/+0
| | | | | | Add StgToCmm module hierarchy. Platform modules that are used in several other places (NCG, LLVM codegen, Cmm transformations) are put into GHC.Platform.
* Some more documentation for typePrimRep1 stuffÖmer Sinan Ağacan2019-08-271-2/+9
| | | | [skip ci]
* Cmm: constant folding `quotRem x 2^N`Sylvain Henry2019-08-151-11/+39
| | | | | | `quot` and `rem` are implemented efficiently when the second argument is a constant power of 2. This patch uses the same implementations for `quotRem` primop.
* Remove unused imports of the form 'import foo ()' (Fixes #17065)James Foster2019-08-151-1/+1
| | | | | | | | | | | These kinds of imports are necessary in some cases such as importing instances of typeclasses or intentionally creating dependencies in the build system, but '-Wunused-imports' can't detect when they are no longer needed. This commit removes the unused ones currently in the code base (not including test files or submodules), with the hope that doing so may increase parallelism in the build system by removing unnecessary dependencies.
* compiler: emit finer grained codegen events to eventlogAlp Mestanogullari2019-08-021-1/+2
|
* Add Note [RuntimeRep and PrimRep] in RepTypeRichard Eisenberg2019-07-291-0/+1
| | | | | | | | | | Also adds Note [Getting from RuntimeRep to PrimRep], which deocuments a related thorny process. This Note addresses #16964, which correctly observes that documentation for this thorny design is lacking. Documentation only.
* Create {Int,Word}32RepJohn Ericson2019-07-171-0/+2
| | | | | | | This prepares the way for making Int32# and Word32# the actual size they claim to be. Updates binary submodule for (de)serializing the new runtime reps.
* Revert "Add support for SIMD operations in the NCG"Ben Gamari2019-07-162-64/+33
| | | | | | | Unfortunately this will require more work; register allocation is quite broken. This reverts commit acd795583625401c5554f8e04ec7efca18814011.
* Dont gather ticks when only striping them in STG.Andreas Klebinger2019-07-041-1/+1
| | | | | | | | | Adds stripStgTicksTopE which only returns the stripped expression. So far we also allocated a list for the stripped ticks which was never used. Allocation difference is as expected very small but present. About 0.02% difference when compiling with -O.
* Add support for SIMD operations in the NCGAbhiroop Sarkar2019-07-032-33/+64
| | | | | | | This adds support for constructing vector types from Float#, Double# etc and performing arithmetic operations on them Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
* Correct closure observation, construction, and mutation on weak memory machines.Travis Whitaker2019-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here the following changes are introduced: - A read barrier machine op is added to Cmm. - The order in which a closure's fields are read and written is changed. - Memory barriers are added to RTS code to ensure correctness on out-or-order machines with weak memory ordering. Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this is lowered to an instruction that ensures memory reads that occur after said instruction in program order are not performed before reads coming before said instruction in program order. On machines with strong memory ordering properties (e.g. X86, SPARC in TSO mode) no such instruction is necessary, so MO_ReadBarrier is simply erased. However, such an instruction is necessary on weakly ordered machines, e.g. ARM and PowerPC. Weam memory ordering has consequences for how closures are observed and mutated. For example, consider a closure that needs to be updated to an indirection. In order for the indirection to be safe for concurrent observers to enter, said observers must read the indirection's info table before they read the indirectee. Furthermore, the entering observer makes assumptions about the closure based on its info table contents, e.g. an INFO_TYPE of IND imples the closure has an indirectee pointer that is safe to follow. When a closure is updated with an indirection, both its info table and its indirectee must be written. With weak memory ordering, these two writes can be arbitrarily reordered, and perhaps even interleaved with other threads' reads and writes (in the absence of memory barrier instructions). Consider this example of a bad reordering: - An updater writes to a closure's info table (INFO_TYPE is now IND). - A concurrent observer branches upon reading the closure's INFO_TYPE as IND. - A concurrent observer reads the closure's indirectee and enters it. (!!!) - An updater writes the closure's indirectee. Here the update to the indirectee comes too late and the concurrent observer has jumped off into the abyss. Speculative execution can also cause us issues, consider: - An observer is about to case on a value in closure's info table. - The observer speculatively reads one or more of closure's fields. - An updater writes to closure's info table. - The observer takes a branch based on the new info table value, but with the old closure fields! - The updater writes to the closure's other fields, but its too late. Because of these effects, reads and writes to a closure's info table must be ordered carefully with respect to reads and writes to the closure's other fields, and memory barriers must be placed to ensure that reads and writes occur in program order. Specifically, updates to a closure must follow the following pattern: - Update the closure's (non-info table) fields. - Write barrier. - Update the closure's info table. Observing a closure's fields must follow the following pattern: - Read the closure's info pointer. - Read barrier. - Read the closure's (non-info table) fields. This patch updates RTS code to obey this pattern. This should fix long-standing SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting out-of-order execution) and PowerPC. This fixes issue #15449. Co-Authored-By: Ben Gamari <ben@well-typed.com>
* Simplify link_caf and mkForeignLabel functionsÖmer Sinan Ağacan2019-06-251-3/+2
|
* Move 'Platform' to ghc-bootJohn Ericson2019-06-193-3/+3
| | | | | | | ghc-pkg needs to be aware of platforms so it can figure out which subdire within the user package db to use. This is admittedly roundabout, but maybe Cabal could use the same notion of a platform as GHC to good affect too.
* Implement the -XUnliftedNewtypes extension.Andrew Martin2019-06-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | GHC Proposal: 0013-unlifted-newtypes.rst Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/98 Issues: #15219, #1311, #13595, #15883 Implementation Details: Note [Implementation of UnliftedNewtypes] Note [Unifying data family kinds] Note [Compulsory newtype unfolding] This patch introduces the -XUnliftedNewtypes extension. When this extension is enabled, GHC drops the restriction that the field in a newtype must be of kind (TYPE 'LiftedRep). This allows types like Int# and ByteArray# to be used in a newtype. Additionally, coerce is made levity-polymorphic so that it can be used with newtypes over unlifted types. The bulk of the changes are in TcTyClsDecls.hs. With -XUnliftedNewtypes, getInitialKind is more liberal, introducing a unification variable to return the kind (TYPE r0) rather than just returning (TYPE 'LiftedRep). When kind-checking a data constructor with kcConDecl, we attempt to unify the kind of a newtype with the kind of its field's type. When typechecking a data declaration with tcTyClDecl, we again perform a unification. See the implementation note for more on this. Co-authored-by: Richard Eisenberg <rae@richarde.dev>
* Remove unused Unique field from StgFCallOpÖmer Sinan Ağacan2019-06-132-2/+2
| | | | Fixes #16696
* Use DeriveFunctor throughout the codebase (#15654)Krzysztof Gogolewski2019-06-122-7/+5
|
* Introduce log1p and expm1 primopschessai2019-06-091-0/+4
| | | | | Previously log and exp were primitives yet log1p and expm1 were FFI calls. Fix this non-uniformity.
* Remove trailing whitespaceMatthew Pickering2019-06-081-2/+2
| | | | | | [skip ci] This should really be caught by the linters! (#16711)
* Use a better strategy for determining the offset applied to foreign function ↵Andrew Martin2019-06-043-41/+131
| | | | arguments that have an unlifted boxed type. We used to use the type of the argument. We now use the type of the foreign function. Add a test to confirm that the roundtrip conversion between an unlifted boxed type and Any is sound in the presence of a foreign function call.
* StgCmmMonad: remove emitProc_, don't export emitProcÖmer Sinan Ağacan2019-05-031-10/+6
|
* StgCmmPrim: remove an unnecessary instruction in doNewArrayOpMichal Terepeta2019-04-191-5/+2
| | | | | | | | | | | Previously we would generate a local variable pointing after the array header and use it to initialize the array elements. But we already use stores with offset, so it's easy to just add the header to those offsets during compilation and avoid generating the local variable (which would become a LEA instruction when using native codegen; LLVM already optimizes it away). Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
* asm-emit-time IND_STATIC eliminationGabor Greif2019-04-151-1/+3
| | | | | | | | | | | | When a new closure identifier is being established to a local or exported closure already emitted into the same module, refrain from adding an IND_STATIC closure, and instead emit an assembly-language alias. Inter-module IND_STATIC objects still remain, and need to be addressed by other measures. Binary-size savings on nofib are around 0.1%.
* codegen: unroll memcpy calls for small bytearraysArtem Pyanykh2019-04-141-24/+26
|
* removing x87 register support from native code genCarter Schonwald2019-04-101-0/+30
| | | | | | | | | | | | | | | | * simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors * makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding behavior in 32bit haskell code * removes the 80bit floating point representation from the supported float sizes * theres still 1 tiny bit of x87 support needed, for handling float and double return values in FFI calls wrt the C ABI on x86_32, but this one piece does not leak into the rest of NCG. * Lots of code thats not been touched in a long time got deleted as a consequence of all of this all in all, this change paves the way towards a lot of future further improvements in how GHC handles floating point computations, along with making the native code gen more accessible to a larger pool of contributors.
* codegen: use newtype for Alignment in BasicTypesArtem Pyanykh2019-04-091-10/+9
|
* codegen: fix memset unroll for small bytearrays, add 64-bit setsArtem Pyanykh2019-04-091-4/+12
| | | | | | | | | | | | | | | | | | | | | | Fixes #16052 When the offset in `setByteArray#` is statically known, we can provide better alignment guarantees then just 1 byte. Also, memset can now do 64-bit wide sets. The current memset intrinsic is not optimal however and can be improved for the case when we know that we deal with (baseAddress at known alignment) + offset For instance, on 64-bit `setByteArray# s 1# 23# 0#` given that bytearray is 8 bytes aligned could be unrolled into `movb, movw, movl, movq, movq`; but currently it is `movb x23` since alignment of 1 is all we can embed into MO_Memset op.
* Generate straightline code for inline array allocationMichal Terepeta2019-04-081-11/+5
| | | | | | | | | | | | | | | GHC has an optimization for allocating arrays when the size is statically known -- it'll generate the code allocating and initializing the array inline (instead of a call to a procedure from `rts/PrimOps.cmm`). However, the generated code uses a loop to do the initialization. Since we already check that the requested size is small (we check against `maxInlineAllocSize`), we can generate faster straightline code instead. This brings about 15% improvement for `newSmallArray#` in my testing and slightly simplifies the code in GHC. Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
* Remove unnecessary uses of UnboxedTuples pragma (see #13101 / #15454)Michael Sloan2019-04-011-1/+1
| | | | Also removes a couple unnecessary MagicHash pragmas
* Add support for bitreverse primopAlexandre2019-04-011-0/+13
| | | | | | This commit includes the necessary changes in code and documentation to support a primop that reverses a word's bits. It also includes a test.
* Minor refactoring in copy array primops:Ömer Sinan Ağacan2019-03-271-15/+17
| | | | | | | | | | - `emitCopySmallArray` now checks size before generating code and doesn't generate any code when size is 0. `emitCopyArray` already does this so this makes small/large array cases the same in argument checking. - In both `emitCopySmallArray` and `emitCopyArray` read the `dflags` after checking the argument.
* Update Wiki URLs to point to GitLabTakenobu Tani2019-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This moves all URL references to Trac Wiki to their corresponding GitLab counterparts. This substitution is classified as follows: 1. Automated substitution using sed with Ben's mapping rule [1] Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy... New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy... 2. Manual substitution for URLs containing `#` index Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz 3. Manual substitution for strings starting with `Commentary` Old: Commentary/XxxYyy... New: commentary/xxx-yyy... See also !539 [1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
* Update Trac ticket URLs to point to GitLabRyan Scott2019-03-153-6/+6
| | | | | This moves all URL references to Trac tickets to their corresponding GitLab counterparts.
* Remove duplicate functions in StgCmmUtils, use functions from CgUtilsÖmer Sinan Ağacan2019-03-122-51/+11
| | | | Also remove unused arg from get_Regtable_addr_from_offset
* Rip out object splittingBen Gamari2019-03-053-51/+8
| | | | | | | | | | | | | | | The splitter is an evil Perl script that processes assembler code. Its job can be done better by the linker's --gc-sections flag. GHC passes this flag to the linker whenever -split-sections is passed on the command line. This is based on @DemiMarie's D2768. Fixes Trac #11315 Fixes Trac #9832 Fixes Trac #8964 Fixes Trac #8685 Fixes Trac #8629
* Add AnonArgFlag to FunTySimon Peyton Jones2019-02-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The big payload of this patch is: Add an AnonArgFlag to the FunTy constructor of Type, so that (FunTy VisArg t1 t2) means (t1 -> t2) (FunTy InvisArg t1 t2) means (t1 => t2) The big payoff is that we have a simple, local test to make when decomposing a type, leading to many fewer calls to isPredTy. To me the code seems a lot tidier, and probably more efficient (isPredTy has to take the kind of the type). See Note [Function types] in TyCoRep. There are lots of consequences * I made FunTy into a record, so that it'll be easier when we add a linearity field, something that is coming down the road. * Lots of code gets touched in a routine way, simply because it pattern matches on FunTy. * I wanted to make a pattern synonym for (FunTy2 arg res), which picks out just the argument and result type from the record. But alas the pattern-match overlap checker has a heart attack, and either reports false positives, or takes too long. In the end I gave up on pattern synonyms. There's some commented-out code in TyCoRep that shows what I wanted to do. * Much more clarity about predicate types, constraint types and (in particular) equality constraints in kinds. See TyCoRep Note [Types for coercions, predicates, and evidence] and Note [Constraints in kinds]. This made me realise that we need an AnonArgFlag on AnonTCB in a TyConBinder, something that was really plain wrong before. See TyCon Note [AnonTCB InivsArg] * When building function types we must know whether we need VisArg (mkVisFunTy) or InvisArg (mkInvisFunTy). This turned out to be pretty easy in practice. * Pretty-printing of types, esp in IfaceType, gets tidier, because we were already recording the (->) vs (=>) distinction in an ad-hoc way. Death to IfaceFunTy. * mkLamType needs to keep track of whether it is building (t1 -> t2) or (t1 => t2). See Type Note [mkLamType: dictionary arguments] Other minor stuff * Some tidy-up in validity checking involving constraints; Trac #16263
* Use ByteString to represent Cmm string literals (#16198)Sylvain Henry2019-01-313-12/+11
| | | | Also used ByteString in some other relevant places
* PPC NCG: Remove Darwin supportPeter Trommler2019-01-012-27/+5
| | | | | | | Support for Mac OS X on PowerPC has been dropped by Apple years ago. We follow suit and remove PowerPC support for Darwin. Fixes #16106.
* Typo fix, replace a foldl with foldl'Ömer Sinan Ağacan2018-12-121-3/+3
|
* PPC NCG: Generate MO_?_QuotRem for subword sizesPeter Trommler2018-12-111-22/+13
| | | | | | | | | | | | | | | Handle Int*QuotRemOP and Word*QuotRemOp in PPC NCG. Refactor common code with remainder operation. Test Plan: validate (I validated on Linux powerpc64le and x86_64) Reviewers: erikd, hvr, bgamari, simonmar Reviewed By: bgamari Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5323
* Don't use a generic apply thunk for known callsSebastian Graf2018-12-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, an AP thunk like `sat = f a b c` will not have its own entry point and info pointer and will instead reuse a generic apply thunk like `stg_ap_4_upd`. That's great from a code size perspective, but if `f` is a known function, a specialised entry point with a plain call can be much faster than figuring out the arity and doing dynamic dispatch. This looks at `f`s arity to figure out if it is a known function and if so, it will not lower it to a generic apply function. Benchmark results are encouraging: No changes to allocation, but 0.2% less counted instructions. Test Plan: Validates locally Reviewers: simonmar, osa1, simonpj, bgamari Reviewed By: simonpj Subscribers: rwbarton, carter GHC Trac Issues: #16005 Differential Revision: https://phabricator.haskell.org/D5414
* Deduplicate decision to count thunks in `-ticky`Sebastian Graf2018-11-301-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, the logic that checks whether a thunk has a counter or not was duplicated in multiple functions. This led to thunk enters being accounted to their enclosing functions in `StgCmmTicky.tickyEnterThunk`, because the outer call to `withNewTickyCounterThunk` didn't set the counter label for the thunk. And rightly so! `tickyEnterThunk` should only account thunk enters to a counter if `-ticky-dyn-thunk` is on. This patch extracts the logic that was already present in its most general form in `withNewTickyCounterThunk` into its own functions and lets all other call sites checking for `-ticky-dyn-thunk` call this new function named `thunkHasCounter` instead. Reviewers: bgamari, simonmar Reviewed By: simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5392
* Comments onlySimon Peyton Jones2018-11-282-19/+23
|
* Add Note [Dead case binders in -O0]Sebastian Graf2018-11-281-1/+13
| | | | | | | | | | | | | | | After reverting Phab:D5358, Simon (Peyton Jones) asked for a Note summarising why we want to keep the dead case binder check in `cgCase`. Summary from mail conversation: * Phab:D5324 means that we no longer /recompute/ dead-ness of case-binders in STG-land * But TidyPgm preserves dead-ness info (see CoreTidy.tidyIdBndr) * And so we can take advantage of it to avoid a redundant load. This load would be eliminated by CmmSink, but that only happens with -O
* Revert "Remove redundant check in cgCase"Ömer Sinan Ağacan2018-11-261-4/+7
| | | | | | This reverts commit d13b7d60650cb84af11ee15b3f51c3511548cfdb. (See discussion in D5358)
* Implement late lambda liftSebastian Graf2018-11-233-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This implements a selective lambda-lifting pass late in the STG pipeline. Lambda lifting has the effect of avoiding closure allocation at the cost of having to make former free vars available at call sites, possibly enlarging closures surrounding call sites in turn. We identify beneficial cases by means of an analysis that estimates closure growth. There's a Wiki page at https://ghc.haskell.org/trac/ghc/wiki/LateLamLift. Reviewers: simonpj, bgamari, simonmar Reviewed By: simonpj Subscribers: rwbarton, carter GHC Trac Issues: #9476 Differential Revision: https://phabricator.haskell.org/D5224
* Fix unused-import warningsDavid Eichmann2018-11-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a fairly long-standing bug (dating back to 2015) in RdrName.bestImport, namely commit 9376249b6b78610db055a10d05f6592d6bbbea2f Author: Simon Peyton Jones <simonpj@microsoft.com> Date: Wed Oct 28 17:16:55 2015 +0000 Fix unused-import stuff in a better way In that patch got the sense of the comparison back to front, and thereby failed to implement the unused-import rules described in Note [Choosing the best import declaration] in RdrName This led to Trac #13064 and #15393 Fixing this bug revealed a bunch of unused imports in libraries; the ones in the GHC repo are part of this commit. The two important changes are * Fix the bug in bestImport * Modified the rules by adding (a) in Note [Choosing the best import declaration] in RdrName Reason: the previosu rules made Trac #5211 go bad again. And the new rule (a) makes sense to me. In unravalling this I also ended up doing a few other things * Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the things that are used, rather than [AvailInfo]. This is simpler and more direct. * Rename greParentName to greParent_maybe, to follow GHC naming conventions * Delete dead code RdrName.greUsedRdrName Bumps a few submodules. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27 Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5312
* LLVM: Use generic code for small size quot-rem opsPeter Trommler2018-11-221-2/+2
|
* Rename literal constructorsSylvain Henry2018-11-222-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | In a previous patch we replaced some built-in literal constructors (MachInt, MachWord, etc.) with a single LitNumber constructor. In this patch we replace the `Mach` prefix of the remaining constructors with `Lit` for consistency (e.g., LitChar, LitLabel, etc.). Sadly the name `LitString` was already taken for a kind of FastString and it would become misleading to have both `LitStr` (literal constructor renamed after `MachStr`) and `LitString` (FastString variant). Hence this patch renames the FastString variant `PtrString` (which is more accurate) and the literal string constructor now uses the least surprising `LitString` name. Both `Literal` and `LitString/PtrString` have recently seen breaking changes so doing this kind of renaming now shouldn't harm much. Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers Subscribers: tdammers, rwbarton, thomie, carter Differential Revision: https://phabricator.haskell.org/D4881
* Remove redundant check in cgCaseÖmer Sinan Ağacan2018-11-201-7/+4
| | | | | | | | | | | | | D5339 (part of D5324) removed the dead case binder analysis done during CoreToStg so this condition always holds now. Test Plan: Validated locally. Reviewers: sgraf, bgamari, simonmar Subscribers: rwbarton, carter Differential Revision: https://phabricator.haskell.org/D5358
* Don't track free variables in STG syntax by defaultSebastian Graf2018-11-194-29/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, `CoreToStg` annotates `StgRhsClosure`s with their set of non-global free variables. This free variable information is only needed in the final code generation step (i.e. `StgCmm.codeGen`), which leads to transformations such as `StgCse` and `StgUnarise` having to maintain this information. This is tiresome and unnecessary, so this patch introduces a trees-to-grow-like approach that only introduces the free variable set into the syntax tree in the code gen pass, along with a free variable analysis on STG terms to generate that information. Fixes #15754. Reviewers: simonpj, osa1, bgamari, simonmar Reviewed By: osa1 Subscribers: rwbarton, carter GHC Trac Issues: #15754 Differential Revision: https://phabricator.haskell.org/D5324