summaryrefslogtreecommitdiff
path: root/compiler/GHC/CmmToAsm
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64/Darwin] fix packed calling conv alignmentMoritz Angermann2021-08-021-6/+38
| | | | | | Apparently we need some padding as well. Fixes #20137
* PrimOps: Add CAS op for all int sizesPeter Trommler2021-08-022-3/+37
| | | | | | | | | | | PPC NCG: Implement CAS inline for 32 and 64 bit testsuite: Add tests for smaller atomic CAS X86 NCG: Catch calls to CAS C fallback Primops: Add atomicCasWord[8|16|32|64]Addr# Add tests for atomicCasWord[8|16|32|64]Addr# Add changelog entry for new primops X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch ghc-prim: 64-bit CAS C fallback on all archs
* PIC: test for cross-module referencesSylvain Henry2021-07-271-7/+4
|
* Fix #19931John Ericson2021-07-213-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | The issue was the renderer for x86 addressing modes assumes native size registers, but we were passing in a possibly-smaller index in conjunction with a native-sized base pointer. The easist thing to do is just extend the register first. I also changed the other NGC backends implementing jump tables accordingly. On one hand, I think PowerPC and Sparc don't have the small sub-registers anyways so there is less to worry about. On the other hand, to the extent that's true the zero extension can become a no-op. I should give credit where it's due: @hsyl20 really did all the work for me in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4717#note_355874, but I was daft and missed the "Oops" and so ended up spending a silly amount of time putting it all back together myself. The unregisterised backend change is a bit different, because here we are translating the actual case not a jump table, and the fix is to handle right-sized literals not addressing modes. But it makes sense to include here too because it's the same change in the subsequent commit that exposes both bugs.
* Add Word64#/Int64# primopsSylvain Henry2021-07-154-0/+124
| | | | | | | | | | | | | | | | | | | | | | | Word64#/Int64# are only used on 32-bit architectures. Before this patch, operations on these types were directly using the FFI. Now we use real primops that are then lowered into ccalls. The advantage of doing this is that we can now perform constant folding on Word64#/Int64# (#19024). Most of this work was done by John Ericson in !3658. However this patch doesn't go as far as e.g. changing Word64 to always be using Word64#. Noticeable performance improvements T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD Metric Decrease: T9203 haddock.Cabal haddock.base
* Fix #19889 - Invalid BMI2 instructions generated.wip/andreask/bim-fixAndreas Klebinger2021-07-062-24/+26
| | | | | When arguments are 8 *or 16* bits wide, then truncate before/after and use the 32bit operation.
* [aarch64 NCG] Add better support for sub-word primopsMoritz Angermann2021-06-233-35/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the intial NCG development, GHC did not have support for anything below Words. As such the NCG didn't support any of this either. AArch64-Darwin however needs support for subword, as arguments in excess of the first eight (8) passed via registers are passed on the stack, and there in a packed fashion. Thus ghc learned about subword sizes. This than lead us to gain subword primops, and these subsequently highlighted deficiencies in the AArch64 NCG. This patch rectifies the ones I found through via the test-suite. I do not claim this to be exhaustive. Fixes: #19993 Metric Increase: T10421 T13035 T13719 T14697 T1969 T9203 T9872a T9872b T9872c T9872d T9961 haddock.Cabal haddock.base parsing001
* Put tracing functions into their own moduleSylvain Henry2021-06-221-11/+7
| | | | | | | | Now that Outputable is independent of DynFlags, we can put tracing functions using SDocs into their own module that doesn't transitively depend on any GHC.Driver.* module. A few modules needed to be moved to avoid loops in DEBUG mode.
* PPC NCG: Fix panic in linear register allocatorPeter Trommler2021-06-161-1/+1
|
* Make Logger independent of DynFlagsSylvain Henry2021-06-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce LogFlags as a independent subset of DynFlags used for logging. As a consequence in many places we don't have to pass both Logger and DynFlags anymore. The main reason for this refactoring is that I want to refactor the systools interfaces: for now many systools functions use DynFlags both to use the Logger and to fetch their parameters (e.g. ldInputs for the linker). I'm interested in refactoring the way they fetch their parameters (i.e. use dedicated XxxOpts data types instead of DynFlags) for #19877. But if I did this refactoring before refactoring the Logger, we would have duplicate parameters (e.g. ldInputs from DynFlags and linkerInputs from LinkerOpts). Hence this patch first. Some flags don't really belong to LogFlags because they are subsystem specific (e.g. most DumpFlags). For example -ddump-asm should better be passed in NCGConfig somehow. This patch doesn't fix this tight coupling: the dump flags are part of the UI but they are passed all the way down for example to infer the file name for the dumps. Because LogFlags are a subset of the DynFlags, we must update the former when the latter changes (not so often). As a consequence we now use accessors to read/write DynFlags in HscEnv instead of using `hsc_dflags` directly. In the process I've also made some subsystems less dependent on DynFlags: - CmmToAsm: by passing some missing flags via NCGConfig (see new fields in GHC.CmmToAsm.Config) - Core.Opt.*: - by passing -dinline-check value into UnfoldingOpts - by fixing some Core passes interfaces (e.g. CallArity, FloatIn) that took DynFlags argument for no good reason. - as a side-effect GHC.Core.Opt.Pipeline.doCorePass is much less convoluted.
* Adds AArch64 Native Code GeneratorMoritz Angermann2021-06-0525-86/+3345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In which we add a new code generator to the Glasgow Haskell Compiler. This codegen supports ELF and Mach-O targets, thus covering Linux, macOS, and BSDs in principle. It was tested only on macOS and Linux. The NCG follows a similar structure as the other native code generators we already have, and should therfore be realtively easy to follow. It supports most of the features required for a proper native code generator, but does not claim to be perfect or fully optimised. There are still opportunities for optimisations. Metric Decrease: ManyAlternatives ManyConstructors MultiLayerModules PmSeriesG PmSeriesS PmSeriesT PmSeriesV T10421 T10421a T10858 T11195 T11276 T11303b T11374 T11822 T12227 T12545 T12707 T13035 T13253 T13253-spj T13379 T13701 T13719 T14683 T14697 T15164 T15630 T16577 T17096 T17516 T17836 T17836b T17977 T17977b T18140 T18282 T18304 T18478 T18698a T18698b T18923 T1969 T3064 T5030 T5321FD T5321Fun T5631 T5642 T5837 T783 T9198 T9233 T9630 T9872d T9961 WWRec Metric Increase: T4801
* Use GHC's State monad consistentlyBen Gamari2021-05-291-1/+1
| | | | | | | | | | | | | GHC's internal State monad benefits from oneShot annotations on its state, allowing for more aggressive eta expansion. We currently don't have monad transformers with the same optimisation, so we only change uses of the pure State monad here. See #19657 and 19380. Metric Decrease: hie002
* Split GHC.Utils.Monad.State into .Strict and .LazyBen Gamari2021-05-296-6/+6
|
* PPC NCG: Fix unsigned compare with 16-bit constantsPeter Trommler2021-05-191-1/+2
| | | | Fixes #19852 and #19609
* Cmm: fix sinking after suspendThreadSylvain Henry2021-05-193-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose a safe call: myCall(x,y,z) It is lowered into three unsafe calls in Cmm: r = suspendThread(...); myCall(x,y,z); resumeThread(r); Consider the following situation for myCall arguments: x = Sp[..] -- stack y = Hp[..] -- heap z = R1 -- global register r = suspendThread(...); myCall(x,y,z); resumeThread(r); The sink pass assumes that unsafe calls clobber memory (heap and stack), hence x and y assignments are not sunk after `suspendThread`. The sink pass also correctly handles global register clobbering for all unsafe calls, except `suspendThread`! `suspendThread` is special because it releases the capability the thread is running on. Hence the sink pass must also take into account global registers that are mapped into memory (in the capability). In the example above, we could get: r = suspendThread(...); z = R1 myCall(x,y,z); resumeThread(r); But this transformation isn't valid if R1 is (BaseReg->rR1) as BaseReg is invalid between suspendThread and resumeThread. This caused argument corruption at least with the C backend ("unregisterised") in #19237. Fix #19237
* Remove useless {-# LANGUAGE CPP #-} pragmasSylvain Henry2021-05-1221-25/+12
|
* Fully remove HsVersions.hSylvain Henry2021-05-1219-39/+0
| | | | | | | | | | Replace uses of WARN macro with calls to: warnPprTrace :: Bool -> SDoc -> a -> a Remove the now unused HsVersions.h Bump haddock submodule
* Replace CPP assertions with Haskell functionsSylvain Henry2021-05-124-24/+28
| | | | | | | | | | | | | | | There is no reason to use CPP. __LINE__ and __FILE__ macros are now better replaced with GHC's CallStack. As a bonus, assert error messages now contain more information (function name, column). Here is the mapping table (HasCallStack omitted): * ASSERT: assert :: Bool -> a -> a * MASSERT: massert :: Bool -> m () * ASSERTM: assertM :: m Bool -> m () * ASSERT2: assertPpr :: Bool -> SDoc -> a -> a * MASSERT2: massertPpr :: Bool -> SDoc -> m () * ASSERTM2: assertPprM :: m Bool -> SDoc -> m ()
* Replace (ptext .. sLit) with `text`Sylvain Henry2021-04-2912-611/+608
| | | | | | | | | | | | | | | 1. `text` is as efficient as `ptext . sLit` thanks to the rewrite rules 2. `text` is visually nicer than `ptext . sLit` 3. `ptext . sLit` encourages using one `ptext` for several `sLit` as in: ptext $ case xy of ... -> sLit ... ... -> sLit ... which may allocate SDoc's TextBeside constructors at runtime instead of sharing them into CAFs.
* Enhance pretty-printing perfSylvain Henry2021-04-101-31/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few refactorings made after looking at Core/STG * Use Doc instead of SDoc in pprASCII to avoid passing the SDocContext that is never used. * Inline every SDoc wrappers in GHC.Utils.Outputable to expose Doc constructs * Add text/[] rule for empty strings (i.e., text "") * Use a single occurrence of pprGNUSectionHeader * Use bangs on Platform parameters and some others Metric Decrease: ManyAlternatives ManyConstructors T12707 T13035 T13379 T18698a T18698b T1969 T3294 T4801 T5321FD T783
* Re-export GHC.Bits from GHC.Prelude with custom shift implementation.Andreas Klebinger2021-04-0910-10/+0
| | | | | | | This allows us to use the unsafe shifts in non-debug builds for performance. For older versions of base we instead export Data.Bits See also #19618
* CmmToAsm.Reg.Linear: oneShot-ify RegMwip/ncg-perfBen Gamari2021-03-241-16/+22
| | | | | | | | | | | | | ------------------------- Metric Decrease: T783 T4801 T12707 T13379 T3294 T4801 T5321FD -------------------------
* CmmToAsm.Reg.Linear: Use concat rather than repeated (++)Ben Gamari2021-03-241-2/+1
|
* PPC NCG: Fix int to float conversionPeter Trommler2021-03-231-6/+26
| | | | | | | | In commit 540fa6b2 integer to float conversions were changed to round to the nearest even. Implement a special case for 64 bit integer to single precision floating point numbers. Fixes #19563.
* Transfer tickish things to GHC.Types.TickishLuite Stegeman2021-03-203-3/+3
| | | | | Metric Increase: MultiLayerModules
* remove superfluous 'id' type parameter from GenTickishLuite Stegeman2021-03-201-2/+2
| | | | | The 'id' type is now determined by the pass, using the XTickishId type family.
* Save the type of breakpoints in the Breakpoint tick in STGLuite Stegeman2021-03-203-3/+3
| | | | | | | | GHCi needs to know the types of all breakpoints, but it's not possible to get the exprType of any expression in STG. This is preparation for the upcoming change to make GHCi bytecode from STG instead of Core.
* CmmtoAsm.Reg.Linear: Rewrite processBen Gamari2021-03-171-33/+29
| | | | | | CmmToAsm.Reg.Linear: More strictness More strictness
* CmmToAsm.Reg.Linear: Make linearRA body a join pointBen Gamari2021-03-171-19/+17
| | | | Avoid top-level recursion.
* Eliminate selector thunk allocationsBen Gamari2021-03-171-1/+1
|
* Require GHC 8.10 as the minimum compiler for bootstrappingRyan Scott2021-03-091-7/+0
| | | | | | | Now that GHC 9.0.1 is released, it is time to drop support for bootstrapping with GHC 8.8, as we only support building with the previous two major GHC releases. As an added bonus, this allows us to remove several bits of CPP that are either always true or no longer reachable.
* Implement riscv64 LLVM backendAndreas Schwab2021-03-054-0/+10
| | | | This enables a registerised build for the riscv64 architecture.
* Fix array and cleanup conversion primops (#19026)Sylvain Henry2021-03-031-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first change makes the array ones use the proper fixed-size types, which also means that just like before, they can be used without explicit conversions with the boxed sized types. (Before, it was Int# / Word# on both sides, now it is fixed sized on both sides). For the second change, don't use "extend" or "narrow" in some of the user-facing primops names for conversions. - Names like `narrowInt32#` are misleading when `Int` is 32-bits. - Names like `extendInt64#` are flat-out wrong when `Int is 32-bits. - `narrow{Int,Word}<N>#` however map a type to itself, and so don't suffer from this problem. They are left as-is. These changes are batched together because Alex happend to use the array ops. We can only use released versions of Alex at this time, sadly, and I don't want to have to have a release thatwon't work for the final GHC 9.2. So by combining these we get all the changes for Alex done at once. Bump hackage state in a few places, and also make that workflow slightly easier for the future. Bump minimum Alex version Bump Cabal, array, bytestring, containers, text, and binary submodules
* PPC NCG: print procedure end label for debugPeter Trommler2021-02-171-5/+11
| | | | Fixes #19118
* Drop GHC_LOADED_IN_GHCIBen Gamari2021-02-141-12/+0
| | | | | | | This previously supported the ghc-in-ghci script which has been since dropped. Hadrian's ghci support does not need this macro (which disabled uses of UnboxedTuples) since it uses `-fno-code` rather than produce bytecode.
* Fix typosBrian Wignall2021-02-063-3/+3
|
* Add explicit import lists to Data.List importsOleg Grenrus2021-01-295-5/+5
| | | | | | | | | | | | | Related to a future change in Data.List, https://downloads.haskell.org/ghc/8.10.3/docs/html/users_guide/using-warnings.html?highlight=wcompat#ghc-flag--Wcompat-unqualified-imports Companion pull&merge requests: - https://github.com/judah/haskeline/pull/153 - https://github.com/haskell/containers/pull/762 - https://gitlab.haskell.org/ghc/packages/hpc/-/merge_requests/9 After these the actual change in Data.List should be easy to do.
* RegAlloc: Add missing raPlatformfield to RegAllocStatsSpillAndreas Klebinger2020-11-262-2/+7
| | | | | | Fixes #18994 Co-Author: Benjamin Maurer <maurer.benjamin@gmail.com>
* [Sized Cmm] properly retain sizes.Moritz Angermann2020-11-261-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us with properly sized primitives in the codegenerator instead of pretending they are all full machine words. This came up when implementing darwinpcs for arm64. The darwinpcs reqires us to pack function argugments in excess of registers on the stack. While most procedure call standards (pcs) assume arguments are just passed in 8 byte slots; and thus the caller does not know the exact signature to make the call, darwinpcs requires us to adhere to the prototype, and thus have the correct sizes. If we specify CInt in the FFI call, it should correspond to the C int, and not just be Word sized, when it's only half the size. This does change the expected output of T16402 but the new result is no less correct as it eliminates the narrowing (instead of the `and` as was previously done). Bumps the array, bytestring, text, and binary submodules. Co-Authored-By: Ben Gamari <ben@well-typed.com> Metric Increase: T13701 T14697
* dwarf: Apply info table offset consistentlyBen Gamari2020-11-211-5/+19
| | | | | | | Previously we failed to apply the info table offset to the aranges and DIEs, meaning that we often failed to unwind in gdb. For some reason this only seemed to manifest in the RTS's Cmm closures. Nevertheless, now we can unwind completely up to `main`
* Add Addr# atomic primops (#17751)Sylvain Henry2020-11-161-5/+5
| | | | This reuses the codegen used for ByteArray#'s atomic primops.
* AArch64/arm64 adjustmentsMoritz Angermann2020-11-154-10/+10
| | | | | | | | This addes the necessary logic to support aarch64 on elf, as well as aarch64 on mach-o, which Apple calls arm64. We change architecture name to AArch64, which is the official arm naming scheme.
* nativeGen/dwarf: Use DW_AT_linkage instead of DW_AT_MIPS_linkageBen Gamari2020-11-152-3/+3
|
* nativeGen/dwarf: Only produce DW_AT_source_note DIEs in -g3Ben Gamari2020-11-152-5/+9
| | | | | Standard debugging tools don't know how to understand these so let's not produce them unless asked.
* nativeGen/dwarf: Fix procedure end addressesBen Gamari2020-11-153-15/+25
| | | | | | | | | | | | Previously the `.debug_aranges` and `.debug_info` (DIE) DWARF information would claim that procedures (represented with a `DW_TAG_subprogram` DIE) would only span the range covered by their entry block. This omitted all of the continuation blocks (represented by `DW_TAG_lexical_block` DIEs), confusing `perf`. Fix this by introducing a end-of-procedure label and using this as the `DW_AT_high_pc` of procedure `DW_TAG_subprogram` DIEs Fixes #17605.
* codeGen: Produce local symbols for module-internal functionsBen Gamari2020-11-113-2/+15
| | | | | | | | | | | | | | | | | | | | It turns out that some important native debugging/profiling tools (e.g. perf) rely only on symbol tables for function name resolution (as opposed to using DWARF DIEs). However, previously GHC would emit temporary symbols (e.g. `.La42b`) to identify module-internal entities. Such symbols are dropped during linking and therefore not visible to runtime tools (in addition to having rather un-helpful unique names). For instance, `perf report` would often end up attributing all cost to the libc `frame_dummy` symbol since Haskell code was no covered by any proper symbol (see #17605). We now rather follow the model of C compilers and emit descriptively-named local symbols for module internal things. Since this will increase object file size this behavior can be disabled with the `-fno-expose-internal-symbols` flag. With this `perf record` can finally be used against Haskell executables. Even more, with `-g3` `perf annotate` provides inline source code.
* Move this_module into NCGConfigBen Gamari2020-11-113-32/+29
| | | | | | In various places in the NCG we need the Module currently being compiled. Let's move this into the environment instead of chewing threw another register.
* Don't use LEA with 8-bit registers (#18614)Sylvain Henry2020-11-041-2/+6
|
* NCG: Fix 64bit int comparisons on 32bit x86Andreas Klebinger2020-11-042-30/+100
| | | | | | | | | | | We no compare these by doing 64bit subtraction and checking the resulting flags. We used to do this differently but the old approach was broken when the high bits compared equal and the comparison was one of >= or <=. The new approach should be both correct and faster.
* Add the proper HLint rules and remove redundant keywords from compilerHécate2020-11-016-85/+76
|