summaryrefslogtreecommitdiff
path: root/compiler/GHC/Cmm
Commit message (Collapse)AuthorAgeFilesLines
* compiler: Use compact representation/FastStrings for `SourceNote`sZubin Duggal2023-05-162-2/+2
| | | | | | | | `SourceNote`s should not be stored as [Char] as this is highly wasteful and in certain scenarios can be highly duplicated. Metric Decrease: hard_hole_fits
* Add fused multiply-add instructionssheaf2023-05-112-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds eight new primops that fuse a multiplication and an addition or subtraction: - `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#` fmadd x y z is x * y + z, computed with a single rounding step. This patch implements code generation for these primops in the following backends: - X86, AArch64 and PowerPC NCG, - LLVM - C WASM uses the C implementation. The primops are unsupported in the JavaScript backend. The following constant folding rules are also provided: - compute a * b + c when a, b, c are all literals, - x * y + 0 ==> x * y, - ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x. NB: the constant folding rules incorrectly handle signed zero. This is a known limitation with GHC's floating-point constant folding rules (#21227), which we hope to resolve in the future.
* StgToCmm: Upgrade -fcheck-prim-bounds behaviorMatthew Craven2023-04-041-2/+5
| | | | | Fixes #21054. Additionally, we can now check for range overlap when generating Cmm for primops that use memcpy internally.
* cmm: implement parsing of MO_AtomicRMW from hand-written CMM filesBodigrim2023-04-021-0/+6
| | | | Fixes #23206
* codeGen/tsan: Disable instrumentation of unaligned storesBen Gamari2023-03-241-8/+7
| | | | | | | | | | There is some disagreement regarding the prototype of `__tsan_unaligned_write` (specifically whether it takes just the written address, or the address and the value as an argument). Moreover, I have observed crashes which appear to be due to it. Disable instrumentation of unaligned stores as a temporary mitigation. Fixes #23096.
* Revert "Cmm Lint: relax SIMD register assignment check"sheaf2023-01-311-14/+1
| | | | | | This reverts commit 3be48877, which weakened a Cmm Lint check involving SIMD vectors. Now that we keep track of the type a global register is used at, we can restore the original stronger check.
* Cmm: track the type of global registerssheaf2023-01-3115-331/+370
| | | | | | | | | | | | This patch tracks the type of Cmm global registers. This is needed in order to lint uses of polymorphic registers, such as SIMD vector registers that can be used both for floating-point and integer values. This changes allows us to refactor VanillaReg to not store VGcPtr, as that information is instead stored in the type of the usage of the register. Fixes #22297
* Add PrimCallConv support to GHCiLuite Stegeman2023-01-183-11/+108
| | | | | | | | | | | | | This adds support for calling Cmm code from bytecode using the native calling convention, allowing modules that use `foreign import prim` to be loaded and debugged in GHCi. This patch introduces a new `PRIMCALL` bytecode instruction and a helper stack frame `stg_primcall`. The code is based on the existing functionality for dealing with unboxed tuples in bytecode, which has been generalised to handle arbitrary calls. Fixes #22051
* Scrub some partiality in `GHC.Cmm.Info.Build`: `doSRTs` takes a `[(CAFSet, ↵M Farkas-Dyck2022-12-202-9/+7
| | | | CmmDecl)]` but truly wants a `[(CAFSet, CmmStatics)]`.
* codeGen: Introduce ThreadSanitizer instrumentationBen Gamari2022-12-153-0/+294
| | | | | | This introduces a new Cmm pass which instruments the program with ThreadSanitizer annotations, allowing full tracking of mutator memory accesses via TSAN.
* cmm/Parser: Atomic load syntaxBen Gamari2022-12-151-3/+23
| | | | | | | Originally I had thought I would just use the `prim` call syntax instead of introducing new syntax for atomic loads. However, it turns out that `prim` call syntax tends to make things quite unreadable. This new syntax seems quite natural.
* cmm/Parser: Add syntax for ordered loads and storesBen Gamari2022-12-152-4/+49
|
* cmm/Parser: Reduce some repetitionBen Gamari2022-12-151-29/+20
|
* cmm: Introduce MemoryOrderingsBen Gamari2022-12-151-3/+14
|
* cmm: Introduce blockConcatBen Gamari2022-12-152-1/+4
|
* Properly cast values when writing/reading unboxed sums.Andreas Klebinger2022-11-301-1/+2
| | | | | | | Unboxed sums might store a Int8# value as Int64#. This patch makes sure we keep track of the actual value type. See Note [Casting slot arguments] for the details.
* compiler: remove unused MO_U_MulMayOfloCheng Shao2022-11-282-5/+0
| | | | We actually only emit MO_S_MulMayOflo and never emit MO_U_MulMayOflo anywhere.
* Scrub some no-warning pragmas.M Farkas-Dyck2022-11-233-7/+2
|
* Use a more efficient printer for code generation (#21853)Krzysztof Gogolewski2022-11-114-34/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes in `GHC.Utils.Outputable` are the bulk of the patch and drive the rest. The types `HLine` and `HDoc` in Outputable can be used instead of `SDoc` and support printing directly to a handle with `bPutHDoc`. See Note [SDoc versus HDoc] and Note [HLine versus HDoc]. The classes `IsLine` and `IsDoc` are used to make the existing code polymorphic over `HLine`/`HDoc` and `SDoc`. This is done for X86, PPC, AArch64, DWARF and dependencies (printing module names, labels etc.). Co-authored-by: Alexis King <lexi.lambda@gmail.com> Metric Decrease: CoOpt_Read ManyAlternatives ManyConstructors T10421 T12425 T12707 T13035 T13056 T13253 T13379 T18140 T18282 T18698a T18698b T1969 T20049 T21839c T21839r T3064 T3294 T4801 T5321FD T5321Fun T5631 T6048 T783 T9198 T9233
* compiler: annotate CmmFileEmbed with blob lengthCheng Shao2022-11-111-3/+3
| | | | | This patch adds the blob length field to CmmFileEmbed. The wasm32 NCG needs to know the precise size of each data segment.
* add new modules for reducibility and WebAssembly translationNorman Ramsey2022-11-111-0/+224
|
* Fix Cmm symbol kindCheng Shao2022-11-111-3/+6
|
* Minor refactor around FastStringsKrzysztof Gogolewski2022-11-052-12/+13
| | | | | | | Pass FastStrings to functions directly, to make sure the rule for fsLit "literal" fires. Remove SDoc indirection in GHCi.UI.Tags and GHC.Unit.Module.Graph.
* Export pprTrace and friends from GHC.Prelude.Andreas Klebinger2022-11-031-1/+0
| | | | | Introduces GHC.Prelude.Basic which can be used in modules which are a dependency of the ppr code.
* Minor SDoc-related cleanupKrzysztof Gogolewski2022-10-283-13/+27
| | | | | | | | | | | * Rename pprCLabel to pprCLabelStyle, and use the name pprCLabel for a function using CStyle (analogous to pprAsmLabel) * Move LabelStyle to the CLabel module, it no longer needs to be in Outputable. * Move calls to 'text' right next to literals, to make sure the text/str rule is triggered. * Remove FastString/String roundtrip in Tc.Deriv.Generate * Introduce showSDocForUser', which abstracts over a pattern in GHCi.UI
* Introduce a standard thunk for allocating stringsÖmer Sinan Ağacan2022-10-223-24/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently for a top-level closure in the form hey = unpackCString# x we generate code like this: Main.hey_entry() // [R1] { info_tbls: [(c2T4, label: Main.hey_info rep: HeapRep static { Thunk } srt: Nothing)] stack_info: arg_space: 8 updfr_space: Just 8 } {offset c2T4: // global _rqm::P64 = R1; if ((Sp + 8) - 24 < SpLim) (likely: False) goto c2T5; else goto c2T6; c2T5: // global R1 = _rqm::P64; call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8; c2T6: // global (_c2T1::I64) = call "ccall" arg hints: [PtrHint, PtrHint] result hints: [PtrHint] newCAF(BaseReg, _rqm::P64); if (_c2T1::I64 == 0) goto c2T3; else goto c2T2; c2T3: // global call (I64[_rqm::P64])() args: 8, res: 0, upd: 8; c2T2: // global I64[Sp - 16] = stg_bh_upd_frame_info; I64[Sp - 8] = _c2T1::I64; R2 = hey1_r2Gg_bytes; Sp = Sp - 16; call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24; } } This code is generated for every string literal. Only difference between top-level closures like this is the argument for the bytes of the string (hey1_r2Gg_bytes in the code above). With this patch we introduce a standard thunk in the RTS, called stg_MK_STRING_info, that does what `unpackCString# x` does, except it gets the bytes address from the payload. Using this, for the closure above, we generate this: Main.hey_closure" { Main.hey_closure: const stg_MK_STRING_info; const 0; // padding for indirectee const 0; // static link const 0; // saved info const hey1_r1Gg_bytes; // the payload } This is much smaller in code. Metric Decrease: T10421 T11195 T12150 T12425 T16577 T18282 T18698a T18698b Co-Authored By: Ben Gamari <ben@well-typed.com>
* remove a no-warn directive from GHC.Cmm.ContFlowOptCurran McConnell2022-10-212-6/+9
| | | | | | | | | | | | | | | | | This patch is motivated by the desire to remove the {-# OPTIONS_GHC -fno-warn-incomplete-patterns #-} directive at the top of GHC.Cmm.ContFlowOpt. (Based on the text in this coding standards doc, I understand it's a goal of the project to remove such directives.) I chose this task because I'm a new contributor to GHC, and it seemed like a good way to get acquainted with the patching process. In order to address the warning that arose when I removed the no-warn directive, I added a case to removeUnreachableBlocksProc to handle the CmmData constructor. Clearly, since this partial function has not been erroring out in the wild, its inputs are always in practice wrapped by the CmmProc constructor. Therefore the CmmData case is handled by a precise panic (which is an improvement over the partial pattern match from before).
* Scrub various partiality involving lists (again).M Farkas-Dyck2022-10-191-4/+4
| | | | Lets us avoid some use of `head` and `tail`, and some panics.
* Cmm Lint: relax SIMD register assignment checksheaf2022-10-191-3/+15
| | | | | | | | | | As noted in #22297, SIMD vector registers can be used to store different kinds of values, e.g. xmm1 can be used both to store integer and floating point values. The Cmm type system doesn't properly account for this, so we weaken the Cmm register assignment lint check to only compare widths when comparing a vector type with its allocated vector register.
* Remove SIMD conversionssheaf2022-10-191-5/+5
| | | | | | | | | | | This patch makes it so that packing/unpacking SIMD vectors always uses the right sized types, e.g. unpacking a Word16X4# will give a tuple of Word16#s. As a result, we can get rid of the conversion instructions that were previously required. Fixes #22296
* Add VecSlot for unboxed sums of SIMD vectorsDai2022-10-191-1/+2
| | | | | | | | | This patch adds the missing `VecRep` case to `primRepSlot` function and all the necessary machinery to carry this new `VecSlot` through code generation. This allows programs involving unboxed sums of SIMD vectors to be written and compiled. Fixes #22187
* Make `Functor` a superclass of `TrieMap`, which lets us derive the `map` ↵M Farkas-Dyck2022-10-181-1/+0
| | | | functions.
* Make Cmm Lint messages use dump styleKrzysztof Gogolewski2022-10-111-1/+2
| | | | | | | | | | | Lint errors indicate an internal error in GHC, so it makes sense to use it instead of the user style. This is consistent with Core Lint and STG Lint: https://gitlab.haskell.org/ghc/ghc/-/blob/22096652/compiler/GHC/Core/Lint.hs#L429 https://gitlab.haskell.org/ghc/ghc/-/blob/22096652/compiler/GHC/Stg/Lint.hs#L144 Fixes #22218.
* Refactor IPE initializationBen Gamari2022-10-112-5/+13
| | | | | | | | | | | | | | | Here we refactor the representation of info table provenance information in object code to significantly reduce its size and link-time impact. Specifically, we deduplicate strings and represent them as 32-bit offsets into a common string table. In addition, we rework the registration logic to eliminate allocation from the registration path, which is run from a static initializer where things like allocation are technically undefined behavior (although it did previously seem to work). For similar reasons we eliminate lock usage from registration path, instead relying on atomic CAS. Closes #22077.
* CLabel: fix isInfoTableLabelCheng Shao2022-10-111-0/+7
| | | | isInfoTableLabel does not take Cmm info table into account. This patch is required for data section layout of wasm32 NCG to work.
* Scrub various partiality involving empty lists.M Farkas-Dyck2022-09-302-6/+7
| | | | Avoids some uses of `head` and `tail`, and some panics when an argument is null.
* Avoid Data.List.group; prefer Data.List.NonEmpty.groupBodigrim2022-09-281-4/+3
| | | | | This allows to avoid further partiality, e. g., map head . group is replaced by map NE.head . NE.group, and there are less panic calls.
* Minor refactor around OutputableKrzysztof Gogolewski2022-09-222-5/+5
| | | | | | | * Replace 'text . show' and 'ppr' with 'int'. * Remove Outputable.hs-boot, no longer needed * Use pprWithCommas * Factor out instructions in AArch64 codegen
* Clean up some. In particular:M Farkas-Dyck2022-09-174-42/+27
| | | | | | | | | | • Delete some dead code, largely under `GHC.Utils`. • Clean up a few definitions in `GHC.Utils.(Misc, Monad)`. • Clean up `GHC.Types.SrcLoc`. • Derive stock `Functor, Foldable, Traversable` for more types. • Derive more instances for newtypes. Bump haddock submodule.
* Fix typosKrzysztof Gogolewski2022-09-141-1/+1
|
* Fix typosEric Lindblad2022-09-146-9/+9
| | | | | | | This fixes various typos and spelling mistakes in the compiler. Fixes #21891
* Add native delimited continuations to the RTSAlexis King2022-09-111-5/+4
| | | | | | | | | | | | | | | | | | | | | This patch implements GHC proposal 313, "Delimited continuation primops", by adding native support for delimited continuations to the GHC RTS. All things considered, the patch is relatively small. It almost exclusively consists of changes to the RTS; the compiler itself is essentially unaffected. The primops come with fairly extensive Haddock documentation, and an overview of the implementation strategy is given in the Notes in rts/Continuation.c. This first stab at the implementation prioritizes simplicity over performance. Most notably, every continuation is always stored as a single, contiguous chunk of stack. If one of these chunks is particularly large, it can result in poor performance, as the current implementation does not attempt to cleverly squeeze a subset of the stack frames into the existing stack: it must fit all at once. If this proves to be a performance issue in practice, a cleverer strategy would be a worthwhile target for future improvements.
* Minor SDoc cleanupKrzysztof Gogolewski2022-09-072-5/+3
| | | | | | | Change calls to renderWithContext with showSDocOneLine; it's more efficient and explanatory. Remove polyPatSig (unused)
* Remove label style from printing contextKrzysztof Gogolewski2022-08-265-16/+20
| | | | | | | | | | | | Previously, the SDocContext used for code generation contained information whether the labels should use Asm or C style. However, at every individual call site, this is known statically. This removes the parameter to 'PprCode' and replaces every 'pdoc' used to print a label in code style with 'pprCLabel' or 'pprAsmLabel'. The OutputableP instance is now used only for dumps. The output of T15155 changes, it now uses the Asm style (which is faithful to what actually happens).
* Scrub some partiality in `CommonBlockElim`.M Farkas-Dyck2022-08-251-10/+9
|
* Cleanups around pretty-printingKrzysztof Gogolewski2022-08-091-10/+7
| | | | | | | | | | * Remove hack when printing OccNames. No longer needed since e3dcc0d5 * Remove unused `pprCmms` and `instance Outputable Instr` * Simplify `pprCLabel` (no need to pass platform) * Remove evil `Show`/`Eq` instances for `SDoc`. They were needed by ImmLit, but that can take just a String instead. * Remove instance `Outputable CLabel` - proper output of labels needs a platform, and is done by the `OutputableP` instance
* compiler: Eliminate two uses of foldr in favor of foldl'Ben Gamari2022-08-062-2/+2
| | | | | | These two uses constructed maps, which is a case where foldl' is generally more efficient since we avoid constructing an intermediate O(n)-depth stack.
* Change `-fprof-late` to insert cost centres after unfolding creation.Andreas Klebinger2022-08-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The former behaviour of adding cost centres after optimization but before unfoldings are created is not available via the flag `prof-late-inline` instead. I also reduced the overhead of -fprof-late* by pushing the cost centres into lambdas. This means the cost centres will only account for execution of functions and not their partial application. Further I made LATE_CC cost centres it's own CC flavour so they now won't clash with user defined ones if a user uses the same string for a custom scc. LateCC: Don't put cost centres inside constructor workers. With -fprof-late they are rarely useful as the worker is usually inlined. Even if the worker is not inlined or we use -fprof-late-linline they are generally not helpful but bloat compile and run time significantly. So we just don't add sccs inside constructor workers. ------------------------- Metric Decrease: T13701 -------------------------
* cmm: Move toBlockList to GHC.CmmBen Gamari2022-07-163-5/+0
|
* cmm: Eliminate orphan Outputable instancesBen Gamari2022-07-1615-1110/+812
| | | | | | | Here we reorganize `GHC.Cmm` to eliminate the orphan `Outputable` and `OutputableP` instances for the Cmm AST. This makes it significantly easier to use the Cmm pretty-printers in tracing output without incurring module import cycles.