| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
`SourceNote`s should not be stored as [Char] as this is highly wasteful
and in certain scenarios can be highly duplicated.
Metric Decrease:
hard_hole_fits
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds eight new primops that fuse a multiplication and an
addition or subtraction:
- `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#`
fmadd x y z is x * y + z, computed with a single rounding step.
This patch implements code generation for these primops in the following
backends:
- X86, AArch64 and PowerPC NCG,
- LLVM
- C
WASM uses the C implementation. The primops are unsupported in the
JavaScript backend.
The following constant folding rules are also provided:
- compute a * b + c when a, b, c are all literals,
- x * y + 0 ==> x * y,
- ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x.
NB: the constant folding rules incorrectly handle signed zero.
This is a known limitation with GHC's floating-point constant folding
rules (#21227), which we hope to resolve in the future.
|
|
|
|
|
| |
Fixes #21054. Additionally, we can now check for range overlap
when generating Cmm for primops that use memcpy internally.
|
|
|
|
| |
Fixes #23206
|
|
|
|
|
|
|
|
|
|
| |
There is some disagreement regarding the prototype of
`__tsan_unaligned_write` (specifically whether it takes just the written
address, or the address and the value as an argument). Moreover, I have
observed crashes which appear to be due to it. Disable instrumentation
of unaligned stores as a temporary mitigation.
Fixes #23096.
|
|
|
|
|
|
| |
This reverts commit 3be48877, which weakened a Cmm Lint check involving
SIMD vectors. Now that we keep track of the type a global register is
used at, we can restore the original stronger check.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch tracks the type of Cmm global registers. This is needed
in order to lint uses of polymorphic registers, such as SIMD vector
registers that can be used both for floating-point and integer values.
This changes allows us to refactor VanillaReg to not store VGcPtr,
as that information is instead stored in the type of the usage of the
register.
Fixes #22297
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for calling Cmm code from bytecode using the native
calling convention, allowing modules that use `foreign import prim`
to be loaded and debugged in GHCi.
This patch introduces a new `PRIMCALL` bytecode instruction and
a helper stack frame `stg_primcall`. The code is based on the
existing functionality for dealing with unboxed tuples in bytecode,
which has been generalised to handle arbitrary calls.
Fixes #22051
|
|
|
|
| |
CmmDecl)]` but truly wants a `[(CAFSet, CmmStatics)]`.
|
|
|
|
|
|
| |
This introduces a new Cmm pass which instruments the program with
ThreadSanitizer annotations, allowing full tracking of mutator memory
accesses via TSAN.
|
|
|
|
|
|
|
| |
Originally I had thought I would just use the `prim` call syntax instead
of introducing new syntax for atomic loads. However, it turns out that
`prim` call syntax tends to make things quite unreadable. This new
syntax seems quite natural.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Unboxed sums might store a Int8# value as Int64#. This patch
makes sure we keep track of the actual value type.
See Note [Casting slot arguments] for the details.
|
|
|
|
| |
We actually only emit MO_S_MulMayOflo and never emit MO_U_MulMayOflo anywhere.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The changes in `GHC.Utils.Outputable` are the bulk of the patch
and drive the rest.
The types `HLine` and `HDoc` in Outputable can be used instead of `SDoc`
and support printing directly to a handle with `bPutHDoc`.
See Note [SDoc versus HDoc] and Note [HLine versus HDoc].
The classes `IsLine` and `IsDoc` are used to make the existing code polymorphic
over `HLine`/`HDoc` and `SDoc`. This is done for X86, PPC, AArch64, DWARF
and dependencies (printing module names, labels etc.).
Co-authored-by: Alexis King <lexi.lambda@gmail.com>
Metric Decrease:
CoOpt_Read
ManyAlternatives
ManyConstructors
T10421
T12425
T12707
T13035
T13056
T13253
T13379
T18140
T18282
T18698a
T18698b
T1969
T20049
T21839c
T21839r
T3064
T3294
T4801
T5321FD
T5321Fun
T5631
T6048
T783
T9198
T9233
|
|
|
|
|
| |
This patch adds the blob length field to CmmFileEmbed. The wasm32 NCG
needs to know the precise size of each data segment.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Pass FastStrings to functions directly, to make sure the rule
for fsLit "literal" fires.
Remove SDoc indirection in GHCi.UI.Tags and GHC.Unit.Module.Graph.
|
|
|
|
|
| |
Introduces GHC.Prelude.Basic which can be used in modules which are a
dependency of the ppr code.
|
|
|
|
|
|
|
|
|
|
|
| |
* Rename pprCLabel to pprCLabelStyle, and use the name pprCLabel
for a function using CStyle (analogous to pprAsmLabel)
* Move LabelStyle to the CLabel module, it no longer needs to be in Outputable.
* Move calls to 'text' right next to literals, to make sure the text/str
rule is triggered.
* Remove FastString/String roundtrip in Tc.Deriv.Generate
* Introduce showSDocForUser', which abstracts over a pattern in
GHCi.UI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently for a top-level closure in the form
hey = unpackCString# x
we generate code like this:
Main.hey_entry() // [R1]
{ info_tbls: [(c2T4,
label: Main.hey_info
rep: HeapRep static { Thunk }
srt: Nothing)]
stack_info: arg_space: 8 updfr_space: Just 8
}
{offset
c2T4: // global
_rqm::P64 = R1;
if ((Sp + 8) - 24 < SpLim) (likely: False) goto c2T5; else goto c2T6;
c2T5: // global
R1 = _rqm::P64;
call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
c2T6: // global
(_c2T1::I64) = call "ccall" arg hints: [PtrHint,
PtrHint] result hints: [PtrHint] newCAF(BaseReg, _rqm::P64);
if (_c2T1::I64 == 0) goto c2T3; else goto c2T2;
c2T3: // global
call (I64[_rqm::P64])() args: 8, res: 0, upd: 8;
c2T2: // global
I64[Sp - 16] = stg_bh_upd_frame_info;
I64[Sp - 8] = _c2T1::I64;
R2 = hey1_r2Gg_bytes;
Sp = Sp - 16;
call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24;
}
}
This code is generated for every string literal. Only difference between
top-level closures like this is the argument for the bytes of the string
(hey1_r2Gg_bytes in the code above).
With this patch we introduce a standard thunk in the RTS, called
stg_MK_STRING_info, that does what `unpackCString# x` does, except it
gets the bytes address from the payload. Using this, for the closure
above, we generate this:
Main.hey_closure" {
Main.hey_closure:
const stg_MK_STRING_info;
const 0; // padding for indirectee
const 0; // static link
const 0; // saved info
const hey1_r1Gg_bytes; // the payload
}
This is much smaller in code.
Metric Decrease:
T10421
T11195
T12150
T12425
T16577
T18282
T18698a
T18698b
Co-Authored By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is motivated by the desire to remove the {-# OPTIONS_GHC
-fno-warn-incomplete-patterns #-} directive at the top of
GHC.Cmm.ContFlowOpt. (Based on the text in this coding standards doc, I
understand it's a goal of the project to remove such directives.) I
chose this task because I'm a new contributor to GHC, and it seemed like
a good way to get acquainted with the patching process.
In order to address the warning that arose when I removed the no-warn
directive, I added a case to removeUnreachableBlocksProc to handle the
CmmData constructor. Clearly, since this partial function has not been
erroring out in the wild, its inputs are always in practice wrapped by
the CmmProc constructor. Therefore the CmmData case is handled by a
precise panic (which is an improvement over the partial pattern match
from before).
|
|
|
|
| |
Lets us avoid some use of `head` and `tail`, and some panics.
|
|
|
|
|
|
|
|
|
|
| |
As noted in #22297, SIMD vector registers can be used
to store different kinds of values, e.g. xmm1 can be used
both to store integer and floating point values.
The Cmm type system doesn't properly account for this, so
we weaken the Cmm register assignment lint check to only
compare widths when comparing a vector type with its
allocated vector register.
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes it so that packing/unpacking SIMD
vectors always uses the right sized types, e.g.
unpacking a Word16X4# will give a tuple of Word16#s.
As a result, we can get rid of the conversion instructions
that were previously required.
Fixes #22296
|
|
|
|
|
|
|
|
|
| |
This patch adds the missing `VecRep` case to `primRepSlot` function and
all the necessary machinery to carry this new `VecSlot` through code
generation. This allows programs involving unboxed sums of SIMD vectors
to be written and compiled.
Fixes #22187
|
|
|
|
| |
functions.
|
|
|
|
|
|
|
|
|
|
|
| |
Lint errors indicate an internal error in GHC, so it makes sense to use
it instead of the user style. This is consistent with Core Lint and STG Lint:
https://gitlab.haskell.org/ghc/ghc/-/blob/22096652/compiler/GHC/Core/Lint.hs#L429
https://gitlab.haskell.org/ghc/ghc/-/blob/22096652/compiler/GHC/Stg/Lint.hs#L144
Fixes #22218.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we refactor the representation of info table provenance information
in object code to significantly reduce its size and link-time impact.
Specifically, we deduplicate strings and represent them as 32-bit
offsets into a common string table.
In addition, we rework the registration logic to eliminate allocation
from the registration path, which is run from a static initializer where
things like allocation are technically undefined behavior (although it
did previously seem to work). For similar reasons we eliminate lock
usage from registration path, instead relying on atomic CAS.
Closes #22077.
|
|
|
|
| |
isInfoTableLabel does not take Cmm info table into account. This patch is required for data section layout of wasm32 NCG to work.
|
|
|
|
| |
Avoids some uses of `head` and `tail`, and some panics when an argument is null.
|
|
|
|
|
| |
This allows to avoid further partiality, e. g., map head . group is
replaced by map NE.head . NE.group, and there are less panic calls.
|
|
|
|
|
|
|
| |
* Replace 'text . show' and 'ppr' with 'int'.
* Remove Outputable.hs-boot, no longer needed
* Use pprWithCommas
* Factor out instructions in AArch64 codegen
|
|
|
|
|
|
|
|
|
|
| |
• Delete some dead code, largely under `GHC.Utils`.
• Clean up a few definitions in `GHC.Utils.(Misc, Monad)`.
• Clean up `GHC.Types.SrcLoc`.
• Derive stock `Functor, Foldable, Traversable` for more types.
• Derive more instances for newtypes.
Bump haddock submodule.
|
| |
|
|
|
|
|
|
|
| |
This fixes various typos and spelling mistakes
in the compiler.
Fixes #21891
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements GHC proposal 313, "Delimited continuation
primops", by adding native support for delimited continuations to the
GHC RTS.
All things considered, the patch is relatively small. It almost
exclusively consists of changes to the RTS; the compiler itself is
essentially unaffected. The primops come with fairly extensive Haddock
documentation, and an overview of the implementation strategy is given
in the Notes in rts/Continuation.c.
This first stab at the implementation prioritizes simplicity over
performance. Most notably, every continuation is always stored as a
single, contiguous chunk of stack. If one of these chunks is
particularly large, it can result in poor performance, as the current
implementation does not attempt to cleverly squeeze a subset of the
stack frames into the existing stack: it must fit all at once. If this
proves to be a performance issue in practice, a cleverer strategy would
be a worthwhile target for future improvements.
|
|
|
|
|
|
|
| |
Change calls to renderWithContext with showSDocOneLine; it's more
efficient and explanatory.
Remove polyPatSig (unused)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the SDocContext used for code generation contained
information whether the labels should use Asm or C style.
However, at every individual call site, this is known statically.
This removes the parameter to 'PprCode' and replaces every 'pdoc'
used to print a label in code style with 'pprCLabel' or 'pprAsmLabel'.
The OutputableP instance is now used only for dumps.
The output of T15155 changes, it now uses the Asm style
(which is faithful to what actually happens).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* Remove hack when printing OccNames. No longer needed since e3dcc0d5
* Remove unused `pprCmms` and `instance Outputable Instr`
* Simplify `pprCLabel` (no need to pass platform)
* Remove evil `Show`/`Eq` instances for `SDoc`. They were needed by
ImmLit, but that can take just a String instead.
* Remove instance `Outputable CLabel` - proper output of labels
needs a platform, and is done by the `OutputableP` instance
|
|
|
|
|
|
| |
These two uses constructed maps, which is a case where foldl' is
generally more efficient since we avoid constructing an intermediate
O(n)-depth stack.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The former behaviour of adding cost centres after optimization but
before unfoldings are created is not available via the flag
`prof-late-inline` instead.
I also reduced the overhead of -fprof-late* by pushing the cost centres
into lambdas. This means the cost centres will only account for
execution of functions and not their partial application.
Further I made LATE_CC cost centres it's own CC flavour so they now
won't clash with user defined ones if a user uses the same string for
a custom scc.
LateCC: Don't put cost centres inside constructor workers.
With -fprof-late they are rarely useful as the worker is usually
inlined. Even if the worker is not inlined or we use -fprof-late-linline
they are generally not helpful but bloat compile and run time
significantly. So we just don't add sccs inside constructor workers.
-------------------------
Metric Decrease:
T13701
-------------------------
|
| |
|
|
|
|
|
|
|
| |
Here we reorganize `GHC.Cmm` to eliminate the orphan `Outputable` and
`OutputableP` instances for the Cmm AST. This makes it significantly
easier to use the Cmm pretty-printers in tracing output without
incurring module import cycles.
|