| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Apparently we need some padding as well.
Fixes #20137
|
|
|
|
|
|
|
|
|
|
|
| |
PPC NCG: Implement CAS inline for 32 and 64 bit
testsuite: Add tests for smaller atomic CAS
X86 NCG: Catch calls to CAS C fallback
Primops: Add atomicCasWord[8|16|32|64]Addr#
Add tests for atomicCasWord[8|16|32|64]Addr#
Add changelog entry for new primops
X86 NCG: Fix MO-Cmpxchg W64 on 32-bit arch
ghc-prim: 64-bit CAS C fallback on all archs
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issue was the renderer for x86 addressing modes assumes native size
registers, but we were passing in a possibly-smaller index in
conjunction with a native-sized base pointer.
The easist thing to do is just extend the register first.
I also changed the other NGC backends implementing jump tables
accordingly. On one hand, I think PowerPC and Sparc don't have the small
sub-registers anyways so there is less to worry about. On the other
hand, to the extent that's true the zero extension can become a no-op.
I should give credit where it's due: @hsyl20 really did all the work for
me in
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4717#note_355874,
but I was daft and missed the "Oops" and so ended up spending a silly
amount of time putting it all back together myself.
The unregisterised backend change is a bit different, because here we
are translating the actual case not a jump table, and the fix is to
handle right-sized literals not addressing modes. But it makes sense to
include here too because it's the same change in the subsequent commit
that exposes both bugs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Word64#/Int64# are only used on 32-bit architectures. Before this patch,
operations on these types were directly using the FFI. Now we use real
primops that are then lowered into ccalls.
The advantage of doing this is that we can now perform constant folding on
Word64#/Int64# (#19024).
Most of this work was done by John Ericson in !3658. However this patch
doesn't go as far as e.g. changing Word64 to always be using Word64#.
Noticeable performance improvements
T9203(normal) run/alloc 89870808.0 66662456.0 -25.8% GOOD
haddock.Cabal(normal) run/alloc 14215777340.8 12780374172.0 -10.1% GOOD
haddock.base(normal) run/alloc 15420020877.6 13643834480.0 -11.5% GOOD
Metric Decrease:
T9203
haddock.Cabal
haddock.base
|
|
|
|
|
| |
When arguments are 8 *or 16* bits wide, then truncate before/after
and use the 32bit operation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During the intial NCG development, GHC did not have support for
anything below Words. As such the NCG didn't support any of this
either. AArch64-Darwin however needs support for subword, as
arguments in excess of the first eight (8) passed via registers
are passed on the stack, and there in a packed fashion. Thus
ghc learned about subword sizes. This than lead us to gain
subword primops, and these subsequently highlighted deficiencies
in the AArch64 NCG.
This patch rectifies the ones I found through via the test-suite.
I do not claim this to be exhaustive.
Fixes: #19993
Metric Increase:
T10421
T13035
T13719
T14697
T1969
T9203
T9872a
T9872b
T9872c
T9872d
T9961
haddock.Cabal
haddock.base
parsing001
|
|
|
|
|
|
|
|
| |
Now that Outputable is independent of DynFlags, we can put tracing
functions using SDocs into their own module that doesn't transitively
depend on any GHC.Driver.* module.
A few modules needed to be moved to avoid loops in DEBUG mode.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce LogFlags as a independent subset of DynFlags used for logging.
As a consequence in many places we don't have to pass both Logger and
DynFlags anymore.
The main reason for this refactoring is that I want to refactor the
systools interfaces: for now many systools functions use DynFlags both
to use the Logger and to fetch their parameters (e.g. ldInputs for the
linker). I'm interested in refactoring the way they fetch their
parameters (i.e. use dedicated XxxOpts data types instead of DynFlags)
for #19877. But if I did this refactoring before refactoring the Logger,
we would have duplicate parameters (e.g. ldInputs from DynFlags and
linkerInputs from LinkerOpts). Hence this patch first.
Some flags don't really belong to LogFlags because they are subsystem
specific (e.g. most DumpFlags). For example -ddump-asm should better be
passed in NCGConfig somehow. This patch doesn't fix this tight coupling:
the dump flags are part of the UI but they are passed all the way down
for example to infer the file name for the dumps.
Because LogFlags are a subset of the DynFlags, we must update the former
when the latter changes (not so often). As a consequence we now use
accessors to read/write DynFlags in HscEnv instead of using `hsc_dflags`
directly.
In the process I've also made some subsystems less dependent on DynFlags:
- CmmToAsm: by passing some missing flags via NCGConfig (see new fields
in GHC.CmmToAsm.Config)
- Core.Opt.*:
- by passing -dinline-check value into UnfoldingOpts
- by fixing some Core passes interfaces (e.g. CallArity, FloatIn)
that took DynFlags argument for no good reason.
- as a side-effect GHC.Core.Opt.Pipeline.doCorePass is much less
convoluted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In which we add a new code generator to the Glasgow Haskell
Compiler. This codegen supports ELF and Mach-O targets, thus covering
Linux, macOS, and BSDs in principle. It was tested only on macOS and
Linux. The NCG follows a similar structure as the other native code
generators we already have, and should therfore be realtively easy to
follow.
It supports most of the features required for a proper native code
generator, but does not claim to be perfect or fully optimised. There
are still opportunities for optimisations.
Metric Decrease:
ManyAlternatives
ManyConstructors
MultiLayerModules
PmSeriesG
PmSeriesS
PmSeriesT
PmSeriesV
T10421
T10421a
T10858
T11195
T11276
T11303b
T11374
T11822
T12227
T12545
T12707
T13035
T13253
T13253-spj
T13379
T13701
T13719
T14683
T14697
T15164
T15630
T16577
T17096
T17516
T17836
T17836b
T17977
T17977b
T18140
T18282
T18304
T18478
T18698a
T18698b
T18923
T1969
T3064
T5030
T5321FD
T5321Fun
T5631
T5642
T5837
T783
T9198
T9233
T9630
T9872d
T9961
WWRec
Metric Increase:
T4801
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC's internal State monad benefits from oneShot annotations on its
state, allowing for more aggressive eta expansion.
We currently don't have monad transformers with the same optimisation,
so we only change uses of the pure State monad here.
See #19657 and 19380.
Metric Decrease:
hie002
|
| |
|
|
|
|
| |
Fixes #19852 and #19609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Suppose a safe call: myCall(x,y,z)
It is lowered into three unsafe calls in Cmm:
r = suspendThread(...);
myCall(x,y,z);
resumeThread(r);
Consider the following situation for myCall arguments:
x = Sp[..] -- stack
y = Hp[..] -- heap
z = R1 -- global register
r = suspendThread(...);
myCall(x,y,z);
resumeThread(r);
The sink pass assumes that unsafe calls clobber memory (heap and stack),
hence x and y assignments are not sunk after `suspendThread`. The sink
pass also correctly handles global register clobbering for all unsafe
calls, except `suspendThread`!
`suspendThread` is special because it releases the capability the thread
is running on. Hence the sink pass must also take into account global
registers that are mapped into memory (in the capability).
In the example above, we could get:
r = suspendThread(...);
z = R1
myCall(x,y,z);
resumeThread(r);
But this transformation isn't valid if R1 is (BaseReg->rR1) as BaseReg
is invalid between suspendThread and resumeThread. This caused argument
corruption at least with the C backend ("unregisterised") in #19237.
Fix #19237
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Replace uses of WARN macro with calls to:
warnPprTrace :: Bool -> SDoc -> a -> a
Remove the now unused HsVersions.h
Bump haddock submodule
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason to use CPP. __LINE__ and __FILE__ macros are now
better replaced with GHC's CallStack. As a bonus, assert error messages
now contain more information (function name, column).
Here is the mapping table (HasCallStack omitted):
* ASSERT: assert :: Bool -> a -> a
* MASSERT: massert :: Bool -> m ()
* ASSERTM: assertM :: m Bool -> m ()
* ASSERT2: assertPpr :: Bool -> SDoc -> a -> a
* MASSERT2: massertPpr :: Bool -> SDoc -> m ()
* ASSERTM2: assertPprM :: m Bool -> SDoc -> m ()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. `text` is as efficient as `ptext . sLit` thanks to the rewrite rules
2. `text` is visually nicer than `ptext . sLit`
3. `ptext . sLit` encourages using one `ptext` for several `sLit` as in:
ptext $ case xy of
... -> sLit ...
... -> sLit ...
which may allocate SDoc's TextBeside constructors at runtime instead
of sharing them into CAFs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few refactorings made after looking at Core/STG
* Use Doc instead of SDoc in pprASCII to avoid passing the SDocContext
that is never used.
* Inline every SDoc wrappers in GHC.Utils.Outputable to expose Doc
constructs
* Add text/[] rule for empty strings (i.e., text "")
* Use a single occurrence of pprGNUSectionHeader
* Use bangs on Platform parameters and some others
Metric Decrease:
ManyAlternatives
ManyConstructors
T12707
T13035
T13379
T18698a
T18698b
T1969
T3294
T4801
T5321FD
T783
|
|
|
|
|
|
|
| |
This allows us to use the unsafe shifts in non-debug builds for performance.
For older versions of base we instead export Data.Bits
See also #19618
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-------------------------
Metric Decrease:
T783
T4801
T12707
T13379
T3294
T4801
T5321FD
-------------------------
|
| |
|
|
|
|
|
|
|
|
| |
In commit 540fa6b2 integer to float conversions were changed to round to
the nearest even. Implement a special case for 64 bit integer to single
precision floating point numbers.
Fixes #19563.
|
|
|
|
|
| |
Metric Increase:
MultiLayerModules
|
|
|
|
|
| |
The 'id' type is now determined by the pass, using the XTickishId
type family.
|
|
|
|
|
|
|
|
| |
GHCi needs to know the types of all breakpoints, but it's
not possible to get the exprType of any expression in STG.
This is preparation for the upcoming change to make GHCi
bytecode from STG instead of Core.
|
|
|
|
|
|
| |
CmmToAsm.Reg.Linear: More strictness
More strictness
|
|
|
|
| |
Avoid top-level recursion.
|
| |
|
|
|
|
|
|
|
| |
Now that GHC 9.0.1 is released, it is time to drop support for bootstrapping
with GHC 8.8, as we only support building with the previous two major GHC
releases. As an added bonus, this allows us to remove several bits of CPP that
are either always true or no longer reachable.
|
|
|
|
| |
This enables a registerised build for the riscv64 architecture.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first change makes the array ones use the proper fixed-size types,
which also means that just like before, they can be used without
explicit conversions with the boxed sized types. (Before, it was Int# /
Word# on both sides, now it is fixed sized on both sides).
For the second change, don't use "extend" or "narrow" in some of the
user-facing primops names for conversions.
- Names like `narrowInt32#` are misleading when `Int` is 32-bits.
- Names like `extendInt64#` are flat-out wrong when `Int is
32-bits.
- `narrow{Int,Word}<N>#` however map a type to itself, and so don't
suffer from this problem. They are left as-is.
These changes are batched together because Alex happend to use the array
ops. We can only use released versions of Alex at this time, sadly, and
I don't want to have to have a release thatwon't work for the final GHC
9.2. So by combining these we get all the changes for Alex done at once.
Bump hackage state in a few places, and also make that workflow slightly
easier for the future.
Bump minimum Alex version
Bump Cabal, array, bytestring, containers, text, and binary submodules
|
|
|
|
| |
Fixes #19118
|
|
|
|
|
|
|
| |
This previously supported the ghc-in-ghci script which has been since
dropped. Hadrian's ghci support does not need this macro (which disabled
uses of UnboxedTuples) since it uses `-fno-code` rather than produce
bytecode.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Related to a future change in Data.List,
https://downloads.haskell.org/ghc/8.10.3/docs/html/users_guide/using-warnings.html?highlight=wcompat#ghc-flag--Wcompat-unqualified-imports
Companion pull&merge requests:
- https://github.com/judah/haskeline/pull/153
- https://github.com/haskell/containers/pull/762
- https://gitlab.haskell.org/ghc/packages/hpc/-/merge_requests/9
After these the actual change in Data.List should be easy to do.
|
|
|
|
|
|
| |
Fixes #18994
Co-Author: Benjamin Maurer <maurer.benjamin@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with
Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us
with properly sized primitives in the codegenerator instead of pretending
they are all full machine words.
This came up when implementing darwinpcs for arm64. The darwinpcs reqires
us to pack function argugments in excess of registers on the stack. While
most procedure call standards (pcs) assume arguments are just passed in
8 byte slots; and thus the caller does not know the exact signature to make
the call, darwinpcs requires us to adhere to the prototype, and thus have
the correct sizes. If we specify CInt in the FFI call, it should correspond
to the C int, and not just be Word sized, when it's only half the size.
This does change the expected output of T16402 but the new result is no
less correct as it eliminates the narrowing (instead of the `and` as was
previously done).
Bumps the array, bytestring, text, and binary submodules.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
Metric Increase:
T13701
T14697
|
|
|
|
|
|
|
| |
Previously we failed to apply the info table offset to the aranges and
DIEs, meaning that we often failed to unwind in gdb. For some reason
this only seemed to manifest in the RTS's Cmm closures. Nevertheless,
now we can unwind completely up to `main`
|
|
|
|
| |
This reuses the codegen used for ByteArray#'s atomic primops.
|
|
|
|
|
|
|
|
| |
This addes the necessary logic to support aarch64 on elf, as well
as aarch64 on mach-o, which Apple calls arm64.
We change architecture name to AArch64, which is the official arm
naming scheme.
|
| |
|
|
|
|
|
| |
Standard debugging tools don't know how to understand these so let's not
produce them unless asked.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the `.debug_aranges` and `.debug_info` (DIE) DWARF
information would claim that procedures (represented with a
`DW_TAG_subprogram` DIE) would only span the range covered by their entry
block. This omitted all of the continuation blocks (represented by
`DW_TAG_lexical_block` DIEs), confusing `perf`. Fix this by introducing
a end-of-procedure label and using this as the `DW_AT_high_pc` of
procedure `DW_TAG_subprogram` DIEs
Fixes #17605.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that some important native debugging/profiling tools (e.g.
perf) rely only on symbol tables for function name resolution (as
opposed to using DWARF DIEs). However, previously GHC would emit
temporary symbols (e.g. `.La42b`) to identify module-internal
entities. Such symbols are dropped during linking and therefore not
visible to runtime tools (in addition to having rather un-helpful unique
names). For instance, `perf report` would often end up attributing all
cost to the libc `frame_dummy` symbol since Haskell code was no covered
by any proper symbol (see #17605).
We now rather follow the model of C compilers and emit
descriptively-named local symbols for module internal things. Since this
will increase object file size this behavior can be disabled with the
`-fno-expose-internal-symbols` flag.
With this `perf record` can finally be used against Haskell executables.
Even more, with `-g3` `perf annotate` provides inline source code.
|
|
|
|
|
|
| |
In various places in the NCG we need the Module currently being
compiled. Let's move this into the environment instead of chewing threw
another register.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We no compare these by doing 64bit subtraction and
checking the resulting flags.
We used to do this differently but the old approach was
broken when the high bits compared equal and the comparison
was one of >= or <=.
The new approach should be both correct and faster.
|
| |
|