| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Add StgToCmm module hierarchy. Platform modules that are used in several
other places (NCG, LLVM codegen, Cmm transformations) are put into
GHC.Platform.
|
|
|
|
| |
[skip ci]
|
|
|
|
|
|
| |
`quot` and `rem` are implemented efficiently when the second argument
is a constant power of 2. This patch uses the same implementations for
`quotRem` primop.
|
|
|
|
|
|
|
|
|
|
|
| |
These kinds of imports are necessary in some cases such as
importing instances of typeclasses or intentionally creating
dependencies in the build system, but '-Wunused-imports' can't
detect when they are no longer needed. This commit removes the
unused ones currently in the code base (not including test files
or submodules), with the hope that doing so may increase
parallelism in the build system by removing unnecessary
dependencies.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Also adds Note [Getting from RuntimeRep to PrimRep], which
deocuments a related thorny process.
This Note addresses #16964, which correctly observes that
documentation for this thorny design is lacking.
Documentation only.
|
|
|
|
|
|
|
| |
This prepares the way for making Int32# and Word32# the actual size they
claim to be.
Updates binary submodule for (de)serializing the new runtime reps.
|
|
|
|
|
|
|
| |
Unfortunately this will require more work; register allocation is
quite broken.
This reverts commit acd795583625401c5554f8e04ec7efca18814011.
|
|
|
|
|
|
|
|
|
| |
Adds stripStgTicksTopE which only returns the stripped expression.
So far we also allocated a list for the stripped ticks which was
never used.
Allocation difference is as expected very small but present.
About 0.02% difference when compiling with -O.
|
|
|
|
|
|
|
| |
This adds support for constructing vector types from Float#, Double# etc
and performing arithmetic operations on them
Cleaned-Up-By: Ben Gamari <ben@well-typed.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here the following changes are introduced:
- A read barrier machine op is added to Cmm.
- The order in which a closure's fields are read and written is changed.
- Memory barriers are added to RTS code to ensure correctness on
out-or-order machines with weak memory ordering.
Cmm has a new CallishMachOp called MO_ReadBarrier. On weak memory machines, this
is lowered to an instruction that ensures memory reads that occur after said
instruction in program order are not performed before reads coming before said
instruction in program order. On machines with strong memory ordering properties
(e.g. X86, SPARC in TSO mode) no such instruction is necessary, so
MO_ReadBarrier is simply erased. However, such an instruction is necessary on
weakly ordered machines, e.g. ARM and PowerPC.
Weam memory ordering has consequences for how closures are observed and mutated.
For example, consider a closure that needs to be updated to an indirection. In
order for the indirection to be safe for concurrent observers to enter, said
observers must read the indirection's info table before they read the
indirectee. Furthermore, the entering observer makes assumptions about the
closure based on its info table contents, e.g. an INFO_TYPE of IND imples the
closure has an indirectee pointer that is safe to follow.
When a closure is updated with an indirection, both its info table and its
indirectee must be written. With weak memory ordering, these two writes can be
arbitrarily reordered, and perhaps even interleaved with other threads' reads
and writes (in the absence of memory barrier instructions). Consider this
example of a bad reordering:
- An updater writes to a closure's info table (INFO_TYPE is now IND).
- A concurrent observer branches upon reading the closure's INFO_TYPE as IND.
- A concurrent observer reads the closure's indirectee and enters it. (!!!)
- An updater writes the closure's indirectee.
Here the update to the indirectee comes too late and the concurrent observer has
jumped off into the abyss. Speculative execution can also cause us issues,
consider:
- An observer is about to case on a value in closure's info table.
- The observer speculatively reads one or more of closure's fields.
- An updater writes to closure's info table.
- The observer takes a branch based on the new info table value, but with the
old closure fields!
- The updater writes to the closure's other fields, but its too late.
Because of these effects, reads and writes to a closure's info table must be
ordered carefully with respect to reads and writes to the closure's other
fields, and memory barriers must be placed to ensure that reads and writes occur
in program order. Specifically, updates to a closure must follow the following
pattern:
- Update the closure's (non-info table) fields.
- Write barrier.
- Update the closure's info table.
Observing a closure's fields must follow the following pattern:
- Read the closure's info pointer.
- Read barrier.
- Read the closure's (non-info table) fields.
This patch updates RTS code to obey this pattern. This should fix long-standing
SMP bugs on ARM (specifically newer aarch64 microarchitectures supporting
out-of-order execution) and PowerPC. This fixes issue #15449.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
|
| |
|
|
|
|
|
|
|
| |
ghc-pkg needs to be aware of platforms so it can figure out which
subdire within the user package db to use. This is admittedly
roundabout, but maybe Cabal could use the same notion of a platform as
GHC to good affect too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC Proposal: 0013-unlifted-newtypes.rst
Discussion: https://github.com/ghc-proposals/ghc-proposals/pull/98
Issues: #15219, #1311, #13595, #15883
Implementation Details:
Note [Implementation of UnliftedNewtypes]
Note [Unifying data family kinds]
Note [Compulsory newtype unfolding]
This patch introduces the -XUnliftedNewtypes extension. When this
extension is enabled, GHC drops the restriction that the field in
a newtype must be of kind (TYPE 'LiftedRep). This allows types
like Int# and ByteArray# to be used in a newtype. Additionally,
coerce is made levity-polymorphic so that it can be used with
newtypes over unlifted types.
The bulk of the changes are in TcTyClsDecls.hs. With -XUnliftedNewtypes,
getInitialKind is more liberal, introducing a unification variable to
return the kind (TYPE r0) rather than just returning (TYPE 'LiftedRep).
When kind-checking a data constructor with kcConDecl, we attempt to
unify the kind of a newtype with the kind of its field's type. When
typechecking a data declaration with tcTyClDecl, we again perform a
unification. See the implementation note for more on this.
Co-authored-by: Richard Eisenberg <rae@richarde.dev>
|
|
|
|
| |
Fixes #16696
|
| |
|
|
|
|
|
| |
Previously log and exp were primitives yet log1p and expm1 were FFI
calls. Fix this non-uniformity.
|
|
|
|
|
|
| |
[skip ci]
This should really be caught by the linters! (#16711)
|
|
|
|
| |
arguments that have an unlifted boxed type. We used to use the type of the argument. We now use the type of the foreign function. Add a test to confirm that the roundtrip conversion between an unlifted boxed type and Any is sound in the presence of a foreign function call.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we would generate a local variable pointing after the array
header and use it to initialize the array elements. But we already use
stores with offset, so it's easy to just add the header to those offsets
during compilation and avoid generating the local variable (which would
become a LEA instruction when using native codegen; LLVM already
optimizes it away).
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a new closure identifier is being established to a
local or exported closure already emitted into the same
module, refrain from adding an IND_STATIC closure, and
instead emit an assembly-language alias.
Inter-module IND_STATIC objects still remain, and need to be
addressed by other measures.
Binary-size savings on nofib are around 0.1%.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* simplifies registers to have GPR, Float and Double, by removing the SSE2 and X87 Constructors
* makes -msse2 assumed/default for x86 platforms, fixing a long standing nondeterminism in rounding
behavior in 32bit haskell code
* removes the 80bit floating point representation from the supported float sizes
* theres still 1 tiny bit of x87 support needed,
for handling float and double return values in FFI calls wrt the C ABI on x86_32,
but this one piece does not leak into the rest of NCG.
* Lots of code thats not been touched in a long time got deleted as a
consequence of all of this
all in all, this change paves the way towards a lot of future further
improvements in how GHC handles floating point computations, along with
making the native code gen more accessible to a larger pool of contributors.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #16052
When the offset in `setByteArray#` is statically known, we can provide
better alignment guarantees then just 1 byte.
Also, memset can now do 64-bit wide sets.
The current memset intrinsic is not optimal however and can be
improved for the case when we know that we deal with
(baseAddress at known alignment) + offset
For instance, on 64-bit
`setByteArray# s 1# 23# 0#`
given that bytearray is 8 bytes aligned could be unrolled into
`movb, movw, movl, movq, movq`; but currently it is
`movb x23` since alignment of 1 is all we can embed into MO_Memset op.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GHC has an optimization for allocating arrays when the size is
statically known -- it'll generate the code allocating and initializing
the array inline (instead of a call to a procedure from
`rts/PrimOps.cmm`).
However, the generated code uses a loop to do the initialization. Since
we already check that the requested size is small (we check against
`maxInlineAllocSize`), we can generate faster straightline code instead.
This brings about 15% improvement for `newSmallArray#` in my testing and
slightly simplifies the code in GHC.
Signed-off-by: Michal Terepeta <michal.terepeta@gmail.com>
|
|
|
|
| |
Also removes a couple unnecessary MagicHash pragmas
|
|
|
|
|
|
| |
This commit includes the necessary changes in code and
documentation to support a primop that reverses a word's
bits. It also includes a test.
|
|
|
|
|
|
|
|
|
|
| |
- `emitCopySmallArray` now checks size before generating code and
doesn't generate any code when size is 0. `emitCopyArray` already does
this so this makes small/large array cases the same in argument
checking.
- In both `emitCopySmallArray` and `emitCopyArray` read the `dflags`
after checking the argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves all URL references to Trac Wiki to their corresponding
GitLab counterparts.
This substitution is classified as follows:
1. Automated substitution using sed with Ben's mapping rule [1]
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...
2. Manual substitution for URLs containing `#` index
Old: ghc.haskell.org/trac/ghc/wiki/XxxYyy...#Zzz
New: gitlab.haskell.org/ghc/ghc/wikis/xxx-yyy...#zzz
3. Manual substitution for strings starting with `Commentary`
Old: Commentary/XxxYyy...
New: commentary/xxx-yyy...
See also !539
[1]: https://gitlab.haskell.org/bgamari/gitlab-migration/blob/master/wiki-mapping.json
|
|
|
|
|
| |
This moves all URL references to Trac tickets to their corresponding
GitLab counterparts.
|
|
|
|
| |
Also remove unused arg from get_Regtable_addr_from_offset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The splitter is an evil Perl script that processes assembler code.
Its job can be done better by the linker's --gc-sections flag. GHC
passes this flag to the linker whenever -split-sections is passed on
the command line.
This is based on @DemiMarie's D2768.
Fixes Trac #11315
Fixes Trac #9832
Fixes Trac #8964
Fixes Trac #8685
Fixes Trac #8629
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The big payload of this patch is:
Add an AnonArgFlag to the FunTy constructor
of Type, so that
(FunTy VisArg t1 t2) means (t1 -> t2)
(FunTy InvisArg t1 t2) means (t1 => t2)
The big payoff is that we have a simple, local test to make
when decomposing a type, leading to many fewer calls to
isPredTy. To me the code seems a lot tidier, and probably
more efficient (isPredTy has to take the kind of the type).
See Note [Function types] in TyCoRep.
There are lots of consequences
* I made FunTy into a record, so that it'll be easier
when we add a linearity field, something that is coming
down the road.
* Lots of code gets touched in a routine way, simply because it
pattern matches on FunTy.
* I wanted to make a pattern synonym for (FunTy2 arg res), which
picks out just the argument and result type from the record. But
alas the pattern-match overlap checker has a heart attack, and
either reports false positives, or takes too long. In the end
I gave up on pattern synonyms.
There's some commented-out code in TyCoRep that shows what I
wanted to do.
* Much more clarity about predicate types, constraint types
and (in particular) equality constraints in kinds. See TyCoRep
Note [Types for coercions, predicates, and evidence]
and Note [Constraints in kinds].
This made me realise that we need an AnonArgFlag on
AnonTCB in a TyConBinder, something that was really plain
wrong before. See TyCon Note [AnonTCB InivsArg]
* When building function types we must know whether we
need VisArg (mkVisFunTy) or InvisArg (mkInvisFunTy).
This turned out to be pretty easy in practice.
* Pretty-printing of types, esp in IfaceType, gets
tidier, because we were already recording the (->)
vs (=>) distinction in an ad-hoc way. Death to
IfaceFunTy.
* mkLamType needs to keep track of whether it is building
(t1 -> t2) or (t1 => t2). See Type
Note [mkLamType: dictionary arguments]
Other minor stuff
* Some tidy-up in validity checking involving constraints;
Trac #16263
|
|
|
|
| |
Also used ByteString in some other relevant places
|
|
|
|
|
|
|
| |
Support for Mac OS X on PowerPC has been dropped by Apple years ago. We
follow suit and remove PowerPC support for Darwin.
Fixes #16106.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle Int*QuotRemOP and Word*QuotRemOp in PPC NCG.
Refactor common code with remainder operation.
Test Plan: validate (I validated on Linux powerpc64le and x86_64)
Reviewers: erikd, hvr, bgamari, simonmar
Reviewed By: bgamari
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5323
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, an AP thunk like `sat = f a b c` will not have its own entry
point and info pointer and will instead reuse a generic apply thunk
like `stg_ap_4_upd`.
That's great from a code size perspective, but if `f` is a known
function, a specialised entry point with a plain call can be much faster
than figuring out the arity and doing dynamic dispatch.
This looks at `f`s arity to figure out if it is a known function and if so, it
will not lower it to a generic apply function.
Benchmark results are encouraging: No changes to allocation, but 0.2% less
counted instructions.
Test Plan: Validates locally
Reviewers: simonmar, osa1, simonpj, bgamari
Reviewed By: simonpj
Subscribers: rwbarton, carter
GHC Trac Issues: #16005
Differential Revision: https://phabricator.haskell.org/D5414
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, the logic that checks whether a thunk has a counter or not
was duplicated in multiple functions.
This led to thunk enters being accounted to their enclosing functions in
`StgCmmTicky.tickyEnterThunk`, because the outer call to
`withNewTickyCounterThunk` didn't set the counter label for the thunk.
And rightly so! `tickyEnterThunk` should only account thunk enters to a
counter if `-ticky-dyn-thunk` is on.
This patch extracts the logic that was already present in its most
general form in `withNewTickyCounterThunk` into its own functions and
lets all other call sites checking for `-ticky-dyn-thunk` call this new
function named `thunkHasCounter` instead.
Reviewers: bgamari, simonmar
Reviewed By: simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5392
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After reverting Phab:D5358, Simon (Peyton Jones) asked for a Note
summarising why we want to keep the dead case binder check in `cgCase`.
Summary from mail conversation:
* Phab:D5324 means that we no longer /recompute/ dead-ness of case-binders in
STG-land
* But TidyPgm preserves dead-ness info (see CoreTidy.tidyIdBndr)
* And so we can take advantage of it to avoid a redundant load. This load
would be eliminated by CmmSink, but that only happens with -O
|
|
|
|
|
|
| |
This reverts commit d13b7d60650cb84af11ee15b3f51c3511548cfdb.
(See discussion in D5358)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This implements a selective lambda-lifting pass late in the STG
pipeline.
Lambda lifting has the effect of avoiding closure allocation at the cost
of having to make former free vars available at call sites, possibly
enlarging closures surrounding call sites in turn.
We identify beneficial cases by means of an analysis that estimates
closure growth.
There's a Wiki page at
https://ghc.haskell.org/trac/ghc/wiki/LateLamLift.
Reviewers: simonpj, bgamari, simonmar
Reviewed By: simonpj
Subscribers: rwbarton, carter
GHC Trac Issues: #9476
Differential Revision: https://phabricator.haskell.org/D5224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a fairly long-standing bug (dating back to 2015) in
RdrName.bestImport, namely
commit 9376249b6b78610db055a10d05f6592d6bbbea2f
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date: Wed Oct 28 17:16:55 2015 +0000
Fix unused-import stuff in a better way
In that patch got the sense of the comparison back to front, and
thereby failed to implement the unused-import rules described in
Note [Choosing the best import declaration] in RdrName
This led to Trac #13064 and #15393
Fixing this bug revealed a bunch of unused imports in libraries;
the ones in the GHC repo are part of this commit.
The two important changes are
* Fix the bug in bestImport
* Modified the rules by adding (a) in
Note [Choosing the best import declaration] in RdrName
Reason: the previosu rules made Trac #5211 go bad again. And
the new rule (a) makes sense to me.
In unravalling this I also ended up doing a few other things
* Refactor RnNames.ImportDeclUsage to use a [GlobalRdrElt] for the
things that are used, rather than [AvailInfo]. This is simpler
and more direct.
* Rename greParentName to greParent_maybe, to follow GHC
naming conventions
* Delete dead code RdrName.greUsedRdrName
Bumps a few submodules.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5312
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a previous patch we replaced some built-in literal constructors
(MachInt, MachWord, etc.) with a single LitNumber constructor.
In this patch we replace the `Mach` prefix of the remaining constructors
with `Lit` for consistency (e.g., LitChar, LitLabel, etc.).
Sadly the name `LitString` was already taken for a kind of FastString
and it would become misleading to have both `LitStr` (literal
constructor renamed after `MachStr`) and `LitString` (FastString
variant). Hence this patch renames the FastString variant `PtrString`
(which is more accurate) and the literal string constructor now uses the
least surprising `LitString` name.
Both `Literal` and `LitString/PtrString` have recently seen breaking
changes so doing this kind of renaming now shouldn't harm much.
Reviewers: hvr, goldfire, bgamari, simonmar, jrtc27, tdammers
Subscribers: tdammers, rwbarton, thomie, carter
Differential Revision: https://phabricator.haskell.org/D4881
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
D5339 (part of D5324) removed the dead case binder analysis done during
CoreToStg so this condition always holds now.
Test Plan: Validated locally.
Reviewers: sgraf, bgamari, simonmar
Subscribers: rwbarton, carter
Differential Revision: https://phabricator.haskell.org/D5358
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, `CoreToStg` annotates `StgRhsClosure`s with their set of non-global
free variables. This free variable information is only needed in the final
code generation step (i.e. `StgCmm.codeGen`), which leads to transformations
such as `StgCse` and `StgUnarise` having to maintain this information.
This is tiresome and unnecessary, so this patch introduces a trees-to-grow-like
approach that only introduces the free variable set into the syntax tree in the
code gen pass, along with a free variable analysis on STG terms to generate
that information.
Fixes #15754.
Reviewers: simonpj, osa1, bgamari, simonmar
Reviewed By: osa1
Subscribers: rwbarton, carter
GHC Trac Issues: #15754
Differential Revision: https://phabricator.haskell.org/D5324
|