| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't export `getUs` and `getUniqueUs`. `UniqSM` has a `MonadUnique` instance:
instance MonadUnique UniqSM where
getUniqueSupplyM = getUs
getUniqueM = getUniqueUs
getUniquesM = getUniquesUs
Commandline-fu used:
git grep -l 'getUs\>' |
grep -v compiler/basicTypes/UniqSupply.lhs |
xargs sed -i 's/getUs/getUniqueSupplyM/g
git grep -l 'getUniqueUs\>' |
grep -v combiler/basicTypes/UniqSupply.lhs |
xargs sed -i 's/getUniqueUs/getUniqueM/g'
Follow up on b522d3a3f970a043397a0d6556ca555648e7a9c3
Reviewed By: austin, hvr
Differential Revision: https://phabricator.haskell.org/D220
|
|
|
|
|
|
| |
...some files more or less recently touched by me
[ci skip]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These MachOps are used by addIntC# and subIntC#, which in turn are
used in integer-gmp when adding or subtracting small Integers. The
following benchmark shows a ~6% speedup after this commit on x86_64
(building GHC with BuildFlavour=perf).
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import Criterion.Main
count :: Int -> Integer
count (I# n#) = go n# 0
where go :: Int# -> Integer -> Integer
go 0# acc = acc
go n# acc = go (n# -# 1#) $! acc + 1
main = defaultMain [bgroup "count"
[bench "100" $ whnf count 100]]
Differential Revision: https://phabricator.haskell.org/D140
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the new primops
clz#, clz32#, clz64#,
ctz#, ctz32#, ctz64#
which provide efficient implementations of the popular
count-leading-zero and count-trailing-zero respectively
(see testcase for a pure Haskell reference implementation).
On x86, NCG as well as LLVM generates code based on the BSF/BSR
instructions (which need extra logic to make the 0-case well-defined).
Test Plan: validate and succesful tests on i686 and amd64
Reviewers: rwbarton, simonmar, ezyang, austin
Subscribers: simonmar, relrod, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D144
GHC Trac Issues: #9340
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second attempt to add this functionality. The first
attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due
to register allocator failure on x86. Given how the register
allocator currently works, we don't have enough registers on x86 to
support cmpxchg using complicated addressing modes. Instead we fall
back to a simpler addressing mode on x86.
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
|
|
|
|
|
|
|
| |
This commit caused the register allocator to fail on i386.
This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and
04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to
the first).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add more primops for atomic ops on byte arrays
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
|
|
|
|
|
|
| |
This cleanup allows the following refactoring commit to avoid adding a
few `{-# LANGUAGE NondecreasingIndentation #-}` pragmas.
Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for several new primitive operations which
support using processor-specific instructions to help guide data and
cache locality decisions. We have levels ranging from [0..3]
For LLVM, we generate llvm.prefetch intrinsics at the proper locality
level (similar to GCC.)
For x86 we generate prefetch{NTA, t2, t1, t0} instructions. On SPARC and
PowerPC, the locality levels are ignored.
This closes #8256.
Authored-by: Carter Tazio Schonwald <carter.schonwald@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width and element type.
SIMD primops are now polymorphic in vector size and element type, but
only internally to the compiler. More specifically, utils/genprimopcode
has been extended so that it "knows" about SIMD vectors. This allows us
to, for example, write a single definition for the "add two vectors"
primop in primops.txt.pp and have it instantiated at many vector types.
This generates a primop in GHC.Prim for each vector type at which "add
two vectors" is instantiated, but only one data constructor for the
PrimOp data type, so the code generator is much, much simpler.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch encompasses most of the basic infrastructure for GHCJS. It
includes:
* A new extension, -XJavaScriptFFI
* A new architecture, ArchJavaScript
* Parser and lexer support for 'foreign import javascript', only
available under -XJavaScriptFFI, using ArchJavaScript.
* As a knock-on, there is also a new 'WayCustom' constructor in
DynFlags, so clients of the GHC API can add custom 'tags' to their
built files. This should be useful for other users as well.
The remaining changes are really just the resulting fallout, making sure
all the cases are handled appropriately for DynFlags and Platform.
Authored-by: Luite Stegeman <stegeman@gmail.com>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machop: MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Authored-by: Vincent Hanquez <tab@snarc.org>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
| |
Seems the last parameter to llvm.prefectch was added in LLVM 3.0.
Signed-off-by: David Terei <davidterei@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This combined patch reworks the LLVM backend in a number of ways:
1. Most prominently, we introduce a LlvmM monad carrying the contents of
the old LlvmEnv around. This patch completely removes LlvmEnv and
refactors towards standard library monad combinators wherever possible.
2. Support for streaming - we can now generate chunks of Llvm for Cmm as
it comes in. This might improve our speed.
3. To allow streaming, we need a more flexible way to handle forward
references. The solution (getGlobalPtr) unifies LlvmCodeGen.Data
and getHsFunc as well.
4. Skip alloca-allocation for registers that are actually never written.
LLVM will automatically eliminate these, but output is smaller and
friendlier to human eyes this way.
5. We use LlvmM to collect references for llvm.used. This allows places
other than cmmProcLlvmGens to generate entries.
|
|
|
|
|
| |
Also give them a proper constructor - getGlobalVar and getGlobalValue
map directly to the accessors.
|
|
|
|
|
|
|
| |
This patch reworks some parts of the LLVM pretty-printing code that were
still using Show and String. Now we should be using SDoc and Outputable
throughout. Note that many get*Name functions become pp*Name
here as a side-effect.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- MetaArgs is not needed, as variables are already meta data
- Same goes for MetaVal - its only reason for existing seems to be to
support LLVM's strange pretty-printing for meta-data annotations, and
I feel that is better to keep the data structure clean and handle it in
the pretty-printing instead.
- Rename "MetaData" to "MetaAnnot". Meta-data is still meta-data when it
is not associated with an expression or statement - for example compile
unit data for debugging. I feel the old name was a bit misleading.
- Make the renamed MetaAnnot a proper data type instead of a type alias
for a pair.
- Rename "MetaExpr" constructor to "MetaStruct". As the data is much more
like a LLVM structure (not array, as it can contain values).
- Fix a warning
|
| |
|
| |
|
|
|
|
| |
This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machops MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Patch from Vincent Hanquez.
|
|
|
|
|
|
|
|
|
| |
In OldCmm, the false case of a conditional was a fallthrough. In Cmm,
conditionals have both true and false successors. When we convert Cmm to LLVM,
we now first re-order Cmm blocks so that the false successor of a conditional
occurs next in the list of basic blocks, i.e., it is a fallthrough, just like it
(necessarily) did in OldCmm. Surprisingly, this can make a big performance
difference.
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for 6 XMM registers on x86-64 which overlap with the F
and D registers and may hold 128-bit wide SIMD vectors. Because there is not a
good way to attach type information to STG registers, we aggressively bitcast in
the LLVM back-end.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lays the groundwork needed for primop support for SIMD vectors. In
addition to the groundwork, we add support for the FloatX4# primitive type and
associated primops.
* Add the FloatX4# primitive type and associated primops.
* Add CodeGen support for Float vectors.
* Compile vector operations to LLVM vector operations in the LLVM code
generator.
* Make the x86 native backend fail gracefully when encountering vector primops.
* Only generate primop wrappers for vector primops when using LLVM.
|
|
|
|
|
| |
Vector values are now always passed on the stack. This isn't particularly
efficient, but it will have to do for now.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug was introduced in the recent fix for #7571, that extended some
existing infastructure in the LLVM backend that handled the conflict
between LLVM's return type from comparison operations (i1) and what GHC
expects (word). By extending it to handle literals though, we forced all
literals to be i1 or word, breaking other code.
This patch resolves this breakage and handles #7571 still, cleaning up
the code for both a little. The overall approach is not ideal but
changing that is left for the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to be sure that when generating code for literals, we properly narrow
the type of the literal to i1. See Note [Literals and branch conditions] in the
LlvmCodeGen.CodeGen module.
This occurs rarely as the optimizer will remove conditional branches with
literals, however we can get this situation occurring with hand written Cmm
code.
This fixes Trac #7571.
Signed-off-by: David Terei <davidterei@gmail.com>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the OldCmm data type and the CmmCvt pass that converts
new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been
converted to consume new Cmm.
The main difference between the two data types is that conditional
branches in new Cmm have both true/false successors, whereas in OldCmm
the false case was a fallthrough. To generate slightly better code we
occasionally need to invert a conditional to ensure that the
branch-not-taken becomes a fallthrough; this was previously done in
CmmCvt, and it is now done in CmmContFlowOpt.
We could go further and use the Hoopl Block representation for native
code, which would mean that we could use Hoopl's postorderDfs and
analyses for native code, but for now I've left it as is, using the
old ListGraph representation for native code.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now have accurate global register liveness information attached to all Cmm
procedures and jumps. With this patch, the LLVM back end uses this information
to pass only the live floating point (F and D) registers on tail calls. This
makes the LLVM back end compatible with the new register allocation strategy.
Ideally the GHC LLVM calling convention would put all registers that are always
live first in the parameter sequence. Unfortunately the specification is written
so that on x86-64 SpLim (always live) is passed after the R registers. Therefore
we must always pass *something* in the R registers, so we pass the LLVM value
undef.
|
|
|
|
|
|
|
| |
All Cmm procedures now include the set of global registers that are live on
procedure entry, i.e., the global registers used to pass arguments to the
procedure. Only global registers that are use to pass arguments are included in
this list.
|
|
|
|
| |
Jumps now always have live register information attached, so drop Maybes.
|
|
|
|
|
| |
Except for CgUtils.fixStgRegisters that is used in the NCG and LLVM
backends, and should probably be moved somewhere else.
|
|
|
|
|
| |
Mostly d -> g (matching DynFlag -> GeneralFlag).
Also renamed if* to when*, matching the Haskell if/when names
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
| |
|
| |
|