| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machops MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Patch from Vincent Hanquez.
|
|
|
|
| |
Patch from Stephen Blackheath
|
|
|
|
|
|
|
|
|
| |
In OldCmm, the false case of a conditional was a fallthrough. In Cmm,
conditionals have both true and false successors. When we convert Cmm to LLVM,
we now first re-order Cmm blocks so that the false successor of a conditional
occurs next in the list of basic blocks, i.e., it is a fallthrough, just like it
(necessarily) did in OldCmm. Surprisingly, this can make a big performance
difference.
|
| |
|
|
|
|
|
|
|
| |
This patch adds support for 6 XMM registers on x86-64 which overlap with the F
and D registers and may hold 128-bit wide SIMD vectors. Because there is not a
good way to attach type information to STG registers, we aggressively bitcast in
the LLVM back-end.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch lays the groundwork needed for primop support for SIMD vectors. In
addition to the groundwork, we add support for the FloatX4# primitive type and
associated primops.
* Add the FloatX4# primitive type and associated primops.
* Add CodeGen support for Float vectors.
* Compile vector operations to LLVM vector operations in the LLVM code
generator.
* Make the x86 native backend fail gracefully when encountering vector primops.
* Only generate primop wrappers for vector primops when using LLVM.
|
|
|
|
|
| |
Vector values are now always passed on the stack. This isn't particularly
efficient, but it will have to do for now.
|
| |
|
|
|
|
| |
Signed-off-by: David Terei <davidterei@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug was introduced in the recent fix for #7571, that extended some
existing infastructure in the LLVM backend that handled the conflict
between LLVM's return type from comparison operations (i1) and what GHC
expects (word). By extending it to handle literals though, we forced all
literals to be i1 or word, breaking other code.
This patch resolves this breakage and handles #7571 still, cleaning up
the code for both a little. The overall approach is not ideal but
changing that is left for the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to be sure that when generating code for literals, we properly narrow
the type of the literal to i1. See Note [Literals and branch conditions] in the
LlvmCodeGen.CodeGen module.
This occurs rarely as the optimizer will remove conditional branches with
literals, however we can get this situation occurring with hand written Cmm
code.
This fixes Trac #7571.
Signed-off-by: David Terei <davidterei@gmail.com>
|
|
|
|
|
| |
Actual support is in progress but we will accept bugs against these
version. LLVM 3.2 seems in good shape at this point anyway.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the OldCmm data type and the CmmCvt pass that converts
new Cmm to OldCmm. The backends (NCGs, LLVM and C) have all been
converted to consume new Cmm.
The main difference between the two data types is that conditional
branches in new Cmm have both true/false successors, whereas in OldCmm
the false case was a fallthrough. To generate slightly better code we
occasionally need to invert a conditional to ensure that the
branch-not-taken becomes a fallthrough; this was previously done in
CmmCvt, and it is now done in CmmContFlowOpt.
We could go further and use the Hoopl Block representation for native
code, which would mean that we could use Hoopl's postorderDfs and
analyses for native code, but for now I've left it as is, using the
old ListGraph representation for native code.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now have accurate global register liveness information attached to all Cmm
procedures and jumps. With this patch, the LLVM back end uses this information
to pass only the live floating point (F and D) registers on tail calls. This
makes the LLVM back end compatible with the new register allocation strategy.
Ideally the GHC LLVM calling convention would put all registers that are always
live first in the parameter sequence. Unfortunately the specification is written
so that on x86-64 SpLim (always live) is passed after the R registers. Therefore
we must always pass *something* in the R registers, so we pass the LLVM value
undef.
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86-64.
On x86-64 F and D registers are both drawn from SSE registers, so there is no
reason not to draw them from the same pool of available SSE registers. This
means that whereas previously a function could only receive two Double arguments
in registers even if it did not have any Float arguments, now it can receive up
to 6 arguments that are any mix of Float and Double in registers.
This patch breaks the LLVM back end. The next patch will fix this breakage.
|
|
|
|
|
|
|
| |
All Cmm procedures now include the set of global registers that are live on
procedure entry, i.e., the global registers used to pass arguments to the
procedure. Only global registers that are use to pass arguments are included in
this list.
|
|
|
|
| |
Jumps now always have live register information attached, so drop Maybes.
|
|
|
|
|
| |
Except for CgUtils.fixStgRegisters that is used in the NCG and LLVM
backends, and should probably be moved somewhere else.
|
|
|
|
|
| |
Mostly d -> g (matching DynFlag -> GeneralFlag).
Also renamed if* to when*, matching the Haskell if/when names
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change here is that the Cmm parser now allows high-level cmm
code with argument-passing and function calls. For example:
foo ( gcptr a, bits32 b )
{
if (b > 0) {
// we can make tail calls passing arguments:
jump stg_ap_0_fast(a);
}
return (x,y);
}
More details on the new cmm syntax are in Note [Syntax of .cmm files]
in CmmParse.y.
The old syntax is still more-or-less supported for those occasional
code fragments that really need to explicitly manipulate the stack.
However there are a couple of differences: it is now obligatory to
give a list of live GlobalRegs on every jump, e.g.
jump %ENTRY_CODE(Sp(0)) [R1];
Again, more details in Note [Syntax of .cmm files].
I have rewritten most of the .cmm files in the RTS into the new
syntax, except for AutoApply.cmm which is generated by the genapply
program: this file could be generated in the new syntax instead and
would probably be better off for it, but I ran out of enthusiasm.
Some other changes in this batch:
- The PrimOp calling convention is gone, primops now use the ordinary
NativeNodeCall convention. This means that primops and "foreign
import prim" code must be written in high-level cmm, but they can
now take more than 10 arguments.
- CmmSink now does constant-folding (should fix #7219)
- .cmm files now go through the cmmPipeline, and as a result we
generate better code in many cases. All the object files generated
for the RTS .cmm files are now smaller. Performance should be
better too, but I haven't measured it yet.
- RET_DYN frames are removed from the RTS, lots of code goes away
- we now have some more canned GC points to cover unboxed-tuples with
2-4 pointers, which will reduce code size a little.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
I've switched to passing DynFlags rather than Platform, as (a) it's
simpler to not have to extract targetPlatform in so many places, and
(b) it may be useful to have DynFlags around in future.
|
|
|
|
|
|
| |
I changed the behaviour slightly, e.g. i386/FreeBSD will no longer
fall through and use the Linux "i386-pc-linux-gnu", but will get the
final empty case instead. I assume that that's the right thing to do.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This means that we now generate the same code whatever platform we are
on, which should help avoid changes on one platform breaking the build
on another.
It's also another step towards full cross-compilation.
|
|
|
|
|
|
| |
To explicitly choose whether you want an unregisterised build you now
need to use the "--enable-unregisterised"/"--disable-unregisterised"
configure flags.
|
|
|
|
|
|
|
|
|
| |
Proc-point splitting is only required by backends that do not support
having proc-points within a code block (that is, everything except the
native backend, i.e. LLVM and C).
Not doing proc-point splitting saves some compilation time, and might
produce slightly better code in some cases.
|
| |
|
|
|
|
| |
earlier did, so we avoid them.
|
| |
|
|
|
|
|
| |
In particular, this makes life simpler when we want to use a general
GHC SDoc in the middle of some LLVM.
|
|
|
|
|
|
|
|
| |
It allows you to do
(high, low) `quotRem` d
provided high < d.
Currently only has an inefficient fallback implementation.
|
| |
|
|
|
|
| |
Currently no NCGs support it
|
|
|
|
| |
No special-casing in any NCGs yet
|
|
|
|
| |
Only amd64 has an efficient implementation currently.
|
|
|
|
|
| |
This means we no longer do a division twice when we are using quotRem
(on platforms on which the op is supported; currently only amd64).
|
| |
|
|
|
|
| |
Signed-off-by: David Terei <davidterei@gmail.com>
|
| |
|
|
|
|
| |
is used for optimisation. (enabled by default)
|