| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
| |
This reverts commit 2f5db98e90cf0cff1a11971c85f108a7480528ed.
|
|
|
|
| |
This reverts commit 9026c77a07533bda3773c3c3f3df1c6592bc80c7.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling a function we can determine how much stack space it will
use. We therefore need to perform only a single stack check at the beginning
of a function to see if we have enough stack space. Instead of referring
directly to Sp - as we used to do in the past - the code generator uses
(old + 0) in the stack check. Stack layout phase turns (old + 0) into Sp.
The idea here is that, while we need to perform only one stack check for
each function, we could in theory place more stack checks later in the
function. They would be redundant, but not incorrect (in a sense that they
should not change program behaviour). We need to make sure however that a
stack check inserted after incrementing the stack pointer checks for a
respectively smaller stack space. This would not be the case if the code
generator produced direct references to Sp. By referencing (old + 0) we make
sure that we always check for a correct amount of stack: when converting
(old + 0) to Sp the stack layout phase takes into account changes already
made to stack pointer. The idea for this change came from observations made
while debugging #8275.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for several new primitive operations which
support using processor-specific instructions to help guide data and
cache locality decisions. We have levels ranging from [0..3]
For LLVM, we generate llvm.prefetch intrinsics at the proper locality
level (similar to GCC.)
For x86 we generate prefetch{NTA, t2, t1, t0} instructions. On SPARC and
PowerPC, the locality levels are ignored.
This closes #8256.
Authored-by: Carter Tazio Schonwald <carter.schonwald@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
| |
|
|
|
|
|
|
|
|
| |
dynamic flags.
SIMD vector instructions currently require the LLVM back-end. The set of
available instructions also depends on the set of architecture flags specified
on the command line.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
width and element type.
SIMD primops are now polymorphic in vector size and element type, but
only internally to the compiler. More specifically, utils/genprimopcode
has been extended so that it "knows" about SIMD vectors. This allows us
to, for example, write a single definition for the "add two vectors"
primop in primops.txt.pp and have it instantiated at many vector types.
This generates a primop in GHC.Prim for each vector type at which "add
two vectors" is instantiated, but only one data constructor for the
PrimOp data type, so the code generator is much, much simpler.
|
|
|
|
|
| |
It is off by default, which is meant to be a workaround for #8275.
Once #8275 is fixed we will enable this option by default.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have primops for copying ranges of bytes between ByteArray#s:
* ByteArray# -> MutableByteArray#
* MutableByteArray# -> MutableByteArray#
This extends it with three further cases:
* Addr# -> MutableByteArray#
* ByteArray# -> Addr#
* MutableByteArray# -> Addr#
One use case for these is copying between ForeignPtr-based
representations and in-heap arrays (like Text, UArray etc).
The implementation is essentially the same as for the existing
primops, and shares the memcpy stuff in the code generators.
Defficiencies / future directions: none of these primops (existing
or the new ones) let one take advantage of knowing that ByteArray#s
are word-aligned in memory. Though it is unclear that any of the
code generators would make use of this information unless the size
to copy is also known at compile time.
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
| |
Authored-by: David Luposchainsky <dluposchainsky@gmail.com>
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements loopification optimization. It was described
in "Low-level code optimisations in the Glasgow Haskell Compiler" by
Krzysztof Woś, but we use a different approach here. Krzysztof's
approach was to perform optimization as a Cmm-to-Cmm pass. Our
approach is to generate properly optimized tail calls in the code
generator, which saves us the trouble of processing Cmm. This idea
was proposed by Simon Marlow. Implementation details are explained
in Note [Self-recursive tail calls].
Performance of most nofib benchmarks is not affected. There are
some benchmarks that show 5-7% improvement, with an average improvement
of 2.6%. It would require some further investigation to check if this
is related to benchamrking noise or does this optimization really
help make some class of programs faster.
As a minor cleanup, this patch renames forkProc to forkLneBody.
It also moves some data declarations from StgCmmMonad to
StgCmmClosure, because they are needed there and it seems that
StgCmmClosure is on top of the whole StgCmm* hierarchy.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
388e14e2 unfortunately broke a subtle invariant in the code generator:
when generating code for an application, names may need to be
externalised, in case you're building against something external with
was built with -split-objs.
We were never externalising the ids of the applied functions. This means
if the libraries are split and we call into them, then the compiler
won't may not generate correct ids when making references to functions
in the library (causing linker failure).
I'm not entirely sure how this didn't break everything, but it certainly
caused several failures for a bunch of people. I had to fiddle with my
tree a little to make this occur.
This should fix #8166.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
| |
This comment is no loger true
|
|
|
|
| |
I missed that file yesterday when I was cleaning up codeGen/ directory.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previosly logic of these functions was sth like this:
cgIdApp x = case x of
A -> cgLneJump x
_ -> cgTailCall x
cgTailCall x = case x of
B -> ...
C -> ...
_ -> ...
After merging there is no nesting of cases:
cgIdApp x = case x of
A -> -- body of cgLneJump
B -> ...
C -> ...
_ -> ...
|
|
|
|
|
|
| |
This commit removes module StgCmmGran which has only no-op functions.
According to comments in the module, it was used by GpH, but GpH
project seems to be dead for a couple of years now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This cleanup includes:
* removing dead code. This includes forkStatics function,
which was in fact one big noop, and global bindings in
CgInfoDownwards,
* converting functions that used FCode monad only to
access DynFlags into functions that take DynFlags
as a parameter and don't work in a monad,
* addBindC function is now smarter. It extracts Id from
CgIdInfo passed to it in the same way addBindsC does.
Previously this was done at every call site, which was
redundant.
|
|
|
|
|
| |
A major cleanup of trailing whitespaces and tabs in codeGen/
directory. I also adjusted code formatting in some places.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies all comparison primops for Char#, Int#, Word#, Double#,
Float# and Addr# to return Int# instead of Bool. A value of 1# represents True
and 0# represents False. For a more detailed description of motivation for this
change, discussion of implementation details and benchmarking results please
visit the wiki page: http://hackage.haskell.org/trac/ghc/wiki/PrimBool
There's also some cleanup: whitespace fixes in files that were extensively edited
in this patch and constant folding rules for Integer div and mod operators (which
for some reason have been left out up till now).
|
|
|
|
|
|
|
| |
We weren't properly tracking the number of stack arguments in the
continuation of a foreign call. It happened to work when the
continuation was not a join point, but when it was a join point we
were using the wrong amount of stack fixup.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machop: MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Authored-by: Vincent Hanquez <tab@snarc.org>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In many places, 'splitUniqSupply' + 'uniqFromSupply' is used to split a
UniqSupply into a Unique and a new UniqSupply. In such places we should
instead use the more efficient and more appropriate
'takeUniqFromSupply' (or equivalent).
Not only is the former method slower, it also generates and throws away
an extra Unique.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
|
|
|
|
|
| |
A comment claimed that the ticky counters are unsigned longs, but
as far as I can see that isn't the case: They're already word-sized
values.
|
| |
|
|
|
|
| |
See Note [GC recovery]. To come: clean-up of StgCmmBind.cgRhs.
|
|
|
|
|
| |
This patch fixes profiling at the cost of losing cost centre accounting in a
very small number of cases. I am working on a better fix.
|
|
|
|
|
|
| |
Clang doesn't like whitespace between macro and arguments.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
|
| |
|
|
|
|
| |
This reverts commit 1c5b0511a89488f5280523569d45ee61c0d09ffa.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Exposes bSwap{,16,32,64}# primops
* Add a new machops MO_BSwap
* Use a Stg implementation (hs_bswap{16,32,64}) for other implementation
in NCG.
* Generate bswap in X86 NCG for 32 and 64 bits, and for 16 bits, bswap+shr
instead of using xchg.
* Generate llvm.bswap intrinsics in llvm codegen.
Patch from Vincent Hanquez.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This major patch implements the cardinality analysis described
in our paper "Higher order cardinality analysis". It is joint
work with Ilya Sergey and Dimitrios Vytiniotis.
The basic is augment the absence-analysis part of the demand
analyser so that it can tell when something is used
never
at most once
some other way
The "at most once" information is used
a) to enable transformations, and
in particular to identify one-shot lambdas
b) to allow updates on thunks to be omitted.
There are two new flags, mainly there so you can do performance
comparisons:
-fkill-absence stops GHC doing absence analysis at all
-fkill-one-shot stops GHC spotting one-shot lambdas
and single-entry thunks
The big changes are:
* The Demand type is substantially refactored. In particular
the UseDmd is factored as follows
data UseDmd
= UCall Count UseDmd
| UProd [MaybeUsed]
| UHead
| Used
data MaybeUsed = Abs | Use Count UseDmd
data Count = One | Many
Notice that UCall recurses straight to UseDmd, whereas
UProd goes via MaybeUsed.
The "Count" embodies the "at most once" or "many" idea.
* The demand analyser itself was refactored a lot
* The previously ad-hoc stuff in the occurrence analyser for foldr and
build goes away entirely. Before if we had build (\cn -> ...x... )
then the "\cn" was hackily made one-shot (by spotting 'build' as
special. That's essential to allow x to be inlined. Now the
occurrence analyser propagates info gotten from 'build's stricness
signature (so build isn't special); and that strictness sig is
in turn derived entirely automatically. Much nicer!
* The ticky stuff is improved to count single-entry thunks separately.
One shortcoming is that there is no DEBUG way to spot if an
allegedly-single-entry thunk is acually entered more than once. It
would not be hard to generate a bit of code to check for this, and it
would be reassuring. But it's fiddly and I have not done it.
Despite all this fuss, the performance numbers are rather under-whelming.
See the paper for more discussion.
nucleic2 -0.8% -10.9% 0.10 0.10 +0.0%
sphere -0.7% -1.5% 0.08 0.08 +0.0%
--------------------------------------------------------------------------------
Min -4.7% -10.9% -9.3% -9.3% -50.0%
Max -0.4% +0.5% +2.2% +2.3% +7.4%
Geometric Mean -0.8% -0.2% -1.3% -1.3% -1.8%
I don't quite know how much credence to place in the runtime changes,
but movement seems generally in the right direction.
|
| |
|