| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unwind information allows the debugger to discover more information
about a program state, by allowing it to "reconstruct" other states of
the program. In practice, this means that we explain to the debugger
how to unravel stack frames, which comes down mostly to explaining how
to find their Sp and Ip register values.
* We declare yet another new constructor for CmmNode - and this time
there's actually little choice, as unwind information can and will
change mid-block. We don't actually make use of these capabilities,
and back-end support would be tricky (generate new labels?), but it
feels like the right way to do it.
* Even though we only use it for Sp so far, we allow CmmUnwind to specify
unwind information for any register. This is pretty cheap and could
come in useful in future.
* We allow full CmmExpr expressions for specifying unwind values. The
advantage here is that we don't have to make up new syntax, and can e.g.
use the WDS macro directly. On the other hand, the back-end will now
have to simplify the expression until it can sensibly be converted
into DWARF byte code - a process which might fail, yielding NCG panics.
On the other hand, when you're writing Cmm by hand you really ought to
know what you're doing.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch solves the scoping problem of CmmTick nodes: If we just put
CmmTicks into blocks we have no idea what exactly they are meant to
cover. Here we introduce tick scopes, which allow us to create
sub-scopes and merged scopes easily.
Notes:
* Given that the code often passes Cmm around "head-less", we have to
make sure that its intended scope does not get lost. To keep the amount
of passing-around to a minimum we define a CmmAGraphScoped type synonym
here that just bundles the scope with a portion of Cmm to be assembled
later.
* We introduce new scopes at somewhat random places, aligning with
getCode calls. This works surprisingly well, but we might have to
add new scopes into the mix later on if we find things too be too
coarse-grained.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds CmmTick nodes to Cmm code. This is relatively
straight-forward, but also not very useful, as many blocks will simply
end up with no annotations whatosever.
Notes:
* We use this design over, say, putting ticks into the entry node of all
blocks, as it seems to work better alongside existing optimisations.
Now granted, the reason for this is that currently GHC's main Cmm
optimisations seem to mainly reorganize and merge code, so this might
change in the future.
* We have the Cmm parser generate a few source notes as well. This is
relatively easy to do - worst part is that it complicates the CmmParse
implementation a bit.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is basically just about continuing maintaining source notes after
the Core stage. Unfortunately, this is more involved as it might seem,
as there are more restrictions on where ticks are allowed to show up.
Notes:
* We replace the StgTick / StgSCC constructors with a unified StgTick
that can carry any tickish.
* For handling constructor or lambda applications, we generally float
ticks out.
* Note that thanks to the NonLam placement, we know that source notes
can never appear on lambdas. This means that as long as we are
careful to always use mkTick, we will never violate CorePrep
invariants.
* This is however not automatically true for eta expansion, which
needs to somewhat awkwardly strip, then re-tick the expression in
question.
* Where CorePrep floats out lets, we make sure to wrap them in the
same spirit as FloatOut.
* Detecting selector thunks becomes a bit more involved, as we can run
into ticks at multiple points.
(From Phabricator D169)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current primops for prefetching do not properly work in pure code;
namely, the primops are not 'hoisted' into the correct call sites based
on when arguments are evaluated. Instead, they should use a `seq`-like
interface, which will cause it to be evaluated when the needed term is.
See #9353 for the full discussion.
Test Plan: updated tests for pure prefetch in T8256 to reflect the design changes in #9353
Reviewers: simonmar, hvr, ekmett, austin
Reviewed By: ekmett, austin
Subscribers: merijn, thomie, carter, simonmar
Differential Revision: https://phabricator.haskell.org/D350
GHC Trac Issues: #9353
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417.
New changes: now works on 32-bit platforms too. I added some basic
support for 64-bit subtraction and comparison operations to the x86
NCG.
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit b23ba2a7d612c6b466521399b33fe9aacf5c4f75.
Conflicts:
compiler/cmm/PprCmmDecl.hs
compiler/nativeGen/PPC/Ppr.hs
compiler/nativeGen/SPARC/Ppr.hs
compiler/nativeGen/X86/Ppr.hs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The primary reason for doing this is assisting debuggability:
if static closures are all in the same section, they are
guaranteed to be adjacent to one another. This will help
later when we add some code that takes section start/end and
uses this to sanity-check the sections.
Part of remove HEAP_ALLOCED patch set (#8199)
Signed-off-by: Edward Z. Yang <ezyang@mit.edu>
Test Plan: validate
Reviewers: simonmar, austin
Subscribers: simonmar, ezyang, carter, thomie
Differential Revision: https://phabricator.haskell.org/D263
GHC Trac Issues: #8199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This includes pretty much all the changes needed to make `Applicative`
a superclass of `Monad` finally. There's mostly reshuffling in the
interests of avoid orphans and boot files, but luckily we can resolve
all of them, pretty much. The only catch was that
Alternative/MonadPlus also had to go into Prelude to avoid this.
As a result, we must update the hsc2hs and haddock submodules.
Signed-off-by: Austin Seipp <austin@well-typed.com>
Test Plan: Build things, they might not explode horribly.
Reviewers: hvr, simonmar
Subscribers: simonmar
Differential Revision: https://phabricator.haskell.org/D13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These MachOps are used by addIntC# and subIntC#, which in turn are
used in integer-gmp when adding or subtracting small Integers. The
following benchmark shows a ~6% speedup after this commit on x86_64
(building GHC with BuildFlavour=perf).
{-# LANGUAGE MagicHash #-}
import GHC.Exts
import Criterion.Main
count :: Int -> Integer
count (I# n#) = go n# 0
where go :: Int# -> Integer -> Integer
go 0# acc = acc
go n# acc = go (n# -# 1#) $! acc + 1
main = defaultMain [bgroup "count"
[bench "100" $ whnf count 100]]
Differential Revision: https://phabricator.haskell.org/D140
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the new primops
clz#, clz32#, clz64#,
ctz#, ctz32#, ctz64#
which provide efficient implementations of the popular
count-leading-zero and count-trailing-zero respectively
(see testcase for a pure Haskell reference implementation).
On x86, NCG as well as LLVM generates code based on the BSF/BSR
instructions (which need extra logic to make the 0-case well-defined).
Test Plan: validate and succesful tests on i686 and amd64
Reviewers: rwbarton, simonmar, ezyang, austin
Subscribers: simonmar, relrod, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D144
GHC Trac Issues: #9340
|
|
|
|
|
| |
We use fixed size signed types to e.g. represent array sizes. This
means that the size can overflow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two overflow issues in shouldInlinePrimOp. The first one is
due to a negative CmmInt literal being created if the array size was
given as larger than 2^63-1 (on a 64-bit platform.) This meant that
large array sizes could compare as being smaller than
maxInlineAllocSize.
The second issue is that we casted the Integer to an Int in the
comparison, which again meant that large array sizes could compare as
being smaller than maxInlineAllocSize.
The attempt to allocate a large array inline then caused a segfault.
Fixes #9416.
|
|
|
|
|
|
|
|
|
| |
... in preparation for backend-specific implementations.
No functional changes in this commit (except in panic messages
for ill-formed Cmm).
Differential Revision: https://phabricator.haskell.org/D138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, both Cabal and GHC defined the type PackageId, and we expected
them to be roughly equivalent (but represented differently). This refactoring
separates these two notions.
A package ID is a user-visible identifier; it's the thing you write in a
Cabal file, e.g. containers-0.9. The components of this ID are semantically
meaningful, and decompose into a package name and a package vrsion.
A package key is an opaque identifier used by GHC to generate linking symbols.
Presently, it just consists of a package name and a package version, but
pursuant to #9265 we are planning to extend it to record other information.
Within a single executable, it uniquely identifies a package. It is *not* an
InstalledPackageId, as the choice of a package key affects the ABI of a package
(whereas an InstalledPackageId is computed after compilation.) Cabal computes
a package key for the package and passes it to GHC using -package-name (now
*extremely* misnamed).
As an added bonus, we don't have to worry about shadowing anymore.
As a follow on, we should introduce -current-package-key having the same role as
-package-name, and deprecate the old flag. This commit is just renaming.
The haddock submodule needed to be updated.
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
Test Plan: validate
Reviewers: simonpj, simonmar, hvr, austin
Subscribers: simonmar, relrod, carter
Differential Revision: https://phabricator.haskell.org/D79
Conflicts:
compiler/main/HscTypes.lhs
compiler/main/Packages.lhs
utils/haddock
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second attempt to add this functionality. The first
attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due
to register allocator failure on x86. Given how the register
allocator currently works, we don't have enough registers on x86 to
support cmpxchg using complicated addressing modes. Instead we fall
back to a simpler addressing mode on x86.
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
|
|
|
|
|
|
|
| |
This commit caused the register allocator to fail on i386.
This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and
04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to
the first).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add more primops for atomic ops on byte arrays
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, the layout of the LANGUAGE/OPTIONS_GHC lines has been
reorganized, while following the convention, to
- place `{-# LANGUAGE #-}` pragmas at the top of the source file, before
any `{-# OPTIONS_GHC #-}`-lines.
- Moreover, if the list of language extensions fit into a single
`{-# LANGUAGE ... -#}`-line (shorter than 80 characters), keep it on one
line. Otherwise split into `{-# LANGUAGE ... -#}`-lines for each
individual language extension. In both cases, try to keep the
enumeration alphabetically ordered.
(The latter layout is preferable as it's more diff-friendly)
While at it, this also replaces obsolete `{-# OPTIONS ... #-}` pragma
occurences by `{-# OPTIONS_GHC ... #-}` pragmas.
|
|
|
|
|
|
|
|
| |
Problems were found on 32-bit platforms, I'll commit again when I have a fix.
This reverts the following commits:
54b31f744848da872c7c6366dea840748e01b5cf
b0534f78a73f972e279eed4447a5687bd6a8308e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracks the amount of memory allocation by each thread in a
counter stored in the TSO. Optionally, when the counter drops below
zero (it counts down), the thread can be sent an asynchronous
exception: AllocationLimitExceeded. When this happens, given a small
additional limit so that it can handle the exception. See
documentation in GHC.Conc for more details.
Allocation limits are similar to timeouts, but
- timeouts use real time, not CPU time. Allocation limits do not
count anything while the thread is blocked or in foreign code.
- timeouts don't re-trigger if the thread catches the exception,
allocation limits do.
- timeouts can catch non-allocating loops, if you use
-fno-omit-yields. This doesn't work for allocation limits.
I couldn't measure any impact on benchmarks with these changes, even
for nofib/smp.
|
|
|
|
|
| |
If the number of elements being copied is known statically this might
lead to the copy loop being unrolled in the backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These array types are smaller than Array# and MutableArray# and are
faster when the array size is small, as they don't have the overhead
of a card table. Having no card table reduces the closure size with 2
words in the typical small array case and leads to less work when
updating or GC:ing the array.
Reduces both the runtime and memory allocation by 8.8% on my insert
benchmark for the HashMap type in the unordered-containers package,
which makes use of lots of small arrays. With tuned GC settings
(i.e. `+RTS -A6M`) the runtime reduction is 15%.
Fixes #8923.
|
|
|
|
|
|
| |
This should reduce code size when there's little to gain from inlining
these primops, while still retaining the inlining benefit when the
size of the copy is known statically.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inline allocation version is 69% faster than the out-of-line
version, when cloning an array of 16 unit elements on a 64-bit
machine.
Comparing the new and the old primop implementations isn't
straightforward. The old version had a missing heap check that I
discovered during the development of the new version. Comparing the
old and the new version would requiring fixing the old version, which
in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
The inline allocation threshold is configurable via
-fmax-inline-alloc-size which gives the maximum array size, in bytes,
to allocate inline. The size does not include the closure header size.
Allowing the same primop to be either inline or out-of-line has some
implication for how we lay out heap checks. We always place a heap
check around out-of-line primops, as they may allocate outside of our
knowledge. However, for the inline primops we only allow allocation
via the standard means (i.e. virtHp). Since the clone primops might be
either inline or out-of-line the heap check layout code now consults
shouldInlinePrimOp to know whether a primop will be inlined.
|
|
|
|
|
|
|
| |
This results in a 57% runtime decrease when allocating an array of 128
bytes on a 64-bit machine.
Fixes #8876.
|
|
|
|
|
|
|
| |
Documentation in response to Johan's questions
Plus, don't export hpRel from StgCmmHeap, StgCmmLayout
(it is only used locally in StgCmmLayout)
|
|
|
|
|
| |
Also make sure allocHeapClosure updates profiling counters with the
memory allocated.
|
|
|
|
|
|
|
|
|
|
| |
- Move array representation knowledge into SMRep
- Separate out low-level heap-object allocation so that we can reuse
it from doNewArrayOp
- remove card-table initialisation, we can safely ignore the card
table for newly allocated arrays.
|
|
|
|
|
| |
I'd like to be able to pack together non-pointer fields that are less
than a word in size, and this is a necessary prerequisite.
|
|
|
|
|
|
|
|
|
|
| |
This results in a 46% runtime decrease when allocating an array of 16
unit elements on a 64-bit machine.
In order to allow newArray# to have both an inline and an out-of-line
implementation, cgOpApp is refactored slightly. The new implementation
of cgOpApp should make it easier to add other primops with both inline
and out-of-line implementations in the future.
|
|
|
|
|
|
|
| |
To evaluate most non-updatable thunks, we can jump directly to the
entry code if we know what it is. But not for a selector thunk: these
might be updated by the garbage collector, so we have to enter the
closure with an indirect jump through its info pointer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #8585
When emmiting label of a self-recursive tail call (ie. when
performing loopification optimization) we emit the loop header
label after a stack check but before the heap check. The reason is
that tail-recursive functions use constant amount of stack space
so we don't need to repeat the check in every loop. But they can
grow the heap so heap check must be repeated in every call.
See Note [Self-recursive tail calls] and [Self-recursive loop header].
|
|
|
|
| |
After commit 55c703b8fdb0, this code is no longer used anywhere.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
We now do the allocation of the blackhole indirection closure inside the
RTS procedure 'newCAF' instead of generating the allocation code inline
in the closure body of each CAF. This slightly decreases code size in
modules with a lot of CAFs.
As a result of this change, for example, the size of DynFlags.o drops by
~60KB and HsExpr.o by ~100KB.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This is just a modest refactoring
|
| |
|