| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements two related changes, both inspired by
the discussion on Trac #13027, comment:23:
* exprOkForSpeculation (op# a1 .. an), where op# is a primop,
now skips over arguments ai of lifted type. See the comments
at Note [Primops with lifted arguments] in CoreUtils.
There is no need to treat dataToTag# specially any more.
* dataToTag# is now treated as a can-fail primop. See
Note [dataToTag#] in primops.txt.pp
I don't expect this to have a visible effect on anything, but
it's much more solid than before.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit makes various improvements and addresses some issues with
Compact Regions (aka Compact Normal Forms).
This was the most important thing I wanted to fix. Compaction
previously prevented GC from running until it was complete, which
would be a problem in a multicore setting. Now, we compact using a
hand-written Cmm routine that can be interrupted at any point. When a
GC is triggered during a sharing-enabled compaction, the GC has to
traverse and update the hash table, so this hash table is now stored
in the StgCompactNFData object.
Previously, compaction consisted of a deepseq using the NFData class,
followed by a traversal in C code to copy the data. This is now done
in a single pass with hand-written Cmm (see rts/Compact.cmm). We no
longer use the NFData instances, instead the Cmm routine evaluates
components directly as it compacts.
The new compaction is about 50% faster than the old one with no
sharing, and a little faster on average with sharing (the cost of the
hash table dominates when we're doing sharing).
Static objects that don't (transitively) refer to any CAFs don't need
to be copied into the compact region. In particular this means we
often avoid copying Char values and small Int values, because these
are static closures in the runtime.
Each Compact# object can support a single compactAdd# operation at any
given time, so the Data.Compact library now enforces mutual exclusion
using an MVar stored in the Compact object.
We now get exceptions rather than killing everything with a barf()
when we encounter an object that cannot be compacted (a function, or a
mutable object). We now also detect pinned objects, which can't be
compacted either.
The Data.Compact API has been refactored and cleaned up. A new
compactSize operation returns the size (in bytes) of the compact
object.
Most of the documentation is in the Haddock docs for the compact
library, which I've expanded and improved here.
Various comments in the code have been improved, especially the main
Note [Compact Normal Forms] in rts/sm/CNF.c.
I've added a few tests, and expanded a few of the tests that were
there. We now also run the tests with GHCi, and in a new test way
that enables sanity checking (+RTS -DS).
There's a benchmark in libraries/compact/tests/compact_bench.hs for
measuring compaction speed and comparing sharing vs. no sharing.
The field totalDataW in StgCompactNFData was unnecessary.
Test Plan:
* new unit tests
* validate
* tested manually that we can compact Data.Aeson data
Reviewers: gcampax, bgamari, ezyang, austin, niteria, hvr, erikd
Subscribers: thomie, simonpj
Differential Revision: https://phabricator.haskell.org/D2751
GHC Trac Issues: #12455
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements #5615 for divInt# and modInt#.
I also included rules to do constant-folding when the both arguments
are known.
Test Plan: validate
Reviewers: austin, simonmar, bgamari
Reviewed By: bgamari
Subscribers: hvr, thomie
Differential Revision: https://phabricator.haskell.org/D2486
GHC Trac Issues: #5615
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings in initial support for compact regions, as described in the
ICFP 2015 paper "Efficient Communication and Collection with Compact
Normal Forms" (Edward Z. Yang et.al.) and implemented by Giovanni
Campagna.
Some things may change before the 8.2 release, but I (Simon M.) wanted
to get the main patch committed so that we can iterate.
What documentation there is is in the Data.Compact module in the new
compact package. We'll need to extend and polish the documentation
before the release.
Test Plan:
validate
(new test cases included)
Reviewers: ezyang, simonmar, hvr, bgamari, austin
Subscribers: vikraman, Yuras, RyanGlScott, qnikst, mboes, facundominguez, rrnewton, thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1264
GHC Trac Issues: #11493
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: simonmar, duncan, erikd, austin
Reviewed By: austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2290
GHC Trac Issues: #12059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a primitive operation to determine whether a particular
`MutableByteArray#` is backed by a pinned buffer.
Test Plan: Validate with included testcase
Reviewers: austin, simonmar
Reviewed By: austin, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2217
GHC Trac Issues: #12059
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This turns `Any` into a standard wired-in type family defined in
`GHC.Types`, instead its current incarnation as a magical creature
provided by the `GHC.Prim`. Also kill `AnyK`.
See #10886.
Test Plan: Validate
Reviewers: simonpj, goldfire, austin, hvr
Reviewed By: simonpj
Subscribers: goldfire, thomie
Differential Revision: https://phabricator.haskell.org/D2049
GHC Trac Issues: #10886
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As Trac #11222, and #10712 note, the strictness analyser
needs to be rather careful about exceptions. Previously
it treated them as identical to divergence, but that
won't quite do.
See Note [Exceptions and strictness] in Demand, which
explains the deal.
Getting more strictness in 'catch' and friends is a
very good thing. Here is the nofib summary, keeping
only the big ones.
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
fasta -0.1% -6.9% -3.0% -3.0% +0.0%
hpg -0.1% -2.0% -6.2% -6.2% +0.0%
maillist -0.1% -0.3% 0.08 0.09 +1.2%
reverse-complem -0.1% -10.9% -6.0% -5.9% +0.0%
sphere -0.1% -4.3% 0.08 0.08 +0.0%
x2n1 -0.1% -0.0% 0.00 0.00 +0.0%
--------------------------------------------------------------------------------
Min -0.2% -10.9% -17.4% -17.3% +0.0%
Max -0.0% +0.0% +4.3% +4.4% +1.2%
Geometric Mean -0.1% -0.3% -2.9% -3.0% +0.0%
On the way I did quite a bit of refactoring in Demand.hs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Breakpoints become SCCs, so we have detailed call-stack info for
interpreted code. Currently this only works when GHC is compiled with
-prof, but D1562 (Remote GHCi) removes this constraint so that in the
future call stacks will be available without building your own GHCi.
How can you get a stack trace?
* programmatically: GHC.Stack.currentCallStack
* I've added an experimental :where command that shows the stack when
stopped at a breakpoint
* `error` attaches a call stack automatically, although since calls to
`error` are often lifted out to the top level, this is less useful
than it might be (ImplicitParams still works though).
* Later we might attach call stacks to all exceptions
Other related changes in this diff:
* I reduced the number of places that get ticks attached for
breakpoints. In particular there was a breakpoint around the whole
declaration, which was often redundant because it bound no variables.
This reduces clutter in the stack traces and speeds up compilation.
* I tidied up some RealSrcSpan stuff in InteractiveUI, and made a few
other small cleanups
Test Plan: validate
Reviewers: ezyang, bgamari, austin, hvr
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1595
GHC Trac Issues: #11047
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is something very peculiar about the `catch` family of operations
with respect to strictness analysis: they turn divergence into
non-divergence. For this reason, it isn't safe to mark them as strict in
the expression whose exceptions they are catching. The reason is this:
Consider,
let r = \st -> raiseIO# blah st
in catch (\st -> ...(r st)..) handler st
If we give the first argument of catch a strict signature, we'll get a
demand 'C(S)' for 'r'; that is, 'r' is definitely called with one
argument, which indeed it is. The trouble comes when we feed 'C(S)' into
'r's RHS as the demand of the body as this will lead us to conclude that
the whole 'let' will diverge; clearly this isn't right.
This is essentially the problem in #10712, which arose when
7c0fff41789669450b02dc1db7f5d7babba5dee6 marked the `catch*` primops as
being strict in the thing to be evaluated. Here I've partially reverted
this commit, again marking the first argument of these primops as lazy.
Fixes #10712.
Test Plan: Validate checking `exceptionsrun001`
Reviewers: simonpj, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1616
GHC Trac Issues: #10712, #11222
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a subWordC# primop which implements subtraction with overflow
reporting.
Reviewers: tibbe, goldfire, rwbarton, bgamari, austin, hvr
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1334
GHC Trac Issues: #10962
|
|
|
|
|
|
|
|
|
|
| |
To quote Simon Marlow,
We don't expect users to ever write code that uses mkWeak# or
finalizeWeak#, we have safe interfaces to these. Let's document the type
unsafety and fix the problem with () without introducing any overhead.
Updates stm submodule.
|
|
|
|
|
|
|
| |
Previously the types needlessly used (), which is defined ghc-prim,
leading to unfortunate import cycles. See #10867 for details.
Updates stm submodule.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now since ByteArrays are mutable we need to be more explicit about when
the size is queried.
Test Plan: Add testcase and validate
Reviewers: goldfire, hvr, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1139
GHC Trac Issues: #9447
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: austin, bgamari, simonpj
Reviewed By: simonpj
Subscribers: simonpj, thomie
Differential Revision: https://phabricator.haskell.org/D1116
GHC Trac Issues: #10481
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In "Improve strictness analysis for exceptions"
commit 7c0fff41789669450b02dc1db7f5d7babba5dee6
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date: Tue Jul 21 12:28:42 2015 +0100
I made catch# strict in its first argument. But today I found
a very old comment suggesting the opposite. I disagree with the
old comment, but I've elaborated the Note, which I reproduce here:
{- Note [Strictness for mask/unmask/catch]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider this example, which comes from GHC.IO.Handle.Internals:
wantReadableHandle3 f ma b st
= case ... of
DEFAULT -> case ma of MVar a -> ...
0# -> maskAsynchExceptions#
(\st -> case ma of MVar a -> ...)
The outer case just decides whether to mask exceptions, but we
don't want thereby to hide the strictness in 'ma'! Hence the use
of strictApply1Dmd.
For catch, we know that the first branch will be evaluated, but
not necessarily the second. Hence strictApply1Dmd and
lazyApply1Dmd
Howver, consider
catch# (\st -> case x of ...) (..handler..) st
We'll see that the entire thing is strict in 'x', so 'x' may be
evaluated before the catch#. So fi evaluting 'x' causes a
divide-by-zero exception, it won't be caught. This seems
acceptable:
- x might be evaluated somewhere else outside the catch# anyway
- It's an imprecise eception anyway. Synchronous exceptions (in
the IO monad) will never move in this way.
There was originally a comment
"Catch is actually strict in its first argument
but we don't want to tell the strictness
analyser about that, so that exceptions stay inside it."
but tracing it back through the commit logs did not give any
rationale. And making catch# lazy has performance costs for
everyone.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two things here:
* For exceptions-catching primops like catch#, we know
that the main argument function will be called, so
we can use strictApply1Dmd, rather than lazy
Changes in primops.txt.pp
* When a 'case' scrutinises a I/O-performing primop,
the Note [IO hack in the demand analyser] was
throwing away all strictness from the code that
followed.
I found that this was causing quite a bit of unnecessary
reboxing in the (heavily used) function
GHC.IO.Handle.Internals.wantReadableHandle
So this patch prevents the hack applying when the
case scrutinises a primop. See the revised
Note [IO hack in the demand analyser]
Thse two things buy us quite a lot in programs that do a lot of IO.
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
hpg -0.4% -2.9% -0.9% -1.0% +0.0%
reverse-complem -0.4% -10.9% +10.7% +10.9% +0.0%
simple -0.3% -0.0% +26.2% +26.2% +3.7%
sphere -0.3% -6.3% 0.09 0.09 +0.0%
--------------------------------------------------------------------------------
Min -0.7% -10.9% -4.6% -4.7% -1.7%
Max -0.2% +0.0% +26.2% +26.2% +6.5%
Geometric Mean -0.4% -0.3% +2.1% +2.1% +0.1%
I think the increase in runtime for 'simple' is measurement error.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: none
Reviewers: simonmar, austin, hvr
Subscribers: hvr, thomie
Differential Revision: https://phabricator.haskell.org/D1082
GHC Trac Issues: #10640
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An attempt to use these resulted in an error like:
[1 of 1] Compiling Main ( p.hs, p.o )
ghc: panic! (the 'impossible' happened)
(GHC version 7.8.4 for x86_64-unknown-linux):
emitPrimOp: can't translate PrimOp parAt#{v}
Test Plan: validate
Reviewers: thomie, austin
Reviewed By: thomie, austin
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D758
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current primops for prefetching do not properly work in pure code;
namely, the primops are not 'hoisted' into the correct call sites based
on when arguments are evaluated. Instead, they should use a `seq`-like
interface, which will cause it to be evaluated when the needed term is.
See #9353 for the full discussion.
Test Plan: updated tests for pure prefetch in T8256 to reflect the design changes in #9353
Reviewers: simonmar, hvr, ekmett, austin
Reviewed By: ekmett, austin
Subscribers: merijn, thomie, carter, simonmar
Differential Revision: https://phabricator.haskell.org/D350
GHC Trac Issues: #9353
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing `decodeDouble_2Int#` primop is rather inconvenient to use
(and in fact is not even used by `integer-gmp`) as the mantissa is split
into 3 components which would actually fit in an `Int64#` value.
However, `decodeDouble_Int64#` is to be used by the new `integer-gmp2`
re-implementation (see #9281).
Moreover, `decodeDouble_2Int#` performs direct bit-wise operations on the
IEEE representation which can be replaced by a combination of the
portable standard C99 `scalbn(3)` and `frexp(3)` functions.
Differential Revision: https://phabricator.haskell.org/D160
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The documentation for `seq` was recently augmented via #9390 &
cbfa107604f4cbfaf02bd633c1faa6ecb90c6dd7. However, it doesn't show
up in the Haddock generated docs because `#ifdef __HADDOCK__` doesn't
work as expected. Also, it's easier to just fix the problem at the
origin (which in this is case is the primops.txt.pp file).
The benefit/downside of this is that now the extended documentation
shows up everywhere `seq` is re-exported directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The two new primops with the type-signatures
resizeMutableByteArray# :: MutableByteArray# s -> Int#
-> State# s -> (# State# s, MutableByteArray# s #)
shrinkMutableByteArray# :: MutableByteArray# s -> Int#
-> State# s -> State# s
allow to resize MutableByteArray#s in-place (when possible), and are useful
for algorithms where memory is temporarily over-allocated. The motivating
use-case is for implementing integer backends, where the final target size of
the result is either N or N+1, and only known after the operation has been
performed.
A future commit will implement a stateful variant of the
`sizeofMutableByteArray#` operation (see #9447 for details), since now the
size of a `MutableByteArray#` may change over its lifetime (i.e before
it gets frozen or GCed).
Test Plan: ./validate --slow
Reviewers: ezyang, austin, simonmar
Reviewed By: austin, simonmar
Differential Revision: https://phabricator.haskell.org/D133
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements the new primops
clz#, clz32#, clz64#,
ctz#, ctz32#, ctz64#
which provide efficient implementations of the popular
count-leading-zero and count-trailing-zero respectively
(see testcase for a pure Haskell reference implementation).
On x86, NCG as well as LLVM generates code based on the BSF/BSR
instructions (which need extra logic to make the 0-case well-defined).
Test Plan: validate and succesful tests on i686 and amd64
Reviewers: rwbarton, simonmar, ezyang, austin
Subscribers: simonmar, relrod, ezyang, carter
Differential Revision: https://phabricator.haskell.org/D144
GHC Trac Issues: #9340
|
| |
|
|
|
|
|
|
|
| |
According to the definition of has_side_effets in PrimOp,
raise# clearly has side effects! In practice it makes little
difference becuase the fact that it returns bottom is more
important... but still it's better to say it right.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the second attempt to add this functionality. The first
attempt was reverted in 950fcae46a82569e7cd1fba1637a23b419e00ecd, due
to register allocator failure on x86. Given how the register
allocator currently works, we don't have enough registers on x86 to
support cmpxchg using complicated addressing modes. Instead we fall
back to a simpler addressing mode on x86.
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
|
|
|
|
|
|
|
| |
This commit caused the register allocator to fail on i386.
This reverts commit d8abf85f8ca176854e9d5d0b12371c4bc402aac3 and
04dd7cb3423f1940242fdfe2ea2e3b8abd68a177 (the second being a fix to
the first).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add more primops for atomic ops on byte arrays
Adds the following primops:
* atomicReadIntArray#
* atomicWriteIntArray#
* fetchSubIntArray#
* fetchOrIntArray#
* fetchXorIntArray#
* fetchAndIntArray#
Makes these pre-existing out-of-line primops inline:
* fetchAddIntArray#
* casIntArray#
|
|
|
|
| |
`Any` is now an abstract (that is, no equations) closed type family.
|
|
|
|
|
|
| |
Say how it differs from Array in terms of size and performance.
These are primitives so it's also ok to talk a bit about implementation
details like card tables.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These array types are smaller than Array# and MutableArray# and are
faster when the array size is small, as they don't have the overhead
of a card table. Having no card table reduces the closure size with 2
words in the typical small array case and leads to less work when
updating or GC:ing the array.
Reduces both the runtime and memory allocation by 8.8% on my insert
benchmark for the HashMap type in the unordered-containers package,
which makes use of lots of small arrays. With tuned GC settings
(i.e. `+RTS -A6M`) the runtime reduction is 15%.
Fixes #8923.
|
|
|
|
|
|
| |
This should reduce code size when there's little to gain from inlining
these primops, while still retaining the inlining benefit when the
size of the copy is known statically.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inline allocation version is 69% faster than the out-of-line
version, when cloning an array of 16 unit elements on a 64-bit
machine.
Comparing the new and the old primop implementations isn't
straightforward. The old version had a missing heap check that I
discovered during the development of the new version. Comparing the
old and the new version would requiring fixing the old version, which
in turn means reimplementing the equivalent of MAYBE_CG in StgCmmPrim.
The inline allocation threshold is configurable via
-fmax-inline-alloc-size which gives the maximum array size, in bytes,
to allocate inline. The size does not include the closure header size.
Allowing the same primop to be either inline or out-of-line has some
implication for how we lay out heap checks. We always place a heap
check around out-of-line primops, as they may allocate outside of our
knowledge. However, for the inline primops we only allow allocation
via the standard means (i.e. virtHp). Since the clone primops might be
either inline or out-of-line the heap check layout code now consults
shouldInlinePrimOp to know whether a primop will be inlined.
|
|
|
|
|
| |
so do not export it in GHC.Prim, and also have the pseudo-code for
GHC.Prim import GHC.Types, so that haddock is happy.
|
|
|
|
|
|
| |
Clarify the order of the arguments. Also, remove any use of # in the
comments, which would make the rest of that comment line disappear in
the docs, due to being treated as a comment by the preprocessor.
|
|
|
|
|
| |
We want it to show up in GHC.Exts, so we need to put the documentation
in GHC.Types, where the datatype Coercible is defined.
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
| |
This patch was authored by SPJ, and extracted from "Improve the handling
of used-once stuff" by Joachim.
|
|
|
|
|
| |
because it is not a top deman (see previous commit), and it is only used
in an argument to mkStrictSig.
|
| |
|
|
|
|
|
|
|
| |
Also remove can_fail=True since it's likely unnecessary upon discussion
(see #8256.)
Signed-off-by: Austin Seipp <austin@well-typed.com>
|