| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
|
|
|
|
|
| |
While the original head and tail of the TSO queue may be in the same
generation as the MVAR, interior elements of the queue could be younger
after a GC run and may then be exposed by putMVar operation that updates
the queue head.
Resolves #18919
|
|
|
|
|
|
|
|
| |
In the past some people have confused ASSERT, which is for checking
internal invariants, which CHECK, which should be used when checking
things that might fail due to bad input (and therefore should be enabled
even in the release compiler). Change some of these cases in the linker
to use CHECK.
|
|
|
|
| |
Use the GHC wrappers instead of <assert.h>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, in an attempt to reduce fragmentation, each new allocator
would map a region of M32_MAX_PAGES fresh pages to seed itself. However,
this ends up being extremely wasteful since it turns out that we often
use fewer than this. Consequently, these pages end up getting freed
which, ends up fragmenting our address space more than than we would
have if we had naively allocated pages on-demand.
Here we refactor m32 to avoid this waste while achieving the
fragmentation mitigation previously desired. In particular, we move all
page allocation into the global m32_alloc_page, which will pull a page
from the free page pool. If the free page pool is empty we then refill
it by allocating a region of M32_MAP_PAGES and adding them to the pool.
Furthermore, we do away with the initial seeding entirely. That is, the
allocator starts with no active pages: pages are rather allocated on an
as-needed basis.
On the whole this ends up being a pleasingly simple change,
simultaneously making m32 more efficient, more robust, and simpler.
Fixes #18980.
|
|
|
|
| |
See Note [Non-moving GC: Marking evacuated objects].
|
| |
|
|
|
|
| |
The mark thread is not joinable as we detach from it on creation.
|
|
|
|
| |
pthread_join returns its error code and apparently doesn't set errno.
|
|
|
|
|
| |
Ensure that the the free variables have been pushed to the update
remembered set before we zero the slop.
|
| |
|
|
|
|
|
|
| |
After a THROWTO message has been handle the message closure is
overwritten by a NULL message. We must ensure that the original
closure's pointers continue to be visible to the nonmoving GC.
|
|
|
|
|
|
|
| |
The TSAN rework (specifically aad1f803) introduced a subtle regression
in GC.c, swapping `g0` in place of `gen`. Whoops!
Fixes #18997.
|
|
|
|
|
|
|
|
| |
When threadPaused blackholes a thunk it calls `OVERWRITING_CLOSURE` to
zero the slop for the benefit of the sanity checker. Previously this was
done *before* pushing the thunk's free variables to the update
remembered set. Consequently we would pull zero'd pointers to the update
remembered set.
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
| |
As noted in #18991, we would previously allocate heap in low memory.
Due to this the linker, which typically *needs* low memory, would end up
competing with the heap. In longer builds we end up running out of
low memory entirely, leading to linking failures.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On windows using gcc-10 gcc failed to inline copy_tag into evacuate.
To fix this we now set the always_inline attribute for the various
copy* functions in Evac.c. The main motivation here is not the
overhead of the function call, but rather that this allows the code
to "specialize" for the size of the closure we copy which is often
known at compile time.
An earlier commit also tried to avoid evacuate_large inlining. But
didn't quite succeed. So I also marked evacuate_large as noinline.
Fixes #12416
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces all Word<N> = W<N># Word# and Int<N> = I<N># Int# with
Word<N> = W<N># Word<N># and Int<N> = I<N># Int<N>#, thus providing us
with properly sized primitives in the codegenerator instead of pretending
they are all full machine words.
This came up when implementing darwinpcs for arm64. The darwinpcs reqires
us to pack function argugments in excess of registers on the stack. While
most procedure call standards (pcs) assume arguments are just passed in
8 byte slots; and thus the caller does not know the exact signature to make
the call, darwinpcs requires us to adhere to the prototype, and thus have
the correct sizes. If we specify CInt in the FFI call, it should correspond
to the C int, and not just be Word sized, when it's only half the size.
This does change the expected output of T16402 but the new result is no
less correct as it eliminates the narrowing (instead of the `and` as was
previously done).
Bumps the array, bytestring, text, and binary submodules.
Co-Authored-By: Ben Gamari <ben@well-typed.com>
Metric Increase:
T13701
T14697
|
|
|
|
|
|
| |
Otherwise `opt` fails with:
error: use of undefined value '@memcmp$def'
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in #18043, flushTrace failed flush anything beyond the writer.
This means that a significant amount of data sitting in capability-local
event buffers may never get flushed, despite the users' pleads for us to
flush.
Fix this by making flushEventLog flush all of the event buffers before
flushing the writer.
Fixes #18043.
|
|
|
|
|
|
|
|
| |
We currently only post the entry counters, not the other global
counters as in my experience the former are more useful. We use the heap
profiler's census period to decide when to dump.
Also spruces up the documentation surrounding ticky-ticky a bit.
|
|
|
|
|
|
|
| |
We place symbol_extras right after bss. We also need
to ensure that symbol_extras can be mprotect'd independently from the
rest of the image. To ensure this we round up the size of bss to a page
boundary, thus ensuring that symbol_extras is also page-aligned.
|
|
|
|
|
|
|
|
| |
This addes the necessary logic to support aarch64 on elf, as well
as aarch64 on mach-o, which Apple calls arm64.
We change architecture name to AArch64, which is the official arm
naming scheme.
|
|
|
|
|
|
|
|
| |
These are used to find the current roots of the garbage collector.
Co-authored-by: Sven Tennie's avatarSven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering's avatarMatthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: default avatarBen Gamari <bgamari.foss@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(This change is originally written by niteria)
This adds two functions:
* `loadNativeObj`
* `unloadNativeObj`
and implements them for Linux.
They are useful if you want to load a shared object with Haskell code
using the system linker and have GHC call dlclose() after the
code is no longer referenced from the heap.
Using the system linker allows you to load the shared object
above outside the low-mem region. It also loads the DWARF sections
in a way that `perf` understands.
`dl_iterate_phdr` is what makes this implementation Linux specific.
|
|
|
|
|
|
|
| |
Fixes #16525 by tracking dependencies between object file symbols and
marking symbol liveness during garbage collection
See Note [Object unloading] in CheckUnload.c for details.
|
|
|
|
|
|
| |
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Matthew Pickering <matthewtpickering@gmail.com>
Co-authored-by: Ben Gamari <bgamari.foss@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the overflow check for the IMAGE_REL_AMD64_ADDR32NB
relocation failed to account for the signed nature of the value.
Specifically, the overflow check was:
uint64_t v;
v = S + A;
if (v >> 32) { ... }
However, `v` ultimately needs to fit into 32-bits as a signed value.
Consequently, values `v > 2^31` in fact overflow yet this is not caught
by the existing overflow check.
Here we rewrite the overflow check to rather ensure that
`INT32_MIN <= v <= INT32_MAX`. There is now quite a bit of repetition
between the `IMAGE_REL_AMD64_REL32` and `IMAGE_REL_AMD64_ADDR32` cases
but I am leaving fixing this for future work.
This bug was first noticed by @awson.
Fixes #15808.
|
|\ |
|
| |\ |
|
| | |
| | |
| | |
| | | |
Since the latter wants to call getRTSStats.
|
| | |
| | |
| | |
| | |
| | | |
While on face value this seems a bit heavy, I think it's far better than
enforcing ordering on every access.
|
| | | |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
We can generally be pretty relaxed in the barriers here since the timer
thread is a loop.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously `initScheduler` would attempt to pause the ticker and in so
doing acquire the ticker mutex. However, initTicker, which is
responsible for initializing said mutex, hadn't been called
yet.
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
This avoids #17289.
|
| | |/ |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
This suppresses the other side of a race during shutdown.
|
| | |/ |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Previously the `current_value`, `first_watch_queue_entry`, and
`num_updates` fields of `StgTVar` were marked as `volatile` in an
attempt to provide strong ordering. Of course, this isn't sufficient.
We now use proper atomic operations. In most of these cases I strengthen
the ordering all the way to SEQ_CST although it's possible that some
could be weakened with some thought.
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a potentially harmful race where we failed to synchronize
before looking at a TVar's current_value.
Also did a bit of refactoring to avoid abstract over management of
max_commits.
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Here we are doing lazy initialization; it's okay if we do the check more
than once, hence relaxed operation is fine.
|
| | | |
| | | |
| | | |
| | | | |
Fixes #17275.
|