| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
List all blocks on the non-moving heap.
Resolves #22627
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using the copying collector there is still a lot of data which
isn't copied (such as pinned, compacted, large objects etc). The logic
to decide how much memory to retain didn't take into account that these
wouldn't be copied. Therefore we pessimistically retained 2* the amount
of memory for these blocks even though they wouldn't be copied by the
collector.
The solution is to split up the heap into two parts, the parts which
will be copied and the parts which won't be copied. Then the appropiate
factor is applied to each part individually (2 * for copying and 1.2 *
for not copying).
The T23221 test demonstrates this improvement with a program which first
allocates many unpinned ByteArray# followed by many pinned ByteArray#
and observes the difference in the ultimate memory baseline between the
two.
There are some charts on #23221.
Fixes #23221
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds eight new primops that fuse a multiplication and an
addition or subtraction:
- `{fmadd,fmsub,fnmadd,fnmsub}{Float,Double}#`
fmadd x y z is x * y + z, computed with a single rounding step.
This patch implements code generation for these primops in the following
backends:
- X86, AArch64 and PowerPC NCG,
- LLVM
- C
WASM uses the C implementation. The primops are unsupported in the
JavaScript backend.
The following constant folding rules are also provided:
- compute a * b + c when a, b, c are all literals,
- x * y + 0 ==> x * y,
- ±1 * y + z ==> z ± y and x * ±1 + z ==> z ± x.
NB: the constant folding rules incorrectly handle signed zero.
This is a known limitation with GHC's floating-point constant folding
rules (#21227), which we hope to resolve in the future.
|
|
|
|
|
|
|
|
|
| |
Previously we failed to account direct mutator allocations into the
nonmoving heap against the mutator's allocation limit and
`cap->total_allocated`. This only manifests during CAF evaluation (since
we allocate the CAF's blackhole directly into the nonmoving heap).
Fixes #23312.
|
| |
|
|
|
|
| |
As requested by @treeowl in CLC#139.
|
|
|
|
|
|
|
|
| |
As noticed by @Terrorjack, `hs_init_ghc` previously used non-atomic
increment/decrement on the RTS's initialization count. This may go wrong
in a multithreaded program which initializes the runtime multiple times.
Closes #22756.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While fixing these I've also changed the way we store addresses into
ByteArray#. Addr# are composed of two parts: a JavaScript array and an
offset (32-bit number).
Suppose we want to store an Addr# in a ByteArray# foo at offset i.
Before this patch, we were storing both fields as a tuple in the "arr"
array field:
foo.arr[i] = [addr_arr, addr_offset];
Now we only store the array part in the "arr" field and the offset
directly in the array:
foo.dv.setInt32(i, addr_offset):
foo.arr[i] = addr_arr;
It avoids wasting space for the tuple.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* For ByteArray-based bounds-checking, the JavaScript backend must use the
`len` field, instead of the inbuild JavaScript `length` field.
* Range-based operations must also check both the start and end of the range
for bounds
* All indicies are valid for ranges of size zero, since they are essentially no-ops
* For cases of ByteArray accesses (e.g. read as Int), the end index is
(i * sizeof(type) + sizeof(type) - 1), while the previous implementation
uses (i + sizeof(type) - 1). In the Int32 example, this is (i * 4 + 3)
* IndexByteArrayOp_Word8As* primitives use byte array indicies (unlike
the previous point), but now check both start and end indicies
* Byte array copies now check if the arrays are the same by identity and
then if the ranges overlap.
|
|
|
|
|
|
|
|
|
|
| |
This patch does a few things:
- Always build 64-bit atomic ops in rts/ghc-prim, even on 32-bit
platforms
- Remove legacy "64bit" cabal flag of rts package
- Fix hs_xchg64 function prototype for 32-bit platforms
- Fix AtomicFetch test for wasm32
|
|
|
|
|
|
|
| |
Previously the implementation of listThreads# failed to initialize the
header of the created array, leading to various nastiness.
Fixes #23071
|
| |
|
|
|
|
| |
implementation
|
|
|
|
| |
Fixes #23142
|
|
|
|
|
|
|
|
|
| |
As noted in #23170, the nonmoving GC can race with a mutator zeroing the
slop of an updated thunk (in much the same way that two mutators would
race). Consequently, we must disable slop-zeroing when the nonmoving GC
is in use.
Closes #23170
|
|
|
|
|
| |
Fixes #21054. Additionally, we can now check for range overlap
when generating Cmm for primops that use memcpy internally.
|
|
|
|
|
|
|
|
|
| |
Previously `zeroSlop` examined `RtsFlags` to determine whether the
program was single-threaded. This is wrong; a program may be started
with `+RTS -N1` yet the process may later increase the capability count
with `setNumCapabilities`. This lead to quite subtle and rare crashes.
Fixes #23088.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we relied on calling EXTERN_INLINE functions defined in
ClosureMacros.h from Cmm to zero slop. However, as far as I can tell,
this is no longer safe to do in C99 as EXTERN_INLINE definitions may be emitted
in each compilation unit.
Fix this by explicitly declaring a new set of non-inline functions in
ZeroSlop.c which can be called from Cmm and marking the ClosureMacros.h
definitions as INLINE_HEADER.
In the future we should try to eliminate EXTERN_INLINE.
|
|
|
|
|
|
|
|
|
|
|
| |
This patch does a few things:
- Add the missing RtsSymbols.c entry of performBlockingMajorGC
- Make hs_perform_gc call performBlockingMajorGC, which restores
previous behavior
- Use hs_perform_gc in ffi023
- Remove rts_clearMemory() call in ffi023, it now works again in some
test ways previously marked as broken. Fixes #23089
|
|
|
|
|
|
|
|
|
| |
Previously IND and IND_STATIC lacked the acquire barriers enjoyed by
BLACKHOLE. As noted in the (now updated) Note [Heap memory barriers],
this barrier is critical to ensure that the indirectee is visible to the
entering core.
Fixes #22872.
|
|
|
|
|
|
|
| |
Make sure that we keep track of the size of large and compact objects that have been moved onto the nonmoving heap.
We keep track of their size and add it to the amount of live bytes in nonmoving segments to get the total size of the live nonmoving heap.
Resolves #17574
|
| |
|
| |
|
|
|
|
|
| |
The number of distinct arguments passed to GarbageCollect was getting a
bit out of hand.
|
|
|
|
| |
This should be INLINE_HEADER lest we get unused declaration warnings.
|
|
|
|
|
| |
To reflect the fact that these are to do with the nonmoving collector,
now since they are exposed no longer static.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This makes it a bit easier to add instrumentation on this spinlock
while debugging.
|
|
|
|
|
|
| |
When the nonmoving GC is in use we do not call `checkUnload` (since we
don't unload code) and therefore should not call `prepareUnloadCheck`,
lest we run into assertions.
|
|
|
|
|
|
|
| |
The nonmoving collector does not use `oldest_gen->blocks` to track its
block list. However, it nevertheless updates `oldest_gen->n_blocks` to
ensure that its size is accounted for by the storage manager.
Consequently, we must not attempt to assert consistency between the two.
|
|
|
|
|
| |
Some references to Note [Deadlock detection under the non-moving
collector] were missing an article.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current segments are conceptually owned by the mutator, not the
collector. Consequently, it was quite tricky to prove that the mutator
would not race with the collect due to this shared state. It turns out
that such races are possible: when resizing the current segment array
we may concurrently try to take a heap census. This will attempt to walk
the current segment array, causing a data race.
Fix this by moving the current segment array into `Capability`, where it
belongs.
Fixes #22926.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Here we significantly improve the bound on sync phase pause times by
imposing a limit on the amount of work that we can perform during the
sync. If we find that we have exceeded our marking budget then we allow
the mutators to resume, return to concurrent marking, and try
synchronizing again later.
Fixes #22929.
|
|
|
|
|
| |
Previously we left various segment link pointers dangling. None of this
wrong per se, but it did make it harder than necessary to debug.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This fixes the selector optimisation, adding a few write barriers which
are necessary for soundness. See the inline comments for details.
Fixes #22930.
|
|
|
|
|
|
|
|
|
| |
Previously `storageAddCapabilities` (called by `setNumCapabilities`) would
clobber the update remembered sets of existing capabilities when
increasing the capability count. Fix this by only initializing the
update remembered sets of the newly-created capabilities.
Fixes #22927.
|
|
|
|
|
| |
We must conservatively assume that new closures are reachable since we
are not guaranteed to mark such blocks.
|
| |
|
|
|
|
|
| |
Previously we only updated the state of the segment at the head of each
allocator's filled list.
|
| |
|
|
|
|
|
| |
Assert that entries in the nonmoving generation's generational
remembered set (a.k.a. mutable list) live in nonmoving generation.
|