| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
+RTS -AL<size> controls the total size of large objects that can be
allocated before a GC is triggered. Previously this was always just the
value of -A, and the limit mainly existed to prevent runaway allocation
in pathalogical programs that allocate a lot of large objects. However,
since the limit is shared between all cores, on a large multicore the
default becomes more restrictive, and can end up triggering GC well
before it would normally have been.
Arguably a better default would be A*N, but this is probably excessive.
Adding a flag lets you choose, and I've left the default as it was.
See docs for usage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the GC to use fewer threads than the number of capabilities.
At each GC, we choose some of the capabilities to be "idle", which means
that the thread running on that capability (if any) will sleep for the
duration of the GC, and the other threads will do its work. We choose
capabilities that are already idle (if any) to be the idle capabilities.
The idea is that this helps in the following situation:
* We want to use a large -N value so as to make use of hyperthreaded
cores
* We use a large heap size, so GC is infrequent
* But we don't want to use all -N threads in the GC, because that
thrashes the memory too much.
See docs for usage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename to the (more correct) NUM_FREE_LISTS
- NUM_FREE_LISTS should be derived from the block and mblock sizes, not
defined manually. It was actually too large by one, which caused a
little bit of (benign) extra work in the form of a redundant loop
iteration in some cases.
- Add some ASSERTs for input preconditions to log_2() and log_2_ceil()
- Fix some comments
- Fix usage in allocLargeChunk, to account for the fact that
log_2_ceil() can return NUM_FREE_LISTS.
|
|
|
|
|
|
|
| |
This reverts commit 546f24e4f8a7c086b1e5afcdda624176610cbcf8.
And adds a fix for Windows: we need to use __builtin_clzll() rather than
__builtin_clzl(), because StgWord is unsigned long long on Windows.
|
|
|
|
|
|
|
|
| |
Some old stuff related to the PAR way.
Reviewed by: austin, simonmar
Differential Revision: https://phabricator.haskell.org/D2137
|
|
|
|
| |
This reverts commit 24864ba5587c1a0447beabae90529e8bb4fa117a.
|
| |
|
|
|
|
| |
A microoptimisation in the block allocator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids contention for the block allocator lock in the GC; this can be
seen in the gc_alloc_block_sync counter emitted by +RTS -s.
I experimented with this a while ago, and there was already
commented-out code for it in GCUtils.c, but I've now improved it so that
it doesn't result in significantly worse memory usage.
* The old method of putting spare blocks on ws->part_list was wasteful,
the spare blocks are now shared between all generations and retained
between GCs.
* repeated allocGroup() results in fragmentation, so I switched to using
allocLargeChunk() instead which is fragmentation-friendly; we already
use it for the same reason in nursery allocation.
|
|
|
|
|
|
|
|
|
| |
After a parallel GC, it is possible to have a long list of blocks in
ws->part_list, if we did a lot of work stealing but didn't fill up the
blocks we stole. These blocks persist until the next load-balanced GC,
which might be a long time, and during every GC we were traversing this
list to find its size. The fix is to maintain the size all the time, so
we don't have to compute it.
|
|
|
|
|
| |
DEAD_WEAK used to have a different layout, see
d61c623ed6b2d352474a7497a65015dbf6a72e12
|
| |
|
|
|
|
|
| |
This reverts commit 6c2c853b11fe25c106469da7b105e2be596c17de which was
supposed to be merged as individual commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this Diff contains small, self-contained changes as I work towards
fixing #10613. It is mostly created to let harbormaster do its job, but
feedback is welcome as well.
Please do not merge this via arc; I’d like to push the individual
patches as layed out here. I might push mostly trivial ones even without
review, as long as the build passes.
Reviewers: austin, bgamari
Reviewed By: bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D2014
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5d52d9b64c21dcf77849866584744722f8121389 removed
global 'blackhole_queue' in favour of new mechanism:
when TSO hits blackhole TSO blocks waiting for
'MessgaeBlackhole' delivery.
Patch removed unused global and updates stale comments.
Noticed by Yuras Shumovich.
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
Test Plan: build test
Reviewers: simonmar, austin, Yuras, bgamari
Reviewed By: Yuras, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1953
|
|
|
|
|
|
|
|
| |
Noticed by uselex.rb:
copied: [R]: exported from:
./rts/dist/build/sm/GC.o
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
| |
Noticed by uselex.rb:
scavenge_mutable_list: [R]: exported from:
./rts/dist/build/sm/Scav.o
scavenge_mutable_list1: [R]: exported from:
./rts/dist/build/sm/Scav.thr_o
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use of these helper functions was removed by
commit 18896fa2b06844407fd1e0d3f85cd3db97a96ff4
Author: Simon Marlow <marlowsd@gmail.com>
Date: Wed Feb 2 15:49:55 2011 +0000
Noticed by uselex.rb:
calcLiveBlocks: [R]: exported from:
./rts/dist/build/sm/Storage.o
calcLiveWords: [R]: exported from:
./rts/dist/build/sm/Storage.o
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it seems that this closure type has not been in use since 5d52d9, so all
this is dead and untested code. This removes it. Some of the code might
be useful for a counting indirection as described in #10613, so when
implementing that, have a look at what this commit removes.
Test Plan: validate on harbormaster
Reviewers: austin, bgamari, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1821
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch fixes gc_thread related compilation failure
on Solaris/i386 platform. It uses Linux way of __thread declared
gc_thread variable for register starving i386 from now.
Reviewers: bgamari, austin, erikd
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1688
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This hasn't been used for a very long time and will soon be superceded
by perf_events support.
Test Plan: validate
Reviewers: austin, simonmar
Reviewed By: austin, simonmar
Subscribers: thomie, erikd
Differential Revision: https://phabricator.haskell.org/D1493
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the two-step allocator the RTS asks the kernel for a large upfront
mmap'd region of memory (on the order of terabytes). While we have no
expectation that this entire region will be backed by physical memory,
this scheme nevertheless fails on some systems with resource limits.
Here we use a back-off scheme to reduce our allocation request until we
find a size agreeable to the kernel. Fixes #10877.
This also fixes a latent bug wherein the heap reservation retry logic
would fail to free the previously reserved address space, which would
likely result in a heap allocation failure.
Test Plan:
set address space limit with `ulimit -v 67108864` and try running
a compiled program
Reviewers: simonmar, austin
Reviewed By: simonmar
Subscribers: thomie, RyanGlScott
Differential Revision: https://phabricator.haskell.org/D1405
GHC Trac Issues: #10877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this was introduced in D524 as a compile-time constant.
Sadly, this isn't flexible enough to allow for environments where
ulimits restrict the maximum address space size (see, for instance,
Consequently, we are forced to make this dynamic. In principle this
shouldn't be so terrible as we can place both the beginning and end
addresses within the same cache line, likely incurring only one or so
additional instruction in HEAP_ALLOCED.
Test Plan: validate
Reviewers: austin, simonmar
Reviewed By: simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1353
GHC Trac Issues: #10877
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was possible to read non-existent memory, if we try to read the
srt_offset field of an info table when there is no SRT, and the info
table is right at the start of the text section.
This actually happened to me, I'm not sure why it never happened
before.
Test Plan: validate
Reviewers: rwbarton, ezyang, austin, bgamari
Reviewed By: austin, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1401
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: See Note [MallocPtr finalizers]
Test Plan: validate; new test T10904
Reviewers: ezyang, bgamari, austin, hvr, rwbarton
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1275
|
|
|
|
|
|
|
|
|
|
| |
Rename StgArrWords to StgArrBytes (see Trac #8552)
Reviewed By: austin
Differential Revision: https://phabricator.haskell.org/D1233
GHC Trac Issues: #8552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0d1a8d09f4 added a two step allocator for 64 bit systems. This
allocator mmaps a huge (1 TB) chunk of memory out of which it does
smaller allocations. On AArch64/Arm64 linux, this mmap was failing
due to the Arm64 Linux kernel parameter CONFIG_ARM64_VA_BITS
defaulting to 39 bits.
Therefore reducing the AArch64 value for MBLOCK_SPACE_SIZE to make
this allocation 1/4 TB while remaining 1 TB for other archs.
Reviewers: ezyang, austin, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1171
GHC Trac Issues: #10682
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The fix is a bit clunky, and is perhaps not the best fix, but I'm not
sure how much work it would be to fix it the other way (see comments
for more info).
Test Plan: T7919 doesn't crash
Reviewers: austin, rwbarton, ezyang, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1113
GHC Trac Issues: #7919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
[Revised version of D1076 that was committed and then backed out]
In a workload with a large amount of code, zero_static_objects_list()
takes a significant amount of time, and furthermore it is in the
single-threaded part of the GC.
This patch uses a slightly fiddly scheme for marking objects on the
static object lists, using a flag in the low 2 bits that flips between
two states to indicate whether an object has been visited during this
GC or not. We also have to take into account objects that have not
been visited yet, which might appear at any time due to runtime linking.
Test Plan: validate
Reviewers: austin, ezyang, rwbarton, bgamari, thomie
Reviewed By: bgamari, thomie
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1106
|
|
|
|
| |
This reverts commit b949c96b4960168a3b399fe14485b24a2167b982.
|
|
|
|
|
|
| |
C99 does not allow unnamed parameters in definition argument lists [1].
[1] http://stackoverflow.com/questions/8776810/parameter-name-omitted-c-vs-c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The current OS memory allocator conflates the concepts of allocating
address space and allocating memory, which makes the HEAP_ALLOCED()
implementation excessively complicated (as the only thing it cares
about is address space layout) and slow. Instead, what we want
is to allocate a single insanely large contiguous block of address
space (to make HEAP_ALLOCED() checks fast), and then commit subportions
of that in 1MB blocks as we did before.
This is currently behind a flag, USE_LARGE_ADDRESS_SPACE, that is only enabled for
certain OSes.
Test Plan: validate
Reviewers: simonmar, ezyang, austin
Subscribers: thomie, carter
Differential Revision: https://phabricator.haskell.org/D524
GHC Trac Issues: #9706
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In a workload with a large amount of code, zero_static_objects_list()
takes a significant amount of time, and furthermore it is in the
single-threaded part of the GC.
This patch uses a slightly fiddly scheme for marking objects on the
static object lists, using a flag in the low 2 bits that flips between
two states to indicate whether an object has been visited during this
GC or not. We also have to take into account objects that have not
been visited yet, which might appear at any time due to runtime linking.
Test Plan: validate
Reviewers: austin, bgamari, ezyang, rwbarton
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1076
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Initialising the whole group is expensive and unnecessary.
Test Plan: validate
Reviewers: austin, bgamari, rwbarton
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1071
|
|
|
|
|
|
|
|
|
|
| |
Test Plan: validate
Reviewers: austin, bgamari
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D1047
|
|
|
|
|
|
|
| |
getNewNursery() was unconditionally incrementing next_nursery, which
is normally fine but it broke an assumption in
storageAddCapabilities(). This manifested as an occasional crash in
the setnumcapabilities001 test.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a situaion where we have some statically-linked code and we want to
load and unload a series of objects, we need the CAFs in the
statically-linked code to be retained indefinitely, while the CAFs in
the dynamically-linked code should be GC'd as normal, so that we can
detect when the code is unloadable. This was wrong before - we GC'd
CAFs in the static code, leading to a crash in the rare case where we
use a CAF, GC it, and then load a new object that uses it again.
I also did some tidy up: RtsConfig now has a field keep_cafs to
indicate whether we want CAFs to be retained in static code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There's a race condition like this:
# A foreign pointer gets promoted to the last generation
# It has its finalizer called manually
# We start shutting down the runtime in `hs_exit_` from the main
thread
# A minor GC starts running (`scheduleDoGC`) on one of the threads
# The minor GC notices that we're in `SCHED_INTERRUPTING` state and
advances to `SCHED_SHUTTING_DOWN`
# The main thread tries to do major GC (with `scheduleDoGC`), but it
exits early because we're in `SCHED_SHUTTING_DOWN` state
# We end up with a `DEAD_WEAK` left on the list of weak pointers of
the last generation, because it relied on major GC removing it from
that list
This change:
* Ignores DEAD_WEAK finalizers when shutting down
* Makes the major GC on shutdown more likely
* Fixes a bogus assert
Test Plan:
before this diff https://ghc.haskell.org/trac/ghc/ticket/7170#comment:5
reproduced and after it doesn't
Reviewers: ezyang, austin, simonmar
Reviewed By: simonmar
Subscribers: bgamari, thomie
Differential Revision: https://phabricator.haskell.org/D921
GHC Trac Issues: #7170
|
|
|
|
| |
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Hooks rely on static linking semantics, and are broken by -Bsymbolic
which we need when using dynamic linking.
Test Plan: Built it
Reviewers: austin, hvr, tibbe
Differential Revision: https://phabricator.haskell.org/D8
|
|
|
|
|
|
|
|
|
|
| |
#10043)
Reviewers: austin, simonmar
Subscribers: thomie
Differential Revision: https://phabricator.haskell.org/D657
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Very large modules can sometimes contain very large SRT bitmaps (this
is a separate problem that I need to look into). The large bitmaps
often contain a lot of zeros, so this patch skips over empty words in
the bitmap.
It makes a dramatic difference in the particular example that I saw,
where an old gen GC was taking 0.5s before this change and 0.07s after
it.
|
|
|
|
| |
See the documentation for details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
clearNursery resets all the bd->free pointers of nursery blocks to
make the blocks empty. In profiles we've seen clearNursery taking
significant amounts of time particularly with large -N and -A values.
This patch moves the work of clearNursery to the point at which we
actually need the new block, thereby introducing an invariant that
blocks to the right of the CurrentNursery pointer still need their
bd->free pointer reset. This should make things faster overall,
because we don't need to clear blocks that we don't use.
Test Plan: validate
Reviewers: AndreasVoellmy, ezyang, austin
Subscribers: thomie, carter, ezyang, simonmar
Differential Revision: https://phabricator.haskell.org/D318
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|
|
|
|
|
|
|
|
| |
This reverts commit f0fcc41d755876a1b02d1c7c79f57515059f6417.
New changes: now works on 32-bit platforms too. I added some basic
support for 64-bit subtraction and comparison operations to the x86
NCG.
|
|
|
|
|
|
|
|
|
| |
When there's a conflict between two threads evacuating the same TSO,
in some cases we would update the incall->tso pointer to point to the
wrong copy of the TSO. This would get fixed during the next GC, but
if the thread completed in the meantime, it would likely crash. We're
seeing this about once per day on a heavily loaded machine (it varies
a lot though).
|
|
|
|
| |
Signed-off-by: Austin Seipp <austin@well-typed.com>
|