| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Remove an invalid assumption about GC checking what_next field. The GC
doesn't care about what_next at all, if a TSO is reachable then all
its pointers are followed (other than global_tso, which is only
followed by compacting GC).
- Remove checkSTACK in checkTSO: TSO stacks will be visited in
checkHeapChain, or checkLargeObjects etc.
- Add an assertion in checkTSO to check that the global_link field is
sane.
- Did some refactor to remove forward decls in checkGlobalTSOList and
added braces around single-statement if statements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #17785
Here's how the problem occurs:
- In generation 0 we have a TSO that is finished (i.e. it has no more
work to do or it is killed).
- The TSO only becomes reachable after collectDeadWeakPtrs().
- After collectDeadWeakPtrs() we switch to WeakDone phase where we don't
move TSOs to different lists anymore (like the next gen's thread list
or the resurrected_threads list).
- So the TSO will never be moved to a generation's thread list, but it
will be promoted to generation 1.
- Generation 1 collected via mark-compact, and because the TSO is
reachable it is marked, and its `global_link` field, which is bogus at
this point (because the TSO is not in a list), will be threaded.
- Chaos ensues.
In other words, when these conditions hold:
- A TSO is reachable only after collectDeadWeakPtrs()
- It's finished (what_next is ThreadComplete or ThreadKilled)
- It's retained by mark-compact collector (moving collector doesn't
evacuate the global_list field)
We end up doing random mutations on the heap because the TSO's
global_list field is not valid, but it still looks like a heap pointer
so we thread it during compacting GC.
The fix is simple: when we traverse old_threads lists to resurrect
unreachable threads the threads that won't be resurrected currently
stays on the old_threads lists. Those threads will never be visited
again by MarkWeak so we now reset the global_list fields. This way
compacting GC does not thread pointers to nowhere.
Testing
-------
The reproducer in #17785 is quite large and hard to build, because of
the dependencies, so I'm not adding a regression test.
In my testing the reproducer would take a less than 5 seconds to run,
and once in every ~5 runs would fail with a segfault or an assertion
error. In other cases it also fails with a test failure. Because the
tests never fail with the bug fix, assuming the code is correct, this
also means that this bug can sometimes lead to incorrect runtime
results.
After the fix I was able to run the reproducer repeatedly for about an
hour, with no runtime crashes or test failures.
To run the reproducer clone the git repo:
$ git clone https://github.com/osa1/streamly --branch ghc-segfault
Then clone primitive and atomic-primops from their git repos and point
to the clones in cabal.project.local. The project should then be
buildable using GHC HEAD. Run the executable `properties` with `+RTS -c
-DZ`.
In addition to the reproducer above I run the test suite using:
$ make slowtest EXTRA_HC_OPTS="-debug -with-rtsopts=-DS \
-with-rtsopts=-c +RTS -c -RTS" SKIPWAY='nonmoving nonmoving_thr'
This enables compacting GC always in both GHC when building the test
programs and when running the test programs, and also enables sanity
checking when running the test programs. These set of flags are not
compatible for all tests so there are some failures, but I got the same
set of failures with this patch compared to GHC HEAD.
|
| |
|
|
|
|
|
| |
nonmovingSweep already clears the bitmap in the sweep loop. There is no
reason to do so a second time.
|
|
|
|
|
|
|
|
|
| |
The non-moving collector would previously walk the entire filled segment
list during the preparatory pause. However, this is far more work than
is strictly necessary. We can rather get away with merely collecting the
allocators' filled segment list heads and process the lists themselves
during the concurrent phase. This can significantly reduce the maximum
gen1 GC pause time in programs with high rates of long-lived allocations.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In copying GC, with the relevant debug flags enabled, we release the old
blocks after a GC, and the block allocator zeroes the space before
releasing a block. This effectively zeros the old heap.
In compacting GC we reuse the blocks and previously we didn't zero the
unused space in a compacting generation after compaction. With this
patch we zero the slop between the free pointer and the end of the block
when we're done with compaction and when switching to a new block
(because the current block doesn't have enough space for the next object
we're shifting).
|
|
|
|
|
|
|
| |
macOS Catalina now supports a non-POSIX-compliant version of clock_gettime
which cannot use the clock_gettime codepath.
Fixes #17906.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously sparks living in the non-moving heap would be promptly GC'd
by the minor collector since pruneSparkQueue uses the BF_EVACUATED flag,
which non-moving heap blocks do not have set.
Fix this by implementing proper support in pruneSparkQueue for
determining reachability in the non-moving heap. The story is told in
Note [Spark management in the nonmoving heap].
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This fixes #17893
[skip-ci]
|
| |
|
|
|
|
|
|
|
|
| |
- Added a few comments in StgPAP
- Added a few comments and assertions in scavenge_small_bitmap and
walk_large_bitmap
- Did tiny refactor in GHC.Data.Bitmap: added some comments, deleted
dead code, used PlatformWordSize type.
|
|
|
|
|
|
|
|
|
| |
Previously we were tracing the object we were asked to mark, even if it
lives in a compact region. However, there is no need to do this; we need
only to mark the region itself as live.
I have seen a segfault due to this due to the concurrent mark seeing a
an object in the process of being compacted by the mutator.
|
|
|
|
| |
Update haddock submodule
|
| |
|
|
|
|
| |
This will help track down #17035.
|
|
|
|
| |
Update haddock submodule
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously we would assert that threads which are sending a
`MSG_THROWTO` message must have their blocking status be blocked on the
message. In the usual case of a thread throwing to another thread this
is guaranteed by `stg_killThreadzh`. However, `throwToSelf`, used by
the GC to kill threads which ran out of heap, failed to guarantee this.
Noted while debugging #17785.
|
|
|
|
| |
usleep was removed in POSIX.1-2008.
|
| |
|
|
|
|
|
|
|
|
| |
When requesting more than BLOCKS_PER_MBLOCK blocks allocGroup can return a
different number of blocks than requested. Here we use the number of
requested blocks, however arenaFree will subtract the actual number of
blocks we got from arena_blocks (possibly) resulting in a negative value
and triggering ASSERT(arena_blocks >= 0).
|
|
|
|
| |
If hs_try_putmvar was called through an unsafe import, it would lose track of the running cap causing a deadlock
|
| |
|
|
|
|
|
|
| |
For unregisterised builds StgRun/StgReturn are implemented via a mini
interpreter in StgCRun.c and therefore would collide with the
implementations in StgCRunAsm.S.
|
|
|
|
|
|
| |
The m32 allocator's `pages` list may contain NULLs in the case that the
page was flushed. Some `munmap` implementations (e.g. FreeBSD's) don't
like it if we pass them NULL. Don't do that.
|
|
|
|
|
|
| |
Similar to SELinux, NetBSD "PaX mprotect" prohibits marking a page
mapping both writable and executable at the same time. Use libffi
which knows how to work around it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently compacting GC has the invariant that in a chain all fields are tagged
the same. However this does not really hold: root pointers are not tagged, so
when we thread a root we initialize a chain without a tag. When the pointed
objects is evaluated and we have more pointers to it from the heap, we then add
*tagged* fields to the chain (because pointers to it from the heap are tagged),
ending up chaining fields with different tags (pointers from roots are NOT
tagged, pointers from heap are). This breaks the invariant and as a result
compacting GC turns tagged pointers into non-tagged.
This later causes problem in the generated code where we do reads assuming that
the pointer is aligned, e.g.
0x7(%rax) -- assumes that pointer is tagged 1
which causes misaligned reads. This caused #17088.
We fix this using the "pointer tagging for large families" patch (#14373,
!1742):
- With the pointer tagging patch the GC can know what the tagged pointer to a
CONSTR should be (previously we'd need to know the family size -- large
families are always tagged 1, small families are tagged depending on the
constructor).
- Since we now know what the tags should be we no longer need to store the
pointer tag in the info table pointers when forming chains in the compacting
GC.
As a result we no longer need to tag pointers in chains with 1/2 depending on
whether the field points to an info table pointer, or to another field: an info
table pointer is always tagged 0, everything else in the chain is tagged 1. The
lost tags in pointers can be retrieved by looking at the info table.
Finally, instead of using tag 1 for fields and tag 0 for info table pointers, we
use two different tags for fields:
- 1 for fields that have untagged pointers
- 2 for fields that have tagged pointers
When unchaining we then look at the pointer to a field, and depending on its tag
we either leave a tagged pointer or an untagged pointer in the field.
This allows chaining untagged and tagged fields together in compacting GC.
Fixes #17088
Nofib results
-------------
Binaries are smaller because of smaller `Compact.c` code.
make mode=fast EXTRA_RUNTEST_OPTS="-cachegrind" EXTRA_HC_OPTS="-with-rtsopts=-c" NoFibRuns=1
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
CS -0.3% 0.0% +0.0% +0.0% +0.0%
CSD -0.3% 0.0% +0.0% +0.0% +0.0%
FS -0.3% 0.0% +0.0% -0.0% -0.0%
S -0.3% 0.0% +5.4% +0.8% +3.9%
VS -0.3% 0.0% +0.0% -0.0% -0.0%
VSD -0.3% 0.0% -0.0% -0.0% -0.2%
VSM -0.3% 0.0% +0.0% +0.0% +0.0%
anna -0.1% 0.0% +0.0% +0.0% +0.0%
ansi -0.3% 0.0% +0.1% +0.0% +0.0%
atom -0.2% 0.0% +0.0% +0.0% +0.0%
awards -0.2% 0.0% +0.0% 0.0% -0.0%
banner -0.3% 0.0% +0.0% +0.0% +0.0%
bernouilli -0.3% 0.0% +0.1% +0.0% +0.0%
binary-trees -0.2% 0.0% +0.0% 0.0% +0.0%
boyer -0.3% 0.0% +0.2% +0.0% +0.0%
boyer2 -0.2% 0.0% +0.2% +0.1% +0.0%
bspt -0.2% 0.0% +0.0% +0.0% +0.0%
cacheprof -0.2% 0.0% +0.0% +0.0% +0.0%
calendar -0.3% 0.0% +0.0% +0.0% +0.0%
cichelli -0.3% 0.0% +1.1% +0.2% +0.5%
circsim -0.2% 0.0% +0.0% -0.0% -0.0%
clausify -0.3% 0.0% +0.0% -0.0% -0.0%
comp_lab_zift -0.2% 0.0% +0.0% +0.0% +0.0%
compress -0.3% 0.0% +0.0% +0.0% +0.0%
compress2 -0.3% 0.0% +0.0% -0.0% -0.0%
constraints -0.3% 0.0% +0.2% +0.1% +0.1%
cryptarithm1 -0.3% 0.0% +0.0% -0.0% 0.0%
cryptarithm2 -0.3% 0.0% +0.0% +0.0% +0.0%
cse -0.3% 0.0% +0.0% +0.0% +0.0%
digits-of-e1 -0.3% 0.0% +0.0% +0.0% +0.0%
digits-of-e2 -0.3% 0.0% +0.0% +0.0% -0.0%
dom-lt -0.2% 0.0% +0.0% +0.0% +0.0%
eliza -0.2% 0.0% +0.0% +0.0% +0.0%
event -0.3% 0.0% +0.1% +0.0% -0.0%
exact-reals -0.2% 0.0% +0.0% +0.0% +0.0%
exp3_8 -0.3% 0.0% +0.0% +0.0% +0.0%
expert -0.2% 0.0% +0.0% +0.0% +0.0%
fannkuch-redux -0.3% 0.0% -0.0% -0.0% -0.0%
fasta -0.3% 0.0% +0.0% +0.0% +0.0%
fem -0.2% 0.0% +0.1% +0.0% +0.0%
fft -0.2% 0.0% +0.0% -0.0% -0.0%
fft2 -0.2% 0.0% +0.0% -0.0% +0.0%
fibheaps -0.3% 0.0% +0.0% -0.0% -0.0%
fish -0.3% 0.0% +0.0% +0.0% +0.0%
fluid -0.2% 0.0% +0.4% +0.1% +0.1%
fulsom -0.2% 0.0% +0.0% +0.0% +0.0%
gamteb -0.2% 0.0% +0.1% +0.0% +0.0%
gcd -0.3% 0.0% +0.0% +0.0% +0.0%
gen_regexps -0.3% 0.0% +0.0% -0.0% -0.0%
genfft -0.3% 0.0% +0.0% +0.0% +0.0%
gg -0.2% 0.0% +0.7% +0.3% +0.2%
grep -0.2% 0.0% +0.0% +0.0% +0.0%
hidden -0.2% 0.0% +0.0% +0.0% +0.0%
hpg -0.2% 0.0% +0.1% +0.0% +0.0%
ida -0.3% 0.0% +0.0% +0.0% +0.0%
infer -0.2% 0.0% +0.0% -0.0% -0.0%
integer -0.3% 0.0% +0.0% +0.0% +0.0%
integrate -0.2% 0.0% +0.0% +0.0% +0.0%
k-nucleotide -0.2% 0.0% +0.0% +0.0% -0.0%
kahan -0.3% 0.0% -0.0% -0.0% -0.0%
knights -0.3% 0.0% +0.0% -0.0% -0.0%
lambda -0.3% 0.0% +0.0% -0.0% -0.0%
last-piece -0.3% 0.0% +0.0% +0.0% +0.0%
lcss -0.3% 0.0% +0.0% +0.0% 0.0%
life -0.3% 0.0% +0.0% -0.0% -0.0%
lift -0.2% 0.0% +0.0% +0.0% +0.0%
linear -0.2% 0.0% +0.0% +0.0% +0.0%
listcompr -0.3% 0.0% +0.0% +0.0% +0.0%
listcopy -0.3% 0.0% +0.0% +0.0% +0.0%
maillist -0.3% 0.0% +0.0% -0.0% -0.0%
mandel -0.2% 0.0% +0.0% +0.0% +0.0%
mandel2 -0.3% 0.0% +0.0% +0.0% +0.0%
mate -0.2% 0.0% +0.0% +0.0% +0.0%
minimax -0.3% 0.0% +0.0% +0.0% +0.0%
mkhprog -0.2% 0.0% +0.0% +0.0% +0.0%
multiplier -0.3% 0.0% +0.0% -0.0% -0.0%
n-body -0.2% 0.0% -0.0% -0.0% -0.0%
nucleic2 -0.2% 0.0% +0.0% +0.0% +0.0%
para -0.2% 0.0% +0.0% -0.0% -0.0%
paraffins -0.3% 0.0% +0.0% -0.0% -0.0%
parser -0.2% 0.0% +0.0% +0.0% +0.0%
parstof -0.2% 0.0% +0.8% +0.2% +0.2%
pic -0.2% 0.0% +0.1% -0.1% -0.1%
pidigits -0.3% 0.0% +0.0% +0.0% +0.0%
power -0.2% 0.0% +0.0% -0.0% -0.0%
pretty -0.3% 0.0% -0.0% -0.0% -0.1%
primes -0.3% 0.0% +0.0% +0.0% -0.0%
primetest -0.2% 0.0% +0.0% -0.0% -0.0%
prolog -0.3% 0.0% +0.0% -0.0% -0.0%
puzzle -0.3% 0.0% +0.0% +0.0% +0.0%
queens -0.3% 0.0% +0.0% +0.0% +0.0%
reptile -0.2% 0.0% +0.2% +0.1% +0.0%
reverse-complem -0.3% 0.0% +0.0% +0.0% +0.0%
rewrite -0.3% 0.0% +0.0% -0.0% -0.0%
rfib -0.2% 0.0% +0.0% +0.0% -0.0%
rsa -0.2% 0.0% +0.0% +0.0% +0.0%
scc -0.3% 0.0% -0.0% -0.0% -0.1%
sched -0.3% 0.0% +0.0% +0.0% +0.0%
scs -0.2% 0.0% +0.1% +0.0% +0.0%
simple -0.2% 0.0% +3.4% +1.0% +1.8%
solid -0.2% 0.0% +0.0% +0.0% +0.0%
sorting -0.3% 0.0% +0.0% +0.0% +0.0%
spectral-norm -0.2% 0.0% -0.0% -0.0% -0.0%
sphere -0.2% 0.0% +0.0% +0.0% +0.0%
symalg -0.2% 0.0% +0.0% +0.0% +0.0%
tak -0.3% 0.0% +0.0% +0.0% -0.0%
transform -0.2% 0.0% +0.2% +0.1% +0.1%
treejoin -0.3% 0.0% +0.2% -0.0% -0.1%
typecheck -0.3% 0.0% +0.0% +0.0% +0.0%
veritas -0.1% 0.0% +0.0% +0.0% +0.0%
wang -0.2% 0.0% +0.0% -0.0% -0.0%
wave4main -0.3% 0.0% +0.0% -0.0% -0.0%
wheel-sieve1 -0.3% 0.0% +0.0% -0.0% -0.0%
wheel-sieve2 -0.3% 0.0% +0.0% -0.0% -0.0%
x2n1 -0.3% 0.0% +0.0% +0.0% +0.0%
--------------------------------------------------------------------------------
Min -0.3% 0.0% -0.0% -0.1% -0.2%
Max -0.1% 0.0% +5.4% +1.0% +3.9%
Geometric Mean -0.3% -0.0% +0.1% +0.0% +0.1%
--------------------------------------------------------------------------------
Program Size Allocs Instrs Reads Writes
--------------------------------------------------------------------------------
circsim -0.2% 0.0% +1.6% +0.4% +0.7%
constraints -0.3% 0.0% +4.3% +1.5% +2.3%
fibheaps -0.3% 0.0% +3.5% +1.2% +1.3%
fulsom -0.2% 0.0% +3.6% +1.2% +1.8%
gc_bench -0.3% 0.0% +4.1% +1.3% +2.3%
hash -0.3% 0.0% +6.6% +2.2% +3.6%
lcss -0.3% 0.0% +0.7% +0.2% +0.7%
mutstore1 -0.3% 0.0% +4.8% +1.4% +2.8%
mutstore2 -0.3% 0.0% +3.4% +1.0% +1.7%
power -0.2% 0.0% +2.7% +0.6% +1.9%
spellcheck -0.3% 0.0% +1.1% +0.4% +0.4%
--------------------------------------------------------------------------------
Min -0.3% 0.0% +0.7% +0.2% +0.4%
Max -0.2% 0.0% +6.6% +2.2% +3.6%
Geometric Mean -0.3% +0.0% +3.3% +1.0% +1.8%
Metric changes
--------------
While it sounds ridiculous, this change causes increased allocations in
the following tests. We concluded that this change can't cause a
difference in allocations and decided to land this patch. Fluctuations
in "bytes allocated" metric is tracked in #17686.
Metric Increase:
Naperian
T10547
T12150
T12234
T12425
T13035
T5837
T6048
|
|
|
|
|
|
|
|
|
| |
Stack squeezing is done on context switch, not on GC or stack overflow.
Fix the documentation.
Fixes #17685
[ci skip]
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
As noted in #17606, Docker disallows the get_mempolicy syscall by
default. This caused numerous tests to fail under CI in the `debug_numa`
way. Avoid this by disabling the NUMA probing logic when --debug-numa is
in use, instead setting n_numa_nodes in RtsFlags.c.
Fixes #17606.
|
|
|
|
|
| |
Previously things like `+RTS --numa-debug` would enable NUMA support,
despite being an invalid flag.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
We can do a heap census with a non-profiling RTS. With a non-profiling
RTS we don't zero superfluous bytes of shrunk arrays hence a need to
handle the case specifically to avoid a crash.
Revert part of a586b33f8e8ad60b5c5ef3501c89e9b71794bbed
|
|
|
|
|
|
|
| |
This seems to have regressed builds using `--with-system-libffi`
(#17520).
This reverts commit 3ce18700f80a12c48a029b49c6201ad2410071bb.
|